Spaces:

jguimond
/

assignment_8_v3

Sleeping

App Files Files Community

jguimond commited on 12 days ago

Commit

0fa10ed

verified ·

1 Parent(s): 4dfff9e

Update README with more information about the project

Browse files

Files changed (1) hide show

README.md +79 -3

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: Assignment 8 V3
 emoji: 🚀
 colorFrom: yellow
 colorTo: blue
@@ -7,7 +7,83 @@ sdk: gradio
 sdk_version: 6.0.1
 app_file: app.py
 pinned: false
-short_description: assignment_8 using both rouge and Zero-Shot Classifier
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Unit 8 Final Project - End-to-End AI Solution Implementation
 emoji: 🚀
 colorFrom: yellow
 colorTo: blue
 sdk_version: 6.0.1
 app_file: app.py
 pinned: false
+short_description: Multimodal image captioning & vibe evaluation.
 ---
+# Assignment 8 – Multimodal Image Captioning & Vibe Evaluation
+This Space implements a **multimodal AI web app** for my AI Solutions class.
+The app compares two **image captioning models** on the same image, analyzes the emotional *“vibe”* of each caption, and evaluates model performance using **NLP metrics**.
+The goal is to explore how **Vision-Language Models (VLMs)** and **text-based models (LLM-style components)** can work together in a single pipeline, and to provide a clear interface for testing and analysis.
+---
+## 🧠 What This App Does
+Given an image and a user-provided *ground truth* caption, the app:
+1. **Generates captions** with two image captioning models:
+   - **Model 1:** BLIP image captioning
+   - **Model 2:** ViT-GPT2 image captioning
+2. **Detects the emotional “vibe”** of each caption using a **zero-shot text classifier** with labels such as:
+   - Peaceful / Calm
+   - Happy / Joy
+   - Sad / Sorrow
+   - Angry / Upset
+   - Fear / Scared
+   - Action / Violence
+3. **Evaluates the captions** against the ground truth using NLP techniques:
+   - **Semantic similarity** via `sentence-transformers` (cosine similarity)
+   - **ROUGE-L** via the `evaluate` library (word-overlap accuracy)
+4. **Displays all results** in a Gradio interface:
+   - Captions for each model
+   - Vibe labels + confidence scores
+   - A summary block with similarity and ROUGE-L scores
+This makes it easy to see not just *what* the models say, but also *how close* they are to a human caption and *how the wording affects the emotional tone*.
+---
+## 🔍 Models & Libraries Used
+- **Vision-Language Models (VLMs) for captioning**
+  - BLIP image captioning model
+  - ViT-GPT2 image captioning model
+- **Text / NLP Components**
+  - Zero-shot text classifier for vibe detection
+  - `sentence-transformers/all-MiniLM-L6-v2` for semantic similarity
+  - `evaluate` library for ROUGE-L
+- **Framework / UI**
+  - [Gradio](https://gradio.app/) for the web interface
+  - Deployed as a **Hugging Face Space** (this repo)
+---
+## 🖼️ How to Use the App
+1. **Upload an image**
+   - Use one of the provided example images or upload your own.
+2. **Enter a ground truth caption**
+   - Type a short sentence that, in your own words, best describes the image.
+3. **Click “Submit”**
+   - The app will:
+     - Run both captioning models
+     - Classify the vibe of each caption
+     - Compute similarity and ROUGE-L vs. your ground truth
+4. **Review the outputs**
+   - Compare how each model describes the scene
+   - Check if the vibe matches what you expect
+   - Look at the metrics to see which caption is closer to your description
+---