Room Vibe Check: Two Tiny Models Read the Vibe of Your Room

Community Article Published June 9, 2026

3.8B parameters, zero cloud APIs, and your room gets a personality profile.

The Idea

Everyone's room says something about them. The stack of unread books. The fairy lights. The gaming chair next to a yoga mat. What if AI could read that vibe and roast you for it?

Room Vibe Check does exactly that. Upload a photo of your room, pick Roast or Kind mode, and get a personality profile for your space.

What You Get

  • Vibe Profile — a personality-style label (e.g., "Cozy Chaos Goblin")
  • One-liner — the punchline ("This room screams 'I'll organize tomorrow'")
  • Objects Noticed — key items the AI spotted
  • Vibe Score — 1-10 rating
  • Quick Tip — one cheap improvement suggestion

Toggle between Roast mode (brutally funny) and Kind mode (warm and encouraging) for very different results from the same photo.

The Pipeline: Two Models, 3.8B Total

Agent Model Size Job
Room Analyzer Florence-2-large 0.8B See what's in the room
Vibe Generator Qwen2.5-3B-Instruct 3.0B Generate the vibe profile
Total 3.8B

Stage 1: Florence-2 runs a more-detailed captioning task and OCR on the photo. It describes the room and reads any visible text — posters, book titles, signs on the wall. This gives the vibe generator rich material to work with.

Stage 2: Qwen2.5-3B takes that description plus the selected mode (Roast/Kind) and generates structured JSON with the vibe profile, score, objects, one-liner, and tip.

Two specialized models instead of one general-purpose model. Florence-2 is a vision specialist — small and fast. Qwen2.5-3B handles the creative writing. Together they're 3.8B, well under the 4B limit.

What I Learned

Florence-2 Is an Underrated Vision Model

At only 0.8B parameters, Florence-2-large produces surprisingly detailed room descriptions. It catches things like "a guitar leaning against the wall" or "fairy lights draped over the headboard" — exactly the kind of details that make vibe readings funny and specific.

Prompt Engineering for Personality

The hardest part wasn't the tech — it was getting Qwen2.5-3B to be genuinely funny in Roast mode without being mean, and genuinely warm in Kind mode without being generic. The system prompts went through several iterations. The key was giving it a persona ("You are a brutally funny interior design critic") rather than just instructions ("Be funny").

Structured Output from Small Models

Qwen2.5-3B occasionally produces malformed JSON. The fix: parse what you can, retry once if needed, and fall back to showing the raw room description if all else fails. Users would rather see partial results than an error screen.

Lazy Loading for HF Spaces

Both models load on first request, not at startup. HF Spaces has a build timeout — if model download exceeds it, the Space never starts. Lazy loading means instant boot, models download when someone actually clicks "Check My Vibe."

The Stack

  • Vision: Florence-2-large (0.8B) via transformers
  • Text: Qwen2.5-3B-Instruct (Q4_K_M GGUF) via llama-cpp-python
  • UI: Gradio 6.x
  • Hosting: Hugging Face Spaces (CPU)
  • Cloud APIs: None

Try It

Try Room Vibe Check on Hugging Face Spaces

Built for the Build Small Hackathon 2026 — Thousand Token Wood track. 3.8B total parameters, no cloud APIs, full agent trace transparency.

Community

Sign up or log in to comment