Files
hermes-detective/docs/ideas/007-vision-spot-the-difference.md
shoko ecfd0b1160 feat: Initial commit - Hermes Detective Agency concept
- Hermes Detective Agency: Open-ended mystery investigation game
- Roles: Chief (human), Witness (Kimi), Detective (Hermes)
- 5 difficulty levels, community cases, open-ended solving
- Scoring: Alignment %, Evidence %, Time
- Features: Retry, Journal, Observe mode
- Tech: Kimi Vision + Hermes Agent + Pollinations

Changelog:
- Research phase: Kimi capabilities, Hermes agent, image APIs
- Brainstorming: 14 ideas explored
- Comparison matrix: Detective selected as winner
- Concept finalized with all design decisions
2026-04-20 00:00:30 +00:00

4.2 KiB

Idea 007: Spot the Difference Agent

Date: 2026-04-19
Status: Idea
Tags: hermes-agent, kimi-vision, puzzle, gamification, webapp

Concept

A daily "Spot the Difference" puzzle webapp where AI (Kimi + Hermes) analyzes two images and shows its step-by-step process in finding the differences.

Core insight: Use visual analysis strength, minimize reasoning load.

User Flow

  1. User opens webapp → sees today's "Spot the Difference" puzzle (two similar images)
  2. User can play manually (click on differences) OR
  3. User clicks "Let AI Solve" → watches AI's step-by-step analysis
  4. AI shows its reasoning process: "Scanning left-to-right... Found difference #1: color mismatch in top-left..."
  5. Leaderboard shows attempt stats (anonymous)

Why This Works

Aspect Implementation
Visual Analysis Kimi compares images pixel-level + semantic
Low Reasoning Pattern matching, not complex logic
Step-by-Step Show each finding with visual highlight
Gamification Daily puzzle, leaderboard, no auth

Puzzle Types

Primary: Spot the Difference (v1)

  • Two images with subtle differences
  • Kimi identifies all differences
  • Each found difference highlighted + explanation

Secondary (future):

  • Find the anomaly (what's wrong in this image?)
  • Count the objects (how many X in this image?)
  • What's different? (semantic analysis)

Technical Stack

Component Source Role
Frontend Single HTML page Display puzzle, show AI process
Image Analysis Kimi Vision (via gateway) Compare images, find differences
Orchestration Hermes Agent Coordinate flow, format output
Image Gen Pollinations AI Generate daily puzzle pairs

Daily Puzzle Generation

Hermes + Pollinations → Generate base image
Hermes + Pollinations → Generate modified image (with subtle changes)
Store both → Serve to users daily

AI Solving Process

1. Hermes receives both images
2. Send to Kimi Vision for analysis
3. Kimi returns list of differences with locations
4. Hermes formats step-by-step explanation
5. Frontend animates each finding

Features

Core

  • Daily puzzle auto-rotates
  • Two-image display (side by side)
  • "Let AI Solve" button
  • Step-by-step visualization of AI findings
  • Show each difference with highlight + explanation

Gamification (no auth)

  • Attempt counter (per user session, localStorage)
  • Leaderboard (anonymous, session-based)
  • "Perfect solve" badge (AI found all differences on first pass)

Nice to Have

  • Difficulty levels (Easy/Medium/Hard)
  • Share result as image
  • Hint system (Kimi finds 1, user finds rest)

Step-by-Step Output Format

🔍 Scanning image...
✅ Difference #1 found: "The lamp color changed from blue to red"
   📍 Location: Top-left corner
   👆 [Highlighted on image]

✅ Difference #2 found: "Window shape is slightly different"
   📍 Location: Center-right
   👆 [Highlighted on image]

...

🎯 Solved! Found X differences in Y steps.
⏱️ Time: Z seconds

Comparison with Other Ideas

Aspect 001 Visual Narrative 007 Spot the Difference
Visual Analysis Heavy Heavy
Reasoning Medium Light
Demo Impact High High
Gamification Low High
Uniqueness 7/10 9/10
Step-by-Step Yes Yes (more natural)

Why Stronger than 001

  1. Tangible use case — People actually play spot the difference
  2. Clear AI demonstration — "Watch AI see what you see"
  3. Gamification — Daily puzzle + leaderboard = engagement
  4. Low reasoning, high vision — Perfect for Kimi's strength
  5. Step-by-step is natural — Not forced, it's how you'd solve it

Risks

  • ⚠️ Need reliable daily puzzle generation (harder than it sounds)
  • ⚠️ Kimi analysis quality depends on image complexity
  • ⚠️ Need diverse puzzle set to not repeat

Next Steps

  • Test Kimi's spot-the-difference capability
  • Design puzzle generation pipeline
  • Mock up webapp UI
  • Prototype step-by-step visualization
  • See: 001-visual-narrative-agent.md