hermes-detective/docs/ideas/007-vision-spot-the-difference.md

# Idea 007: Spot the Difference Agent

**Date:** 2026-04-19
**Status:** Idea
**Tags:** hermes-agent, kimi-vision, puzzle, gamification, webapp

## Concept

A daily "Spot the Difference" puzzle webapp where AI (Kimi + Hermes) analyzes two images and shows its step-by-step process in finding the differences.

**Core insight:** Use visual analysis strength, minimize reasoning load.

## User Flow

1. User opens webapp → sees today's "Spot the Difference" puzzle (two similar images)
2. User can play manually (click on differences) OR
3. User clicks "Let AI Solve" → watches AI's step-by-step analysis
4. AI shows its reasoning process: "Scanning left-to-right... Found difference #1: color mismatch in top-left..."
5. Leaderboard shows attempt stats (anonymous)

## Why This Works

| Aspect | Implementation |
|--------|----------------|
| **Visual Analysis** | Kimi compares images pixel-level + semantic |
| **Low Reasoning** | Pattern matching, not complex logic |
| **Step-by-Step** | Show each finding with visual highlight |
| **Gamification** | Daily puzzle, leaderboard, no auth |

## Puzzle Types

### Primary: Spot the Difference (v1)
- Two images with subtle differences
- Kimi identifies all differences
- Each found difference highlighted + explanation

### Secondary (future):
- Find the anomaly (what's wrong in this image?)
- Count the objects (how many X in this image?)
- What's different? (semantic analysis)

## Technical Stack

| Component | Source | Role |
|-----------|--------|------|
| Frontend | Single HTML page | Display puzzle, show AI process |
| Image Analysis | Kimi Vision (via gateway) | Compare images, find differences |
| Orchestration | Hermes Agent | Coordinate flow, format output |
| Image Gen | Pollinations AI | Generate daily puzzle pairs |

### Daily Puzzle Generation
```
Hermes + Pollinations → Generate base image
Hermes + Pollinations → Generate modified image (with subtle changes)
Store both → Serve to users daily
```

### AI Solving Process
```
1. Hermes receives both images
2. Send to Kimi Vision for analysis
3. Kimi returns list of differences with locations
4. Hermes formats step-by-step explanation
5. Frontend animates each finding
```

## Features

### Core
- [ ] Daily puzzle auto-rotates
- [ ] Two-image display (side by side)
- [ ] "Let AI Solve" button
- [ ] Step-by-step visualization of AI findings
- [ ] Show each difference with highlight + explanation

### Gamification (no auth)
- [ ] Attempt counter (per user session, localStorage)
- [ ] Leaderboard (anonymous, session-based)
- [ ] "Perfect solve" badge (AI found all differences on first pass)

### Nice to Have
- [ ] Difficulty levels (Easy/Medium/Hard)
- [ ] Share result as image
- [ ] Hint system (Kimi finds 1, user finds rest)

## Step-by-Step Output Format

```
🔍 Scanning image...
✅ Difference #1 found: "The lamp color changed from blue to red"
   📍 Location: Top-left corner
   👆 [Highlighted on image]

✅ Difference #2 found: "Window shape is slightly different"
   📍 Location: Center-right
   👆 [Highlighted on image]

...

🎯 Solved! Found X differences in Y steps.
⏱️ Time: Z seconds
```

## Comparison with Other Ideas

| Aspect | 001 Visual Narrative | 007 Spot the Difference |
|--------|---------------------|------------------------|
| Visual Analysis | Heavy | **Heavy** |
| Reasoning | Medium | **Light** |
| Demo Impact | High | **High** |
| Gamification | Low | **High** |
| Uniqueness | 7/10 | **9/10** |
| Step-by-Step | Yes | **Yes (more natural)** |

## Why Stronger than 001

1. **Tangible use case** — People actually play spot the difference
2. **Clear AI demonstration** — "Watch AI see what you see"
3. **Gamification** — Daily puzzle + leaderboard = engagement
4. **Low reasoning, high vision** — Perfect for Kimi's strength
5. **Step-by-step is natural** — Not forced, it's how you'd solve it

## Risks

- ⚠️ Need reliable daily puzzle generation (harder than it sounds)
- ⚠️ Kimi analysis quality depends on image complexity
- ⚠️ Need diverse puzzle set to not repeat

## Next Steps

- [ ] Test Kimi's spot-the-difference capability
- [ ] Design puzzle generation pipeline
- [ ] Mock up webapp UI
- [ ] Prototype step-by-step visualization

## Related Ideas

- See: `001-visual-narrative-agent.md`