Files
hermes-detective/docs/ideas/007-vision-spot-the-difference.md
shoko ecfd0b1160 feat: Initial commit - Hermes Detective Agency concept
- Hermes Detective Agency: Open-ended mystery investigation game
- Roles: Chief (human), Witness (Kimi), Detective (Hermes)
- 5 difficulty levels, community cases, open-ended solving
- Scoring: Alignment %, Evidence %, Time
- Features: Retry, Journal, Observe mode
- Tech: Kimi Vision + Hermes Agent + Pollinations

Changelog:
- Research phase: Kimi capabilities, Hermes agent, image APIs
- Brainstorming: 14 ideas explored
- Comparison matrix: Detective selected as winner
- Concept finalized with all design decisions
2026-04-20 00:00:30 +00:00

139 lines
4.2 KiB
Markdown

# Idea 007: Spot the Difference Agent
**Date:** 2026-04-19
**Status:** Idea
**Tags:** hermes-agent, kimi-vision, puzzle, gamification, webapp
## Concept
A daily "Spot the Difference" puzzle webapp where AI (Kimi + Hermes) analyzes two images and shows its step-by-step process in finding the differences.
**Core insight:** Use visual analysis strength, minimize reasoning load.
## User Flow
1. User opens webapp → sees today's "Spot the Difference" puzzle (two similar images)
2. User can play manually (click on differences) OR
3. User clicks "Let AI Solve" → watches AI's step-by-step analysis
4. AI shows its reasoning process: "Scanning left-to-right... Found difference #1: color mismatch in top-left..."
5. Leaderboard shows attempt stats (anonymous)
## Why This Works
| Aspect | Implementation |
|--------|----------------|
| **Visual Analysis** | Kimi compares images pixel-level + semantic |
| **Low Reasoning** | Pattern matching, not complex logic |
| **Step-by-Step** | Show each finding with visual highlight |
| **Gamification** | Daily puzzle, leaderboard, no auth |
## Puzzle Types
### Primary: Spot the Difference (v1)
- Two images with subtle differences
- Kimi identifies all differences
- Each found difference highlighted + explanation
### Secondary (future):
- Find the anomaly (what's wrong in this image?)
- Count the objects (how many X in this image?)
- What's different? (semantic analysis)
## Technical Stack
| Component | Source | Role |
|-----------|--------|------|
| Frontend | Single HTML page | Display puzzle, show AI process |
| Image Analysis | Kimi Vision (via gateway) | Compare images, find differences |
| Orchestration | Hermes Agent | Coordinate flow, format output |
| Image Gen | Pollinations AI | Generate daily puzzle pairs |
### Daily Puzzle Generation
```
Hermes + Pollinations → Generate base image
Hermes + Pollinations → Generate modified image (with subtle changes)
Store both → Serve to users daily
```
### AI Solving Process
```
1. Hermes receives both images
2. Send to Kimi Vision for analysis
3. Kimi returns list of differences with locations
4. Hermes formats step-by-step explanation
5. Frontend animates each finding
```
## Features
### Core
- [ ] Daily puzzle auto-rotates
- [ ] Two-image display (side by side)
- [ ] "Let AI Solve" button
- [ ] Step-by-step visualization of AI findings
- [ ] Show each difference with highlight + explanation
### Gamification (no auth)
- [ ] Attempt counter (per user session, localStorage)
- [ ] Leaderboard (anonymous, session-based)
- [ ] "Perfect solve" badge (AI found all differences on first pass)
### Nice to Have
- [ ] Difficulty levels (Easy/Medium/Hard)
- [ ] Share result as image
- [ ] Hint system (Kimi finds 1, user finds rest)
## Step-by-Step Output Format
```
🔍 Scanning image...
✅ Difference #1 found: "The lamp color changed from blue to red"
📍 Location: Top-left corner
👆 [Highlighted on image]
✅ Difference #2 found: "Window shape is slightly different"
📍 Location: Center-right
👆 [Highlighted on image]
...
🎯 Solved! Found X differences in Y steps.
⏱️ Time: Z seconds
```
## Comparison with Other Ideas
| Aspect | 001 Visual Narrative | 007 Spot the Difference |
|--------|---------------------|------------------------|
| Visual Analysis | Heavy | **Heavy** |
| Reasoning | Medium | **Light** |
| Demo Impact | High | **High** |
| Gamification | Low | **High** |
| Uniqueness | 7/10 | **9/10** |
| Step-by-Step | Yes | **Yes (more natural)** |
## Why Stronger than 001
1. **Tangible use case** — People actually play spot the difference
2. **Clear AI demonstration** — "Watch AI see what you see"
3. **Gamification** — Daily puzzle + leaderboard = engagement
4. **Low reasoning, high vision** — Perfect for Kimi's strength
5. **Step-by-step is natural** — Not forced, it's how you'd solve it
## Risks
- ⚠️ Need reliable daily puzzle generation (harder than it sounds)
- ⚠️ Kimi analysis quality depends on image complexity
- ⚠️ Need diverse puzzle set to not repeat
## Next Steps
- [ ] Test Kimi's spot-the-difference capability
- [ ] Design puzzle generation pipeline
- [ ] Mock up webapp UI
- [ ] Prototype step-by-step visualization
## Related Ideas
- See: `001-visual-narrative-agent.md`