From ecfd0b11609108531a68da4c01e5f4a0d6fcd795 Mon Sep 17 00:00:00 2001
From: shoko <shoko@fbrns.co>
Date: Mon, 20 Apr 2026 00:00:30 +0000
Subject: [PATCH] feat: Initial commit - Hermes Detective Agency concept

- Hermes Detective Agency: Open-ended mystery investigation game
- Roles: Chief (human), Witness (Kimi), Detective (Hermes)
- 5 difficulty levels, community cases, open-ended solving
- Scoring: Alignment %, Evidence %, Time
- Features: Retry, Journal, Observe mode
- Tech: Kimi Vision + Hermes Agent + Pollinations

Changelog:
- Research phase: Kimi capabilities, Hermes agent, image APIs
- Brainstorming: 14 ideas explored
- Comparison matrix: Detective selected as winner
- Concept finalized with all design decisions
---
 .issues/001-hermes-hackathon-project.md      |  63 +++
 CHANGELOG.md                                 | 204 ++++++++
 docs/chosen-detective-game.md                | 502 +++++++++++++++++++
 docs/ideas/001-visual-narrative-agent.md     |  79 +++
 docs/ideas/007-vision-spot-the-difference.md | 138 +++++
 docs/ideas/008-visual-detective.md           | 397 +++++++++++++++
 docs/ideas/COMPARISON.md                     | 132 +++++
 docs/research-hermes-agent.md                |  47 ++
 docs/research-image-generation-apis.md       |  72 +++
 docs/research-kimi-visual-capabilities.md    |  51 ++
 10 files changed, 1685 insertions(+)
 create mode 100644 .issues/001-hermes-hackathon-project.md
 create mode 100644 CHANGELOG.md
 create mode 100644 docs/chosen-detective-game.md
 create mode 100644 docs/ideas/001-visual-narrative-agent.md
 create mode 100644 docs/ideas/007-vision-spot-the-difference.md
 create mode 100644 docs/ideas/008-visual-detective.md
 create mode 100644 docs/ideas/COMPARISON.md
 create mode 100644 docs/research-hermes-agent.md
 create mode 100644 docs/research-image-generation-apis.md
 create mode 100644 docs/research-kimi-visual-capabilities.md

diff --git a/.issues/001-hermes-hackathon-project.md b/.issues/001-hermes-hackathon-project.md
new file mode 100644
index 0000000..3f4a8cd
--- /dev/null
+++ b/.issues/001-hermes-hackathon-project.md
@@ -0,0 +1,63 @@
+# Issue 001: Hermes Agent Creative Hackathon Project
+
+**Status:** active  
+**Created:** 2026-04-19  
+**Tags:** hackathon, hermes-agent, creative  
+
+## Summary
+
+Participate in the Hermes Agent Creative Hackathon. 16 days, $25k in prizes across two tracks.
+
+## Tracks
+
+| Track | Pool | 1st | 2nd | 3rd |
+|-------|------|-----|-----|-----|
+| Main | $15,000 | $10,000 | $3,500 | $1,500 |
+| Kimi | $5,000 | $3,500 | $1,000 | $500 |
+
+Plus $5k in Kimi Credits for winners.
+
+## Requirements
+
+- Submit via Discord (`⁠creative-hackathon-submissions` channel)
+- Tweet demo video + writeup tagging @NousResearch
+- Kimi Track: must prove Kimi model usage (eligible for both tracks)
+- Judged on: creativity, usefulness, presentation
+- Deadline: EOD Sunday, May 3rd
+
+## Creative Domains
+
+Video, image, audio, 3D, long-form writing, creative software, interactive media
+
+## Next Steps
+
+- [ ] Brainstorm 3-5 project ideas
+- [ ] Compare ideas (uniqueness, feasibility, wow factor)
+- [ ] Decide on final concept
+- [ ] Build prototype
+- [ ] Test & iterate
+- [ ] Produce submission video + writeup
+
+## Research Summary
+
+### Kimi Visual Strengths (Validated)
+- **Video understanding** - Unique multimodal capability
+- **Visual coding** - Kimi Code, Kimi Claw
+- **Image + Text** - Full multimodal support
+- **Strong benchmarks** - SWE-bench 65.8%, Tau2-Bench ~64%
+
+### Hermes Agent Capabilities
+- **Function calling** - Trained for reliable tool use
+- **Structured output** - JSON/Pydantic
+- **OpenAI-compatible** - Easy integration
+- **Multi-turn agents** - Agentic workflow execution
+
+### Potential Project Concept
+**Visual Agent Pipeline:** Hermes Agent orchestrates workflows, Kimi K2.5 handles visual understanding (image/video analysis), creating a creative visual AI application.
+
+### Kimi Track Eligibility
+To qualify for Kimi Track, must prove Kimi model usage in demo video.
+
+## Notes
+
+Session started 2026-04-19. Working directory: `/tmp/hermes-hack`
diff --git a/CHANGELOG.md b/CHANGELOG.md
new file mode 100644
index 0000000..6e024f3
--- /dev/null
+++ b/CHANGELOG.md
@@ -0,0 +1,204 @@
+# Session Log
+
+## 2026-04-19
+
+### 001 - Session Start: Hermes Hackathon
+
+**What:** Started Hermes Agent Creative Hackathon collaboration.
+
+**Context:**
+- Hackathon: 16 days, $25k prizes (Main $15k, Kimi $5k, $5k Kimi credits)
+- Presented by Kimi Moonshot & Nous Research
+- Two tracks: Main (any creative use) and Kimi Track (must use Kimi models)
+- Deadline: EOD Sunday, May 3rd
+
+**Action:** 
+- Set up workflow structure (`.issues/`, `docs/`, git init)
+- Created first issue file: `001-hermes-hackathon-project.md`
+
+**Next:**
+- Define project concept and creative domain focus
+- Explore Hermes Agent capabilities
+- Sketch initial prototype idea
+
+### 002 - Research Completed
+
+**What:** Validated Kimi and Hermes Agent capabilities.
+
+**Findings:**
+- Kimi K2.5: multimodal (text+image+video), video understanding, visual coding
+- Kimi benchmarks: SWE-bench 65.8%, Tau2 ~64%
+- Hermes 3: function calling, structured output, OpenAI-compatible
+- Hermes built-in skills: manim-video, ascii-video, ascii-art (accessibility-focused)
+
+**Action:**
+- Created `docs/research-kimi-visual-capabilities.md`
+- Created `docs/research-hermes-agent.md`
+- Created `docs/research-image-generation-apis.md`
+- Updated issue file with research summary
+
+**Next:**
+- Define concrete project concept
+- Choose specific creative angle (visual coding? video analysis? image generation?)
+- Start rapid prototyping
+
+---
+
+### 003 - Image Gen API Research
+
+**What:** Found affordable/free image generation API.
+
+**Findings:**
+- **Pollinations AI** ✅: Free tier, OpenAI-compatible, multiple models (Flux, etc.)
+  - Endpoint: `https://gen.pollinations.ai/image/{prompt}`
+  - Simple: just curl it, no auth needed for basic
+  - Models: flux, zimage, wan-image, qwen-image, gptimage
+  - Cost: Free tier (pollen credits), $1 ≈ 1 Pollen paid
+
+**Action:**
+- Created `docs/research-image-generation-apis.md`
+- Updated idea 001 with image gen options
+
+**Next:**
+- Sketch more project ideas for comparison
+- Do idea benchmark matrix
+
+---
+
+### 004 - Brainstorming Session
+
+**What:** Generated 7 project ideas, deeper dive on Idea 007.
+
+**Ideas Generated:**
+1. 001: Visual Narrative Agent (text → image loop)
+2. 002: Visual Memory Journal (AI scrapbook)
+3. 003: Reverse Design Critic (UI critique + fix)
+4. 004: Visual Poem Generator (two-AI art collaboration)
+5. 005: Scene-to-Scene Video Storyteller (visual journey)
+6. 006: Real-time Visual Debugger (screenshot → fix)
+7. 007: Spot the Difference Agent (NEW FOCUS)
+
+**User Preferences:**
+- Want high visual analysis, low reasoning
+- Single page webapp, no auth
+- Show step-by-step AI process
+- Gamification (leaderboard, daily puzzles)
+
+**Selected for deeper dive:** 007 Vision Puzzle
+
+**Action:**
+- Created `docs/ideas/007-vision-spot-the-difference.md`
+
+**Next:**
+- Compare all ideas to pick winner
+
+---
+
+### 005 - Ideas Comparison
+
+**What:** Created comparison matrix for all brainstormed ideas.
+
+**Ideas Compared:** 14 concepts across visual games, interactive, and creative
+
+**Scoring Criteria:**
+- Visual Analysis (30%)
+- Multi-Turn (20%)
+- Human-AI Interaction (20%)
+- Cost Efficiency (15%)
+- Uniqueness (10%)
+- Fun (5%)
+
+**Results:**
+
+| Rank | Idea | Score |
+|------|------|-------|
+| 🥇 | **033v2 Detective** | **4.7** |
+| 🥈 | Auction | 3.9 |
+| 🥉 | 032v2 Art Critic | 3.7 |
+| 4 | 013 Image Alchemy | 3.6 |
+| 5 | 009 Image Tarot | 3.5 |
+
+**Winner: 033v2 Detective**
+
+**Why:**
+- Best multi-turn (5+ rounds)
+- Human actively directs (Chief role)
+- Kimi does real visual work
+- Cost efficient (mostly text)
+- Natural mystery/narrative arc
+
+**Action:**
+- Created `docs/ideas/COMPARISON.md`
+- Created `docs/ideas/008-visual-detective.md` (includes multi-agent v2)
+
+**Next:**
+- Discuss and finalize concept details
+
+---
+
+### 006 - Concept Documented
+
+**What:** Documented chosen 033v2 Detective as `chosen-detective-game.md`.
+
+**Documented:**
+- Elevator pitch
+- Game roles (Chief, Witness, Detective)
+- Evidence types (crime scene, documents, photos, etc.)
+- Round structure (7 rounds per case)
+- Scoring system
+- UI concept sketch
+- Difficulty tiers (Rookie → Chief)
+- Daily cases + leaderboard
+
+**Action:**
+- Created `docs/chosen-detective-game.md`
+
+**Open Questions (for discussion):**
+1. How much Witness describes unprompted?
+2. Can Detective be wrong?
+3. Red herrings — yes/no?
+4. Plot twist mid-case?
+5. Timer?
+6. Replay past cases?
+7. Hints system?
+8. Skip evidence?
+9. Case sources (pre-made/generated)?
+10. Image sources (real/AI/illustrated)?
+11. Share results?
+12. Community cases?
+
+**Next:**
+- Discuss and finalize concept details
+
+---
+
+### 007 - Concept Finalized
+
+**What:** Finalized Hermes Detective Agency concept after extensive discussion.
+
+**Key Decisions:**
+- **Difficulty:** 5 levels (Easy → Impossible), one case per day
+- **Open-ended solving:** No single truth, multiple valid theories
+- **Scoring:** Alignment %, Evidence cited %, Time (turns × 10min)
+- **Hints:** Embedded in evidence (too obvious, barely obvious, not too obvious)
+- **Witness:** Dynamic appearance based on triggers (harder cases)
+- **Truth reveal:** Available anytime, doesn't end game
+- **Retry:** Unlimited attempts, every documented
+- **Journal:** Private by default, publish stats/journal optional
+- **Observe:** Watch others' published solves
+- **Case source:** 5 starter cases (one per difficulty) + community generation
+- **Community:** Visits + reviews (no auth, manipulable but requires effort)
+- **Discovery:** Jungle (browse) vs path (direct links from creator)
+- **Case format:** YAML-based template
+- **Creator tools:** Hermes skill + format validator
+
+**Action:**
+- Updated `docs/chosen-detective-game.md` with full finalized concept
+
+**Next:**
+- Technical architecture
+- UI/UX design
+- Prompt engineering
+- Prototype development
+
+---
diff --git a/docs/chosen-detective-game.md b/docs/chosen-detective-game.md
new file mode 100644
index 0000000..07c2b7f
--- /dev/null
+++ b/docs/chosen-detective-game.md
@@ -0,0 +1,502 @@
+# Project: Hermes Detective Agency
+
+**Chosen Concept:** 033v2 Detective  
+**Date:** 2026-04-19  
+**Status:** Concept Finalized  
+**Tags:** hermes-agent, kimi-vision, game, multi-agent, open-ended, community
+
+---
+
+## Concept Summary
+
+A mystery investigation game where a human (Chief) directs two AI agents — a **Witness** (powered by Kimi Vision) and a **Detective** (powered by Hermes) — to investigate visual cases.
+
+**Core philosophy:** Open-ended solving. No single truth. Evidence guides, but multiple theories are valid.
+
+---
+
+## Elevator Pitch
+
+> *"You're the Chief. Your Witness sees everything. Your Detective connects the dots. Build YOUR theory. See how it aligns with others."*
+
+---
+
+## The Story
+
+You run a small detective agency. Your two AI assistants have superhuman abilities:
+
+- **Witness** can look at any image and describe it perfectly — every detail, every inconsistency, every hidden clue.
+- **Detective** can take those observations and build theories, spot patterns, and identify suspects.
+
+Your job? **Direct the investigation.** Tell them what to look at. Ask the right questions. Build your theory.
+
+**Key difference:** There's no single "right answer." The creator has an intended story, but your theory is valid if evidence supports it.
+
+---
+
+## Game Roles
+
+### Chief (Human)
+The player. You run the investigation.
+
+| Action | Effect |
+|--------|--------|
+| Examine evidence | Witness + Kimi analyze |
+| Question suspects | Detective probes, Witness watches |
+| Compare items | Kimi highlights differences |
+| Build theory | Cite evidence, form conclusion |
+| Request truth | See creator's intended story (optional) |
+
+### Witness (Agent A + Kimi)
+The eyes. Analyzes visual evidence. Appears based on triggers.
+
+| Input | Output |
+|-------|--------|
+| Crime scene photo | "I see glass shards, muddy footprints, a broken frame..." |
+| Suspect photo | "This person has paint on their sleeve..." |
+| Document | Extracts text, notes inconsistencies |
+| Item close-up | Identifies details Chief might miss |
+
+**Dynamic Appearance:** In harder cases, Witness doesn't appear until triggered.
+
+### Detective (Agent B)
+The brain. Builds theories, responds to questions.
+
+| Input | Output |
+|-------|--------|
+| Witness observations | "Based on evidence, the thief entered through..." |
+| Suspect profiles | "Suspect A has motive: insurance fraud..." |
+| Human questions | "Good question, Chief. Let me look into that..." |
+| Theory building | Helps Chief cite evidence for their theory |
+
+---
+
+## Difficulty System
+
+### Difficulty Levels
+
+| Difficulty | Description | Evidence | Suspects | Red Herrings | Plot Twist |
+|-----------|-------------|----------|----------|---------------|------------|
+| **Easy** | Obvious clues, clear path | 4-5 | 2 | ❌ | ❌ |
+| **Medium** | Requires comparison | 6-7 | 3 | ❌ | ❌ |
+| **Hard** | Red herrings present | 8-9 | 4 | ✅ | ❌ |
+| **Hardcore** | Plot twist mid-case | 10-11 | 4 | ✅ | ✅ |
+| **Impossible** | All elements, complex | 12+ | 5 | ✅ | ✅ |
+
+### Daily Structure
+
+```
+One case per day, everyone gets the same case
+Same difficulty for all players
+Different case each day
+```
+
+### Starter Pack (5 Cases)
+
+| Week | Difficulty | Theme |
+|------|------------|-------|
+| 1 | Easy | Simple theft |
+| 2 | Medium | Missing person |
+| 3 | Hard | Corporate fraud |
+| 4 | Hardcore | Art heist (plot twist) |
+| 5 | Impossible | Multi-layered conspiracy |
+
+**Approach:** Add cases incrementally during development.
+
+---
+
+## Evidence System
+
+### Evidence Types
+
+| Type | What Kimi Sees | Example Clue |
+|------|---------------|--------------|
+| **Crime scene** | Scene layout, objects, anomalies | "Window was broken from inside" |
+| **Surveillance** | People, actions, timestamps | "Person lingered at door for 3 minutes" |
+| **Documents** | Text, handwriting, context | "Letter mentions 'meeting at midnight'" |
+| **Photos** | People, items, locations | "Suspect's shoes match the footprint" |
+| **Maps** | Routes, access points, exits | "Only one entrance visible to street" |
+| **Items** | Condition, marks, connections | "Key is copy — grooves don't match original" |
+
+### Evidence Citation
+
+Evidence helps build theory. Not all evidence is required.
+
+```
+Chief's Theory: "I think Suspect B did it."
+
+📎 Cited Evidence:
+- Evidence #3: Crime scene photo
+- Evidence #5: Security footage
+- Evidence #8: Witness testimony
+→ 3/10 evidence cited (30%)
+
+💬 Detective: "That's a solid theory. The evidence 
+supports B, but have you considered Evidence #7?"
+```
+
+### Hints Embedded in Evidence
+
+Not a separate button. Hints are part of the evidence design.
+
+| Level | Visibility | Example |
+|-------|-----------|---------|
+| **Too obvious** | Easy to find | "Letter saying 'I did it'" |
+| **Barely obvious** | Check certain places | "Muddy shoes near suspect's home" |
+| **Not too obvious** | Requires attention | "Timeline inconsistency in letter" |
+
+### Witness Trigger System
+
+In harder cases, Witness appears based on triggers.
+
+```
+Trigger Example:
+Turn 1: Chief examines crime scene photo
+Turn 2: Chief finds a hair sample on the floor
+   ↓ [Trigger activated]
+Turn 3: 👁️ Witness appears
+   ↓ "I recognize this hair... it belongs to Suspect B's dog"
+Turn 4: Chief examines suspect's home
+Turn 5: 👁️ Witness appears again (new trigger)
+   ↓ "I saw Suspect B leaving the gallery at midnight..."
+```
+
+**Indicator:** Each piece of evidence has a note indicating if it triggers Witness appearance.
+
+---
+
+## Open-Ended Solving
+
+### Core Philosophy
+
+> **No single truth. Multiple valid theories.**
+
+| Before | After |
+|--------|-------|
+| One correct answer | Multiple valid theories |
+| Wrong accusation = Fail | Theory valid if evidence supports |
+| One winner | Everyone discusses |
+| Truth ends game | Truth is guidance, not mandate |
+
+### Theory Building
+
+```
+👤 Chief builds theory:
+"I think Suspect B did it, with help from Suspect A.
+B had access (night guard), A had keys (curator).
+They split the insurance money."
+
+📎 Chief cites evidence:
+- Evidence #3: Crime scene (window not broken)
+- Evidence #5: Security footage (B was inside)
+- Evidence #7: A has master keys
+- Evidence #9: Financial records (recent debt)
+
+💬 Detective responds:
+"That's a coherent theory. Your cited evidence
+supports collaboration between A and B."
+```
+
+### Truth Reveal
+
+**Available anytime. Does NOT end the game.**
+
+| When | Why |
+|------|-----|
+| After building theory | "Did I get it right?" |
+| When stuck | "Give me guidance" |
+| Never | "I want to figure it out myself" |
+| After solving | "See how close I was" |
+
+```
+📜 THE TRUTH (Creator's Intended)
+
+The case was designed as:
+"A and B collaborated. A had keys, B had access.
+But C was the real mastermind, funding the whole thing."
+
+👤 Your theory:
+"Suspect B acted alone."
+
+💬 Comparison:
+- Your theory missed the collaboration element
+- You correctly identified B as main actor
+- Evidence you cited: 80% relevant
+- 🎯 65% alignment with intended truth
+
+💬 But: Your theory is still valid based on evidence!
+Discussion continues. Truth is guidance, not mandate.
+```
+
+---
+
+## Scoring System
+
+### Per Case Statistics
+
+| Metric | Calculation |
+|--------|-------------|
+| **Time** | Turns × 10 min (simplified) |
+| **Evidence** | Evidence cited / Total evidence |
+| **Alignment** | How close to creator's intended story |
+| **Coherence** | Theory makes sense based on evidence |
+
+### Statistics Display
+
+```
+┌─────────────────────────────────────┐
+│ 📊 CASE STATISTICS                  │
+├─────────────────────────────────────┤
+│ ⏱️ Time: 6 turns × 10 min = 60 min │
+│ 📎 Evidence: 7/10 cited (70%)       │
+│ 🎯 Alignment: 85% with creator     │
+│ 💬 Theory coherence: Strong        │
+├─────────────────────────────────────┤
+│ ⭐ Rating: Sharp Detective           │
+└─────────────────────────────────────┘
+```
+
+### Rating Tiers
+
+| Alignment | Rating |
+|-----------|--------|
+| 90-100% | Master Detective |
+| 75-89% | Sharp Detective |
+| 50-74% | Promising Detective |
+| 25-49% | Apprentice |
+| 0-24% | Rookie |
+
+---
+
+## Retry & Journal System
+
+### Multiple Attempts
+
+User can solve same case multiple times.
+
+```
+Case #47 — The Hartwell Heist
+
+Your Attempts:
+├── Attempt #1: 85% alignment, 6 turns 📖
+├── Attempt #2: 92% alignment, 4 turns 📖
+├── Attempt #3: In progress...
+└── Best: 92% alignment
+```
+
+### Journal Documentation
+
+Every attempt is documented (solve or not).
+
+```
+Attempt #1: April 19, 2026
+├── Status: Solved
+├── Evidence cited: 7/10
+├── Alignment: 85%
+├── Theory: "Suspect B acted alone"
+└── Notes: "Missed the A-B collaboration"
+```
+
+### Privacy Settings
+
+| Setting | Description |
+|---------|-------------|
+| **Private** | Only you see your attempts |
+| **Publish stats** | Everyone sees your stats (default) |
+| **Publish journal** | Anyone can read your solve |
+
+---
+
+## Replay (Observe Mode)
+
+Watch how others solved the case.
+
+```
+📺 OBSERVE MODE
+
+@alice's Solve of Case #47
+
+Turn 1: Examined crime scene
+Turn 2: Found hair sample → Witness appeared
+Turn 3: Questioned Suspect B
+Turn 4: Examined financial records
+Turn 5: Cited evidence, formed theory
+Turn 6: Requested truth reveal
+
+⏱️ 6 turns | 🎯 85% alignment | ⭐ Sharp
+```
+
+**Only published journals are observable.**
+
+---
+
+## Case Creation System
+
+### Starter Cases
+
+5 cases (one per difficulty) as templates.
+
+**Source:** Real solved cases adapted for the game.
+
+### Community Cases
+
+Anyone can create and share cases.
+
+#### Creation Flow
+
+```
+1. Choose reference case (optional)
+   "Let's base this on the Isabella Stewart Gardner theft"
+
+2. Gather/create evidence
+   Upload images (crime scene, suspects, documents)
+
+3. Write case brief
+   ├── Title, difficulty
+   ├── Suspect list (names, photos)
+   ├── Evidence set
+   ├── Hidden truth (creator's intended story)
+   ├── Red herrings (optional)
+   ├── Plot twist (optional)
+   └── Witness triggers (which evidence triggers Witness)
+
+4. Test it
+   Play through yourself to verify solvability
+
+5. Publish
+   ├── Private link (friends only)
+   └── Public (case library)
+```
+
+### Case Format
+
+```yaml
+case:
+  title: "The Hartwell Heist"
+  difficulty: medium
+  difficulty_description: "Requires comparison of evidence"
+  
+  evidence:
+    - id: 1
+      type: photo
+      image: crime_scene.jpg
+      description: "Crime scene photograph"
+      triggers_witness: true
+      hint_level: not_too_obvious
+    
+    - id: 2
+      type: document
+      image: letter.jpg
+      description: "Anonymous letter found"
+      triggers_witness: false
+      hint_level: barely_obvious
+  
+  suspects:
+    - name: "Suspect A"
+      photo: suspect_a.jpg
+      description: "Gallery curator"
+      
+  truth:
+    summary: "A and B collaborated..."
+    alignment_criteria:
+      - "Correctly identified collaboration"
+      - "Identified A as key holder"
+      - "Identified B as main actor"
+
+  witness_triggers:
+    - evidence_id: 1
+      testimony: "I see glass on the floor inside..."
+```
+
+### Case Creator Tools
+
+| Tool | Purpose |
+|------|---------|
+| **Skill** | Hermes skill for case creation guidance |
+| **Validator** | Verify case format is correct |
+
+---
+
+## Community Moderation
+
+### Discovery Philosophy
+
+> **Community cases are the jungle. Direct links are the path.**
+
+| Discovery Method | Quality | Effort |
+|-----------------|---------|--------|
+| Case library (browse) | Mixed (jungle) | Low |
+| Direct link from creator | Same quality | Medium |
+| Social media / community | Trusted (curated) | High |
+
+### Quality Signals
+
+| Signal | Description |
+|--------|-------------|
+| **Visits** | How many times case was played |
+| **Reviews** | 👍 or 👎 (no text, requires effort to spam) |
+
+```
+Case #47B — "The Missing Heirloom"
+├── Visits: 234
+├── 👍 45 | 👎 3
+└── Quality score: High
+```
+
+**Note:** Review manipulation is possible but requires effort. Not perfect, but workable.
+
+### Sharing Flow
+
+```
+Creator creates case
+    ↓
+Tests locally
+    ↓
+Publishes to community
+    ↓
+Shares link on social media / Discord
+    ↓
+Players try directly from creator
+```
+
+---
+
+## Summary of Decisions
+
+| Element | Decision |
+|---------|----------|
+| Difficulty | 5 levels (Easy → Impossible) |
+| Daily structure | One case per day, same for all |
+| Timer | ❌ No (first phase) |
+| Hints | ✅ Embedded in evidence |
+| Retry | ✅ Unlimited attempts |
+| Journal | ✅ Every attempt documented |
+| Observe | ✅ Watch published solves |
+| Privacy | Private by default |
+| Publish | Stats always, journal optional |
+| Scoring | Alignment %, Evidence %, Time |
+| Open-ended | ✅ No single truth |
+| Truth reveal | Available anytime |
+| Case source | Real cases + community |
+| Witness | Dynamic (triggers in hard cases) |
+| Red herrings | ✅ Hard+ difficulty |
+| Plot twist | ✅ Hardcore+ difficulty |
+| Community | Visits + reviews (no auth) |
+
+---
+
+## What's Next
+
+Once we finalize the concept:
+- Technical architecture
+- UI/UX design
+- Prompt engineering
+- Case creation template
+- Prototype development
+
+---
+
+## Related Documents
+
+- `docs/ideas/COMPARISON.md` — Full comparison matrix
+- `docs/ideas/008-visual-detective.md` — Initial brainstorm
diff --git a/docs/ideas/001-visual-narrative-agent.md b/docs/ideas/001-visual-narrative-agent.md
new file mode 100644
index 0000000..2dac70c
--- /dev/null
+++ b/docs/ideas/001-visual-narrative-agent.md
@@ -0,0 +1,79 @@
+# Idea 001: Visual Narrative Agent
+
+**Date:** 2026-04-19  
+**Status:** Idea  
+**Tags:** hermes-agent, kimi-vision, storytelling, image-generation
+
+## Concept
+
+An agentic storytelling system where Hermes orchestrates a narrative loop with Kimi's visual analysis and built-in image generation skills to produce coherent visual stories.
+
+## User Flow
+
+1. User provides text prompt (e.g., "A lone astronaut discovers an ancient alien garden on Mars")
+2. Hermes plans story structure (scenes, pacing, visual style)
+3. For each scene:
+   - Hermes generates image prompt
+   - Generate image (Hermes built-in skill: manim / ascii)
+   - Kimi analyzes generated image
+   - Kimi's feedback refines next scene's prompt
+4. Return compiled visual story to user
+
+## Key Differentiator
+
+Most story-to-image tools: **Generate → Done**
+
+This concept: **Generate → Analyze → Refine → Loop**
+
+Kimi serves as the **visual reasoning engine** — tells Hermes if the generated image matches the intended scene, catches inconsistencies, and informs prompt refinement for the next scene.
+
+## Tech Stack
+
+| Component | Source | Role |
+|-----------|--------|------|
+| Hermes Agent | Nous Research | Orchestration, planning, decision loop |
+| Kimi Vision | Moonshot AI (via gateway) | Image analysis, visual feedback |
+| Image Generation | Pollinations AI | Free tier, multiple models (Flux, etc.) |
+
+### Image Generation Options
+
+| Provider | Free Tier | Quality | Use Case |
+|---------|-----------|---------|----------|
+| **Pollinations** ✅ | ✅ Yes | Good | Primary (simple, free) |
+| **Flux (local)** | ✅ Free | High | If GPU available |
+| **Hermes skills** | ✅ Free | Niche | Fallback/ASCII aesthetic |
+
+### Pollinations API (Primary)
+- **Endpoint:** `https://gen.pollinations.ai/image/{prompt}`
+- **Models:** flux, zimage, wan-image, qwen-image, etc.
+- **Cost:** Free tier (pollen credits), ~$1/1 Pollen paid
+- **Auth:** Optional for free tier
+
+## Strengths
+
+- ✅ Combines Hermes + Kimi + Pollinations natively
+- ✅ Agentic visual feedback loop is unique
+- ✅ Visual coherence check via Kimi ensures quality
+- ✅ Free tier = low barrier to test
+- ✅ User controls output format (default: image)
+
+## Weaknesses
+
+- ⚠️ Pollinations quality vs DALL-E/Midjourney (may need to test)
+- ⚠️ Kimi requires gateway access (no direct API key)
+- ⚠️ Loop adds latency (generate → analyze → refine)
+- ⚠️ Need to verify Pollinations reliability
+
+## Uniqueness Score
+
+**7/10** — Agentic visual feedback loop is novel, but need to verify if built-in image generation is compelling enough
+
+## Next Steps
+
+- [ ] Explore Hermes built-in image skills (manim, ascii)
+- [ ] Define output format options
+- [ ] Sketch technical architecture
+
+## Related Ideas
+
+- See: `002-xxx.md`, `003-xxx.md` for alternatives
diff --git a/docs/ideas/007-vision-spot-the-difference.md b/docs/ideas/007-vision-spot-the-difference.md
new file mode 100644
index 0000000..2b416a9
--- /dev/null
+++ b/docs/ideas/007-vision-spot-the-difference.md
@@ -0,0 +1,138 @@
+# Idea 007: Spot the Difference Agent
+
+**Date:** 2026-04-19  
+**Status:** Idea  
+**Tags:** hermes-agent, kimi-vision, puzzle, gamification, webapp
+
+## Concept
+
+A daily "Spot the Difference" puzzle webapp where AI (Kimi + Hermes) analyzes two images and shows its step-by-step process in finding the differences.
+
+**Core insight:** Use visual analysis strength, minimize reasoning load.
+
+## User Flow
+
+1. User opens webapp → sees today's "Spot the Difference" puzzle (two similar images)
+2. User can play manually (click on differences) OR
+3. User clicks "Let AI Solve" → watches AI's step-by-step analysis
+4. AI shows its reasoning process: "Scanning left-to-right... Found difference #1: color mismatch in top-left..."
+5. Leaderboard shows attempt stats (anonymous)
+
+## Why This Works
+
+| Aspect | Implementation |
+|--------|----------------|
+| **Visual Analysis** | Kimi compares images pixel-level + semantic |
+| **Low Reasoning** | Pattern matching, not complex logic |
+| **Step-by-Step** | Show each finding with visual highlight |
+| **Gamification** | Daily puzzle, leaderboard, no auth |
+
+## Puzzle Types
+
+### Primary: Spot the Difference (v1)
+- Two images with subtle differences
+- Kimi identifies all differences
+- Each found difference highlighted + explanation
+
+### Secondary (future):
+- Find the anomaly (what's wrong in this image?)
+- Count the objects (how many X in this image?)
+- What's different? (semantic analysis)
+
+## Technical Stack
+
+| Component | Source | Role |
+|-----------|--------|------|
+| Frontend | Single HTML page | Display puzzle, show AI process |
+| Image Analysis | Kimi Vision (via gateway) | Compare images, find differences |
+| Orchestration | Hermes Agent | Coordinate flow, format output |
+| Image Gen | Pollinations AI | Generate daily puzzle pairs |
+
+### Daily Puzzle Generation
+```
+Hermes + Pollinations → Generate base image
+Hermes + Pollinations → Generate modified image (with subtle changes)
+Store both → Serve to users daily
+```
+
+### AI Solving Process
+```
+1. Hermes receives both images
+2. Send to Kimi Vision for analysis
+3. Kimi returns list of differences with locations
+4. Hermes formats step-by-step explanation
+5. Frontend animates each finding
+```
+
+## Features
+
+### Core
+- [ ] Daily puzzle auto-rotates
+- [ ] Two-image display (side by side)
+- [ ] "Let AI Solve" button
+- [ ] Step-by-step visualization of AI findings
+- [ ] Show each difference with highlight + explanation
+
+### Gamification (no auth)
+- [ ] Attempt counter (per user session, localStorage)
+- [ ] Leaderboard (anonymous, session-based)
+- [ ] "Perfect solve" badge (AI found all differences on first pass)
+
+### Nice to Have
+- [ ] Difficulty levels (Easy/Medium/Hard)
+- [ ] Share result as image
+- [ ] Hint system (Kimi finds 1, user finds rest)
+
+## Step-by-Step Output Format
+
+```
+🔍 Scanning image...
+✅ Difference #1 found: "The lamp color changed from blue to red"
+   📍 Location: Top-left corner
+   👆 [Highlighted on image]
+
+✅ Difference #2 found: "Window shape is slightly different"
+   📍 Location: Center-right
+   👆 [Highlighted on image]
+
+...
+
+🎯 Solved! Found X differences in Y steps.
+⏱️ Time: Z seconds
+```
+
+## Comparison with Other Ideas
+
+| Aspect | 001 Visual Narrative | 007 Spot the Difference |
+|--------|---------------------|------------------------|
+| Visual Analysis | Heavy | **Heavy** |
+| Reasoning | Medium | **Light** |
+| Demo Impact | High | **High** |
+| Gamification | Low | **High** |
+| Uniqueness | 7/10 | **9/10** |
+| Step-by-Step | Yes | **Yes (more natural)** |
+
+## Why Stronger than 001
+
+1. **Tangible use case** — People actually play spot the difference
+2. **Clear AI demonstration** — "Watch AI see what you see"
+3. **Gamification** — Daily puzzle + leaderboard = engagement
+4. **Low reasoning, high vision** — Perfect for Kimi's strength
+5. **Step-by-step is natural** — Not forced, it's how you'd solve it
+
+## Risks
+
+- ⚠️ Need reliable daily puzzle generation (harder than it sounds)
+- ⚠️ Kimi analysis quality depends on image complexity
+- ⚠️ Need diverse puzzle set to not repeat
+
+## Next Steps
+
+- [ ] Test Kimi's spot-the-difference capability
+- [ ] Design puzzle generation pipeline
+- [ ] Mock up webapp UI
+- [ ] Prototype step-by-step visualization
+
+## Related Ideas
+
+- See: `001-visual-narrative-agent.md`
diff --git a/docs/ideas/008-visual-detective.md b/docs/ideas/008-visual-detective.md
new file mode 100644
index 0000000..8d3bf83
--- /dev/null
+++ b/docs/ideas/008-visual-detective.md
@@ -0,0 +1,397 @@
+# Idea 008: Visual Detective
+
+**Date:** 2026-04-19
+
+## Concept
+
+Upload a "crime scene" or mystery image. Kimi analyzes every detail. Hermes pieces together clues and generates a detective story/hypothesis.
+
+## Why Strong
+
+- Heavy visual analysis (Kimi reads the scene)
+- Low reasoning (observation, not complex logic)
+- Storytelling naturally fits step-by-step
+- Mystery genre = engaging
+
+## User Flow
+
+1. Upload image (or get random daily mystery)
+2. Kimi: "I see a broken window, muddy footprints, overturned chair..."
+3. Hermes: "Based on these clues, here's what likely happened..."
+4. Output: Detective story with visual evidence
+
+## Tech
+
+- Kimi Vision: Scene analysis
+- Hermes: Narrative orchestration
+- Pollinations: Generate mystery images
+
+## Unique?
+
+- Nobody's doing "AI detective" with your photos
+- Could be daily mystery + community solving
+
+---
+
+## 009: Image Tarot Reader
+
+**Date:** 2026-04-19
+
+## Concept
+
+Upload any image. AI interprets it like a tarot card reading.
+
+## Why Strong
+
+- Fun/flirty, low stakes
+- Heavy visual analysis (Kimi interprets symbolism)
+- Storytelling fits perfectly
+- Shareable results
+
+## User Flow
+
+1. Upload image OR random draw
+2. Kimi: Analyzes composition, colors, objects, mood
+3. Hermes: "This represents [Tarot card]. Your reading: [Narrative]"
+4. Output: Tarot card + 3-card spread interpretation
+
+## Step-by-Step
+
+```
+🃏 Drawing your card...
+👁️ Analyzing your image...
+
+Visual Elements Detected:
+• A winding road (path in life)
+• Setting sun (endings/new beginnings)
+• Standing figure (you, the observer)
+
+🎴 Your Card: The Fool
+Interpretation: A new journey awaits. Trust the path ahead...
+
+Past: Confusion about direction
+Present: Standing at the crossroads
+Future: Leap of faith required
+```
+
+## Tech
+
+- Kimi Vision: Symbol analysis
+- Hermes: Tarot narrative generation
+- Pollinations: Generate thematic card visuals
+
+---
+
+## 010: Color Emotion Translator
+
+**Date:** 2026-04-19
+
+## Concept
+
+Upload image. AI analyzes dominant colors and translates them into emotions/mood.
+
+## Why Strong
+
+- Pure visual analysis
+- Art/design focused
+- Generates color palette + emotion report
+- Useful for designers
+
+## User Flow
+
+1. Upload image
+2. Kimi: Extracts colors, analyzes saturation, harmony
+3. Hermes: Translates to emotions, generates palette
+4. Output: Color palette + emotion breakdown + suggested uses
+
+## Step-by-Step
+
+```
+🔍 Scanning colors...
+🎨 Extracting dominant palette...
+
+Detected Colors:
+• #2D4A3E (Deep Forest Green) - 45%
+• #F5E6D3 (Warm Cream) - 30%
+• #8B4513 (Saddle Brown) - 15%
+• #CD853F (Peru Gold) - 10%
+
+🎭 Emotional Profile:
+Primary: Grounded, natural, calm
+Secondary: Warm, nostalgic, organic
+Accent: Vintage, artisanal, trustworthy
+
+💡 Recommendations:
+• Brand Identity for eco-friendly products
+• Interior design: cozy cabin aesthetic
+• Packaging: artisanal food products
+```
+
+---
+
+## 011: Before/After Time Machine
+
+**Date:** 2026-04-19
+
+## Concept
+
+Upload an old/historical photo. AI shows what it would look like today or vice versa.
+
+## Why Strong
+
+- Historical/educational angle
+- Visual transformation is compelling
+- Shows AI's understanding of time/changes
+
+## User Flow
+
+1. Upload old OR new photo
+2. Select transformation direction
+3. Kimi: Analyzes context, era, subject
+4. Hermes: Predicts/adapts to target era
+5. Output: Side-by-side transformation
+
+## Step-by-Step
+
+```
+📸 Analyzing source image...
+📅 Detected era: 1950s New York Street
+
+Identifying elements:
+• Black & white photography style
+• Vintage automobiles (1950s models)
+• Fashion: fedoras, swing coats
+• Architecture: Art Deco buildings
+
+🔮 Projecting to 2024...
+
+Transformation breakdown:
+• Colorization: Added natural skin tones + sky colors
+• Vehicles: Replaced with modern equivalents
+• Architecture: Updated signage, added modern elements
+• Fashion: Modernized while preserving style
+
+✨ Your 1950s scene in 2024!
+```
+
+---
+
+## 012: Visual Haiku Generator
+
+**Date:** 2026-04-19
+
+## Concept
+
+Upload any image. AI generates a haiku (5-7-5) based on visual elements.
+
+## Why Strong
+
+- Minimal reasoning, pure visual
+- Artistic/creative output
+- Japanese aesthetic + AI = unique
+- Highly shareable
+
+## User Flow
+
+1. Upload image
+2. Kimi: Analyzes scene, mood, elements
+3. Hermes: Crafts haiku (strict 5-7-5)
+4. Output: Image + haiku + syllable breakdown
+
+## Step-by-Step
+
+```
+🖼️ Analyzing your image...
+
+Scene Elements:
+• Autumn forest path
+• Golden leaves falling
+• Soft morning light through trees
+
+✍️ Crafting haiku...
+
+Forest whispers
+Golden footsteps on leaves—
+Silence speaks loud
+
+📝 Syllable breakdown:
+"Forest" (2) - whisper (2)
+s(1) - il(1) -ence (1) - speaks (1) - loud (1)
+"Golden" (2) - foot (1) -steps (1) - on (1) - leaves (1)
+(5) - (7) - (5) ✅
+```
+
+---
+
+## 013: Image Alchemy
+
+**Date:** 2026-04-19
+
+## Concept
+
+Upload two random images. AI "fuses" them into a new concept based on their shared elements.
+
+## Why Strong
+
+- Surprising/comedic combinations
+- Pure visual + semantic analysis
+- Unique creative output
+- Viral potential
+
+## User Flow
+
+1. Upload image A (or random)
+2. Upload image B (or random)
+3. Kimi: Analyzes both separately
+4. Hermes: Finds connections, creates fusion
+5. Output: New concept + fused image prompt
+
+## Step-by-Step
+
+```
+🌀 Analyzing Image A: A Viking ship
+• Norse aesthetic
+• Ocean voyage
+• Historical warrior culture
+
+🌀 Analyzing Image B: A Coffee shop
+• Cozy atmosphere
+• Barista craft
+• Modern social space
+
+🔮 Alchemizing...
+
+Found connections:
+• Craft (warrior's craft → barista's craft)
+• Ritual (battle ritual → coffee ritual)
+• Journey (ocean voyage → daily commute)
+
+⚗️ Alchemy Result:
+
+"THE VIKING BARISTA"
+
+A warrior of the morning,
+steering through storms of exhaustion,
+claiming the sacred cup.
+
+Your coffee shop serves mead in horn-shaped mugs,
+the barista wears a helmet of foam,
+and every latte is a conquest.
+```
+
+---
+
+## 014: Visual Lie Detector
+
+**Date:** 2026-04-19
+
+## Concept
+
+Upload a photo + claim. AI analyzes if the image supports or contradicts the claim.
+
+## Why Strong
+
+- Useful in era of fake news
+- Pure visual verification
+- Educational about image analysis
+- "Is this real?" tool
+
+## User Flow
+
+1. Paste claim + upload image
+2. Kimi: Analyzes image details
+3. Hermes: Compares claim vs evidence
+4. Output: Verdict + reasoning
+
+## Step-by-Step
+
+```
+🔍 Analyzing claim: "This photo was taken in Paris"
+
+🔬 Image Analysis:
+• Architecture: Haussmannian buildings ✓
+• Street signs: French ✓
+• License plates: European format ✓
+• Language: French on signs ✓
+• Vegetation: Consistent with Paris climate ✓
+• Shadows: Consistent with claimed time of day ✓
+
+✅ VERDICT: LIKELY AUTHENTIC
+
+Confidence: 94%
+Supporting evidence: 8/8 elements match
+Caveats: Metadata not verified
+```
+
+---
+
+## 015: Object Archaeology
+
+**Date:** 2026-04-19
+
+## Concept
+
+Upload an object close-up. AI identifies it, tells its history/story.
+
+## Why Strong
+
+- Educational
+- Heavy visual (identification + knowledge)
+- Discovery/antiquities angle
+- Could work with museum APIs
+
+## User Flow
+
+1. Upload object photo
+2. Kimi: Visual identification + details
+3. Hermes: Tells object's "story"
+4. Output: Identity + history narrative
+
+## Step-by-Step
+
+```
+🔍 Scanning object...
+
+Visual Analysis:
+• Material: Ceramic
+• Style: Ming Dynasty blue and white
+• Pattern: Dragon with cloud motifs
+• Technique: Underglaze blue
+
+🏺 Object Identified:
+Ming Dynasty (1368-1644) Blue and White Porcelain
+Dragon Pattern Bowl
+
+📜 The Story:
+This bowl was crafted during the reign of Emperor Wanli,
+at the height of Jingdezhen's porcelain production.
+The dragon motif signifies imperial power and protection...
+
+[Full historical narrative]
+```
+
+---
+
+## Quick Comparison Matrix
+
+| # | Name | Visual | Reasoning | Uniqueness | Fun |
+|---|------|--------|-----------|------------|-----|
+| 007 | Spot the Difference | Heavy | Light | 9/10 | 8/10 |
+| 008 | Visual Detective | Heavy | Light | 8/10 | 9/10 |
+| 009 | Image Tarot | Heavy | Light | 8/10 | 10/10 |
+| 010 | Color Emotion | Medium | Light | 7/10 | 7/10 |
+| 011 | Before/After | Heavy | Medium | 8/10 | 8/10 |
+| 012 | Visual Haiku | Heavy | Light | 9/10 | 8/10 |
+| 013 | Image Alchemy | Heavy | Light | 10/10 | 10/10 |
+| 014 | Lie Detector | Heavy | Medium | 9/10 | 8/10 |
+| 015 | Object Archaeology | Heavy | Medium | 8/10 | 8/10 |
+
+---
+
+**My top picks for uniqueness + fun:**
+1. **013 Image Alchemy** — Most unique, viral potential
+2. **009 Image Tarot** — Fun, shareable, low friction
+3. **007 Spot the Difference** — Game + AI demonstration
+4. **014 Visual Lie Detector** — Useful, educational
+
+What stands out to you?
diff --git a/docs/ideas/COMPARISON.md b/docs/ideas/COMPARISON.md
new file mode 100644
index 0000000..b949245
--- /dev/null
+++ b/docs/ideas/COMPARISON.md
@@ -0,0 +1,132 @@
+# Ideas Comparison Matrix
+
+**Date:** 2026-04-19  
+**Purpose:** Compare all ideas to select final concept
+
+---
+
+## Scoring Criteria
+
+| Criteria | Weight | Description |
+|----------|--------|-------------|
+| **Visual Analysis** | 30% | Heavy Kimi use (aligned with Kimi's strength) |
+| **Multi-Turn** | 20% | Not single-turn, builds over time |
+| **Human-AI Interaction** | 20% | Human participates, not passive |
+| **Cost Efficiency** | 15% | Low API costs (image gen vs analysis) |
+| **Uniqueness** | 10% | Stand out from competitors |
+| **Fun/Engagement** | 5% | Enjoyable to play/watch |
+
+**Scoring:** 1-5 (5 = best)
+
+---
+
+## Full Comparison Matrix
+
+| # | Idea | Visual | Multi-Turn | Human-AI | Cost | Unique | Fun | **Total** |
+|---|------|--------|------------|----------|------|--------|-----|-----------|
+| 001 | Visual Narrative Agent | 4 | 4 | 3 | 2 | 3 | 4 | **3.5** |
+| 002 | Visual Memory Journal | 3 | 3 | 2 | 3 | 4 | 3 | **3.0** |
+| 003 | Design Critic | 3 | 2 | 2 | 3 | 2 | 3 | **2.6** |
+| 004 | Visual Poem | 4 | 2 | 2 | 3 | 4 | 4 | **3.2** |
+| 005 | Scene Journey | 4 | 3 | 2 | 2 | 3 | 4 | **3.2** |
+| 007 | Spot the Difference | 4 | 2 | 3 | 2 | 4 | 5 | **3.4** |
+| 008 | Visual Detective | 4 | 3 | 2 | 3 | 4 | 4 | **3.5** |
+| 009 | Image Tarot | 4 | 2 | 3 | 3 | 4 | 5 | **3.5** |
+| 013 | Image Alchemy | 4 | 2 | 3 | 2 | 5 | 5 | **3.6** |
+| 014 | Lie Detector | 4 | 2 | 3 | 3 | 4 | 4 | **3.4** |
+| 032v2 | Art Critic | 5 | 3 | 3 | 3 | 3 | 4 | **3.7** |
+| **033v2** | **Detective** | **5** | **5** | **5** | **4** | **4** | **5** | **4.7** |
+| 035 | Guess Artist | 5 | 2 | 3 | 3 | 3 | 4 | **3.5** |
+| Auction | Auction | 3 | 4 | 5 | 4 | 4 | 4 | **3.9** |
+
+---
+
+## Top Contenders
+
+| Rank | Idea | Score | Key Strengths |
+|------|------|-------|---------------|
+| 🥇 | **033v2 Detective** | **4.7** | Best multi-turn, human directs, Kimi does real work |
+| 🥈 | Auction | 3.9 | Human describes, human engages, cheap |
+| 🥉 | 032v2 Art Critic | 3.7 | Kimi visual analysis, multi-turn |
+| 4 | 013 Image Alchemy | 3.6 | Most unique, viral potential |
+| 5 | 009 Image Tarot | 3.5 | Fun, shareable |
+
+---
+
+## 033v2 Detective — Why It Wins
+
+### Alignment with User Goals
+
+| User Goal | How Detective Meets It |
+|-----------|----------------------|
+| Heavy visual analysis | Kimi analyzes each piece of evidence |
+| Low reasoning | Pattern matching, not complex logic |
+| Multi-turn | 5-7 rounds per case |
+| Human-AI collaboration | Human (Chief) directs the investigation |
+| Cost efficient | Mostly text between Kimi calls |
+| Fun/engagement | Mystery + competition |
+
+### What Makes It Special
+
+1. **Natural two-agent roles:** Witness (sees) + Detective (thinks)
+2. **Human as boss:** Chief directs investigation, not passive observer
+3. **Multi-turn structure:** Each round builds the case
+4. **Kimi's strength shines:** Visual evidence analysis is the core mechanic
+5. **Scoring system:** Track cases solved, rounds taken, accuracy
+
+### Comparison to Other Games
+
+| Aspect | Spot the Difference | Tarot | Alchemy | **Detective** |
+|--------|-------------------|-------|---------|---------------|
+| Visual Analysis | 4 | 4 | 4 | **5** |
+| Multi-Turn | 2 | 2 | 2 | **5** |
+| Human Role | Judge | Receive | Submit | **Direct** |
+| Narrative | None | Story | Surprise | **Full Mystery** |
+| Replayability | Medium | Low | Medium | **High** |
+
+---
+
+## Recommendation
+
+**Go with 033v2 Detective.**
+
+### Why Not Others
+
+| Idea | Why Not |
+|------|---------|
+| 001 Visual Narrative | Too similar to others, high cost |
+| 007 Spot Difference | Fun but shallow (1-turn) |
+| 009 Image Tarot | Not really interactive |
+| 013 Image Alchemy | Unique but single interaction |
+| Auction | Good but less "AI demonstration" |
+
+### Detective's Edge
+
+- **Multi-turn** = not just a quick demo
+- **Human directs** = active participation
+- **Kimi sees evidence** = clear AI capability showcase
+- **Cost efficient** = mostly text
+- **Daily cases** = reason to return
+
+---
+
+## Next Steps for 033v2 Detective
+
+- [ ] Define case structure (5-7 evidence images)
+- [ ] Design Chief interface (what buttons/actions)
+- [ ] Plan Witness + Detective prompts
+- [ ] Mock up UI
+- [ ] Prototype with one case
+
+---
+
+## Appendix: Ideas That Could Combine with Detective
+
+### Detective + Art Critic
+Two types of daily content: Mystery case OR Art analysis
+
+### Detective + Auction
+Hybrid mode: Evidence auction where Chief describes to Detective
+
+### Detective + Spot Difference
+Mini-game within case: "Find the clue hidden in this photo"
diff --git a/docs/research-hermes-agent.md b/docs/research-hermes-agent.md
new file mode 100644
index 0000000..dc5197c
--- /dev/null
+++ b/docs/research-hermes-agent.md
@@ -0,0 +1,47 @@
+# Research: Hermes Agent Capabilities
+
+**Date:** 2026-04-19  
+**Purpose:** Understand Hermes Agent framework for hackathon integration
+
+## Hermes 3 (Nous Research)
+
+### Core Capabilities
+- **Advanced agentic capabilities**
+- **Reliable function calling** - Trained specifically for tool use
+- **Structured output** - JSON mode / Pydantic schemas
+- **ChatML prompt format** - OpenAI-compatible
+- Multi-turn conversation
+- Long context coherence
+
+### Benchmark Performance
+| Benchmark | Hermes 3 Score |
+|-----------|---------------|
+| IFEval (0-shot) | 61.70% |
+| MMLU-Redux | 92.7% |
+| MMLU-Pro | 81.1% |
+| SimpleQA | 31.0% |
+
+### Function Calling
+- Trained on specific prompts for tool use
+- XML-based tool call format: `<tool_call>{"name": "...", "arguments": {...}}</tool_call>`
+- Supports recursive/chain tool calls
+- Native tool integration via NousResearch/Hermes-Function-Calling repo
+
+## Hermes Agent Framework
+
+### Key Components
+1. **ChatML format** - Structured system/user/assistant turns
+2. **Tool definitions** - JSON schema for function signatures
+3. **Tool parsing** - Parse and execute function calls
+4. **Response loop** - Multi-turn agentic execution
+
+### Integration Points
+- HuggingFace Transformers
+- vLLM inference
+- Ollama local deployment
+- OpenAI-compatible API
+
+## Sources
+- https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B
+- https://github.com/NousResearch/Hermes-Function-Calling
+- https://arxiv.org/abs/2408.11857 (Hermes 3 Technical Report)
diff --git a/docs/research-image-generation-apis.md b/docs/research-image-generation-apis.md
new file mode 100644
index 0000000..5fcd646
--- /dev/null
+++ b/docs/research-image-generation-apis.md
@@ -0,0 +1,72 @@
+# Research: Image Generation APIs
+
+**Date:** 2026-04-19  
+**Purpose:** Find affordable/free image generation for hackathon project
+
+## Pollinations AI (Recommended ✅)
+
+**Why:** Free tier, OpenAI-compatible, multiple models, simple API
+
+### Quick Start
+```bash
+# No auth needed for basic
+curl "https://gen.pollinations.ai/image/a%20cat%20in%20space"
+
+# With auth
+curl -H "Authorization: Bearer YOUR_KEY" ...
+```
+
+### Models Available
+| Model | Type | Notes |
+|-------|------|-------|
+| `flux` | Default | Good quality |
+| `zimage` | Default | Alternative |
+| `wan-image` | Quality | Higher quality option |
+| `qwen-image` | Quality | Alibaba model |
+| `gptimage` | Quality | GPT-based |
+| `seedream5` | Style | Special styles |
+| `kontext` | Edit | Image editing |
+
+### Pricing
+- **Free tier:** Weekly pollen credits (tier-based)
+- **Paid:** $1 ≈ 1 Pollen
+- **Free API:** Limited but usable
+- **Rate limits:** Anonymous = limited, Seed/Flower = more
+
+### API Details
+- **Base URL:** `https://gen.pollinations.ai`
+- **Image endpoint:** `GET /image/{prompt}`
+- **OpenAI-compatible:** `POST /v1/images/generations`
+- **No setup:** Just curl it
+
+### Strengths
+- ✅ 100% Open Source
+- ✅ Free tier available
+- ✅ Multiple model options
+- ✅ Simple API (no complex setup)
+- ✅ OpenAI-compatible SDK
+
+### Weaknesses
+- ⚠️ Quality may not match DALL-E/Midjourney
+- ⚠️ Free tier has rate limits
+- ⚠️ Infrastructure may vary in reliability
+
+## Other Options Considered
+
+| Provider | Free Tier | Quality | Notes |
+|----------|-----------|---------|-------|
+| **Midjourney** | ❌ No | High | Expensive |
+| **Stable Diffusion** | Local only | High | Needs GPU |
+| **DALL-E 3** | ❌ No | High | OpenAI pricing |
+| **Ideogram** | Limited | Good | API in beta |
+| **Flux (Local)** | ✅ Free | High | Self-hosted, needs GPU |
+
+## Recommendation
+
+**Primary:** Pollinations AI (free tier + simplicity)  
+**Fallback:** Flux if we have GPU resources
+
+## Sources
+- https://gen.pollinations.ai
+- https://docs.pollinations.ai/
+- https://github.com/pollinations/pollinations
diff --git a/docs/research-kimi-visual-capabilities.md b/docs/research-kimi-visual-capabilities.md
new file mode 100644
index 0000000..2f283f3
--- /dev/null
+++ b/docs/research-kimi-visual-capabilities.md
@@ -0,0 +1,51 @@
+# Research: Kimi Visual Capabilities
+
+**Date:** 2026-04-19  
+**Purpose:** Validate Kimi's visual strengths for hackathon project
+
+## Kimi K2.5 - Multimodal Model
+
+### Core Capabilities
+- **Text + Images + Video** input support
+- 256K context length
+- Thinking/non-thinking modes
+- Agent task support
+
+### Visual API Models
+- `moonshot-v1-8k-vision-preview`
+- `moonshot-v1-32k-vision-preview`  
+- `moonshot-v1-128k-vision-preview`
+- `kimi-k2.5` (latest, supports video)
+
+### Supported Formats
+**Images:** png, jpeg, webp, gif  
+**Video:** mp4, mpeg, mov, avi, x-flv, mpg, webm, wmv, 3gpp
+
+### Unique Visual Features
+1. **Visual Coding** - Kimi Code, Kimi Claw for coding with visual context
+2. **Video Understanding** - Analyzes video content (unique for multimodal models)
+3. **Real-time Visual Chat** - Interactive visual conversation
+
+## Kimi K2 Benchmarks (Coding/Agent)
+
+| Benchmark | Kimi K2 Score | Notes |
+|-----------|---------------|-------|
+| SWE-bench Verified (Single Attempt) | **65.8%** | Global SOTA for open-source |
+| SWE-bench Multilingual | 47.3% | Outperforms most proprietary |
+| LiveCodeBench v6 | 53.7% | Strong coding |
+| TerminalBench | 30.0% | Agentic tool use |
+| Aider-Polyglot | 60.0% | Code editing |
+| Tau2-Bench (avg) | ~64% | Tool use tasks |
+
+## Kimi Visual Strengths Summary
+
+✅ **Video understanding** (unique advantage)  
+✅ **Visual coding** capabilities  
+✅ **Image + Text multimodal**  
+✅ **Strong agentic tool use**  
+✅ **256K context** for large visual inputs  
+
+## Sources
+- https://platform.moonshot.cn/docs/guide/kimi-k2-5-quickstart
+- https://moonshotai.github.io/Kimi-K2/
+- https://platform.moonshot.cn/docs/guide/use-kimi-vision-model