- Hermes Detective Agency: Open-ended mystery investigation game - Roles: Chief (human), Witness (Kimi), Detective (Hermes) - 5 difficulty levels, community cases, open-ended solving - Scoring: Alignment %, Evidence %, Time - Features: Retry, Journal, Observe mode - Tech: Kimi Vision + Hermes Agent + Pollinations Changelog: - Research phase: Kimi capabilities, Hermes agent, image APIs - Brainstorming: 14 ideas explored - Comparison matrix: Detective selected as winner - Concept finalized with all design decisions
5.6 KiB
Session Log
2026-04-19
001 - Session Start: Hermes Hackathon
What: Started Hermes Agent Creative Hackathon collaboration.
Context:
- Hackathon: 16 days, $25k prizes (Main $15k, Kimi $5k, $5k Kimi credits)
- Presented by Kimi Moonshot & Nous Research
- Two tracks: Main (any creative use) and Kimi Track (must use Kimi models)
- Deadline: EOD Sunday, May 3rd
Action:
- Set up workflow structure (
.issues/,docs/, git init) - Created first issue file:
001-hermes-hackathon-project.md
Next:
- Define project concept and creative domain focus
- Explore Hermes Agent capabilities
- Sketch initial prototype idea
002 - Research Completed
What: Validated Kimi and Hermes Agent capabilities.
Findings:
- Kimi K2.5: multimodal (text+image+video), video understanding, visual coding
- Kimi benchmarks: SWE-bench 65.8%, Tau2 ~64%
- Hermes 3: function calling, structured output, OpenAI-compatible
- Hermes built-in skills: manim-video, ascii-video, ascii-art (accessibility-focused)
Action:
- Created
docs/research-kimi-visual-capabilities.md - Created
docs/research-hermes-agent.md - Created
docs/research-image-generation-apis.md - Updated issue file with research summary
Next:
- Define concrete project concept
- Choose specific creative angle (visual coding? video analysis? image generation?)
- Start rapid prototyping
003 - Image Gen API Research
What: Found affordable/free image generation API.
Findings:
- Pollinations AI ✅: Free tier, OpenAI-compatible, multiple models (Flux, etc.)
- Endpoint:
https://gen.pollinations.ai/image/{prompt} - Simple: just curl it, no auth needed for basic
- Models: flux, zimage, wan-image, qwen-image, gptimage
- Cost: Free tier (pollen credits), $1 ≈ 1 Pollen paid
- Endpoint:
Action:
- Created
docs/research-image-generation-apis.md - Updated idea 001 with image gen options
Next:
- Sketch more project ideas for comparison
- Do idea benchmark matrix
004 - Brainstorming Session
What: Generated 7 project ideas, deeper dive on Idea 007.
Ideas Generated:
- 001: Visual Narrative Agent (text → image loop)
- 002: Visual Memory Journal (AI scrapbook)
- 003: Reverse Design Critic (UI critique + fix)
- 004: Visual Poem Generator (two-AI art collaboration)
- 005: Scene-to-Scene Video Storyteller (visual journey)
- 006: Real-time Visual Debugger (screenshot → fix)
- 007: Spot the Difference Agent (NEW FOCUS)
User Preferences:
- Want high visual analysis, low reasoning
- Single page webapp, no auth
- Show step-by-step AI process
- Gamification (leaderboard, daily puzzles)
Selected for deeper dive: 007 Vision Puzzle
Action:
- Created
docs/ideas/007-vision-spot-the-difference.md
Next:
- Compare all ideas to pick winner
005 - Ideas Comparison
What: Created comparison matrix for all brainstormed ideas.
Ideas Compared: 14 concepts across visual games, interactive, and creative
Scoring Criteria:
- Visual Analysis (30%)
- Multi-Turn (20%)
- Human-AI Interaction (20%)
- Cost Efficiency (15%)
- Uniqueness (10%)
- Fun (5%)
Results:
| Rank | Idea | Score |
|---|---|---|
| 🥇 | 033v2 Detective | 4.7 |
| 🥈 | Auction | 3.9 |
| 🥉 | 032v2 Art Critic | 3.7 |
| 4 | 013 Image Alchemy | 3.6 |
| 5 | 009 Image Tarot | 3.5 |
Winner: 033v2 Detective
Why:
- Best multi-turn (5+ rounds)
- Human actively directs (Chief role)
- Kimi does real visual work
- Cost efficient (mostly text)
- Natural mystery/narrative arc
Action:
- Created
docs/ideas/COMPARISON.md - Created
docs/ideas/008-visual-detective.md(includes multi-agent v2)
Next:
- Discuss and finalize concept details
006 - Concept Documented
What: Documented chosen 033v2 Detective as chosen-detective-game.md.
Documented:
- Elevator pitch
- Game roles (Chief, Witness, Detective)
- Evidence types (crime scene, documents, photos, etc.)
- Round structure (7 rounds per case)
- Scoring system
- UI concept sketch
- Difficulty tiers (Rookie → Chief)
- Daily cases + leaderboard
Action:
- Created
docs/chosen-detective-game.md
Open Questions (for discussion):
- How much Witness describes unprompted?
- Can Detective be wrong?
- Red herrings — yes/no?
- Plot twist mid-case?
- Timer?
- Replay past cases?
- Hints system?
- Skip evidence?
- Case sources (pre-made/generated)?
- Image sources (real/AI/illustrated)?
- Share results?
- Community cases?
Next:
- Discuss and finalize concept details
007 - Concept Finalized
What: Finalized Hermes Detective Agency concept after extensive discussion.
Key Decisions:
- Difficulty: 5 levels (Easy → Impossible), one case per day
- Open-ended solving: No single truth, multiple valid theories
- Scoring: Alignment %, Evidence cited %, Time (turns × 10min)
- Hints: Embedded in evidence (too obvious, barely obvious, not too obvious)
- Witness: Dynamic appearance based on triggers (harder cases)
- Truth reveal: Available anytime, doesn't end game
- Retry: Unlimited attempts, every documented
- Journal: Private by default, publish stats/journal optional
- Observe: Watch others' published solves
- Case source: 5 starter cases (one per difficulty) + community generation
- Community: Visits + reviews (no auth, manipulable but requires effort)
- Discovery: Jungle (browse) vs path (direct links from creator)
- Case format: YAML-based template
- Creator tools: Hermes skill + format validator
Action:
- Updated
docs/chosen-detective-game.mdwith full finalized concept
Next:
- Technical architecture
- UI/UX design
- Prompt engineering
- Prototype development