feat: Initial commit - Hermes Detective Agency concept

- Hermes Detective Agency: Open-ended mystery investigation game
- Roles: Chief (human), Witness (Kimi), Detective (Hermes)
- 5 difficulty levels, community cases, open-ended solving
- Scoring: Alignment %, Evidence %, Time
- Features: Retry, Journal, Observe mode
- Tech: Kimi Vision + Hermes Agent + Pollinations

Changelog:
- Research phase: Kimi capabilities, Hermes agent, image APIs
- Brainstorming: 14 ideas explored
- Comparison matrix: Detective selected as winner
- Concept finalized with all design decisions
This commit is contained in:
2026-04-20 00:00:30 +00:00
commit ecfd0b1160
10 changed files with 1685 additions and 0 deletions

View File

@@ -0,0 +1,502 @@
# Project: Hermes Detective Agency
**Chosen Concept:** 033v2 Detective
**Date:** 2026-04-19
**Status:** Concept Finalized
**Tags:** hermes-agent, kimi-vision, game, multi-agent, open-ended, community
---
## Concept Summary
A mystery investigation game where a human (Chief) directs two AI agents — a **Witness** (powered by Kimi Vision) and a **Detective** (powered by Hermes) — to investigate visual cases.
**Core philosophy:** Open-ended solving. No single truth. Evidence guides, but multiple theories are valid.
---
## Elevator Pitch
> *"You're the Chief. Your Witness sees everything. Your Detective connects the dots. Build YOUR theory. See how it aligns with others."*
---
## The Story
You run a small detective agency. Your two AI assistants have superhuman abilities:
- **Witness** can look at any image and describe it perfectly — every detail, every inconsistency, every hidden clue.
- **Detective** can take those observations and build theories, spot patterns, and identify suspects.
Your job? **Direct the investigation.** Tell them what to look at. Ask the right questions. Build your theory.
**Key difference:** There's no single "right answer." The creator has an intended story, but your theory is valid if evidence supports it.
---
## Game Roles
### Chief (Human)
The player. You run the investigation.
| Action | Effect |
|--------|--------|
| Examine evidence | Witness + Kimi analyze |
| Question suspects | Detective probes, Witness watches |
| Compare items | Kimi highlights differences |
| Build theory | Cite evidence, form conclusion |
| Request truth | See creator's intended story (optional) |
### Witness (Agent A + Kimi)
The eyes. Analyzes visual evidence. Appears based on triggers.
| Input | Output |
|-------|--------|
| Crime scene photo | "I see glass shards, muddy footprints, a broken frame..." |
| Suspect photo | "This person has paint on their sleeve..." |
| Document | Extracts text, notes inconsistencies |
| Item close-up | Identifies details Chief might miss |
**Dynamic Appearance:** In harder cases, Witness doesn't appear until triggered.
### Detective (Agent B)
The brain. Builds theories, responds to questions.
| Input | Output |
|-------|--------|
| Witness observations | "Based on evidence, the thief entered through..." |
| Suspect profiles | "Suspect A has motive: insurance fraud..." |
| Human questions | "Good question, Chief. Let me look into that..." |
| Theory building | Helps Chief cite evidence for their theory |
---
## Difficulty System
### Difficulty Levels
| Difficulty | Description | Evidence | Suspects | Red Herrings | Plot Twist |
|-----------|-------------|----------|----------|---------------|------------|
| **Easy** | Obvious clues, clear path | 4-5 | 2 | ❌ | ❌ |
| **Medium** | Requires comparison | 6-7 | 3 | ❌ | ❌ |
| **Hard** | Red herrings present | 8-9 | 4 | ✅ | ❌ |
| **Hardcore** | Plot twist mid-case | 10-11 | 4 | ✅ | ✅ |
| **Impossible** | All elements, complex | 12+ | 5 | ✅ | ✅ |
### Daily Structure
```
One case per day, everyone gets the same case
Same difficulty for all players
Different case each day
```
### Starter Pack (5 Cases)
| Week | Difficulty | Theme |
|------|------------|-------|
| 1 | Easy | Simple theft |
| 2 | Medium | Missing person |
| 3 | Hard | Corporate fraud |
| 4 | Hardcore | Art heist (plot twist) |
| 5 | Impossible | Multi-layered conspiracy |
**Approach:** Add cases incrementally during development.
---
## Evidence System
### Evidence Types
| Type | What Kimi Sees | Example Clue |
|------|---------------|--------------|
| **Crime scene** | Scene layout, objects, anomalies | "Window was broken from inside" |
| **Surveillance** | People, actions, timestamps | "Person lingered at door for 3 minutes" |
| **Documents** | Text, handwriting, context | "Letter mentions 'meeting at midnight'" |
| **Photos** | People, items, locations | "Suspect's shoes match the footprint" |
| **Maps** | Routes, access points, exits | "Only one entrance visible to street" |
| **Items** | Condition, marks, connections | "Key is copy — grooves don't match original" |
### Evidence Citation
Evidence helps build theory. Not all evidence is required.
```
Chief's Theory: "I think Suspect B did it."
📎 Cited Evidence:
- Evidence #3: Crime scene photo
- Evidence #5: Security footage
- Evidence #8: Witness testimony
→ 3/10 evidence cited (30%)
💬 Detective: "That's a solid theory. The evidence
supports B, but have you considered Evidence #7?"
```
### Hints Embedded in Evidence
Not a separate button. Hints are part of the evidence design.
| Level | Visibility | Example |
|-------|-----------|---------|
| **Too obvious** | Easy to find | "Letter saying 'I did it'" |
| **Barely obvious** | Check certain places | "Muddy shoes near suspect's home" |
| **Not too obvious** | Requires attention | "Timeline inconsistency in letter" |
### Witness Trigger System
In harder cases, Witness appears based on triggers.
```
Trigger Example:
Turn 1: Chief examines crime scene photo
Turn 2: Chief finds a hair sample on the floor
↓ [Trigger activated]
Turn 3: 👁️ Witness appears
↓ "I recognize this hair... it belongs to Suspect B's dog"
Turn 4: Chief examines suspect's home
Turn 5: 👁️ Witness appears again (new trigger)
↓ "I saw Suspect B leaving the gallery at midnight..."
```
**Indicator:** Each piece of evidence has a note indicating if it triggers Witness appearance.
---
## Open-Ended Solving
### Core Philosophy
> **No single truth. Multiple valid theories.**
| Before | After |
|--------|-------|
| One correct answer | Multiple valid theories |
| Wrong accusation = Fail | Theory valid if evidence supports |
| One winner | Everyone discusses |
| Truth ends game | Truth is guidance, not mandate |
### Theory Building
```
👤 Chief builds theory:
"I think Suspect B did it, with help from Suspect A.
B had access (night guard), A had keys (curator).
They split the insurance money."
📎 Chief cites evidence:
- Evidence #3: Crime scene (window not broken)
- Evidence #5: Security footage (B was inside)
- Evidence #7: A has master keys
- Evidence #9: Financial records (recent debt)
💬 Detective responds:
"That's a coherent theory. Your cited evidence
supports collaboration between A and B."
```
### Truth Reveal
**Available anytime. Does NOT end the game.**
| When | Why |
|------|-----|
| After building theory | "Did I get it right?" |
| When stuck | "Give me guidance" |
| Never | "I want to figure it out myself" |
| After solving | "See how close I was" |
```
📜 THE TRUTH (Creator's Intended)
The case was designed as:
"A and B collaborated. A had keys, B had access.
But C was the real mastermind, funding the whole thing."
👤 Your theory:
"Suspect B acted alone."
💬 Comparison:
- Your theory missed the collaboration element
- You correctly identified B as main actor
- Evidence you cited: 80% relevant
- 🎯 65% alignment with intended truth
💬 But: Your theory is still valid based on evidence!
Discussion continues. Truth is guidance, not mandate.
```
---
## Scoring System
### Per Case Statistics
| Metric | Calculation |
|--------|-------------|
| **Time** | Turns × 10 min (simplified) |
| **Evidence** | Evidence cited / Total evidence |
| **Alignment** | How close to creator's intended story |
| **Coherence** | Theory makes sense based on evidence |
### Statistics Display
```
┌─────────────────────────────────────┐
│ 📊 CASE STATISTICS │
├─────────────────────────────────────┤
│ ⏱️ Time: 6 turns × 10 min = 60 min │
│ 📎 Evidence: 7/10 cited (70%) │
│ 🎯 Alignment: 85% with creator │
│ 💬 Theory coherence: Strong │
├─────────────────────────────────────┤
│ ⭐ Rating: Sharp Detective │
└─────────────────────────────────────┘
```
### Rating Tiers
| Alignment | Rating |
|-----------|--------|
| 90-100% | Master Detective |
| 75-89% | Sharp Detective |
| 50-74% | Promising Detective |
| 25-49% | Apprentice |
| 0-24% | Rookie |
---
## Retry & Journal System
### Multiple Attempts
User can solve same case multiple times.
```
Case #47 — The Hartwell Heist
Your Attempts:
├── Attempt #1: 85% alignment, 6 turns 📖
├── Attempt #2: 92% alignment, 4 turns 📖
├── Attempt #3: In progress...
└── Best: 92% alignment
```
### Journal Documentation
Every attempt is documented (solve or not).
```
Attempt #1: April 19, 2026
├── Status: Solved
├── Evidence cited: 7/10
├── Alignment: 85%
├── Theory: "Suspect B acted alone"
└── Notes: "Missed the A-B collaboration"
```
### Privacy Settings
| Setting | Description |
|---------|-------------|
| **Private** | Only you see your attempts |
| **Publish stats** | Everyone sees your stats (default) |
| **Publish journal** | Anyone can read your solve |
---
## Replay (Observe Mode)
Watch how others solved the case.
```
📺 OBSERVE MODE
@alice's Solve of Case #47
Turn 1: Examined crime scene
Turn 2: Found hair sample → Witness appeared
Turn 3: Questioned Suspect B
Turn 4: Examined financial records
Turn 5: Cited evidence, formed theory
Turn 6: Requested truth reveal
⏱️ 6 turns | 🎯 85% alignment | ⭐ Sharp
```
**Only published journals are observable.**
---
## Case Creation System
### Starter Cases
5 cases (one per difficulty) as templates.
**Source:** Real solved cases adapted for the game.
### Community Cases
Anyone can create and share cases.
#### Creation Flow
```
1. Choose reference case (optional)
"Let's base this on the Isabella Stewart Gardner theft"
2. Gather/create evidence
Upload images (crime scene, suspects, documents)
3. Write case brief
├── Title, difficulty
├── Suspect list (names, photos)
├── Evidence set
├── Hidden truth (creator's intended story)
├── Red herrings (optional)
├── Plot twist (optional)
└── Witness triggers (which evidence triggers Witness)
4. Test it
Play through yourself to verify solvability
5. Publish
├── Private link (friends only)
└── Public (case library)
```
### Case Format
```yaml
case:
title: "The Hartwell Heist"
difficulty: medium
difficulty_description: "Requires comparison of evidence"
evidence:
- id: 1
type: photo
image: crime_scene.jpg
description: "Crime scene photograph"
triggers_witness: true
hint_level: not_too_obvious
- id: 2
type: document
image: letter.jpg
description: "Anonymous letter found"
triggers_witness: false
hint_level: barely_obvious
suspects:
- name: "Suspect A"
photo: suspect_a.jpg
description: "Gallery curator"
truth:
summary: "A and B collaborated..."
alignment_criteria:
- "Correctly identified collaboration"
- "Identified A as key holder"
- "Identified B as main actor"
witness_triggers:
- evidence_id: 1
testimony: "I see glass on the floor inside..."
```
### Case Creator Tools
| Tool | Purpose |
|------|---------|
| **Skill** | Hermes skill for case creation guidance |
| **Validator** | Verify case format is correct |
---
## Community Moderation
### Discovery Philosophy
> **Community cases are the jungle. Direct links are the path.**
| Discovery Method | Quality | Effort |
|-----------------|---------|--------|
| Case library (browse) | Mixed (jungle) | Low |
| Direct link from creator | Same quality | Medium |
| Social media / community | Trusted (curated) | High |
### Quality Signals
| Signal | Description |
|--------|-------------|
| **Visits** | How many times case was played |
| **Reviews** | 👍 or 👎 (no text, requires effort to spam) |
```
Case #47B — "The Missing Heirloom"
├── Visits: 234
├── 👍 45 | 👎 3
└── Quality score: High
```
**Note:** Review manipulation is possible but requires effort. Not perfect, but workable.
### Sharing Flow
```
Creator creates case
Tests locally
Publishes to community
Shares link on social media / Discord
Players try directly from creator
```
---
## Summary of Decisions
| Element | Decision |
|---------|----------|
| Difficulty | 5 levels (Easy → Impossible) |
| Daily structure | One case per day, same for all |
| Timer | ❌ No (first phase) |
| Hints | ✅ Embedded in evidence |
| Retry | ✅ Unlimited attempts |
| Journal | ✅ Every attempt documented |
| Observe | ✅ Watch published solves |
| Privacy | Private by default |
| Publish | Stats always, journal optional |
| Scoring | Alignment %, Evidence %, Time |
| Open-ended | ✅ No single truth |
| Truth reveal | Available anytime |
| Case source | Real cases + community |
| Witness | Dynamic (triggers in hard cases) |
| Red herrings | ✅ Hard+ difficulty |
| Plot twist | ✅ Hardcore+ difficulty |
| Community | Visits + reviews (no auth) |
---
## What's Next
Once we finalize the concept:
- Technical architecture
- UI/UX design
- Prompt engineering
- Case creation template
- Prototype development
---
## Related Documents
- `docs/ideas/COMPARISON.md` — Full comparison matrix
- `docs/ideas/008-visual-detective.md` — Initial brainstorm

View File

@@ -0,0 +1,79 @@
# Idea 001: Visual Narrative Agent
**Date:** 2026-04-19
**Status:** Idea
**Tags:** hermes-agent, kimi-vision, storytelling, image-generation
## Concept
An agentic storytelling system where Hermes orchestrates a narrative loop with Kimi's visual analysis and built-in image generation skills to produce coherent visual stories.
## User Flow
1. User provides text prompt (e.g., "A lone astronaut discovers an ancient alien garden on Mars")
2. Hermes plans story structure (scenes, pacing, visual style)
3. For each scene:
- Hermes generates image prompt
- Generate image (Hermes built-in skill: manim / ascii)
- Kimi analyzes generated image
- Kimi's feedback refines next scene's prompt
4. Return compiled visual story to user
## Key Differentiator
Most story-to-image tools: **Generate → Done**
This concept: **Generate → Analyze → Refine → Loop**
Kimi serves as the **visual reasoning engine** — tells Hermes if the generated image matches the intended scene, catches inconsistencies, and informs prompt refinement for the next scene.
## Tech Stack
| Component | Source | Role |
|-----------|--------|------|
| Hermes Agent | Nous Research | Orchestration, planning, decision loop |
| Kimi Vision | Moonshot AI (via gateway) | Image analysis, visual feedback |
| Image Generation | Pollinations AI | Free tier, multiple models (Flux, etc.) |
### Image Generation Options
| Provider | Free Tier | Quality | Use Case |
|---------|-----------|---------|----------|
| **Pollinations** ✅ | ✅ Yes | Good | Primary (simple, free) |
| **Flux (local)** | ✅ Free | High | If GPU available |
| **Hermes skills** | ✅ Free | Niche | Fallback/ASCII aesthetic |
### Pollinations API (Primary)
- **Endpoint:** `https://gen.pollinations.ai/image/{prompt}`
- **Models:** flux, zimage, wan-image, qwen-image, etc.
- **Cost:** Free tier (pollen credits), ~$1/1 Pollen paid
- **Auth:** Optional for free tier
## Strengths
- ✅ Combines Hermes + Kimi + Pollinations natively
- ✅ Agentic visual feedback loop is unique
- ✅ Visual coherence check via Kimi ensures quality
- ✅ Free tier = low barrier to test
- ✅ User controls output format (default: image)
## Weaknesses
- ⚠️ Pollinations quality vs DALL-E/Midjourney (may need to test)
- ⚠️ Kimi requires gateway access (no direct API key)
- ⚠️ Loop adds latency (generate → analyze → refine)
- ⚠️ Need to verify Pollinations reliability
## Uniqueness Score
**7/10** — Agentic visual feedback loop is novel, but need to verify if built-in image generation is compelling enough
## Next Steps
- [ ] Explore Hermes built-in image skills (manim, ascii)
- [ ] Define output format options
- [ ] Sketch technical architecture
## Related Ideas
- See: `002-xxx.md`, `003-xxx.md` for alternatives

View File

@@ -0,0 +1,138 @@
# Idea 007: Spot the Difference Agent
**Date:** 2026-04-19
**Status:** Idea
**Tags:** hermes-agent, kimi-vision, puzzle, gamification, webapp
## Concept
A daily "Spot the Difference" puzzle webapp where AI (Kimi + Hermes) analyzes two images and shows its step-by-step process in finding the differences.
**Core insight:** Use visual analysis strength, minimize reasoning load.
## User Flow
1. User opens webapp → sees today's "Spot the Difference" puzzle (two similar images)
2. User can play manually (click on differences) OR
3. User clicks "Let AI Solve" → watches AI's step-by-step analysis
4. AI shows its reasoning process: "Scanning left-to-right... Found difference #1: color mismatch in top-left..."
5. Leaderboard shows attempt stats (anonymous)
## Why This Works
| Aspect | Implementation |
|--------|----------------|
| **Visual Analysis** | Kimi compares images pixel-level + semantic |
| **Low Reasoning** | Pattern matching, not complex logic |
| **Step-by-Step** | Show each finding with visual highlight |
| **Gamification** | Daily puzzle, leaderboard, no auth |
## Puzzle Types
### Primary: Spot the Difference (v1)
- Two images with subtle differences
- Kimi identifies all differences
- Each found difference highlighted + explanation
### Secondary (future):
- Find the anomaly (what's wrong in this image?)
- Count the objects (how many X in this image?)
- What's different? (semantic analysis)
## Technical Stack
| Component | Source | Role |
|-----------|--------|------|
| Frontend | Single HTML page | Display puzzle, show AI process |
| Image Analysis | Kimi Vision (via gateway) | Compare images, find differences |
| Orchestration | Hermes Agent | Coordinate flow, format output |
| Image Gen | Pollinations AI | Generate daily puzzle pairs |
### Daily Puzzle Generation
```
Hermes + Pollinations → Generate base image
Hermes + Pollinations → Generate modified image (with subtle changes)
Store both → Serve to users daily
```
### AI Solving Process
```
1. Hermes receives both images
2. Send to Kimi Vision for analysis
3. Kimi returns list of differences with locations
4. Hermes formats step-by-step explanation
5. Frontend animates each finding
```
## Features
### Core
- [ ] Daily puzzle auto-rotates
- [ ] Two-image display (side by side)
- [ ] "Let AI Solve" button
- [ ] Step-by-step visualization of AI findings
- [ ] Show each difference with highlight + explanation
### Gamification (no auth)
- [ ] Attempt counter (per user session, localStorage)
- [ ] Leaderboard (anonymous, session-based)
- [ ] "Perfect solve" badge (AI found all differences on first pass)
### Nice to Have
- [ ] Difficulty levels (Easy/Medium/Hard)
- [ ] Share result as image
- [ ] Hint system (Kimi finds 1, user finds rest)
## Step-by-Step Output Format
```
🔍 Scanning image...
✅ Difference #1 found: "The lamp color changed from blue to red"
📍 Location: Top-left corner
👆 [Highlighted on image]
✅ Difference #2 found: "Window shape is slightly different"
📍 Location: Center-right
👆 [Highlighted on image]
...
🎯 Solved! Found X differences in Y steps.
⏱️ Time: Z seconds
```
## Comparison with Other Ideas
| Aspect | 001 Visual Narrative | 007 Spot the Difference |
|--------|---------------------|------------------------|
| Visual Analysis | Heavy | **Heavy** |
| Reasoning | Medium | **Light** |
| Demo Impact | High | **High** |
| Gamification | Low | **High** |
| Uniqueness | 7/10 | **9/10** |
| Step-by-Step | Yes | **Yes (more natural)** |
## Why Stronger than 001
1. **Tangible use case** — People actually play spot the difference
2. **Clear AI demonstration** — "Watch AI see what you see"
3. **Gamification** — Daily puzzle + leaderboard = engagement
4. **Low reasoning, high vision** — Perfect for Kimi's strength
5. **Step-by-step is natural** — Not forced, it's how you'd solve it
## Risks
- ⚠️ Need reliable daily puzzle generation (harder than it sounds)
- ⚠️ Kimi analysis quality depends on image complexity
- ⚠️ Need diverse puzzle set to not repeat
## Next Steps
- [ ] Test Kimi's spot-the-difference capability
- [ ] Design puzzle generation pipeline
- [ ] Mock up webapp UI
- [ ] Prototype step-by-step visualization
## Related Ideas
- See: `001-visual-narrative-agent.md`

View File

@@ -0,0 +1,397 @@
# Idea 008: Visual Detective
**Date:** 2026-04-19
## Concept
Upload a "crime scene" or mystery image. Kimi analyzes every detail. Hermes pieces together clues and generates a detective story/hypothesis.
## Why Strong
- Heavy visual analysis (Kimi reads the scene)
- Low reasoning (observation, not complex logic)
- Storytelling naturally fits step-by-step
- Mystery genre = engaging
## User Flow
1. Upload image (or get random daily mystery)
2. Kimi: "I see a broken window, muddy footprints, overturned chair..."
3. Hermes: "Based on these clues, here's what likely happened..."
4. Output: Detective story with visual evidence
## Tech
- Kimi Vision: Scene analysis
- Hermes: Narrative orchestration
- Pollinations: Generate mystery images
## Unique?
- Nobody's doing "AI detective" with your photos
- Could be daily mystery + community solving
---
## 009: Image Tarot Reader
**Date:** 2026-04-19
## Concept
Upload any image. AI interprets it like a tarot card reading.
## Why Strong
- Fun/flirty, low stakes
- Heavy visual analysis (Kimi interprets symbolism)
- Storytelling fits perfectly
- Shareable results
## User Flow
1. Upload image OR random draw
2. Kimi: Analyzes composition, colors, objects, mood
3. Hermes: "This represents [Tarot card]. Your reading: [Narrative]"
4. Output: Tarot card + 3-card spread interpretation
## Step-by-Step
```
🃏 Drawing your card...
👁️ Analyzing your image...
Visual Elements Detected:
• A winding road (path in life)
• Setting sun (endings/new beginnings)
• Standing figure (you, the observer)
🎴 Your Card: The Fool
Interpretation: A new journey awaits. Trust the path ahead...
Past: Confusion about direction
Present: Standing at the crossroads
Future: Leap of faith required
```
## Tech
- Kimi Vision: Symbol analysis
- Hermes: Tarot narrative generation
- Pollinations: Generate thematic card visuals
---
## 010: Color Emotion Translator
**Date:** 2026-04-19
## Concept
Upload image. AI analyzes dominant colors and translates them into emotions/mood.
## Why Strong
- Pure visual analysis
- Art/design focused
- Generates color palette + emotion report
- Useful for designers
## User Flow
1. Upload image
2. Kimi: Extracts colors, analyzes saturation, harmony
3. Hermes: Translates to emotions, generates palette
4. Output: Color palette + emotion breakdown + suggested uses
## Step-by-Step
```
🔍 Scanning colors...
🎨 Extracting dominant palette...
Detected Colors:
• #2D4A3E (Deep Forest Green) - 45%
• #F5E6D3 (Warm Cream) - 30%
• #8B4513 (Saddle Brown) - 15%
• #CD853F (Peru Gold) - 10%
🎭 Emotional Profile:
Primary: Grounded, natural, calm
Secondary: Warm, nostalgic, organic
Accent: Vintage, artisanal, trustworthy
💡 Recommendations:
• Brand Identity for eco-friendly products
• Interior design: cozy cabin aesthetic
• Packaging: artisanal food products
```
---
## 011: Before/After Time Machine
**Date:** 2026-04-19
## Concept
Upload an old/historical photo. AI shows what it would look like today or vice versa.
## Why Strong
- Historical/educational angle
- Visual transformation is compelling
- Shows AI's understanding of time/changes
## User Flow
1. Upload old OR new photo
2. Select transformation direction
3. Kimi: Analyzes context, era, subject
4. Hermes: Predicts/adapts to target era
5. Output: Side-by-side transformation
## Step-by-Step
```
📸 Analyzing source image...
📅 Detected era: 1950s New York Street
Identifying elements:
• Black & white photography style
• Vintage automobiles (1950s models)
• Fashion: fedoras, swing coats
• Architecture: Art Deco buildings
🔮 Projecting to 2024...
Transformation breakdown:
• Colorization: Added natural skin tones + sky colors
• Vehicles: Replaced with modern equivalents
• Architecture: Updated signage, added modern elements
• Fashion: Modernized while preserving style
✨ Your 1950s scene in 2024!
```
---
## 012: Visual Haiku Generator
**Date:** 2026-04-19
## Concept
Upload any image. AI generates a haiku (5-7-5) based on visual elements.
## Why Strong
- Minimal reasoning, pure visual
- Artistic/creative output
- Japanese aesthetic + AI = unique
- Highly shareable
## User Flow
1. Upload image
2. Kimi: Analyzes scene, mood, elements
3. Hermes: Crafts haiku (strict 5-7-5)
4. Output: Image + haiku + syllable breakdown
## Step-by-Step
```
🖼️ Analyzing your image...
Scene Elements:
• Autumn forest path
• Golden leaves falling
• Soft morning light through trees
✍️ Crafting haiku...
Forest whispers
Golden footsteps on leaves—
Silence speaks loud
📝 Syllable breakdown:
"Forest" (2) - whisper (2)
s(1) - il(1) -ence (1) - speaks (1) - loud (1)
"Golden" (2) - foot (1) -steps (1) - on (1) - leaves (1)
(5) - (7) - (5) ✅
```
---
## 013: Image Alchemy
**Date:** 2026-04-19
## Concept
Upload two random images. AI "fuses" them into a new concept based on their shared elements.
## Why Strong
- Surprising/comedic combinations
- Pure visual + semantic analysis
- Unique creative output
- Viral potential
## User Flow
1. Upload image A (or random)
2. Upload image B (or random)
3. Kimi: Analyzes both separately
4. Hermes: Finds connections, creates fusion
5. Output: New concept + fused image prompt
## Step-by-Step
```
🌀 Analyzing Image A: A Viking ship
• Norse aesthetic
• Ocean voyage
• Historical warrior culture
🌀 Analyzing Image B: A Coffee shop
• Cozy atmosphere
• Barista craft
• Modern social space
🔮 Alchemizing...
Found connections:
• Craft (warrior's craft → barista's craft)
• Ritual (battle ritual → coffee ritual)
• Journey (ocean voyage → daily commute)
⚗️ Alchemy Result:
"THE VIKING BARISTA"
A warrior of the morning,
steering through storms of exhaustion,
claiming the sacred cup.
Your coffee shop serves mead in horn-shaped mugs,
the barista wears a helmet of foam,
and every latte is a conquest.
```
---
## 014: Visual Lie Detector
**Date:** 2026-04-19
## Concept
Upload a photo + claim. AI analyzes if the image supports or contradicts the claim.
## Why Strong
- Useful in era of fake news
- Pure visual verification
- Educational about image analysis
- "Is this real?" tool
## User Flow
1. Paste claim + upload image
2. Kimi: Analyzes image details
3. Hermes: Compares claim vs evidence
4. Output: Verdict + reasoning
## Step-by-Step
```
🔍 Analyzing claim: "This photo was taken in Paris"
🔬 Image Analysis:
• Architecture: Haussmannian buildings ✓
• Street signs: French ✓
• License plates: European format ✓
• Language: French on signs ✓
• Vegetation: Consistent with Paris climate ✓
• Shadows: Consistent with claimed time of day ✓
✅ VERDICT: LIKELY AUTHENTIC
Confidence: 94%
Supporting evidence: 8/8 elements match
Caveats: Metadata not verified
```
---
## 015: Object Archaeology
**Date:** 2026-04-19
## Concept
Upload an object close-up. AI identifies it, tells its history/story.
## Why Strong
- Educational
- Heavy visual (identification + knowledge)
- Discovery/antiquities angle
- Could work with museum APIs
## User Flow
1. Upload object photo
2. Kimi: Visual identification + details
3. Hermes: Tells object's "story"
4. Output: Identity + history narrative
## Step-by-Step
```
🔍 Scanning object...
Visual Analysis:
• Material: Ceramic
• Style: Ming Dynasty blue and white
• Pattern: Dragon with cloud motifs
• Technique: Underglaze blue
🏺 Object Identified:
Ming Dynasty (1368-1644) Blue and White Porcelain
Dragon Pattern Bowl
📜 The Story:
This bowl was crafted during the reign of Emperor Wanli,
at the height of Jingdezhen's porcelain production.
The dragon motif signifies imperial power and protection...
[Full historical narrative]
```
---
## Quick Comparison Matrix
| # | Name | Visual | Reasoning | Uniqueness | Fun |
|---|------|--------|-----------|------------|-----|
| 007 | Spot the Difference | Heavy | Light | 9/10 | 8/10 |
| 008 | Visual Detective | Heavy | Light | 8/10 | 9/10 |
| 009 | Image Tarot | Heavy | Light | 8/10 | 10/10 |
| 010 | Color Emotion | Medium | Light | 7/10 | 7/10 |
| 011 | Before/After | Heavy | Medium | 8/10 | 8/10 |
| 012 | Visual Haiku | Heavy | Light | 9/10 | 8/10 |
| 013 | Image Alchemy | Heavy | Light | 10/10 | 10/10 |
| 014 | Lie Detector | Heavy | Medium | 9/10 | 8/10 |
| 015 | Object Archaeology | Heavy | Medium | 8/10 | 8/10 |
---
**My top picks for uniqueness + fun:**
1. **013 Image Alchemy** — Most unique, viral potential
2. **009 Image Tarot** — Fun, shareable, low friction
3. **007 Spot the Difference** — Game + AI demonstration
4. **014 Visual Lie Detector** — Useful, educational
What stands out to you?

132
docs/ideas/COMPARISON.md Normal file
View File

@@ -0,0 +1,132 @@
# Ideas Comparison Matrix
**Date:** 2026-04-19
**Purpose:** Compare all ideas to select final concept
---
## Scoring Criteria
| Criteria | Weight | Description |
|----------|--------|-------------|
| **Visual Analysis** | 30% | Heavy Kimi use (aligned with Kimi's strength) |
| **Multi-Turn** | 20% | Not single-turn, builds over time |
| **Human-AI Interaction** | 20% | Human participates, not passive |
| **Cost Efficiency** | 15% | Low API costs (image gen vs analysis) |
| **Uniqueness** | 10% | Stand out from competitors |
| **Fun/Engagement** | 5% | Enjoyable to play/watch |
**Scoring:** 1-5 (5 = best)
---
## Full Comparison Matrix
| # | Idea | Visual | Multi-Turn | Human-AI | Cost | Unique | Fun | **Total** |
|---|------|--------|------------|----------|------|--------|-----|-----------|
| 001 | Visual Narrative Agent | 4 | 4 | 3 | 2 | 3 | 4 | **3.5** |
| 002 | Visual Memory Journal | 3 | 3 | 2 | 3 | 4 | 3 | **3.0** |
| 003 | Design Critic | 3 | 2 | 2 | 3 | 2 | 3 | **2.6** |
| 004 | Visual Poem | 4 | 2 | 2 | 3 | 4 | 4 | **3.2** |
| 005 | Scene Journey | 4 | 3 | 2 | 2 | 3 | 4 | **3.2** |
| 007 | Spot the Difference | 4 | 2 | 3 | 2 | 4 | 5 | **3.4** |
| 008 | Visual Detective | 4 | 3 | 2 | 3 | 4 | 4 | **3.5** |
| 009 | Image Tarot | 4 | 2 | 3 | 3 | 4 | 5 | **3.5** |
| 013 | Image Alchemy | 4 | 2 | 3 | 2 | 5 | 5 | **3.6** |
| 014 | Lie Detector | 4 | 2 | 3 | 3 | 4 | 4 | **3.4** |
| 032v2 | Art Critic | 5 | 3 | 3 | 3 | 3 | 4 | **3.7** |
| **033v2** | **Detective** | **5** | **5** | **5** | **4** | **4** | **5** | **4.7** |
| 035 | Guess Artist | 5 | 2 | 3 | 3 | 3 | 4 | **3.5** |
| Auction | Auction | 3 | 4 | 5 | 4 | 4 | 4 | **3.9** |
---
## Top Contenders
| Rank | Idea | Score | Key Strengths |
|------|------|-------|---------------|
| 🥇 | **033v2 Detective** | **4.7** | Best multi-turn, human directs, Kimi does real work |
| 🥈 | Auction | 3.9 | Human describes, human engages, cheap |
| 🥉 | 032v2 Art Critic | 3.7 | Kimi visual analysis, multi-turn |
| 4 | 013 Image Alchemy | 3.6 | Most unique, viral potential |
| 5 | 009 Image Tarot | 3.5 | Fun, shareable |
---
## 033v2 Detective — Why It Wins
### Alignment with User Goals
| User Goal | How Detective Meets It |
|-----------|----------------------|
| Heavy visual analysis | Kimi analyzes each piece of evidence |
| Low reasoning | Pattern matching, not complex logic |
| Multi-turn | 5-7 rounds per case |
| Human-AI collaboration | Human (Chief) directs the investigation |
| Cost efficient | Mostly text between Kimi calls |
| Fun/engagement | Mystery + competition |
### What Makes It Special
1. **Natural two-agent roles:** Witness (sees) + Detective (thinks)
2. **Human as boss:** Chief directs investigation, not passive observer
3. **Multi-turn structure:** Each round builds the case
4. **Kimi's strength shines:** Visual evidence analysis is the core mechanic
5. **Scoring system:** Track cases solved, rounds taken, accuracy
### Comparison to Other Games
| Aspect | Spot the Difference | Tarot | Alchemy | **Detective** |
|--------|-------------------|-------|---------|---------------|
| Visual Analysis | 4 | 4 | 4 | **5** |
| Multi-Turn | 2 | 2 | 2 | **5** |
| Human Role | Judge | Receive | Submit | **Direct** |
| Narrative | None | Story | Surprise | **Full Mystery** |
| Replayability | Medium | Low | Medium | **High** |
---
## Recommendation
**Go with 033v2 Detective.**
### Why Not Others
| Idea | Why Not |
|------|---------|
| 001 Visual Narrative | Too similar to others, high cost |
| 007 Spot Difference | Fun but shallow (1-turn) |
| 009 Image Tarot | Not really interactive |
| 013 Image Alchemy | Unique but single interaction |
| Auction | Good but less "AI demonstration" |
### Detective's Edge
- **Multi-turn** = not just a quick demo
- **Human directs** = active participation
- **Kimi sees evidence** = clear AI capability showcase
- **Cost efficient** = mostly text
- **Daily cases** = reason to return
---
## Next Steps for 033v2 Detective
- [ ] Define case structure (5-7 evidence images)
- [ ] Design Chief interface (what buttons/actions)
- [ ] Plan Witness + Detective prompts
- [ ] Mock up UI
- [ ] Prototype with one case
---
## Appendix: Ideas That Could Combine with Detective
### Detective + Art Critic
Two types of daily content: Mystery case OR Art analysis
### Detective + Auction
Hybrid mode: Evidence auction where Chief describes to Detective
### Detective + Spot Difference
Mini-game within case: "Find the clue hidden in this photo"

View File

@@ -0,0 +1,47 @@
# Research: Hermes Agent Capabilities
**Date:** 2026-04-19
**Purpose:** Understand Hermes Agent framework for hackathon integration
## Hermes 3 (Nous Research)
### Core Capabilities
- **Advanced agentic capabilities**
- **Reliable function calling** - Trained specifically for tool use
- **Structured output** - JSON mode / Pydantic schemas
- **ChatML prompt format** - OpenAI-compatible
- Multi-turn conversation
- Long context coherence
### Benchmark Performance
| Benchmark | Hermes 3 Score |
|-----------|---------------|
| IFEval (0-shot) | 61.70% |
| MMLU-Redux | 92.7% |
| MMLU-Pro | 81.1% |
| SimpleQA | 31.0% |
### Function Calling
- Trained on specific prompts for tool use
- XML-based tool call format: `<tool_call>{"name": "...", "arguments": {...}}</tool_call>`
- Supports recursive/chain tool calls
- Native tool integration via NousResearch/Hermes-Function-Calling repo
## Hermes Agent Framework
### Key Components
1. **ChatML format** - Structured system/user/assistant turns
2. **Tool definitions** - JSON schema for function signatures
3. **Tool parsing** - Parse and execute function calls
4. **Response loop** - Multi-turn agentic execution
### Integration Points
- HuggingFace Transformers
- vLLM inference
- Ollama local deployment
- OpenAI-compatible API
## Sources
- https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B
- https://github.com/NousResearch/Hermes-Function-Calling
- https://arxiv.org/abs/2408.11857 (Hermes 3 Technical Report)

View File

@@ -0,0 +1,72 @@
# Research: Image Generation APIs
**Date:** 2026-04-19
**Purpose:** Find affordable/free image generation for hackathon project
## Pollinations AI (Recommended ✅)
**Why:** Free tier, OpenAI-compatible, multiple models, simple API
### Quick Start
```bash
# No auth needed for basic
curl "https://gen.pollinations.ai/image/a%20cat%20in%20space"
# With auth
curl -H "Authorization: Bearer YOUR_KEY" ...
```
### Models Available
| Model | Type | Notes |
|-------|------|-------|
| `flux` | Default | Good quality |
| `zimage` | Default | Alternative |
| `wan-image` | Quality | Higher quality option |
| `qwen-image` | Quality | Alibaba model |
| `gptimage` | Quality | GPT-based |
| `seedream5` | Style | Special styles |
| `kontext` | Edit | Image editing |
### Pricing
- **Free tier:** Weekly pollen credits (tier-based)
- **Paid:** $1 ≈ 1 Pollen
- **Free API:** Limited but usable
- **Rate limits:** Anonymous = limited, Seed/Flower = more
### API Details
- **Base URL:** `https://gen.pollinations.ai`
- **Image endpoint:** `GET /image/{prompt}`
- **OpenAI-compatible:** `POST /v1/images/generations`
- **No setup:** Just curl it
### Strengths
- ✅ 100% Open Source
- ✅ Free tier available
- ✅ Multiple model options
- ✅ Simple API (no complex setup)
- ✅ OpenAI-compatible SDK
### Weaknesses
- ⚠️ Quality may not match DALL-E/Midjourney
- ⚠️ Free tier has rate limits
- ⚠️ Infrastructure may vary in reliability
## Other Options Considered
| Provider | Free Tier | Quality | Notes |
|----------|-----------|---------|-------|
| **Midjourney** | ❌ No | High | Expensive |
| **Stable Diffusion** | Local only | High | Needs GPU |
| **DALL-E 3** | ❌ No | High | OpenAI pricing |
| **Ideogram** | Limited | Good | API in beta |
| **Flux (Local)** | ✅ Free | High | Self-hosted, needs GPU |
## Recommendation
**Primary:** Pollinations AI (free tier + simplicity)
**Fallback:** Flux if we have GPU resources
## Sources
- https://gen.pollinations.ai
- https://docs.pollinations.ai/
- https://github.com/pollinations/pollinations

View File

@@ -0,0 +1,51 @@
# Research: Kimi Visual Capabilities
**Date:** 2026-04-19
**Purpose:** Validate Kimi's visual strengths for hackathon project
## Kimi K2.5 - Multimodal Model
### Core Capabilities
- **Text + Images + Video** input support
- 256K context length
- Thinking/non-thinking modes
- Agent task support
### Visual API Models
- `moonshot-v1-8k-vision-preview`
- `moonshot-v1-32k-vision-preview`
- `moonshot-v1-128k-vision-preview`
- `kimi-k2.5` (latest, supports video)
### Supported Formats
**Images:** png, jpeg, webp, gif
**Video:** mp4, mpeg, mov, avi, x-flv, mpg, webm, wmv, 3gpp
### Unique Visual Features
1. **Visual Coding** - Kimi Code, Kimi Claw for coding with visual context
2. **Video Understanding** - Analyzes video content (unique for multimodal models)
3. **Real-time Visual Chat** - Interactive visual conversation
## Kimi K2 Benchmarks (Coding/Agent)
| Benchmark | Kimi K2 Score | Notes |
|-----------|---------------|-------|
| SWE-bench Verified (Single Attempt) | **65.8%** | Global SOTA for open-source |
| SWE-bench Multilingual | 47.3% | Outperforms most proprietary |
| LiveCodeBench v6 | 53.7% | Strong coding |
| TerminalBench | 30.0% | Agentic tool use |
| Aider-Polyglot | 60.0% | Code editing |
| Tau2-Bench (avg) | ~64% | Tool use tasks |
## Kimi Visual Strengths Summary
**Video understanding** (unique advantage)
**Visual coding** capabilities
**Image + Text multimodal**
**Strong agentic tool use**
**256K context** for large visual inputs
## Sources
- https://platform.moonshot.cn/docs/guide/kimi-k2-5-quickstart
- https://moonshotai.github.io/Kimi-K2/
- https://platform.moonshot.cn/docs/guide/use-kimi-vision-model