feat: Initial commit - Hermes Detective Agency concept
- Hermes Detective Agency: Open-ended mystery investigation game - Roles: Chief (human), Witness (Kimi), Detective (Hermes) - 5 difficulty levels, community cases, open-ended solving - Scoring: Alignment %, Evidence %, Time - Features: Retry, Journal, Observe mode - Tech: Kimi Vision + Hermes Agent + Pollinations Changelog: - Research phase: Kimi capabilities, Hermes agent, image APIs - Brainstorming: 14 ideas explored - Comparison matrix: Detective selected as winner - Concept finalized with all design decisions
This commit is contained in:
502
docs/chosen-detective-game.md
Normal file
502
docs/chosen-detective-game.md
Normal file
@@ -0,0 +1,502 @@
|
||||
# Project: Hermes Detective Agency
|
||||
|
||||
**Chosen Concept:** 033v2 Detective
|
||||
**Date:** 2026-04-19
|
||||
**Status:** Concept Finalized
|
||||
**Tags:** hermes-agent, kimi-vision, game, multi-agent, open-ended, community
|
||||
|
||||
---
|
||||
|
||||
## Concept Summary
|
||||
|
||||
A mystery investigation game where a human (Chief) directs two AI agents — a **Witness** (powered by Kimi Vision) and a **Detective** (powered by Hermes) — to investigate visual cases.
|
||||
|
||||
**Core philosophy:** Open-ended solving. No single truth. Evidence guides, but multiple theories are valid.
|
||||
|
||||
---
|
||||
|
||||
## Elevator Pitch
|
||||
|
||||
> *"You're the Chief. Your Witness sees everything. Your Detective connects the dots. Build YOUR theory. See how it aligns with others."*
|
||||
|
||||
---
|
||||
|
||||
## The Story
|
||||
|
||||
You run a small detective agency. Your two AI assistants have superhuman abilities:
|
||||
|
||||
- **Witness** can look at any image and describe it perfectly — every detail, every inconsistency, every hidden clue.
|
||||
- **Detective** can take those observations and build theories, spot patterns, and identify suspects.
|
||||
|
||||
Your job? **Direct the investigation.** Tell them what to look at. Ask the right questions. Build your theory.
|
||||
|
||||
**Key difference:** There's no single "right answer." The creator has an intended story, but your theory is valid if evidence supports it.
|
||||
|
||||
---
|
||||
|
||||
## Game Roles
|
||||
|
||||
### Chief (Human)
|
||||
The player. You run the investigation.
|
||||
|
||||
| Action | Effect |
|
||||
|--------|--------|
|
||||
| Examine evidence | Witness + Kimi analyze |
|
||||
| Question suspects | Detective probes, Witness watches |
|
||||
| Compare items | Kimi highlights differences |
|
||||
| Build theory | Cite evidence, form conclusion |
|
||||
| Request truth | See creator's intended story (optional) |
|
||||
|
||||
### Witness (Agent A + Kimi)
|
||||
The eyes. Analyzes visual evidence. Appears based on triggers.
|
||||
|
||||
| Input | Output |
|
||||
|-------|--------|
|
||||
| Crime scene photo | "I see glass shards, muddy footprints, a broken frame..." |
|
||||
| Suspect photo | "This person has paint on their sleeve..." |
|
||||
| Document | Extracts text, notes inconsistencies |
|
||||
| Item close-up | Identifies details Chief might miss |
|
||||
|
||||
**Dynamic Appearance:** In harder cases, Witness doesn't appear until triggered.
|
||||
|
||||
### Detective (Agent B)
|
||||
The brain. Builds theories, responds to questions.
|
||||
|
||||
| Input | Output |
|
||||
|-------|--------|
|
||||
| Witness observations | "Based on evidence, the thief entered through..." |
|
||||
| Suspect profiles | "Suspect A has motive: insurance fraud..." |
|
||||
| Human questions | "Good question, Chief. Let me look into that..." |
|
||||
| Theory building | Helps Chief cite evidence for their theory |
|
||||
|
||||
---
|
||||
|
||||
## Difficulty System
|
||||
|
||||
### Difficulty Levels
|
||||
|
||||
| Difficulty | Description | Evidence | Suspects | Red Herrings | Plot Twist |
|
||||
|-----------|-------------|----------|----------|---------------|------------|
|
||||
| **Easy** | Obvious clues, clear path | 4-5 | 2 | ❌ | ❌ |
|
||||
| **Medium** | Requires comparison | 6-7 | 3 | ❌ | ❌ |
|
||||
| **Hard** | Red herrings present | 8-9 | 4 | ✅ | ❌ |
|
||||
| **Hardcore** | Plot twist mid-case | 10-11 | 4 | ✅ | ✅ |
|
||||
| **Impossible** | All elements, complex | 12+ | 5 | ✅ | ✅ |
|
||||
|
||||
### Daily Structure
|
||||
|
||||
```
|
||||
One case per day, everyone gets the same case
|
||||
Same difficulty for all players
|
||||
Different case each day
|
||||
```
|
||||
|
||||
### Starter Pack (5 Cases)
|
||||
|
||||
| Week | Difficulty | Theme |
|
||||
|------|------------|-------|
|
||||
| 1 | Easy | Simple theft |
|
||||
| 2 | Medium | Missing person |
|
||||
| 3 | Hard | Corporate fraud |
|
||||
| 4 | Hardcore | Art heist (plot twist) |
|
||||
| 5 | Impossible | Multi-layered conspiracy |
|
||||
|
||||
**Approach:** Add cases incrementally during development.
|
||||
|
||||
---
|
||||
|
||||
## Evidence System
|
||||
|
||||
### Evidence Types
|
||||
|
||||
| Type | What Kimi Sees | Example Clue |
|
||||
|------|---------------|--------------|
|
||||
| **Crime scene** | Scene layout, objects, anomalies | "Window was broken from inside" |
|
||||
| **Surveillance** | People, actions, timestamps | "Person lingered at door for 3 minutes" |
|
||||
| **Documents** | Text, handwriting, context | "Letter mentions 'meeting at midnight'" |
|
||||
| **Photos** | People, items, locations | "Suspect's shoes match the footprint" |
|
||||
| **Maps** | Routes, access points, exits | "Only one entrance visible to street" |
|
||||
| **Items** | Condition, marks, connections | "Key is copy — grooves don't match original" |
|
||||
|
||||
### Evidence Citation
|
||||
|
||||
Evidence helps build theory. Not all evidence is required.
|
||||
|
||||
```
|
||||
Chief's Theory: "I think Suspect B did it."
|
||||
|
||||
📎 Cited Evidence:
|
||||
- Evidence #3: Crime scene photo
|
||||
- Evidence #5: Security footage
|
||||
- Evidence #8: Witness testimony
|
||||
→ 3/10 evidence cited (30%)
|
||||
|
||||
💬 Detective: "That's a solid theory. The evidence
|
||||
supports B, but have you considered Evidence #7?"
|
||||
```
|
||||
|
||||
### Hints Embedded in Evidence
|
||||
|
||||
Not a separate button. Hints are part of the evidence design.
|
||||
|
||||
| Level | Visibility | Example |
|
||||
|-------|-----------|---------|
|
||||
| **Too obvious** | Easy to find | "Letter saying 'I did it'" |
|
||||
| **Barely obvious** | Check certain places | "Muddy shoes near suspect's home" |
|
||||
| **Not too obvious** | Requires attention | "Timeline inconsistency in letter" |
|
||||
|
||||
### Witness Trigger System
|
||||
|
||||
In harder cases, Witness appears based on triggers.
|
||||
|
||||
```
|
||||
Trigger Example:
|
||||
Turn 1: Chief examines crime scene photo
|
||||
Turn 2: Chief finds a hair sample on the floor
|
||||
↓ [Trigger activated]
|
||||
Turn 3: 👁️ Witness appears
|
||||
↓ "I recognize this hair... it belongs to Suspect B's dog"
|
||||
Turn 4: Chief examines suspect's home
|
||||
Turn 5: 👁️ Witness appears again (new trigger)
|
||||
↓ "I saw Suspect B leaving the gallery at midnight..."
|
||||
```
|
||||
|
||||
**Indicator:** Each piece of evidence has a note indicating if it triggers Witness appearance.
|
||||
|
||||
---
|
||||
|
||||
## Open-Ended Solving
|
||||
|
||||
### Core Philosophy
|
||||
|
||||
> **No single truth. Multiple valid theories.**
|
||||
|
||||
| Before | After |
|
||||
|--------|-------|
|
||||
| One correct answer | Multiple valid theories |
|
||||
| Wrong accusation = Fail | Theory valid if evidence supports |
|
||||
| One winner | Everyone discusses |
|
||||
| Truth ends game | Truth is guidance, not mandate |
|
||||
|
||||
### Theory Building
|
||||
|
||||
```
|
||||
👤 Chief builds theory:
|
||||
"I think Suspect B did it, with help from Suspect A.
|
||||
B had access (night guard), A had keys (curator).
|
||||
They split the insurance money."
|
||||
|
||||
📎 Chief cites evidence:
|
||||
- Evidence #3: Crime scene (window not broken)
|
||||
- Evidence #5: Security footage (B was inside)
|
||||
- Evidence #7: A has master keys
|
||||
- Evidence #9: Financial records (recent debt)
|
||||
|
||||
💬 Detective responds:
|
||||
"That's a coherent theory. Your cited evidence
|
||||
supports collaboration between A and B."
|
||||
```
|
||||
|
||||
### Truth Reveal
|
||||
|
||||
**Available anytime. Does NOT end the game.**
|
||||
|
||||
| When | Why |
|
||||
|------|-----|
|
||||
| After building theory | "Did I get it right?" |
|
||||
| When stuck | "Give me guidance" |
|
||||
| Never | "I want to figure it out myself" |
|
||||
| After solving | "See how close I was" |
|
||||
|
||||
```
|
||||
📜 THE TRUTH (Creator's Intended)
|
||||
|
||||
The case was designed as:
|
||||
"A and B collaborated. A had keys, B had access.
|
||||
But C was the real mastermind, funding the whole thing."
|
||||
|
||||
👤 Your theory:
|
||||
"Suspect B acted alone."
|
||||
|
||||
💬 Comparison:
|
||||
- Your theory missed the collaboration element
|
||||
- You correctly identified B as main actor
|
||||
- Evidence you cited: 80% relevant
|
||||
- 🎯 65% alignment with intended truth
|
||||
|
||||
💬 But: Your theory is still valid based on evidence!
|
||||
Discussion continues. Truth is guidance, not mandate.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Scoring System
|
||||
|
||||
### Per Case Statistics
|
||||
|
||||
| Metric | Calculation |
|
||||
|--------|-------------|
|
||||
| **Time** | Turns × 10 min (simplified) |
|
||||
| **Evidence** | Evidence cited / Total evidence |
|
||||
| **Alignment** | How close to creator's intended story |
|
||||
| **Coherence** | Theory makes sense based on evidence |
|
||||
|
||||
### Statistics Display
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────┐
|
||||
│ 📊 CASE STATISTICS │
|
||||
├─────────────────────────────────────┤
|
||||
│ ⏱️ Time: 6 turns × 10 min = 60 min │
|
||||
│ 📎 Evidence: 7/10 cited (70%) │
|
||||
│ 🎯 Alignment: 85% with creator │
|
||||
│ 💬 Theory coherence: Strong │
|
||||
├─────────────────────────────────────┤
|
||||
│ ⭐ Rating: Sharp Detective │
|
||||
└─────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Rating Tiers
|
||||
|
||||
| Alignment | Rating |
|
||||
|-----------|--------|
|
||||
| 90-100% | Master Detective |
|
||||
| 75-89% | Sharp Detective |
|
||||
| 50-74% | Promising Detective |
|
||||
| 25-49% | Apprentice |
|
||||
| 0-24% | Rookie |
|
||||
|
||||
---
|
||||
|
||||
## Retry & Journal System
|
||||
|
||||
### Multiple Attempts
|
||||
|
||||
User can solve same case multiple times.
|
||||
|
||||
```
|
||||
Case #47 — The Hartwell Heist
|
||||
|
||||
Your Attempts:
|
||||
├── Attempt #1: 85% alignment, 6 turns 📖
|
||||
├── Attempt #2: 92% alignment, 4 turns 📖
|
||||
├── Attempt #3: In progress...
|
||||
└── Best: 92% alignment
|
||||
```
|
||||
|
||||
### Journal Documentation
|
||||
|
||||
Every attempt is documented (solve or not).
|
||||
|
||||
```
|
||||
Attempt #1: April 19, 2026
|
||||
├── Status: Solved
|
||||
├── Evidence cited: 7/10
|
||||
├── Alignment: 85%
|
||||
├── Theory: "Suspect B acted alone"
|
||||
└── Notes: "Missed the A-B collaboration"
|
||||
```
|
||||
|
||||
### Privacy Settings
|
||||
|
||||
| Setting | Description |
|
||||
|---------|-------------|
|
||||
| **Private** | Only you see your attempts |
|
||||
| **Publish stats** | Everyone sees your stats (default) |
|
||||
| **Publish journal** | Anyone can read your solve |
|
||||
|
||||
---
|
||||
|
||||
## Replay (Observe Mode)
|
||||
|
||||
Watch how others solved the case.
|
||||
|
||||
```
|
||||
📺 OBSERVE MODE
|
||||
|
||||
@alice's Solve of Case #47
|
||||
|
||||
Turn 1: Examined crime scene
|
||||
Turn 2: Found hair sample → Witness appeared
|
||||
Turn 3: Questioned Suspect B
|
||||
Turn 4: Examined financial records
|
||||
Turn 5: Cited evidence, formed theory
|
||||
Turn 6: Requested truth reveal
|
||||
|
||||
⏱️ 6 turns | 🎯 85% alignment | ⭐ Sharp
|
||||
```
|
||||
|
||||
**Only published journals are observable.**
|
||||
|
||||
---
|
||||
|
||||
## Case Creation System
|
||||
|
||||
### Starter Cases
|
||||
|
||||
5 cases (one per difficulty) as templates.
|
||||
|
||||
**Source:** Real solved cases adapted for the game.
|
||||
|
||||
### Community Cases
|
||||
|
||||
Anyone can create and share cases.
|
||||
|
||||
#### Creation Flow
|
||||
|
||||
```
|
||||
1. Choose reference case (optional)
|
||||
"Let's base this on the Isabella Stewart Gardner theft"
|
||||
|
||||
2. Gather/create evidence
|
||||
Upload images (crime scene, suspects, documents)
|
||||
|
||||
3. Write case brief
|
||||
├── Title, difficulty
|
||||
├── Suspect list (names, photos)
|
||||
├── Evidence set
|
||||
├── Hidden truth (creator's intended story)
|
||||
├── Red herrings (optional)
|
||||
├── Plot twist (optional)
|
||||
└── Witness triggers (which evidence triggers Witness)
|
||||
|
||||
4. Test it
|
||||
Play through yourself to verify solvability
|
||||
|
||||
5. Publish
|
||||
├── Private link (friends only)
|
||||
└── Public (case library)
|
||||
```
|
||||
|
||||
### Case Format
|
||||
|
||||
```yaml
|
||||
case:
|
||||
title: "The Hartwell Heist"
|
||||
difficulty: medium
|
||||
difficulty_description: "Requires comparison of evidence"
|
||||
|
||||
evidence:
|
||||
- id: 1
|
||||
type: photo
|
||||
image: crime_scene.jpg
|
||||
description: "Crime scene photograph"
|
||||
triggers_witness: true
|
||||
hint_level: not_too_obvious
|
||||
|
||||
- id: 2
|
||||
type: document
|
||||
image: letter.jpg
|
||||
description: "Anonymous letter found"
|
||||
triggers_witness: false
|
||||
hint_level: barely_obvious
|
||||
|
||||
suspects:
|
||||
- name: "Suspect A"
|
||||
photo: suspect_a.jpg
|
||||
description: "Gallery curator"
|
||||
|
||||
truth:
|
||||
summary: "A and B collaborated..."
|
||||
alignment_criteria:
|
||||
- "Correctly identified collaboration"
|
||||
- "Identified A as key holder"
|
||||
- "Identified B as main actor"
|
||||
|
||||
witness_triggers:
|
||||
- evidence_id: 1
|
||||
testimony: "I see glass on the floor inside..."
|
||||
```
|
||||
|
||||
### Case Creator Tools
|
||||
|
||||
| Tool | Purpose |
|
||||
|------|---------|
|
||||
| **Skill** | Hermes skill for case creation guidance |
|
||||
| **Validator** | Verify case format is correct |
|
||||
|
||||
---
|
||||
|
||||
## Community Moderation
|
||||
|
||||
### Discovery Philosophy
|
||||
|
||||
> **Community cases are the jungle. Direct links are the path.**
|
||||
|
||||
| Discovery Method | Quality | Effort |
|
||||
|-----------------|---------|--------|
|
||||
| Case library (browse) | Mixed (jungle) | Low |
|
||||
| Direct link from creator | Same quality | Medium |
|
||||
| Social media / community | Trusted (curated) | High |
|
||||
|
||||
### Quality Signals
|
||||
|
||||
| Signal | Description |
|
||||
|--------|-------------|
|
||||
| **Visits** | How many times case was played |
|
||||
| **Reviews** | 👍 or 👎 (no text, requires effort to spam) |
|
||||
|
||||
```
|
||||
Case #47B — "The Missing Heirloom"
|
||||
├── Visits: 234
|
||||
├── 👍 45 | 👎 3
|
||||
└── Quality score: High
|
||||
```
|
||||
|
||||
**Note:** Review manipulation is possible but requires effort. Not perfect, but workable.
|
||||
|
||||
### Sharing Flow
|
||||
|
||||
```
|
||||
Creator creates case
|
||||
↓
|
||||
Tests locally
|
||||
↓
|
||||
Publishes to community
|
||||
↓
|
||||
Shares link on social media / Discord
|
||||
↓
|
||||
Players try directly from creator
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary of Decisions
|
||||
|
||||
| Element | Decision |
|
||||
|---------|----------|
|
||||
| Difficulty | 5 levels (Easy → Impossible) |
|
||||
| Daily structure | One case per day, same for all |
|
||||
| Timer | ❌ No (first phase) |
|
||||
| Hints | ✅ Embedded in evidence |
|
||||
| Retry | ✅ Unlimited attempts |
|
||||
| Journal | ✅ Every attempt documented |
|
||||
| Observe | ✅ Watch published solves |
|
||||
| Privacy | Private by default |
|
||||
| Publish | Stats always, journal optional |
|
||||
| Scoring | Alignment %, Evidence %, Time |
|
||||
| Open-ended | ✅ No single truth |
|
||||
| Truth reveal | Available anytime |
|
||||
| Case source | Real cases + community |
|
||||
| Witness | Dynamic (triggers in hard cases) |
|
||||
| Red herrings | ✅ Hard+ difficulty |
|
||||
| Plot twist | ✅ Hardcore+ difficulty |
|
||||
| Community | Visits + reviews (no auth) |
|
||||
|
||||
---
|
||||
|
||||
## What's Next
|
||||
|
||||
Once we finalize the concept:
|
||||
- Technical architecture
|
||||
- UI/UX design
|
||||
- Prompt engineering
|
||||
- Case creation template
|
||||
- Prototype development
|
||||
|
||||
---
|
||||
|
||||
## Related Documents
|
||||
|
||||
- `docs/ideas/COMPARISON.md` — Full comparison matrix
|
||||
- `docs/ideas/008-visual-detective.md` — Initial brainstorm
|
||||
79
docs/ideas/001-visual-narrative-agent.md
Normal file
79
docs/ideas/001-visual-narrative-agent.md
Normal file
@@ -0,0 +1,79 @@
|
||||
# Idea 001: Visual Narrative Agent
|
||||
|
||||
**Date:** 2026-04-19
|
||||
**Status:** Idea
|
||||
**Tags:** hermes-agent, kimi-vision, storytelling, image-generation
|
||||
|
||||
## Concept
|
||||
|
||||
An agentic storytelling system where Hermes orchestrates a narrative loop with Kimi's visual analysis and built-in image generation skills to produce coherent visual stories.
|
||||
|
||||
## User Flow
|
||||
|
||||
1. User provides text prompt (e.g., "A lone astronaut discovers an ancient alien garden on Mars")
|
||||
2. Hermes plans story structure (scenes, pacing, visual style)
|
||||
3. For each scene:
|
||||
- Hermes generates image prompt
|
||||
- Generate image (Hermes built-in skill: manim / ascii)
|
||||
- Kimi analyzes generated image
|
||||
- Kimi's feedback refines next scene's prompt
|
||||
4. Return compiled visual story to user
|
||||
|
||||
## Key Differentiator
|
||||
|
||||
Most story-to-image tools: **Generate → Done**
|
||||
|
||||
This concept: **Generate → Analyze → Refine → Loop**
|
||||
|
||||
Kimi serves as the **visual reasoning engine** — tells Hermes if the generated image matches the intended scene, catches inconsistencies, and informs prompt refinement for the next scene.
|
||||
|
||||
## Tech Stack
|
||||
|
||||
| Component | Source | Role |
|
||||
|-----------|--------|------|
|
||||
| Hermes Agent | Nous Research | Orchestration, planning, decision loop |
|
||||
| Kimi Vision | Moonshot AI (via gateway) | Image analysis, visual feedback |
|
||||
| Image Generation | Pollinations AI | Free tier, multiple models (Flux, etc.) |
|
||||
|
||||
### Image Generation Options
|
||||
|
||||
| Provider | Free Tier | Quality | Use Case |
|
||||
|---------|-----------|---------|----------|
|
||||
| **Pollinations** ✅ | ✅ Yes | Good | Primary (simple, free) |
|
||||
| **Flux (local)** | ✅ Free | High | If GPU available |
|
||||
| **Hermes skills** | ✅ Free | Niche | Fallback/ASCII aesthetic |
|
||||
|
||||
### Pollinations API (Primary)
|
||||
- **Endpoint:** `https://gen.pollinations.ai/image/{prompt}`
|
||||
- **Models:** flux, zimage, wan-image, qwen-image, etc.
|
||||
- **Cost:** Free tier (pollen credits), ~$1/1 Pollen paid
|
||||
- **Auth:** Optional for free tier
|
||||
|
||||
## Strengths
|
||||
|
||||
- ✅ Combines Hermes + Kimi + Pollinations natively
|
||||
- ✅ Agentic visual feedback loop is unique
|
||||
- ✅ Visual coherence check via Kimi ensures quality
|
||||
- ✅ Free tier = low barrier to test
|
||||
- ✅ User controls output format (default: image)
|
||||
|
||||
## Weaknesses
|
||||
|
||||
- ⚠️ Pollinations quality vs DALL-E/Midjourney (may need to test)
|
||||
- ⚠️ Kimi requires gateway access (no direct API key)
|
||||
- ⚠️ Loop adds latency (generate → analyze → refine)
|
||||
- ⚠️ Need to verify Pollinations reliability
|
||||
|
||||
## Uniqueness Score
|
||||
|
||||
**7/10** — Agentic visual feedback loop is novel, but need to verify if built-in image generation is compelling enough
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [ ] Explore Hermes built-in image skills (manim, ascii)
|
||||
- [ ] Define output format options
|
||||
- [ ] Sketch technical architecture
|
||||
|
||||
## Related Ideas
|
||||
|
||||
- See: `002-xxx.md`, `003-xxx.md` for alternatives
|
||||
138
docs/ideas/007-vision-spot-the-difference.md
Normal file
138
docs/ideas/007-vision-spot-the-difference.md
Normal file
@@ -0,0 +1,138 @@
|
||||
# Idea 007: Spot the Difference Agent
|
||||
|
||||
**Date:** 2026-04-19
|
||||
**Status:** Idea
|
||||
**Tags:** hermes-agent, kimi-vision, puzzle, gamification, webapp
|
||||
|
||||
## Concept
|
||||
|
||||
A daily "Spot the Difference" puzzle webapp where AI (Kimi + Hermes) analyzes two images and shows its step-by-step process in finding the differences.
|
||||
|
||||
**Core insight:** Use visual analysis strength, minimize reasoning load.
|
||||
|
||||
## User Flow
|
||||
|
||||
1. User opens webapp → sees today's "Spot the Difference" puzzle (two similar images)
|
||||
2. User can play manually (click on differences) OR
|
||||
3. User clicks "Let AI Solve" → watches AI's step-by-step analysis
|
||||
4. AI shows its reasoning process: "Scanning left-to-right... Found difference #1: color mismatch in top-left..."
|
||||
5. Leaderboard shows attempt stats (anonymous)
|
||||
|
||||
## Why This Works
|
||||
|
||||
| Aspect | Implementation |
|
||||
|--------|----------------|
|
||||
| **Visual Analysis** | Kimi compares images pixel-level + semantic |
|
||||
| **Low Reasoning** | Pattern matching, not complex logic |
|
||||
| **Step-by-Step** | Show each finding with visual highlight |
|
||||
| **Gamification** | Daily puzzle, leaderboard, no auth |
|
||||
|
||||
## Puzzle Types
|
||||
|
||||
### Primary: Spot the Difference (v1)
|
||||
- Two images with subtle differences
|
||||
- Kimi identifies all differences
|
||||
- Each found difference highlighted + explanation
|
||||
|
||||
### Secondary (future):
|
||||
- Find the anomaly (what's wrong in this image?)
|
||||
- Count the objects (how many X in this image?)
|
||||
- What's different? (semantic analysis)
|
||||
|
||||
## Technical Stack
|
||||
|
||||
| Component | Source | Role |
|
||||
|-----------|--------|------|
|
||||
| Frontend | Single HTML page | Display puzzle, show AI process |
|
||||
| Image Analysis | Kimi Vision (via gateway) | Compare images, find differences |
|
||||
| Orchestration | Hermes Agent | Coordinate flow, format output |
|
||||
| Image Gen | Pollinations AI | Generate daily puzzle pairs |
|
||||
|
||||
### Daily Puzzle Generation
|
||||
```
|
||||
Hermes + Pollinations → Generate base image
|
||||
Hermes + Pollinations → Generate modified image (with subtle changes)
|
||||
Store both → Serve to users daily
|
||||
```
|
||||
|
||||
### AI Solving Process
|
||||
```
|
||||
1. Hermes receives both images
|
||||
2. Send to Kimi Vision for analysis
|
||||
3. Kimi returns list of differences with locations
|
||||
4. Hermes formats step-by-step explanation
|
||||
5. Frontend animates each finding
|
||||
```
|
||||
|
||||
## Features
|
||||
|
||||
### Core
|
||||
- [ ] Daily puzzle auto-rotates
|
||||
- [ ] Two-image display (side by side)
|
||||
- [ ] "Let AI Solve" button
|
||||
- [ ] Step-by-step visualization of AI findings
|
||||
- [ ] Show each difference with highlight + explanation
|
||||
|
||||
### Gamification (no auth)
|
||||
- [ ] Attempt counter (per user session, localStorage)
|
||||
- [ ] Leaderboard (anonymous, session-based)
|
||||
- [ ] "Perfect solve" badge (AI found all differences on first pass)
|
||||
|
||||
### Nice to Have
|
||||
- [ ] Difficulty levels (Easy/Medium/Hard)
|
||||
- [ ] Share result as image
|
||||
- [ ] Hint system (Kimi finds 1, user finds rest)
|
||||
|
||||
## Step-by-Step Output Format
|
||||
|
||||
```
|
||||
🔍 Scanning image...
|
||||
✅ Difference #1 found: "The lamp color changed from blue to red"
|
||||
📍 Location: Top-left corner
|
||||
👆 [Highlighted on image]
|
||||
|
||||
✅ Difference #2 found: "Window shape is slightly different"
|
||||
📍 Location: Center-right
|
||||
👆 [Highlighted on image]
|
||||
|
||||
...
|
||||
|
||||
🎯 Solved! Found X differences in Y steps.
|
||||
⏱️ Time: Z seconds
|
||||
```
|
||||
|
||||
## Comparison with Other Ideas
|
||||
|
||||
| Aspect | 001 Visual Narrative | 007 Spot the Difference |
|
||||
|--------|---------------------|------------------------|
|
||||
| Visual Analysis | Heavy | **Heavy** |
|
||||
| Reasoning | Medium | **Light** |
|
||||
| Demo Impact | High | **High** |
|
||||
| Gamification | Low | **High** |
|
||||
| Uniqueness | 7/10 | **9/10** |
|
||||
| Step-by-Step | Yes | **Yes (more natural)** |
|
||||
|
||||
## Why Stronger than 001
|
||||
|
||||
1. **Tangible use case** — People actually play spot the difference
|
||||
2. **Clear AI demonstration** — "Watch AI see what you see"
|
||||
3. **Gamification** — Daily puzzle + leaderboard = engagement
|
||||
4. **Low reasoning, high vision** — Perfect for Kimi's strength
|
||||
5. **Step-by-step is natural** — Not forced, it's how you'd solve it
|
||||
|
||||
## Risks
|
||||
|
||||
- ⚠️ Need reliable daily puzzle generation (harder than it sounds)
|
||||
- ⚠️ Kimi analysis quality depends on image complexity
|
||||
- ⚠️ Need diverse puzzle set to not repeat
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [ ] Test Kimi's spot-the-difference capability
|
||||
- [ ] Design puzzle generation pipeline
|
||||
- [ ] Mock up webapp UI
|
||||
- [ ] Prototype step-by-step visualization
|
||||
|
||||
## Related Ideas
|
||||
|
||||
- See: `001-visual-narrative-agent.md`
|
||||
397
docs/ideas/008-visual-detective.md
Normal file
397
docs/ideas/008-visual-detective.md
Normal file
@@ -0,0 +1,397 @@
|
||||
# Idea 008: Visual Detective
|
||||
|
||||
**Date:** 2026-04-19
|
||||
|
||||
## Concept
|
||||
|
||||
Upload a "crime scene" or mystery image. Kimi analyzes every detail. Hermes pieces together clues and generates a detective story/hypothesis.
|
||||
|
||||
## Why Strong
|
||||
|
||||
- Heavy visual analysis (Kimi reads the scene)
|
||||
- Low reasoning (observation, not complex logic)
|
||||
- Storytelling naturally fits step-by-step
|
||||
- Mystery genre = engaging
|
||||
|
||||
## User Flow
|
||||
|
||||
1. Upload image (or get random daily mystery)
|
||||
2. Kimi: "I see a broken window, muddy footprints, overturned chair..."
|
||||
3. Hermes: "Based on these clues, here's what likely happened..."
|
||||
4. Output: Detective story with visual evidence
|
||||
|
||||
## Tech
|
||||
|
||||
- Kimi Vision: Scene analysis
|
||||
- Hermes: Narrative orchestration
|
||||
- Pollinations: Generate mystery images
|
||||
|
||||
## Unique?
|
||||
|
||||
- Nobody's doing "AI detective" with your photos
|
||||
- Could be daily mystery + community solving
|
||||
|
||||
---
|
||||
|
||||
## 009: Image Tarot Reader
|
||||
|
||||
**Date:** 2026-04-19
|
||||
|
||||
## Concept
|
||||
|
||||
Upload any image. AI interprets it like a tarot card reading.
|
||||
|
||||
## Why Strong
|
||||
|
||||
- Fun/flirty, low stakes
|
||||
- Heavy visual analysis (Kimi interprets symbolism)
|
||||
- Storytelling fits perfectly
|
||||
- Shareable results
|
||||
|
||||
## User Flow
|
||||
|
||||
1. Upload image OR random draw
|
||||
2. Kimi: Analyzes composition, colors, objects, mood
|
||||
3. Hermes: "This represents [Tarot card]. Your reading: [Narrative]"
|
||||
4. Output: Tarot card + 3-card spread interpretation
|
||||
|
||||
## Step-by-Step
|
||||
|
||||
```
|
||||
🃏 Drawing your card...
|
||||
👁️ Analyzing your image...
|
||||
|
||||
Visual Elements Detected:
|
||||
• A winding road (path in life)
|
||||
• Setting sun (endings/new beginnings)
|
||||
• Standing figure (you, the observer)
|
||||
|
||||
🎴 Your Card: The Fool
|
||||
Interpretation: A new journey awaits. Trust the path ahead...
|
||||
|
||||
Past: Confusion about direction
|
||||
Present: Standing at the crossroads
|
||||
Future: Leap of faith required
|
||||
```
|
||||
|
||||
## Tech
|
||||
|
||||
- Kimi Vision: Symbol analysis
|
||||
- Hermes: Tarot narrative generation
|
||||
- Pollinations: Generate thematic card visuals
|
||||
|
||||
---
|
||||
|
||||
## 010: Color Emotion Translator
|
||||
|
||||
**Date:** 2026-04-19
|
||||
|
||||
## Concept
|
||||
|
||||
Upload image. AI analyzes dominant colors and translates them into emotions/mood.
|
||||
|
||||
## Why Strong
|
||||
|
||||
- Pure visual analysis
|
||||
- Art/design focused
|
||||
- Generates color palette + emotion report
|
||||
- Useful for designers
|
||||
|
||||
## User Flow
|
||||
|
||||
1. Upload image
|
||||
2. Kimi: Extracts colors, analyzes saturation, harmony
|
||||
3. Hermes: Translates to emotions, generates palette
|
||||
4. Output: Color palette + emotion breakdown + suggested uses
|
||||
|
||||
## Step-by-Step
|
||||
|
||||
```
|
||||
🔍 Scanning colors...
|
||||
🎨 Extracting dominant palette...
|
||||
|
||||
Detected Colors:
|
||||
• #2D4A3E (Deep Forest Green) - 45%
|
||||
• #F5E6D3 (Warm Cream) - 30%
|
||||
• #8B4513 (Saddle Brown) - 15%
|
||||
• #CD853F (Peru Gold) - 10%
|
||||
|
||||
🎭 Emotional Profile:
|
||||
Primary: Grounded, natural, calm
|
||||
Secondary: Warm, nostalgic, organic
|
||||
Accent: Vintage, artisanal, trustworthy
|
||||
|
||||
💡 Recommendations:
|
||||
• Brand Identity for eco-friendly products
|
||||
• Interior design: cozy cabin aesthetic
|
||||
• Packaging: artisanal food products
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 011: Before/After Time Machine
|
||||
|
||||
**Date:** 2026-04-19
|
||||
|
||||
## Concept
|
||||
|
||||
Upload an old/historical photo. AI shows what it would look like today or vice versa.
|
||||
|
||||
## Why Strong
|
||||
|
||||
- Historical/educational angle
|
||||
- Visual transformation is compelling
|
||||
- Shows AI's understanding of time/changes
|
||||
|
||||
## User Flow
|
||||
|
||||
1. Upload old OR new photo
|
||||
2. Select transformation direction
|
||||
3. Kimi: Analyzes context, era, subject
|
||||
4. Hermes: Predicts/adapts to target era
|
||||
5. Output: Side-by-side transformation
|
||||
|
||||
## Step-by-Step
|
||||
|
||||
```
|
||||
📸 Analyzing source image...
|
||||
📅 Detected era: 1950s New York Street
|
||||
|
||||
Identifying elements:
|
||||
• Black & white photography style
|
||||
• Vintage automobiles (1950s models)
|
||||
• Fashion: fedoras, swing coats
|
||||
• Architecture: Art Deco buildings
|
||||
|
||||
🔮 Projecting to 2024...
|
||||
|
||||
Transformation breakdown:
|
||||
• Colorization: Added natural skin tones + sky colors
|
||||
• Vehicles: Replaced with modern equivalents
|
||||
• Architecture: Updated signage, added modern elements
|
||||
• Fashion: Modernized while preserving style
|
||||
|
||||
✨ Your 1950s scene in 2024!
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 012: Visual Haiku Generator
|
||||
|
||||
**Date:** 2026-04-19
|
||||
|
||||
## Concept
|
||||
|
||||
Upload any image. AI generates a haiku (5-7-5) based on visual elements.
|
||||
|
||||
## Why Strong
|
||||
|
||||
- Minimal reasoning, pure visual
|
||||
- Artistic/creative output
|
||||
- Japanese aesthetic + AI = unique
|
||||
- Highly shareable
|
||||
|
||||
## User Flow
|
||||
|
||||
1. Upload image
|
||||
2. Kimi: Analyzes scene, mood, elements
|
||||
3. Hermes: Crafts haiku (strict 5-7-5)
|
||||
4. Output: Image + haiku + syllable breakdown
|
||||
|
||||
## Step-by-Step
|
||||
|
||||
```
|
||||
🖼️ Analyzing your image...
|
||||
|
||||
Scene Elements:
|
||||
• Autumn forest path
|
||||
• Golden leaves falling
|
||||
• Soft morning light through trees
|
||||
|
||||
✍️ Crafting haiku...
|
||||
|
||||
Forest whispers
|
||||
Golden footsteps on leaves—
|
||||
Silence speaks loud
|
||||
|
||||
📝 Syllable breakdown:
|
||||
"Forest" (2) - whisper (2)
|
||||
s(1) - il(1) -ence (1) - speaks (1) - loud (1)
|
||||
"Golden" (2) - foot (1) -steps (1) - on (1) - leaves (1)
|
||||
(5) - (7) - (5) ✅
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 013: Image Alchemy
|
||||
|
||||
**Date:** 2026-04-19
|
||||
|
||||
## Concept
|
||||
|
||||
Upload two random images. AI "fuses" them into a new concept based on their shared elements.
|
||||
|
||||
## Why Strong
|
||||
|
||||
- Surprising/comedic combinations
|
||||
- Pure visual + semantic analysis
|
||||
- Unique creative output
|
||||
- Viral potential
|
||||
|
||||
## User Flow
|
||||
|
||||
1. Upload image A (or random)
|
||||
2. Upload image B (or random)
|
||||
3. Kimi: Analyzes both separately
|
||||
4. Hermes: Finds connections, creates fusion
|
||||
5. Output: New concept + fused image prompt
|
||||
|
||||
## Step-by-Step
|
||||
|
||||
```
|
||||
🌀 Analyzing Image A: A Viking ship
|
||||
• Norse aesthetic
|
||||
• Ocean voyage
|
||||
• Historical warrior culture
|
||||
|
||||
🌀 Analyzing Image B: A Coffee shop
|
||||
• Cozy atmosphere
|
||||
• Barista craft
|
||||
• Modern social space
|
||||
|
||||
🔮 Alchemizing...
|
||||
|
||||
Found connections:
|
||||
• Craft (warrior's craft → barista's craft)
|
||||
• Ritual (battle ritual → coffee ritual)
|
||||
• Journey (ocean voyage → daily commute)
|
||||
|
||||
⚗️ Alchemy Result:
|
||||
|
||||
"THE VIKING BARISTA"
|
||||
|
||||
A warrior of the morning,
|
||||
steering through storms of exhaustion,
|
||||
claiming the sacred cup.
|
||||
|
||||
Your coffee shop serves mead in horn-shaped mugs,
|
||||
the barista wears a helmet of foam,
|
||||
and every latte is a conquest.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 014: Visual Lie Detector
|
||||
|
||||
**Date:** 2026-04-19
|
||||
|
||||
## Concept
|
||||
|
||||
Upload a photo + claim. AI analyzes if the image supports or contradicts the claim.
|
||||
|
||||
## Why Strong
|
||||
|
||||
- Useful in era of fake news
|
||||
- Pure visual verification
|
||||
- Educational about image analysis
|
||||
- "Is this real?" tool
|
||||
|
||||
## User Flow
|
||||
|
||||
1. Paste claim + upload image
|
||||
2. Kimi: Analyzes image details
|
||||
3. Hermes: Compares claim vs evidence
|
||||
4. Output: Verdict + reasoning
|
||||
|
||||
## Step-by-Step
|
||||
|
||||
```
|
||||
🔍 Analyzing claim: "This photo was taken in Paris"
|
||||
|
||||
🔬 Image Analysis:
|
||||
• Architecture: Haussmannian buildings ✓
|
||||
• Street signs: French ✓
|
||||
• License plates: European format ✓
|
||||
• Language: French on signs ✓
|
||||
• Vegetation: Consistent with Paris climate ✓
|
||||
• Shadows: Consistent with claimed time of day ✓
|
||||
|
||||
✅ VERDICT: LIKELY AUTHENTIC
|
||||
|
||||
Confidence: 94%
|
||||
Supporting evidence: 8/8 elements match
|
||||
Caveats: Metadata not verified
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 015: Object Archaeology
|
||||
|
||||
**Date:** 2026-04-19
|
||||
|
||||
## Concept
|
||||
|
||||
Upload an object close-up. AI identifies it, tells its history/story.
|
||||
|
||||
## Why Strong
|
||||
|
||||
- Educational
|
||||
- Heavy visual (identification + knowledge)
|
||||
- Discovery/antiquities angle
|
||||
- Could work with museum APIs
|
||||
|
||||
## User Flow
|
||||
|
||||
1. Upload object photo
|
||||
2. Kimi: Visual identification + details
|
||||
3. Hermes: Tells object's "story"
|
||||
4. Output: Identity + history narrative
|
||||
|
||||
## Step-by-Step
|
||||
|
||||
```
|
||||
🔍 Scanning object...
|
||||
|
||||
Visual Analysis:
|
||||
• Material: Ceramic
|
||||
• Style: Ming Dynasty blue and white
|
||||
• Pattern: Dragon with cloud motifs
|
||||
• Technique: Underglaze blue
|
||||
|
||||
🏺 Object Identified:
|
||||
Ming Dynasty (1368-1644) Blue and White Porcelain
|
||||
Dragon Pattern Bowl
|
||||
|
||||
📜 The Story:
|
||||
This bowl was crafted during the reign of Emperor Wanli,
|
||||
at the height of Jingdezhen's porcelain production.
|
||||
The dragon motif signifies imperial power and protection...
|
||||
|
||||
[Full historical narrative]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quick Comparison Matrix
|
||||
|
||||
| # | Name | Visual | Reasoning | Uniqueness | Fun |
|
||||
|---|------|--------|-----------|------------|-----|
|
||||
| 007 | Spot the Difference | Heavy | Light | 9/10 | 8/10 |
|
||||
| 008 | Visual Detective | Heavy | Light | 8/10 | 9/10 |
|
||||
| 009 | Image Tarot | Heavy | Light | 8/10 | 10/10 |
|
||||
| 010 | Color Emotion | Medium | Light | 7/10 | 7/10 |
|
||||
| 011 | Before/After | Heavy | Medium | 8/10 | 8/10 |
|
||||
| 012 | Visual Haiku | Heavy | Light | 9/10 | 8/10 |
|
||||
| 013 | Image Alchemy | Heavy | Light | 10/10 | 10/10 |
|
||||
| 014 | Lie Detector | Heavy | Medium | 9/10 | 8/10 |
|
||||
| 015 | Object Archaeology | Heavy | Medium | 8/10 | 8/10 |
|
||||
|
||||
---
|
||||
|
||||
**My top picks for uniqueness + fun:**
|
||||
1. **013 Image Alchemy** — Most unique, viral potential
|
||||
2. **009 Image Tarot** — Fun, shareable, low friction
|
||||
3. **007 Spot the Difference** — Game + AI demonstration
|
||||
4. **014 Visual Lie Detector** — Useful, educational
|
||||
|
||||
What stands out to you?
|
||||
132
docs/ideas/COMPARISON.md
Normal file
132
docs/ideas/COMPARISON.md
Normal file
@@ -0,0 +1,132 @@
|
||||
# Ideas Comparison Matrix
|
||||
|
||||
**Date:** 2026-04-19
|
||||
**Purpose:** Compare all ideas to select final concept
|
||||
|
||||
---
|
||||
|
||||
## Scoring Criteria
|
||||
|
||||
| Criteria | Weight | Description |
|
||||
|----------|--------|-------------|
|
||||
| **Visual Analysis** | 30% | Heavy Kimi use (aligned with Kimi's strength) |
|
||||
| **Multi-Turn** | 20% | Not single-turn, builds over time |
|
||||
| **Human-AI Interaction** | 20% | Human participates, not passive |
|
||||
| **Cost Efficiency** | 15% | Low API costs (image gen vs analysis) |
|
||||
| **Uniqueness** | 10% | Stand out from competitors |
|
||||
| **Fun/Engagement** | 5% | Enjoyable to play/watch |
|
||||
|
||||
**Scoring:** 1-5 (5 = best)
|
||||
|
||||
---
|
||||
|
||||
## Full Comparison Matrix
|
||||
|
||||
| # | Idea | Visual | Multi-Turn | Human-AI | Cost | Unique | Fun | **Total** |
|
||||
|---|------|--------|------------|----------|------|--------|-----|-----------|
|
||||
| 001 | Visual Narrative Agent | 4 | 4 | 3 | 2 | 3 | 4 | **3.5** |
|
||||
| 002 | Visual Memory Journal | 3 | 3 | 2 | 3 | 4 | 3 | **3.0** |
|
||||
| 003 | Design Critic | 3 | 2 | 2 | 3 | 2 | 3 | **2.6** |
|
||||
| 004 | Visual Poem | 4 | 2 | 2 | 3 | 4 | 4 | **3.2** |
|
||||
| 005 | Scene Journey | 4 | 3 | 2 | 2 | 3 | 4 | **3.2** |
|
||||
| 007 | Spot the Difference | 4 | 2 | 3 | 2 | 4 | 5 | **3.4** |
|
||||
| 008 | Visual Detective | 4 | 3 | 2 | 3 | 4 | 4 | **3.5** |
|
||||
| 009 | Image Tarot | 4 | 2 | 3 | 3 | 4 | 5 | **3.5** |
|
||||
| 013 | Image Alchemy | 4 | 2 | 3 | 2 | 5 | 5 | **3.6** |
|
||||
| 014 | Lie Detector | 4 | 2 | 3 | 3 | 4 | 4 | **3.4** |
|
||||
| 032v2 | Art Critic | 5 | 3 | 3 | 3 | 3 | 4 | **3.7** |
|
||||
| **033v2** | **Detective** | **5** | **5** | **5** | **4** | **4** | **5** | **4.7** |
|
||||
| 035 | Guess Artist | 5 | 2 | 3 | 3 | 3 | 4 | **3.5** |
|
||||
| Auction | Auction | 3 | 4 | 5 | 4 | 4 | 4 | **3.9** |
|
||||
|
||||
---
|
||||
|
||||
## Top Contenders
|
||||
|
||||
| Rank | Idea | Score | Key Strengths |
|
||||
|------|------|-------|---------------|
|
||||
| 🥇 | **033v2 Detective** | **4.7** | Best multi-turn, human directs, Kimi does real work |
|
||||
| 🥈 | Auction | 3.9 | Human describes, human engages, cheap |
|
||||
| 🥉 | 032v2 Art Critic | 3.7 | Kimi visual analysis, multi-turn |
|
||||
| 4 | 013 Image Alchemy | 3.6 | Most unique, viral potential |
|
||||
| 5 | 009 Image Tarot | 3.5 | Fun, shareable |
|
||||
|
||||
---
|
||||
|
||||
## 033v2 Detective — Why It Wins
|
||||
|
||||
### Alignment with User Goals
|
||||
|
||||
| User Goal | How Detective Meets It |
|
||||
|-----------|----------------------|
|
||||
| Heavy visual analysis | Kimi analyzes each piece of evidence |
|
||||
| Low reasoning | Pattern matching, not complex logic |
|
||||
| Multi-turn | 5-7 rounds per case |
|
||||
| Human-AI collaboration | Human (Chief) directs the investigation |
|
||||
| Cost efficient | Mostly text between Kimi calls |
|
||||
| Fun/engagement | Mystery + competition |
|
||||
|
||||
### What Makes It Special
|
||||
|
||||
1. **Natural two-agent roles:** Witness (sees) + Detective (thinks)
|
||||
2. **Human as boss:** Chief directs investigation, not passive observer
|
||||
3. **Multi-turn structure:** Each round builds the case
|
||||
4. **Kimi's strength shines:** Visual evidence analysis is the core mechanic
|
||||
5. **Scoring system:** Track cases solved, rounds taken, accuracy
|
||||
|
||||
### Comparison to Other Games
|
||||
|
||||
| Aspect | Spot the Difference | Tarot | Alchemy | **Detective** |
|
||||
|--------|-------------------|-------|---------|---------------|
|
||||
| Visual Analysis | 4 | 4 | 4 | **5** |
|
||||
| Multi-Turn | 2 | 2 | 2 | **5** |
|
||||
| Human Role | Judge | Receive | Submit | **Direct** |
|
||||
| Narrative | None | Story | Surprise | **Full Mystery** |
|
||||
| Replayability | Medium | Low | Medium | **High** |
|
||||
|
||||
---
|
||||
|
||||
## Recommendation
|
||||
|
||||
**Go with 033v2 Detective.**
|
||||
|
||||
### Why Not Others
|
||||
|
||||
| Idea | Why Not |
|
||||
|------|---------|
|
||||
| 001 Visual Narrative | Too similar to others, high cost |
|
||||
| 007 Spot Difference | Fun but shallow (1-turn) |
|
||||
| 009 Image Tarot | Not really interactive |
|
||||
| 013 Image Alchemy | Unique but single interaction |
|
||||
| Auction | Good but less "AI demonstration" |
|
||||
|
||||
### Detective's Edge
|
||||
|
||||
- **Multi-turn** = not just a quick demo
|
||||
- **Human directs** = active participation
|
||||
- **Kimi sees evidence** = clear AI capability showcase
|
||||
- **Cost efficient** = mostly text
|
||||
- **Daily cases** = reason to return
|
||||
|
||||
---
|
||||
|
||||
## Next Steps for 033v2 Detective
|
||||
|
||||
- [ ] Define case structure (5-7 evidence images)
|
||||
- [ ] Design Chief interface (what buttons/actions)
|
||||
- [ ] Plan Witness + Detective prompts
|
||||
- [ ] Mock up UI
|
||||
- [ ] Prototype with one case
|
||||
|
||||
---
|
||||
|
||||
## Appendix: Ideas That Could Combine with Detective
|
||||
|
||||
### Detective + Art Critic
|
||||
Two types of daily content: Mystery case OR Art analysis
|
||||
|
||||
### Detective + Auction
|
||||
Hybrid mode: Evidence auction where Chief describes to Detective
|
||||
|
||||
### Detective + Spot Difference
|
||||
Mini-game within case: "Find the clue hidden in this photo"
|
||||
47
docs/research-hermes-agent.md
Normal file
47
docs/research-hermes-agent.md
Normal file
@@ -0,0 +1,47 @@
|
||||
# Research: Hermes Agent Capabilities
|
||||
|
||||
**Date:** 2026-04-19
|
||||
**Purpose:** Understand Hermes Agent framework for hackathon integration
|
||||
|
||||
## Hermes 3 (Nous Research)
|
||||
|
||||
### Core Capabilities
|
||||
- **Advanced agentic capabilities**
|
||||
- **Reliable function calling** - Trained specifically for tool use
|
||||
- **Structured output** - JSON mode / Pydantic schemas
|
||||
- **ChatML prompt format** - OpenAI-compatible
|
||||
- Multi-turn conversation
|
||||
- Long context coherence
|
||||
|
||||
### Benchmark Performance
|
||||
| Benchmark | Hermes 3 Score |
|
||||
|-----------|---------------|
|
||||
| IFEval (0-shot) | 61.70% |
|
||||
| MMLU-Redux | 92.7% |
|
||||
| MMLU-Pro | 81.1% |
|
||||
| SimpleQA | 31.0% |
|
||||
|
||||
### Function Calling
|
||||
- Trained on specific prompts for tool use
|
||||
- XML-based tool call format: `<tool_call>{"name": "...", "arguments": {...}}</tool_call>`
|
||||
- Supports recursive/chain tool calls
|
||||
- Native tool integration via NousResearch/Hermes-Function-Calling repo
|
||||
|
||||
## Hermes Agent Framework
|
||||
|
||||
### Key Components
|
||||
1. **ChatML format** - Structured system/user/assistant turns
|
||||
2. **Tool definitions** - JSON schema for function signatures
|
||||
3. **Tool parsing** - Parse and execute function calls
|
||||
4. **Response loop** - Multi-turn agentic execution
|
||||
|
||||
### Integration Points
|
||||
- HuggingFace Transformers
|
||||
- vLLM inference
|
||||
- Ollama local deployment
|
||||
- OpenAI-compatible API
|
||||
|
||||
## Sources
|
||||
- https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B
|
||||
- https://github.com/NousResearch/Hermes-Function-Calling
|
||||
- https://arxiv.org/abs/2408.11857 (Hermes 3 Technical Report)
|
||||
72
docs/research-image-generation-apis.md
Normal file
72
docs/research-image-generation-apis.md
Normal file
@@ -0,0 +1,72 @@
|
||||
# Research: Image Generation APIs
|
||||
|
||||
**Date:** 2026-04-19
|
||||
**Purpose:** Find affordable/free image generation for hackathon project
|
||||
|
||||
## Pollinations AI (Recommended ✅)
|
||||
|
||||
**Why:** Free tier, OpenAI-compatible, multiple models, simple API
|
||||
|
||||
### Quick Start
|
||||
```bash
|
||||
# No auth needed for basic
|
||||
curl "https://gen.pollinations.ai/image/a%20cat%20in%20space"
|
||||
|
||||
# With auth
|
||||
curl -H "Authorization: Bearer YOUR_KEY" ...
|
||||
```
|
||||
|
||||
### Models Available
|
||||
| Model | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| `flux` | Default | Good quality |
|
||||
| `zimage` | Default | Alternative |
|
||||
| `wan-image` | Quality | Higher quality option |
|
||||
| `qwen-image` | Quality | Alibaba model |
|
||||
| `gptimage` | Quality | GPT-based |
|
||||
| `seedream5` | Style | Special styles |
|
||||
| `kontext` | Edit | Image editing |
|
||||
|
||||
### Pricing
|
||||
- **Free tier:** Weekly pollen credits (tier-based)
|
||||
- **Paid:** $1 ≈ 1 Pollen
|
||||
- **Free API:** Limited but usable
|
||||
- **Rate limits:** Anonymous = limited, Seed/Flower = more
|
||||
|
||||
### API Details
|
||||
- **Base URL:** `https://gen.pollinations.ai`
|
||||
- **Image endpoint:** `GET /image/{prompt}`
|
||||
- **OpenAI-compatible:** `POST /v1/images/generations`
|
||||
- **No setup:** Just curl it
|
||||
|
||||
### Strengths
|
||||
- ✅ 100% Open Source
|
||||
- ✅ Free tier available
|
||||
- ✅ Multiple model options
|
||||
- ✅ Simple API (no complex setup)
|
||||
- ✅ OpenAI-compatible SDK
|
||||
|
||||
### Weaknesses
|
||||
- ⚠️ Quality may not match DALL-E/Midjourney
|
||||
- ⚠️ Free tier has rate limits
|
||||
- ⚠️ Infrastructure may vary in reliability
|
||||
|
||||
## Other Options Considered
|
||||
|
||||
| Provider | Free Tier | Quality | Notes |
|
||||
|----------|-----------|---------|-------|
|
||||
| **Midjourney** | ❌ No | High | Expensive |
|
||||
| **Stable Diffusion** | Local only | High | Needs GPU |
|
||||
| **DALL-E 3** | ❌ No | High | OpenAI pricing |
|
||||
| **Ideogram** | Limited | Good | API in beta |
|
||||
| **Flux (Local)** | ✅ Free | High | Self-hosted, needs GPU |
|
||||
|
||||
## Recommendation
|
||||
|
||||
**Primary:** Pollinations AI (free tier + simplicity)
|
||||
**Fallback:** Flux if we have GPU resources
|
||||
|
||||
## Sources
|
||||
- https://gen.pollinations.ai
|
||||
- https://docs.pollinations.ai/
|
||||
- https://github.com/pollinations/pollinations
|
||||
51
docs/research-kimi-visual-capabilities.md
Normal file
51
docs/research-kimi-visual-capabilities.md
Normal file
@@ -0,0 +1,51 @@
|
||||
# Research: Kimi Visual Capabilities
|
||||
|
||||
**Date:** 2026-04-19
|
||||
**Purpose:** Validate Kimi's visual strengths for hackathon project
|
||||
|
||||
## Kimi K2.5 - Multimodal Model
|
||||
|
||||
### Core Capabilities
|
||||
- **Text + Images + Video** input support
|
||||
- 256K context length
|
||||
- Thinking/non-thinking modes
|
||||
- Agent task support
|
||||
|
||||
### Visual API Models
|
||||
- `moonshot-v1-8k-vision-preview`
|
||||
- `moonshot-v1-32k-vision-preview`
|
||||
- `moonshot-v1-128k-vision-preview`
|
||||
- `kimi-k2.5` (latest, supports video)
|
||||
|
||||
### Supported Formats
|
||||
**Images:** png, jpeg, webp, gif
|
||||
**Video:** mp4, mpeg, mov, avi, x-flv, mpg, webm, wmv, 3gpp
|
||||
|
||||
### Unique Visual Features
|
||||
1. **Visual Coding** - Kimi Code, Kimi Claw for coding with visual context
|
||||
2. **Video Understanding** - Analyzes video content (unique for multimodal models)
|
||||
3. **Real-time Visual Chat** - Interactive visual conversation
|
||||
|
||||
## Kimi K2 Benchmarks (Coding/Agent)
|
||||
|
||||
| Benchmark | Kimi K2 Score | Notes |
|
||||
|-----------|---------------|-------|
|
||||
| SWE-bench Verified (Single Attempt) | **65.8%** | Global SOTA for open-source |
|
||||
| SWE-bench Multilingual | 47.3% | Outperforms most proprietary |
|
||||
| LiveCodeBench v6 | 53.7% | Strong coding |
|
||||
| TerminalBench | 30.0% | Agentic tool use |
|
||||
| Aider-Polyglot | 60.0% | Code editing |
|
||||
| Tau2-Bench (avg) | ~64% | Tool use tasks |
|
||||
|
||||
## Kimi Visual Strengths Summary
|
||||
|
||||
✅ **Video understanding** (unique advantage)
|
||||
✅ **Visual coding** capabilities
|
||||
✅ **Image + Text multimodal**
|
||||
✅ **Strong agentic tool use**
|
||||
✅ **256K context** for large visual inputs
|
||||
|
||||
## Sources
|
||||
- https://platform.moonshot.cn/docs/guide/kimi-k2-5-quickstart
|
||||
- https://moonshotai.github.io/Kimi-K2/
|
||||
- https://platform.moonshot.cn/docs/guide/use-kimi-vision-model
|
||||
Reference in New Issue
Block a user