feat: Initial commit - Hermes Detective Agency concept
- Hermes Detective Agency: Open-ended mystery investigation game - Roles: Chief (human), Witness (Kimi), Detective (Hermes) - 5 difficulty levels, community cases, open-ended solving - Scoring: Alignment %, Evidence %, Time - Features: Retry, Journal, Observe mode - Tech: Kimi Vision + Hermes Agent + Pollinations Changelog: - Research phase: Kimi capabilities, Hermes agent, image APIs - Brainstorming: 14 ideas explored - Comparison matrix: Detective selected as winner - Concept finalized with all design decisions
This commit is contained in:
79
docs/ideas/001-visual-narrative-agent.md
Normal file
79
docs/ideas/001-visual-narrative-agent.md
Normal file
@@ -0,0 +1,79 @@
|
||||
# Idea 001: Visual Narrative Agent
|
||||
|
||||
**Date:** 2026-04-19
|
||||
**Status:** Idea
|
||||
**Tags:** hermes-agent, kimi-vision, storytelling, image-generation
|
||||
|
||||
## Concept
|
||||
|
||||
An agentic storytelling system where Hermes orchestrates a narrative loop with Kimi's visual analysis and built-in image generation skills to produce coherent visual stories.
|
||||
|
||||
## User Flow
|
||||
|
||||
1. User provides text prompt (e.g., "A lone astronaut discovers an ancient alien garden on Mars")
|
||||
2. Hermes plans story structure (scenes, pacing, visual style)
|
||||
3. For each scene:
|
||||
- Hermes generates image prompt
|
||||
- Generate image (Hermes built-in skill: manim / ascii)
|
||||
- Kimi analyzes generated image
|
||||
- Kimi's feedback refines next scene's prompt
|
||||
4. Return compiled visual story to user
|
||||
|
||||
## Key Differentiator
|
||||
|
||||
Most story-to-image tools: **Generate → Done**
|
||||
|
||||
This concept: **Generate → Analyze → Refine → Loop**
|
||||
|
||||
Kimi serves as the **visual reasoning engine** — tells Hermes if the generated image matches the intended scene, catches inconsistencies, and informs prompt refinement for the next scene.
|
||||
|
||||
## Tech Stack
|
||||
|
||||
| Component | Source | Role |
|
||||
|-----------|--------|------|
|
||||
| Hermes Agent | Nous Research | Orchestration, planning, decision loop |
|
||||
| Kimi Vision | Moonshot AI (via gateway) | Image analysis, visual feedback |
|
||||
| Image Generation | Pollinations AI | Free tier, multiple models (Flux, etc.) |
|
||||
|
||||
### Image Generation Options
|
||||
|
||||
| Provider | Free Tier | Quality | Use Case |
|
||||
|---------|-----------|---------|----------|
|
||||
| **Pollinations** ✅ | ✅ Yes | Good | Primary (simple, free) |
|
||||
| **Flux (local)** | ✅ Free | High | If GPU available |
|
||||
| **Hermes skills** | ✅ Free | Niche | Fallback/ASCII aesthetic |
|
||||
|
||||
### Pollinations API (Primary)
|
||||
- **Endpoint:** `https://gen.pollinations.ai/image/{prompt}`
|
||||
- **Models:** flux, zimage, wan-image, qwen-image, etc.
|
||||
- **Cost:** Free tier (pollen credits), ~$1/1 Pollen paid
|
||||
- **Auth:** Optional for free tier
|
||||
|
||||
## Strengths
|
||||
|
||||
- ✅ Combines Hermes + Kimi + Pollinations natively
|
||||
- ✅ Agentic visual feedback loop is unique
|
||||
- ✅ Visual coherence check via Kimi ensures quality
|
||||
- ✅ Free tier = low barrier to test
|
||||
- ✅ User controls output format (default: image)
|
||||
|
||||
## Weaknesses
|
||||
|
||||
- ⚠️ Pollinations quality vs DALL-E/Midjourney (may need to test)
|
||||
- ⚠️ Kimi requires gateway access (no direct API key)
|
||||
- ⚠️ Loop adds latency (generate → analyze → refine)
|
||||
- ⚠️ Need to verify Pollinations reliability
|
||||
|
||||
## Uniqueness Score
|
||||
|
||||
**7/10** — Agentic visual feedback loop is novel, but need to verify if built-in image generation is compelling enough
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [ ] Explore Hermes built-in image skills (manim, ascii)
|
||||
- [ ] Define output format options
|
||||
- [ ] Sketch technical architecture
|
||||
|
||||
## Related Ideas
|
||||
|
||||
- See: `002-xxx.md`, `003-xxx.md` for alternatives
|
||||
Reference in New Issue
Block a user