- Hermes Detective Agency: Open-ended mystery investigation game - Roles: Chief (human), Witness (Kimi), Detective (Hermes) - 5 difficulty levels, community cases, open-ended solving - Scoring: Alignment %, Evidence %, Time - Features: Retry, Journal, Observe mode - Tech: Kimi Vision + Hermes Agent + Pollinations Changelog: - Research phase: Kimi capabilities, Hermes agent, image APIs - Brainstorming: 14 ideas explored - Comparison matrix: Detective selected as winner - Concept finalized with all design decisions
2.7 KiB
2.7 KiB
Idea 001: Visual Narrative Agent
Date: 2026-04-19
Status: Idea
Tags: hermes-agent, kimi-vision, storytelling, image-generation
Concept
An agentic storytelling system where Hermes orchestrates a narrative loop with Kimi's visual analysis and built-in image generation skills to produce coherent visual stories.
User Flow
- User provides text prompt (e.g., "A lone astronaut discovers an ancient alien garden on Mars")
- Hermes plans story structure (scenes, pacing, visual style)
- For each scene:
- Hermes generates image prompt
- Generate image (Hermes built-in skill: manim / ascii)
- Kimi analyzes generated image
- Kimi's feedback refines next scene's prompt
- Return compiled visual story to user
Key Differentiator
Most story-to-image tools: Generate → Done
This concept: Generate → Analyze → Refine → Loop
Kimi serves as the visual reasoning engine — tells Hermes if the generated image matches the intended scene, catches inconsistencies, and informs prompt refinement for the next scene.
Tech Stack
| Component | Source | Role |
|---|---|---|
| Hermes Agent | Nous Research | Orchestration, planning, decision loop |
| Kimi Vision | Moonshot AI (via gateway) | Image analysis, visual feedback |
| Image Generation | Pollinations AI | Free tier, multiple models (Flux, etc.) |
Image Generation Options
| Provider | Free Tier | Quality | Use Case |
|---|---|---|---|
| Pollinations ✅ | ✅ Yes | Good | Primary (simple, free) |
| Flux (local) | ✅ Free | High | If GPU available |
| Hermes skills | ✅ Free | Niche | Fallback/ASCII aesthetic |
Pollinations API (Primary)
- Endpoint:
https://gen.pollinations.ai/image/{prompt} - Models: flux, zimage, wan-image, qwen-image, etc.
- Cost: Free tier (pollen credits), ~$1/1 Pollen paid
- Auth: Optional for free tier
Strengths
- ✅ Combines Hermes + Kimi + Pollinations natively
- ✅ Agentic visual feedback loop is unique
- ✅ Visual coherence check via Kimi ensures quality
- ✅ Free tier = low barrier to test
- ✅ User controls output format (default: image)
Weaknesses
- ⚠️ Pollinations quality vs DALL-E/Midjourney (may need to test)
- ⚠️ Kimi requires gateway access (no direct API key)
- ⚠️ Loop adds latency (generate → analyze → refine)
- ⚠️ Need to verify Pollinations reliability
Uniqueness Score
7/10 — Agentic visual feedback loop is novel, but need to verify if built-in image generation is compelling enough
Next Steps
- Explore Hermes built-in image skills (manim, ascii)
- Define output format options
- Sketch technical architecture
Related Ideas
- See:
002-xxx.md,003-xxx.mdfor alternatives