- Hermes Detective Agency: Open-ended mystery investigation game - Roles: Chief (human), Witness (Kimi), Detective (Hermes) - 5 difficulty levels, community cases, open-ended solving - Scoring: Alignment %, Evidence %, Time - Features: Retry, Journal, Observe mode - Tech: Kimi Vision + Hermes Agent + Pollinations Changelog: - Research phase: Kimi capabilities, Hermes agent, image APIs - Brainstorming: 14 ideas explored - Comparison matrix: Detective selected as winner - Concept finalized with all design decisions
80 lines
2.7 KiB
Markdown
80 lines
2.7 KiB
Markdown
# Idea 001: Visual Narrative Agent
|
|
|
|
**Date:** 2026-04-19
|
|
**Status:** Idea
|
|
**Tags:** hermes-agent, kimi-vision, storytelling, image-generation
|
|
|
|
## Concept
|
|
|
|
An agentic storytelling system where Hermes orchestrates a narrative loop with Kimi's visual analysis and built-in image generation skills to produce coherent visual stories.
|
|
|
|
## User Flow
|
|
|
|
1. User provides text prompt (e.g., "A lone astronaut discovers an ancient alien garden on Mars")
|
|
2. Hermes plans story structure (scenes, pacing, visual style)
|
|
3. For each scene:
|
|
- Hermes generates image prompt
|
|
- Generate image (Hermes built-in skill: manim / ascii)
|
|
- Kimi analyzes generated image
|
|
- Kimi's feedback refines next scene's prompt
|
|
4. Return compiled visual story to user
|
|
|
|
## Key Differentiator
|
|
|
|
Most story-to-image tools: **Generate → Done**
|
|
|
|
This concept: **Generate → Analyze → Refine → Loop**
|
|
|
|
Kimi serves as the **visual reasoning engine** — tells Hermes if the generated image matches the intended scene, catches inconsistencies, and informs prompt refinement for the next scene.
|
|
|
|
## Tech Stack
|
|
|
|
| Component | Source | Role |
|
|
|-----------|--------|------|
|
|
| Hermes Agent | Nous Research | Orchestration, planning, decision loop |
|
|
| Kimi Vision | Moonshot AI (via gateway) | Image analysis, visual feedback |
|
|
| Image Generation | Pollinations AI | Free tier, multiple models (Flux, etc.) |
|
|
|
|
### Image Generation Options
|
|
|
|
| Provider | Free Tier | Quality | Use Case |
|
|
|---------|-----------|---------|----------|
|
|
| **Pollinations** ✅ | ✅ Yes | Good | Primary (simple, free) |
|
|
| **Flux (local)** | ✅ Free | High | If GPU available |
|
|
| **Hermes skills** | ✅ Free | Niche | Fallback/ASCII aesthetic |
|
|
|
|
### Pollinations API (Primary)
|
|
- **Endpoint:** `https://gen.pollinations.ai/image/{prompt}`
|
|
- **Models:** flux, zimage, wan-image, qwen-image, etc.
|
|
- **Cost:** Free tier (pollen credits), ~$1/1 Pollen paid
|
|
- **Auth:** Optional for free tier
|
|
|
|
## Strengths
|
|
|
|
- ✅ Combines Hermes + Kimi + Pollinations natively
|
|
- ✅ Agentic visual feedback loop is unique
|
|
- ✅ Visual coherence check via Kimi ensures quality
|
|
- ✅ Free tier = low barrier to test
|
|
- ✅ User controls output format (default: image)
|
|
|
|
## Weaknesses
|
|
|
|
- ⚠️ Pollinations quality vs DALL-E/Midjourney (may need to test)
|
|
- ⚠️ Kimi requires gateway access (no direct API key)
|
|
- ⚠️ Loop adds latency (generate → analyze → refine)
|
|
- ⚠️ Need to verify Pollinations reliability
|
|
|
|
## Uniqueness Score
|
|
|
|
**7/10** — Agentic visual feedback loop is novel, but need to verify if built-in image generation is compelling enough
|
|
|
|
## Next Steps
|
|
|
|
- [ ] Explore Hermes built-in image skills (manim, ascii)
|
|
- [ ] Define output format options
|
|
- [ ] Sketch technical architecture
|
|
|
|
## Related Ideas
|
|
|
|
- See: `002-xxx.md`, `003-xxx.md` for alternatives
|