Files
hermes-detective/docs/ideas/001-visual-narrative-agent.md
shoko ecfd0b1160 feat: Initial commit - Hermes Detective Agency concept
- Hermes Detective Agency: Open-ended mystery investigation game
- Roles: Chief (human), Witness (Kimi), Detective (Hermes)
- 5 difficulty levels, community cases, open-ended solving
- Scoring: Alignment %, Evidence %, Time
- Features: Retry, Journal, Observe mode
- Tech: Kimi Vision + Hermes Agent + Pollinations

Changelog:
- Research phase: Kimi capabilities, Hermes agent, image APIs
- Brainstorming: 14 ideas explored
- Comparison matrix: Detective selected as winner
- Concept finalized with all design decisions
2026-04-20 00:00:30 +00:00

2.7 KiB

Idea 001: Visual Narrative Agent

Date: 2026-04-19
Status: Idea
Tags: hermes-agent, kimi-vision, storytelling, image-generation

Concept

An agentic storytelling system where Hermes orchestrates a narrative loop with Kimi's visual analysis and built-in image generation skills to produce coherent visual stories.

User Flow

  1. User provides text prompt (e.g., "A lone astronaut discovers an ancient alien garden on Mars")
  2. Hermes plans story structure (scenes, pacing, visual style)
  3. For each scene:
    • Hermes generates image prompt
    • Generate image (Hermes built-in skill: manim / ascii)
    • Kimi analyzes generated image
    • Kimi's feedback refines next scene's prompt
  4. Return compiled visual story to user

Key Differentiator

Most story-to-image tools: Generate → Done

This concept: Generate → Analyze → Refine → Loop

Kimi serves as the visual reasoning engine — tells Hermes if the generated image matches the intended scene, catches inconsistencies, and informs prompt refinement for the next scene.

Tech Stack

Component Source Role
Hermes Agent Nous Research Orchestration, planning, decision loop
Kimi Vision Moonshot AI (via gateway) Image analysis, visual feedback
Image Generation Pollinations AI Free tier, multiple models (Flux, etc.)

Image Generation Options

Provider Free Tier Quality Use Case
Pollinations Yes Good Primary (simple, free)
Flux (local) Free High If GPU available
Hermes skills Free Niche Fallback/ASCII aesthetic

Pollinations API (Primary)

  • Endpoint: https://gen.pollinations.ai/image/{prompt}
  • Models: flux, zimage, wan-image, qwen-image, etc.
  • Cost: Free tier (pollen credits), ~$1/1 Pollen paid
  • Auth: Optional for free tier

Strengths

  • Combines Hermes + Kimi + Pollinations natively
  • Agentic visual feedback loop is unique
  • Visual coherence check via Kimi ensures quality
  • Free tier = low barrier to test
  • User controls output format (default: image)

Weaknesses

  • ⚠️ Pollinations quality vs DALL-E/Midjourney (may need to test)
  • ⚠️ Kimi requires gateway access (no direct API key)
  • ⚠️ Loop adds latency (generate → analyze → refine)
  • ⚠️ Need to verify Pollinations reliability

Uniqueness Score

7/10 — Agentic visual feedback loop is novel, but need to verify if built-in image generation is compelling enough

Next Steps

  • Explore Hermes built-in image skills (manim, ascii)
  • Define output format options
  • Sketch technical architecture
  • See: 002-xxx.md, 003-xxx.md for alternatives