feat: Initial commit - Hermes Detective Agency concept

- Hermes Detective Agency: Open-ended mystery investigation game
- Roles: Chief (human), Witness (Kimi), Detective (Hermes)
- 5 difficulty levels, community cases, open-ended solving
- Scoring: Alignment %, Evidence %, Time
- Features: Retry, Journal, Observe mode
- Tech: Kimi Vision + Hermes Agent + Pollinations

Changelog:
- Research phase: Kimi capabilities, Hermes agent, image APIs
- Brainstorming: 14 ideas explored
- Comparison matrix: Detective selected as winner
- Concept finalized with all design decisions
This commit is contained in:
2026-04-20 00:00:30 +00:00
commit ecfd0b1160
10 changed files with 1685 additions and 0 deletions

View File

@@ -0,0 +1,51 @@
# Research: Kimi Visual Capabilities
**Date:** 2026-04-19
**Purpose:** Validate Kimi's visual strengths for hackathon project
## Kimi K2.5 - Multimodal Model
### Core Capabilities
- **Text + Images + Video** input support
- 256K context length
- Thinking/non-thinking modes
- Agent task support
### Visual API Models
- `moonshot-v1-8k-vision-preview`
- `moonshot-v1-32k-vision-preview`
- `moonshot-v1-128k-vision-preview`
- `kimi-k2.5` (latest, supports video)
### Supported Formats
**Images:** png, jpeg, webp, gif
**Video:** mp4, mpeg, mov, avi, x-flv, mpg, webm, wmv, 3gpp
### Unique Visual Features
1. **Visual Coding** - Kimi Code, Kimi Claw for coding with visual context
2. **Video Understanding** - Analyzes video content (unique for multimodal models)
3. **Real-time Visual Chat** - Interactive visual conversation
## Kimi K2 Benchmarks (Coding/Agent)
| Benchmark | Kimi K2 Score | Notes |
|-----------|---------------|-------|
| SWE-bench Verified (Single Attempt) | **65.8%** | Global SOTA for open-source |
| SWE-bench Multilingual | 47.3% | Outperforms most proprietary |
| LiveCodeBench v6 | 53.7% | Strong coding |
| TerminalBench | 30.0% | Agentic tool use |
| Aider-Polyglot | 60.0% | Code editing |
| Tau2-Bench (avg) | ~64% | Tool use tasks |
## Kimi Visual Strengths Summary
**Video understanding** (unique advantage)
**Visual coding** capabilities
**Image + Text multimodal**
**Strong agentic tool use**
**256K context** for large visual inputs
## Sources
- https://platform.moonshot.cn/docs/guide/kimi-k2-5-quickstart
- https://moonshotai.github.io/Kimi-K2/
- https://platform.moonshot.cn/docs/guide/use-kimi-vision-model