- Hermes Detective Agency: Open-ended mystery investigation game - Roles: Chief (human), Witness (Kimi), Detective (Hermes) - 5 difficulty levels, community cases, open-ended solving - Scoring: Alignment %, Evidence %, Time - Features: Retry, Journal, Observe mode - Tech: Kimi Vision + Hermes Agent + Pollinations Changelog: - Research phase: Kimi capabilities, Hermes agent, image APIs - Brainstorming: 14 ideas explored - Comparison matrix: Detective selected as winner - Concept finalized with all design decisions
1.7 KiB
1.7 KiB
Research: Kimi Visual Capabilities
Date: 2026-04-19
Purpose: Validate Kimi's visual strengths for hackathon project
Kimi K2.5 - Multimodal Model
Core Capabilities
- Text + Images + Video input support
- 256K context length
- Thinking/non-thinking modes
- Agent task support
Visual API Models
moonshot-v1-8k-vision-previewmoonshot-v1-32k-vision-previewmoonshot-v1-128k-vision-previewkimi-k2.5(latest, supports video)
Supported Formats
Images: png, jpeg, webp, gif
Video: mp4, mpeg, mov, avi, x-flv, mpg, webm, wmv, 3gpp
Unique Visual Features
- Visual Coding - Kimi Code, Kimi Claw for coding with visual context
- Video Understanding - Analyzes video content (unique for multimodal models)
- Real-time Visual Chat - Interactive visual conversation
Kimi K2 Benchmarks (Coding/Agent)
| Benchmark | Kimi K2 Score | Notes |
|---|---|---|
| SWE-bench Verified (Single Attempt) | 65.8% | Global SOTA for open-source |
| SWE-bench Multilingual | 47.3% | Outperforms most proprietary |
| LiveCodeBench v6 | 53.7% | Strong coding |
| TerminalBench | 30.0% | Agentic tool use |
| Aider-Polyglot | 60.0% | Code editing |
| Tau2-Bench (avg) | ~64% | Tool use tasks |
Kimi Visual Strengths Summary
✅ Video understanding (unique advantage)
✅ Visual coding capabilities
✅ Image + Text multimodal
✅ Strong agentic tool use
✅ 256K context for large visual inputs