Initial commit: kage-research project files

This commit is contained in:
shokollm
2026-04-09 00:39:52 +00:00
commit 71fc8b4495
19 changed files with 5303 additions and 0 deletions

130
llm-compression-research.md Normal file
View File

@@ -0,0 +1,130 @@
# LLM for Context Compression/Summarization
## Overview
Research on best LLMs for context compression (summarizing old messages to save tokens).
**Use case**: Compress old conversation history when context gets too long.
---
## Ranking: Performance First
Based on general benchmarks and summarization capability:
| Rank | Model | Provider | Strengths |
|------|-------|----------|-----------|
| 1 | **GPT-4.1** | OpenAI | Best overall reasoning, good summarization |
| 2 | **Claude 4 Sonnet** | Anthropic | Excellent at long context tasks |
| 3 | **Gemini 2.5 Pro** | Google | Massive context, strong reasoning |
| 4 | **GPT-4o** | OpenAI | Balanced, reliable |
| 5 | **Gemini 2.0 Flash** | Google | Fast + good quality |
| 6 | **Claude 3.5 Sonnet** | Anthropic | Good value, fast |
| 7 | **Llama 3.3 70B** | Meta | Open source, good reasoning |
| 8 | **Qwen 3** | Alibaba | Excellent for coding/summarization |
| 9 | **Mistral Large** | Mistral | European option, fast |
| 10 | **Gemma 3** | Google | Lightweight, free |
**Note**: Performance is subjective and varies by use case. For summarization specifically, fast models (Flash) often work well.
---
## Ranking: Price First (Cheapest)
Sorted by input cost (per 1M tokens):
### Free Models (OpenRouter)
| Model | Input | Output | Context | Notes |
|-------|-------|--------|---------|-------|
| **stepfun/step-3.5-flash:free** | $0 | $0 | 256K | ✅ Currently using |
| **minimax/minimax-m2.5:free** | $0 | $0 | 196K | Good quality |
| **meta-llama/llama-3.3-70b:free** | $0 | $0 | 128K | Solid |
| **arcee-ai/trinity-mini:free** | $0 | $0 | 131K | Lightweight |
### Paid Models (Cheapest)
| Model | Input | Output | Context | Notes |
|-------|-------|--------|---------|-------|
| **google/gemini-1.5-flash-8b** | $0.0375 | $0.15 | 1M | 🏆 Best cheap |
| **google/gemini-2.0-flash-lite** | $0.075 | $0.30 | 1M | Fast |
| **qwen/qwen3.5-flash-02-23** | $0.065 | $0.26 | 1M | Great context |
| **openai/gpt-5-nano** | $0.05 | $0.40 | 200K | Cheap |
| **openai/gpt-4.1-nano** | $0.10 | $0.40 | 1M | Good |
| **openai/gpt-4o-mini** | $0.15 | $0.60 | 128K | Reliable |
| **anthropic/claude-3-haiku** | $0.25 | $1.25 | 200K | Fast |
---
## Ranking: Value for Money
Combines performance + price (subjective scoring):
| Rank | Model | Input Cost | Performance | Value Score |
|------|-------|------------|-------------|-------------|
| 1 🏆 | **google/gemini-2.0-flash-lite** | $0.075 | 7/10 | ⭐⭐⭐⭐⭐ |
| 2 | **qwen/qwen3.5-flash** | $0.065 | 6/10 | ⭐⭐⭐⭐⭐ |
| 3 | **stepfun/step-3.5-flash:free** | $0 | 5/10 | ⭐⭐⭐⭐⭐ |
| 4 | **minimax/minimax-m2.5:free** | $0 | 5/10 | ⭐⭐⭐⭐ |
| 5 | **openai/gpt-4o-mini** | $0.15 | 8/10 | ⭐⭐⭐⭐ |
| 6 | **google/gemini-1.5-flash-8b** | $0.0375 | 6/10 | ⭐⭐⭐⭐ |
| 7 | **anthropic/claude-3.5-haiku** | $0.40 | 7/10 | ⭐⭐⭐ |
| 8 | **openai/gpt-4.1** | $1.10 | 9/10 | ⭐⭐⭐ |
---
## Recommendation for Context Compression
### For This Project (Kugetsu/Pi)
**Option 1: Free (Current)**
- `stepfun/step-3.5-flash:free` - Works, no cost
- Good enough for simple summarization
**Option 2: Best Value**
- `google/gemini-2.0-flash-lite` - $0.075/M tokens
- 1M context window
- Fast and reliable
**Option 3: Best Performance**
- `openai/gpt-4.1-nano` - $0.10/M tokens
- Excellent reasoning for better summaries
---
## How Compression Would Work
```typescript
// Pseudocode for compression
async function compressContext(messages: Message[]): Promise<Message[]> {
// 1. Take old messages (not recent)
const oldMessages = messages.slice(1, -10); // Skip system + keep recent
// 2. Send to compression model
const summary = await llm.compress(`
Summarize this conversation concisely:
${formatMessages(oldMessages)}
`);
// 3. Return summarized context
return [
messages[0], // system
{ role: "user", content: `[Previous conversation summarized: ${summary}]` },
...messages.slice(-10) // recent messages
];
}
```
---
## Summary
| Priority | Recommended Model | Cost |
|----------|------------------|------|
| **Performance** | GPT-4.1 or Claude 4 Sonnet | $$ |
| **Price** | stepfun/free or Gemini Flash Lite | $0-0.075 |
| **Value** | Gemini 2.0 Flash Lite | $0.075 |
For this POC, I'd recommend:
- **Free**: Keep using `stepfun/step-3.5-flash:free`
- **Production**: Switch to `google/gemini-2.0-flash-lite` ($0.075/M)