Initial commit: kage-research project files

This commit is contained in:
shokollm
2026-04-09 00:39:52 +00:00
commit 71fc8b4495
19 changed files with 5303 additions and 0 deletions

505
research.md Normal file
View File

@@ -0,0 +1,505 @@
# Research: Agent Frameworks for Programmatic/Headless Usage
## Summary
This research evaluates seven agent frameworks/tools for programmatic/headless usage: Hermes, OpenCode, Pi, OpenClaw, LangChain Agents, Claude Code, and Codex. The evaluation focuses on headless operation, resource usage, session management, agent lifecycle, data persistence, customizability, and integration complexity. **For the user's use case (replacing hermes + opencode with something better for local dev and cloud production)**, the top recommendations are:
- **Pi (agent-core)**: Best for pure programmatic control with excellent TypeScript SDK, event-driven architecture, and lightweight footprint
- **Claude Code**: Best for production-grade headless operation with structured output, CI/CD integration, and official SDK support
- **LangChain**: Best for flexibility and customization if the user wants full control over the agent loop
- **OpenCode**: Strong option if they want to stick with a similar architecture but need better SDK
---
## Comparison Matrix
| Criteria | Hermes | OpenCode | Pi (agent-core) | OpenClaw | LangChain Agents | Claude Code | Codex |
|----------|--------|----------|-----------------|----------|-----------------|-------------|-------|
| **Headless/Programmatic** | ✅ Python lib (`AIAgent`) | ✅ SDK + server mode | ✅ Full TypeScript SDK | ✅ Gateway WS API | ✅ `create_agent()` Python | ✅ `-p` flag + SDK | ❌ CLI only |
| **Resource Usage** | ~500MB+ (Python) | ~200-400MB (Go) | ~50-100MB (TS core) | ~500MB+ (Node) | ~100-300MB (Python) | ~200-400MB (Node) | ~200-300MB (Rust) |
| **Multi-agent Support** | ✅ Subagents/spawn | ✅ Multiple sessions | ✅ Multiple instances | ✅ Multi-agent routing | ✅ Via LangGraph | ✅ Multiple sessions | ❌ Single agent |
| **Session Management** | SQLite-based | Session API | In-memory + custom | Gateway sessions | Manual state | `--resume` flag | Session-based |
| **Data Persistence** | SQLite + pluggable memory | File-based | Custom (you control) | SQLite + gateway | You implement | File-based | File-based |
| **Customizability** | High (skills, tools, prompts) | High (tools, prompts) | High (tools, middleware) | High (skills, MCP) | Very high | Medium (plugins, hooks) | Low |
| **Plug-and-Play** | Easy (pip install) | Easy (npm) | Easy (npm) | Moderate | Moderate | Easy | Easy |
| **LLM Flexibility** | 200+ via OpenRouter | Any (provider-agnostic) | Any (multi-provider) | Any (multi-provider) | Any | Anthropic-first | OpenAI-first |
---
## Per-Tool Deep Dives
### 1. Hermes Agent (NousResearch/hermes-agent)
**Repository**: https://github.com/NousResearch/hermes-agent (30.7K stars)
#### Headless / Programmatic API
**Yes - Python Library**
Hermes can be imported and used as a Python library:
```python
from run_agent import AIAgent
agent = AIAgent(
model="anthropic/claude-sonnet-4",
quiet_mode=True,
)
response = agent.chat("What is the capital of France?")
```
For full conversation control:
```python
result = agent.run_conversation(
user_message="Search for recent Python features",
task_id="my-task-1",
)
# Returns: final_response, messages, task_id
```
**CLI Headless**: Also supports `-p` flag via OpenClaw migration path.
#### Resource Usage
- **Memory**: ~500MB+ (Python runtime)
- **CPU**: Moderate (depends on model)
- **Multi-agent**: Supports subagents via `sessions_spawn` tool
- **Batch**: `batch_runner.py` for parallel processing
#### Session Management
- **SQLite-based** session storage (configurable location)
- **Pluggable memory providers** (v0.7.0+) - built-in, Honcho, or custom
- **Conversation history** preserved across sessions
- **FTS5 search** for cross-session recall
- Multi-turn conversations via `conversation_history` parameter
#### Agent Lifecycle
1. **Initialize**: `AIAgent(model=, quiet_mode=)`
2. **Run**: `chat()` or `run_conversation()`
3. **Terminate**: Automatic cleanup; resources released on conversation end
**Key options**:
- `max_iterations`: 90 default (configurable)
- `enabled_toolsets` / `disabled_toolsets`: Control available tools
- `skip_memory` / `skip_context_files`: Stateless mode for APIs
#### Data Persistence
- **SQLite**: Session data stored in `~/.hermes/`
- **Memory**: Pluggable providers (built-in, Honcho, vector stores)
- **Trajectories**: JSONL format for training data (`save_trajectories=True`)
- **API Server**: Shared SessionDB for Open WebUI integration
#### Customizability
- **Skills**: Procedural memory via `SKILL.md` files
- **Tools**: Custom tool registration
- **Prompts**: `ephemeral_system_prompt` for dynamic prompts
- **MCP**: Model Context Protocol support
- **Platform hints**: `platform` param for Discord, Telegram, etc.
#### Performance/Intelligence
- **Self-improving**: Agent creates skills from experience
- **Memory persistence**: Learns across sessions
- **Credential pooling**: Multiple API keys with rotation
- **Compression**: Context compression to prevent overflow
#### Integration Example (FastAPI)
```python
from fastapi import FastAPI
from pydantic import BaseModel
from run_agent import AIAgent
app = FastAPI()
class ChatRequest(BaseModel):
message: str
model: str = "anthropic/claude-sonnet-4"
@app.post("/chat")
async def chat(request: ChatRequest):
agent = AIAgent(
model=request.model,
quiet_mode=True,
skip_context_files=True,
skip_memory=True,
)
return {"response": agent.chat(request.message)}
```
---
### 2. OpenCode (anomalyco/opencode)
**Repository**: https://github.com/anomalyco/opencode (138.9K stars, but this is the frontend repo - the actual agent is https://github.com/opencode-ai/opencode with 11.8K stars)
#### Headless / Programmatic API
**Yes - SDK + Server Mode**
**Server Mode**:
```bash
opencode serve [--port 4096] [--hostname "127.0.0.1"]
```
**SDK**:
```typescript
import { createOpencode } from "@opencode-ai/sdk"
const { client } = await createOpencode()
// Or client-only:
const client = createOpencodeClient({ baseUrl: "http://localhost:4096" })
```
#### Resource Usage
- **Memory**: ~200-400MB (Go runtime)
- **Architecture**: Client/server - TUI is just one client
- **Multi-agent**: Multiple sessions supported
#### Session Management
- Full **Session API**:
- `session.create()`, `session.list()`, `session.get()`
- `session.prompt()` - send prompts
- `session.abort()` - cancel running sessions
- `session.summarize()` - compress context
#### Agent Lifecycle
1. **Start server**: `opencode serve`
2. **Create session**: `client.session.create()`
3. **Prompt**: `client.session.prompt()`
4. **Terminate**: Server stays running; sessions are disposable
#### Data Persistence
- File-based configuration (`opencode.json`)
- Sessions stored in server memory (configurable)
#### Customizability
- **Tools**: Custom tool definitions
- **Prompts**: Custom system prompts
- **Structured Output**: JSON Schema support
- **Provider-agnostic**: Any model via configuration
#### Structured Output Example
```typescript
const result = await client.session.prompt({
path: { id: sessionId },
body: {
parts: [{ type: "text", text: "Research Anthropic" }],
format: {
type: "json_schema",
schema: {
type: "object",
properties: {
company: { type: "string" },
founded: { type: "number" },
},
required: ["company", "founded"],
},
},
},
});
```
---
### 3. Pi (badlogic/pi-mono)
**Repository**: https://github.com/badlogic/pi-mono (33.1K stars)
**This is the actual agent runtime that Feynman uses.**
#### Headless / Programmatic API
**Yes - Full TypeScript SDK**
```typescript
import { Agent } from "@mariozechner/pi-agent-core";
import { getModel } from "@mariozechner/pi-ai";
const agent = new Agent({
initialState: {
systemPrompt: "You are a helpful assistant.",
model: getModel("anthropic", "claude-sonnet-4-20250514"),
},
});
agent.subscribe((event) => {
if (event.type === "message_update" && event.assistantMessageEvent.type === "text_delta") {
process.stdout.write(event.assistantMessageEvent.delta);
}
});
await agent.prompt("Hello!");
```
#### Resource Usage
- **Memory**: ~50-100MB for core agent (very lightweight)
- **CPU**: Minimal (just orchestration)
- **Multi-agent**: Create multiple `Agent` instances
- **Dependencies**: Requires `@mariozechner/pi-ai` for LLM calls
#### Session Management
- **In-memory** by default - you control persistence
- **Messages array** in agent state
- **Custom state schema** via TypeScript interfaces
- **Session ID** for provider caching
#### Agent Lifecycle
1. **Create**: `new Agent({ initialState })`
2. **Prompt**: `agent.prompt()` or `agent.continue()`
3. **Events**: Subscribe to `agent_start`, `turn_start`, `message_update`, etc.
4. **Terminate**: `agent.reset()` or let go out of scope
**Key options**:
- `transformContext`: Prune/compress messages
- `convertToLlm`: Filter custom message types
- `beforeToolCall` / `afterToolCall`: Hooks for tool execution
#### Data Persistence
- **You control**: Implement persistence via middleware
- **State is mutable**: `agent.state.messages = newMessages`
- **No built-in storage**: Freedom to implement as needed
#### Customizability
- **Tools**: `AgentTool` with Typebox schemas
- **Middleware**: `@dynamic_prompt`, `@wrap_tool_call` decorators
- **Message types**: Custom via declaration merging
- **Thinking budgets**: Configurable per provider
#### Low-Level API
```typescript
import { agentLoop, agentLoopContinue } from "@mariozechner/pi-agent-core";
for await (const event of agentLoop([userMessage], context, config)) {
console.log(event.type);
}
```
---
### 4. OpenClaw (openclaw/openclaw)
**Repository**: https://github.com/openclaw/openclaw (351.9K stars)
#### Headless / Programmatic API
**Yes - Gateway WebSocket API**
OpenClaw has an extensive Gateway WS API:
```bash
openclaw gateway --port 18789 --verbose
# Send a message
openclaw message send --to +1234567890 --message "Hello"
# Agent command
openclaw agent --message "Ship checklist" --thinking high
```
#### Resource Usage
- **Memory**: ~500MB+ (Node.js runtime)
- **Multi-agent**: Multi-agent routing via Gateway
#### Session Management
- **Gateway Sessions**: Main session + group isolation
- **Session tools**: `sessions_list`, `sessions_history`, `sessions_send`
- **SQLite-based** storage
#### Agent Lifecycle
1. **Start Gateway**: `openclaw gateway`
2. **Connect**: WebSocket to `ws://127.0.0.1:18789`
3. **Message**: Send via CLI or API
4. **Persistence**: Sessions saved to SQLite
#### Data Persistence
- **SQLite**: Gateway session storage
- **Workspace**: `~/.openclaw/workspace`
- **Skills**: `~/.openclaw/workspace/skills/<skill>/SKILL.md`
#### Customizability
- **Skills**: Full skill system (ClawHub registry)
- **MCP**: Model Context Protocol support
- **Channels**: 20+ messaging platforms
---
### 5. LangChain Agents (langchain-ai/langchain)
**Repository**: https://github.com/langchain-ai/langchain
#### Headless / Programmatic API
**Yes - Full Python API**
```python
from langchain.agents import create_agent
agent = create_agent("openai:gpt-5", tools=tools)
result = agent.invoke({"messages": [{"role": "user", "content": "Hello"}]})
```
#### Resource Usage
- **Memory**: ~100-300MB (Python)
- **Flexible**: Your code controls resource allocation
- **Multi-agent**: Via LangGraph subgraphs
#### Session Management
- **Manual**: You manage message history in state
- **Custom state**: Extend `AgentState` TypedDict
- **Memory integration**: Optional short-term/long-term memory
#### Agent Lifecycle
1. **Create**: `create_agent(model, tools, system_prompt)`
2. **Invoke**: `agent.invoke({"messages": [...]})`
3. **Stream**: `agent.stream()` for real-time events
#### Data Persistence
- **You implement**: Full control via middleware
- **Optional memory**: LangChain memory modules
#### Customizability
- **Very high**: Middleware, tools, prompts, dynamic everything
- **ReAct pattern**: Built-in reasoning + acting loop
- **ToolStrategy** / **ProviderStrategy**: Structured output
---
### 6. Claude Code (anthropics/claude-code)
**Repository**: https://github.com/anthropics/claude-code
#### Headless / Programmatic API
**Yes - Agent SDK + CLI**
**CLI Headless**:
```bash
claude -p "Find and fix the bug in auth.py" --allowedTools "Read,Edit,Bash"
claude --bare -p "Summarize" --allowedTools "Read"
```
**SDK** (Python/TypeScript):
```python
from anthropic import Agent
agent = Agent(
model="claude-sonnet-4-20250514",
tools=[...],
)
result = agent.run("Fix the bug in auth.py")
```
#### Resource Usage
- **Memory**: ~200-400MB (Node.js)
- **Structured output**: JSON with `--output-format json`
- **Streaming**: `--output-format stream-json`
#### Session Management
- **Session ID**: `--resume <session-id>`
- **Continue**: `--continue` for follow-up
- **Persistence**: File-based in `~/.claude/`
#### Agent Lifecycle
1. **Run**: `claude -p "task"`
2. **Continue**: `claude -p "more" --continue`
3. **Resume**: `claude --resume <session-id>`
#### Customizability
- **Hooks**: Pre/post tool use
- **Plugins**: Custom commands and agents
- **MCP**: Model Context Protocol
- **Settings**: JSON config files
---
### 7. Codex (openai/codex)
**Repository**: https://github.com/openai/codex
#### Headless / Programmatic API
**CLI Only - No official programmatic API**
```bash
npm install -g @openai/codex
codex "Write a function to sort a list"
```
#### Resource Usage
- **Memory**: ~200-300MB (Rust binary)
- **Lightweight**: Minimal footprint
#### Session Management
- **Limited**: Basic session support
- **No SDK**: Not designed for programmatic control
#### Customizability
- **Low**: No official extension API
- **Provider-locked**: OpenAI-first
---
## Recommendations for User's Use Case
### Primary Recommendation: Pi (agent-core)
**Why**:
- Lightest weight (~50-100MB)
- Full programmatic control via TypeScript
- Event-driven architecture perfect for custom integration
- Feynman already uses it - seamless replacement
- You control persistence - perfect for cloud production
**Best for**: User wants fine-grained control, lightweight footprint, TypeScript ecosystem
### Secondary: Claude Code
**Why**:
- Production-grade headless mode
- Structured output support
- Official SDK (Python/TypeScript)
- CI/CD integration built-in
- `bare` mode for consistent CI runs
**Best for**: Production cloud deployment with structured requirements
### Alternative: LangChain
**Why**:
- Maximum flexibility
- Any LLM provider
- Rich ecosystem
- Full control over agent loop
**Best for**: User wants to build custom agent behavior from scratch
---
## Sources
### Primary Sources (Kept)
- **Hermes Agent**: https://github.com/NousResearch/hermes-agent - Python library docs, v0.7.0 release notes
- **OpenCode SDK**: https://opencode.ai/docs/sdk/ - Full TypeScript SDK documentation
- **Pi agent-core**: https://github.com/badlogic/pi-mono/tree/main/packages/agent - Complete TypeScript API
- **Claude Code Headless**: https://code.claude.com/docs/en/headless - Official headless documentation
- **LangChain Agents**: https://docs.langchain.com/oss/python/langchain/agents - Official agents documentation
- **OpenClaw**: https://github.com/openclaw/openclaw - Gateway architecture
- **Codex**: https://github.com/openai/codex - CLI tool
### Why These Sources
- Official repositories and documentation
- Recent updates (2025-2026)
- Direct technical details from source
- Code examples for integration
---
## Gaps & Limitations
### Not Fully Covered
1. **Benchmark data**: No comprehensive benchmarks comparing agent performance across tools
2. **OpenCode internal architecture**: Client/server details somewhat opaque
3. **Exact resource numbers**: Estimates based on typical Python/Node.js/Go runtime sizes
4. **OpenClaw detailed SDK**: Very large project; deep programmatic details require more investigation
5. **Codex SDK**: Currently CLI-only with no programmatic API
### Suggested Next Steps
1. **Test Pi locally**: Install `@mariozechner/pi-agent-core` and verify headless operation
2. **Test Claude Code**: Try `claude -p --bare` for CI use case
3. **OpenCode server test**: Run `opencode serve` and test SDK integration
4. **Hermes Python lib**: Test the programmatic API for comparison
### For Cloud Production
- Consider **Pi** for lightweight containers
- Consider **Claude Code** for structured output requirements
- Both support any LLM provider - not locked in