Initial commit: kage-research project files
This commit is contained in:
505
research.md
Normal file
505
research.md
Normal file
@@ -0,0 +1,505 @@
|
||||
# Research: Agent Frameworks for Programmatic/Headless Usage
|
||||
|
||||
## Summary
|
||||
|
||||
This research evaluates seven agent frameworks/tools for programmatic/headless usage: Hermes, OpenCode, Pi, OpenClaw, LangChain Agents, Claude Code, and Codex. The evaluation focuses on headless operation, resource usage, session management, agent lifecycle, data persistence, customizability, and integration complexity. **For the user's use case (replacing hermes + opencode with something better for local dev and cloud production)**, the top recommendations are:
|
||||
|
||||
- **Pi (agent-core)**: Best for pure programmatic control with excellent TypeScript SDK, event-driven architecture, and lightweight footprint
|
||||
- **Claude Code**: Best for production-grade headless operation with structured output, CI/CD integration, and official SDK support
|
||||
- **LangChain**: Best for flexibility and customization if the user wants full control over the agent loop
|
||||
- **OpenCode**: Strong option if they want to stick with a similar architecture but need better SDK
|
||||
|
||||
---
|
||||
|
||||
## Comparison Matrix
|
||||
|
||||
| Criteria | Hermes | OpenCode | Pi (agent-core) | OpenClaw | LangChain Agents | Claude Code | Codex |
|
||||
|----------|--------|----------|-----------------|----------|-----------------|-------------|-------|
|
||||
| **Headless/Programmatic** | ✅ Python lib (`AIAgent`) | ✅ SDK + server mode | ✅ Full TypeScript SDK | ✅ Gateway WS API | ✅ `create_agent()` Python | ✅ `-p` flag + SDK | ❌ CLI only |
|
||||
| **Resource Usage** | ~500MB+ (Python) | ~200-400MB (Go) | ~50-100MB (TS core) | ~500MB+ (Node) | ~100-300MB (Python) | ~200-400MB (Node) | ~200-300MB (Rust) |
|
||||
| **Multi-agent Support** | ✅ Subagents/spawn | ✅ Multiple sessions | ✅ Multiple instances | ✅ Multi-agent routing | ✅ Via LangGraph | ✅ Multiple sessions | ❌ Single agent |
|
||||
| **Session Management** | SQLite-based | Session API | In-memory + custom | Gateway sessions | Manual state | `--resume` flag | Session-based |
|
||||
| **Data Persistence** | SQLite + pluggable memory | File-based | Custom (you control) | SQLite + gateway | You implement | File-based | File-based |
|
||||
| **Customizability** | High (skills, tools, prompts) | High (tools, prompts) | High (tools, middleware) | High (skills, MCP) | Very high | Medium (plugins, hooks) | Low |
|
||||
| **Plug-and-Play** | Easy (pip install) | Easy (npm) | Easy (npm) | Moderate | Moderate | Easy | Easy |
|
||||
| **LLM Flexibility** | 200+ via OpenRouter | Any (provider-agnostic) | Any (multi-provider) | Any (multi-provider) | Any | Anthropic-first | OpenAI-first |
|
||||
|
||||
---
|
||||
|
||||
## Per-Tool Deep Dives
|
||||
|
||||
### 1. Hermes Agent (NousResearch/hermes-agent)
|
||||
|
||||
**Repository**: https://github.com/NousResearch/hermes-agent (30.7K stars)
|
||||
|
||||
#### Headless / Programmatic API
|
||||
✅ **Yes - Python Library**
|
||||
|
||||
Hermes can be imported and used as a Python library:
|
||||
|
||||
```python
|
||||
from run_agent import AIAgent
|
||||
|
||||
agent = AIAgent(
|
||||
model="anthropic/claude-sonnet-4",
|
||||
quiet_mode=True,
|
||||
)
|
||||
response = agent.chat("What is the capital of France?")
|
||||
```
|
||||
|
||||
For full conversation control:
|
||||
```python
|
||||
result = agent.run_conversation(
|
||||
user_message="Search for recent Python features",
|
||||
task_id="my-task-1",
|
||||
)
|
||||
# Returns: final_response, messages, task_id
|
||||
```
|
||||
|
||||
**CLI Headless**: Also supports `-p` flag via OpenClaw migration path.
|
||||
|
||||
#### Resource Usage
|
||||
- **Memory**: ~500MB+ (Python runtime)
|
||||
- **CPU**: Moderate (depends on model)
|
||||
- **Multi-agent**: Supports subagents via `sessions_spawn` tool
|
||||
- **Batch**: `batch_runner.py` for parallel processing
|
||||
|
||||
#### Session Management
|
||||
- **SQLite-based** session storage (configurable location)
|
||||
- **Pluggable memory providers** (v0.7.0+) - built-in, Honcho, or custom
|
||||
- **Conversation history** preserved across sessions
|
||||
- **FTS5 search** for cross-session recall
|
||||
- Multi-turn conversations via `conversation_history` parameter
|
||||
|
||||
#### Agent Lifecycle
|
||||
1. **Initialize**: `AIAgent(model=, quiet_mode=)`
|
||||
2. **Run**: `chat()` or `run_conversation()`
|
||||
3. **Terminate**: Automatic cleanup; resources released on conversation end
|
||||
|
||||
**Key options**:
|
||||
- `max_iterations`: 90 default (configurable)
|
||||
- `enabled_toolsets` / `disabled_toolsets`: Control available tools
|
||||
- `skip_memory` / `skip_context_files`: Stateless mode for APIs
|
||||
|
||||
#### Data Persistence
|
||||
- **SQLite**: Session data stored in `~/.hermes/`
|
||||
- **Memory**: Pluggable providers (built-in, Honcho, vector stores)
|
||||
- **Trajectories**: JSONL format for training data (`save_trajectories=True`)
|
||||
- **API Server**: Shared SessionDB for Open WebUI integration
|
||||
|
||||
#### Customizability
|
||||
- **Skills**: Procedural memory via `SKILL.md` files
|
||||
- **Tools**: Custom tool registration
|
||||
- **Prompts**: `ephemeral_system_prompt` for dynamic prompts
|
||||
- **MCP**: Model Context Protocol support
|
||||
- **Platform hints**: `platform` param for Discord, Telegram, etc.
|
||||
|
||||
#### Performance/Intelligence
|
||||
- **Self-improving**: Agent creates skills from experience
|
||||
- **Memory persistence**: Learns across sessions
|
||||
- **Credential pooling**: Multiple API keys with rotation
|
||||
- **Compression**: Context compression to prevent overflow
|
||||
|
||||
#### Integration Example (FastAPI)
|
||||
```python
|
||||
from fastapi import FastAPI
|
||||
from pydantic import BaseModel
|
||||
from run_agent import AIAgent
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
class ChatRequest(BaseModel):
|
||||
message: str
|
||||
model: str = "anthropic/claude-sonnet-4"
|
||||
|
||||
@app.post("/chat")
|
||||
async def chat(request: ChatRequest):
|
||||
agent = AIAgent(
|
||||
model=request.model,
|
||||
quiet_mode=True,
|
||||
skip_context_files=True,
|
||||
skip_memory=True,
|
||||
)
|
||||
return {"response": agent.chat(request.message)}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. OpenCode (anomalyco/opencode)
|
||||
|
||||
**Repository**: https://github.com/anomalyco/opencode (138.9K stars, but this is the frontend repo - the actual agent is https://github.com/opencode-ai/opencode with 11.8K stars)
|
||||
|
||||
#### Headless / Programmatic API
|
||||
✅ **Yes - SDK + Server Mode**
|
||||
|
||||
**Server Mode**:
|
||||
```bash
|
||||
opencode serve [--port 4096] [--hostname "127.0.0.1"]
|
||||
```
|
||||
|
||||
**SDK**:
|
||||
```typescript
|
||||
import { createOpencode } from "@opencode-ai/sdk"
|
||||
|
||||
const { client } = await createOpencode()
|
||||
// Or client-only:
|
||||
const client = createOpencodeClient({ baseUrl: "http://localhost:4096" })
|
||||
```
|
||||
|
||||
#### Resource Usage
|
||||
- **Memory**: ~200-400MB (Go runtime)
|
||||
- **Architecture**: Client/server - TUI is just one client
|
||||
- **Multi-agent**: Multiple sessions supported
|
||||
|
||||
#### Session Management
|
||||
- Full **Session API**:
|
||||
- `session.create()`, `session.list()`, `session.get()`
|
||||
- `session.prompt()` - send prompts
|
||||
- `session.abort()` - cancel running sessions
|
||||
- `session.summarize()` - compress context
|
||||
|
||||
#### Agent Lifecycle
|
||||
1. **Start server**: `opencode serve`
|
||||
2. **Create session**: `client.session.create()`
|
||||
3. **Prompt**: `client.session.prompt()`
|
||||
4. **Terminate**: Server stays running; sessions are disposable
|
||||
|
||||
#### Data Persistence
|
||||
- File-based configuration (`opencode.json`)
|
||||
- Sessions stored in server memory (configurable)
|
||||
|
||||
#### Customizability
|
||||
- **Tools**: Custom tool definitions
|
||||
- **Prompts**: Custom system prompts
|
||||
- **Structured Output**: JSON Schema support
|
||||
- **Provider-agnostic**: Any model via configuration
|
||||
|
||||
#### Structured Output Example
|
||||
```typescript
|
||||
const result = await client.session.prompt({
|
||||
path: { id: sessionId },
|
||||
body: {
|
||||
parts: [{ type: "text", text: "Research Anthropic" }],
|
||||
format: {
|
||||
type: "json_schema",
|
||||
schema: {
|
||||
type: "object",
|
||||
properties: {
|
||||
company: { type: "string" },
|
||||
founded: { type: "number" },
|
||||
},
|
||||
required: ["company", "founded"],
|
||||
},
|
||||
},
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. Pi (badlogic/pi-mono)
|
||||
|
||||
**Repository**: https://github.com/badlogic/pi-mono (33.1K stars)
|
||||
|
||||
**This is the actual agent runtime that Feynman uses.**
|
||||
|
||||
#### Headless / Programmatic API
|
||||
✅ **Yes - Full TypeScript SDK**
|
||||
|
||||
```typescript
|
||||
import { Agent } from "@mariozechner/pi-agent-core";
|
||||
import { getModel } from "@mariozechner/pi-ai";
|
||||
|
||||
const agent = new Agent({
|
||||
initialState: {
|
||||
systemPrompt: "You are a helpful assistant.",
|
||||
model: getModel("anthropic", "claude-sonnet-4-20250514"),
|
||||
},
|
||||
});
|
||||
|
||||
agent.subscribe((event) => {
|
||||
if (event.type === "message_update" && event.assistantMessageEvent.type === "text_delta") {
|
||||
process.stdout.write(event.assistantMessageEvent.delta);
|
||||
}
|
||||
});
|
||||
|
||||
await agent.prompt("Hello!");
|
||||
```
|
||||
|
||||
#### Resource Usage
|
||||
- **Memory**: ~50-100MB for core agent (very lightweight)
|
||||
- **CPU**: Minimal (just orchestration)
|
||||
- **Multi-agent**: Create multiple `Agent` instances
|
||||
- **Dependencies**: Requires `@mariozechner/pi-ai` for LLM calls
|
||||
|
||||
#### Session Management
|
||||
- **In-memory** by default - you control persistence
|
||||
- **Messages array** in agent state
|
||||
- **Custom state schema** via TypeScript interfaces
|
||||
- **Session ID** for provider caching
|
||||
|
||||
#### Agent Lifecycle
|
||||
1. **Create**: `new Agent({ initialState })`
|
||||
2. **Prompt**: `agent.prompt()` or `agent.continue()`
|
||||
3. **Events**: Subscribe to `agent_start`, `turn_start`, `message_update`, etc.
|
||||
4. **Terminate**: `agent.reset()` or let go out of scope
|
||||
|
||||
**Key options**:
|
||||
- `transformContext`: Prune/compress messages
|
||||
- `convertToLlm`: Filter custom message types
|
||||
- `beforeToolCall` / `afterToolCall`: Hooks for tool execution
|
||||
|
||||
#### Data Persistence
|
||||
- **You control**: Implement persistence via middleware
|
||||
- **State is mutable**: `agent.state.messages = newMessages`
|
||||
- **No built-in storage**: Freedom to implement as needed
|
||||
|
||||
#### Customizability
|
||||
- **Tools**: `AgentTool` with Typebox schemas
|
||||
- **Middleware**: `@dynamic_prompt`, `@wrap_tool_call` decorators
|
||||
- **Message types**: Custom via declaration merging
|
||||
- **Thinking budgets**: Configurable per provider
|
||||
|
||||
#### Low-Level API
|
||||
```typescript
|
||||
import { agentLoop, agentLoopContinue } from "@mariozechner/pi-agent-core";
|
||||
|
||||
for await (const event of agentLoop([userMessage], context, config)) {
|
||||
console.log(event.type);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. OpenClaw (openclaw/openclaw)
|
||||
|
||||
**Repository**: https://github.com/openclaw/openclaw (351.9K stars)
|
||||
|
||||
#### Headless / Programmatic API
|
||||
✅ **Yes - Gateway WebSocket API**
|
||||
|
||||
OpenClaw has an extensive Gateway WS API:
|
||||
```bash
|
||||
openclaw gateway --port 18789 --verbose
|
||||
|
||||
# Send a message
|
||||
openclaw message send --to +1234567890 --message "Hello"
|
||||
|
||||
# Agent command
|
||||
openclaw agent --message "Ship checklist" --thinking high
|
||||
```
|
||||
|
||||
#### Resource Usage
|
||||
- **Memory**: ~500MB+ (Node.js runtime)
|
||||
- **Multi-agent**: Multi-agent routing via Gateway
|
||||
|
||||
#### Session Management
|
||||
- **Gateway Sessions**: Main session + group isolation
|
||||
- **Session tools**: `sessions_list`, `sessions_history`, `sessions_send`
|
||||
- **SQLite-based** storage
|
||||
|
||||
#### Agent Lifecycle
|
||||
1. **Start Gateway**: `openclaw gateway`
|
||||
2. **Connect**: WebSocket to `ws://127.0.0.1:18789`
|
||||
3. **Message**: Send via CLI or API
|
||||
4. **Persistence**: Sessions saved to SQLite
|
||||
|
||||
#### Data Persistence
|
||||
- **SQLite**: Gateway session storage
|
||||
- **Workspace**: `~/.openclaw/workspace`
|
||||
- **Skills**: `~/.openclaw/workspace/skills/<skill>/SKILL.md`
|
||||
|
||||
#### Customizability
|
||||
- **Skills**: Full skill system (ClawHub registry)
|
||||
- **MCP**: Model Context Protocol support
|
||||
- **Channels**: 20+ messaging platforms
|
||||
|
||||
---
|
||||
|
||||
### 5. LangChain Agents (langchain-ai/langchain)
|
||||
|
||||
**Repository**: https://github.com/langchain-ai/langchain
|
||||
|
||||
#### Headless / Programmatic API
|
||||
✅ **Yes - Full Python API**
|
||||
|
||||
```python
|
||||
from langchain.agents import create_agent
|
||||
|
||||
agent = create_agent("openai:gpt-5", tools=tools)
|
||||
result = agent.invoke({"messages": [{"role": "user", "content": "Hello"}]})
|
||||
```
|
||||
|
||||
#### Resource Usage
|
||||
- **Memory**: ~100-300MB (Python)
|
||||
- **Flexible**: Your code controls resource allocation
|
||||
- **Multi-agent**: Via LangGraph subgraphs
|
||||
|
||||
#### Session Management
|
||||
- **Manual**: You manage message history in state
|
||||
- **Custom state**: Extend `AgentState` TypedDict
|
||||
- **Memory integration**: Optional short-term/long-term memory
|
||||
|
||||
#### Agent Lifecycle
|
||||
1. **Create**: `create_agent(model, tools, system_prompt)`
|
||||
2. **Invoke**: `agent.invoke({"messages": [...]})`
|
||||
3. **Stream**: `agent.stream()` for real-time events
|
||||
|
||||
#### Data Persistence
|
||||
- **You implement**: Full control via middleware
|
||||
- **Optional memory**: LangChain memory modules
|
||||
|
||||
#### Customizability
|
||||
- **Very high**: Middleware, tools, prompts, dynamic everything
|
||||
- **ReAct pattern**: Built-in reasoning + acting loop
|
||||
- **ToolStrategy** / **ProviderStrategy**: Structured output
|
||||
|
||||
---
|
||||
|
||||
### 6. Claude Code (anthropics/claude-code)
|
||||
|
||||
**Repository**: https://github.com/anthropics/claude-code
|
||||
|
||||
#### Headless / Programmatic API
|
||||
✅ **Yes - Agent SDK + CLI**
|
||||
|
||||
**CLI Headless**:
|
||||
```bash
|
||||
claude -p "Find and fix the bug in auth.py" --allowedTools "Read,Edit,Bash"
|
||||
claude --bare -p "Summarize" --allowedTools "Read"
|
||||
```
|
||||
|
||||
**SDK** (Python/TypeScript):
|
||||
```python
|
||||
from anthropic import Agent
|
||||
|
||||
agent = Agent(
|
||||
model="claude-sonnet-4-20250514",
|
||||
tools=[...],
|
||||
)
|
||||
result = agent.run("Fix the bug in auth.py")
|
||||
```
|
||||
|
||||
#### Resource Usage
|
||||
- **Memory**: ~200-400MB (Node.js)
|
||||
- **Structured output**: JSON with `--output-format json`
|
||||
- **Streaming**: `--output-format stream-json`
|
||||
|
||||
#### Session Management
|
||||
- **Session ID**: `--resume <session-id>`
|
||||
- **Continue**: `--continue` for follow-up
|
||||
- **Persistence**: File-based in `~/.claude/`
|
||||
|
||||
#### Agent Lifecycle
|
||||
1. **Run**: `claude -p "task"`
|
||||
2. **Continue**: `claude -p "more" --continue`
|
||||
3. **Resume**: `claude --resume <session-id>`
|
||||
|
||||
#### Customizability
|
||||
- **Hooks**: Pre/post tool use
|
||||
- **Plugins**: Custom commands and agents
|
||||
- **MCP**: Model Context Protocol
|
||||
- **Settings**: JSON config files
|
||||
|
||||
---
|
||||
|
||||
### 7. Codex (openai/codex)
|
||||
|
||||
**Repository**: https://github.com/openai/codex
|
||||
|
||||
#### Headless / Programmatic API
|
||||
❌ **CLI Only - No official programmatic API**
|
||||
|
||||
```bash
|
||||
npm install -g @openai/codex
|
||||
codex "Write a function to sort a list"
|
||||
```
|
||||
|
||||
#### Resource Usage
|
||||
- **Memory**: ~200-300MB (Rust binary)
|
||||
- **Lightweight**: Minimal footprint
|
||||
|
||||
#### Session Management
|
||||
- **Limited**: Basic session support
|
||||
- **No SDK**: Not designed for programmatic control
|
||||
|
||||
#### Customizability
|
||||
- **Low**: No official extension API
|
||||
- **Provider-locked**: OpenAI-first
|
||||
|
||||
---
|
||||
|
||||
## Recommendations for User's Use Case
|
||||
|
||||
### Primary Recommendation: Pi (agent-core)
|
||||
|
||||
**Why**:
|
||||
- Lightest weight (~50-100MB)
|
||||
- Full programmatic control via TypeScript
|
||||
- Event-driven architecture perfect for custom integration
|
||||
- Feynman already uses it - seamless replacement
|
||||
- You control persistence - perfect for cloud production
|
||||
|
||||
**Best for**: User wants fine-grained control, lightweight footprint, TypeScript ecosystem
|
||||
|
||||
### Secondary: Claude Code
|
||||
|
||||
**Why**:
|
||||
- Production-grade headless mode
|
||||
- Structured output support
|
||||
- Official SDK (Python/TypeScript)
|
||||
- CI/CD integration built-in
|
||||
- `bare` mode for consistent CI runs
|
||||
|
||||
**Best for**: Production cloud deployment with structured requirements
|
||||
|
||||
### Alternative: LangChain
|
||||
|
||||
**Why**:
|
||||
- Maximum flexibility
|
||||
- Any LLM provider
|
||||
- Rich ecosystem
|
||||
- Full control over agent loop
|
||||
|
||||
**Best for**: User wants to build custom agent behavior from scratch
|
||||
|
||||
---
|
||||
|
||||
## Sources
|
||||
|
||||
### Primary Sources (Kept)
|
||||
- **Hermes Agent**: https://github.com/NousResearch/hermes-agent - Python library docs, v0.7.0 release notes
|
||||
- **OpenCode SDK**: https://opencode.ai/docs/sdk/ - Full TypeScript SDK documentation
|
||||
- **Pi agent-core**: https://github.com/badlogic/pi-mono/tree/main/packages/agent - Complete TypeScript API
|
||||
- **Claude Code Headless**: https://code.claude.com/docs/en/headless - Official headless documentation
|
||||
- **LangChain Agents**: https://docs.langchain.com/oss/python/langchain/agents - Official agents documentation
|
||||
- **OpenClaw**: https://github.com/openclaw/openclaw - Gateway architecture
|
||||
- **Codex**: https://github.com/openai/codex - CLI tool
|
||||
|
||||
### Why These Sources
|
||||
- Official repositories and documentation
|
||||
- Recent updates (2025-2026)
|
||||
- Direct technical details from source
|
||||
- Code examples for integration
|
||||
|
||||
---
|
||||
|
||||
## Gaps & Limitations
|
||||
|
||||
### Not Fully Covered
|
||||
1. **Benchmark data**: No comprehensive benchmarks comparing agent performance across tools
|
||||
2. **OpenCode internal architecture**: Client/server details somewhat opaque
|
||||
3. **Exact resource numbers**: Estimates based on typical Python/Node.js/Go runtime sizes
|
||||
4. **OpenClaw detailed SDK**: Very large project; deep programmatic details require more investigation
|
||||
5. **Codex SDK**: Currently CLI-only with no programmatic API
|
||||
|
||||
### Suggested Next Steps
|
||||
1. **Test Pi locally**: Install `@mariozechner/pi-agent-core` and verify headless operation
|
||||
2. **Test Claude Code**: Try `claude -p --bare` for CI use case
|
||||
3. **OpenCode server test**: Run `opencode serve` and test SDK integration
|
||||
4. **Hermes Python lib**: Test the programmatic API for comparison
|
||||
|
||||
### For Cloud Production
|
||||
- Consider **Pi** for lightweight containers
|
||||
- Consider **Claude Code** for structured output requirements
|
||||
- Both support any LLM provider - not locked in
|
||||
Reference in New Issue
Block a user