kage-research/research.md

# Research: Agent Frameworks for Programmatic/Headless Usage

## Summary

This research evaluates seven agent frameworks/tools for programmatic/headless usage: Hermes, OpenCode, Pi, OpenClaw, LangChain Agents, Claude Code, and Codex. The evaluation focuses on headless operation, resource usage, session management, agent lifecycle, data persistence, customizability, and integration complexity. **For the user's use case (replacing hermes + opencode with something better for local dev and cloud production)**, the top recommendations are:

- **Pi (agent-core)**: Best for pure programmatic control with excellent TypeScript SDK, event-driven architecture, and lightweight footprint
- **Claude Code**: Best for production-grade headless operation with structured output, CI/CD integration, and official SDK support
- **LangChain**: Best for flexibility and customization if the user wants full control over the agent loop
- **OpenCode**: Strong option if they want to stick with a similar architecture but need better SDK

---

## Comparison Matrix

| Criteria | Hermes | OpenCode | Pi (agent-core) | OpenClaw | LangChain Agents | Claude Code | Codex |
|----------|--------|----------|-----------------|----------|-----------------|-------------|-------|
| **Headless/Programmatic** | ✅ Python lib (`AIAgent`) | ✅ SDK + server mode | ✅ Full TypeScript SDK | ✅ Gateway WS API | ✅ `create_agent()` Python | ✅ `-p` flag + SDK | ❌ CLI only |
| **Resource Usage** | ~500MB+ (Python) | ~200-400MB (Go) | ~50-100MB (TS core) | ~500MB+ (Node) | ~100-300MB (Python) | ~200-400MB (Node) | ~200-300MB (Rust) |
| **Multi-agent Support** | ✅ Subagents/spawn | ✅ Multiple sessions | ✅ Multiple instances | ✅ Multi-agent routing | ✅ Via LangGraph | ✅ Multiple sessions | ❌ Single agent |
| **Session Management** | SQLite-based | Session API | In-memory + custom | Gateway sessions | Manual state | `--resume` flag | Session-based |
| **Data Persistence** | SQLite + pluggable memory | File-based | Custom (you control) | SQLite + gateway | You implement | File-based | File-based |
| **Customizability** | High (skills, tools, prompts) | High (tools, prompts) | High (tools, middleware) | High (skills, MCP) | Very high | Medium (plugins, hooks) | Low |
| **Plug-and-Play** | Easy (pip install) | Easy (npm) | Easy (npm) | Moderate | Moderate | Easy | Easy |
| **LLM Flexibility** | 200+ via OpenRouter | Any (provider-agnostic) | Any (multi-provider) | Any (multi-provider) | Any | Anthropic-first | OpenAI-first |

---

## Per-Tool Deep Dives

### 1. Hermes Agent (NousResearch/hermes-agent)

**Repository**: https://github.com/NousResearch/hermes-agent (30.7K stars)

#### Headless / Programmatic API
✅ **Yes - Python Library**

Hermes can be imported and used as a Python library:

```python
from run_agent import AIAgent

agent = AIAgent(
    model="anthropic/claude-sonnet-4",
    quiet_mode=True,
)
response = agent.chat("What is the capital of France?")
```

For full conversation control:
```python
result = agent.run_conversation(
    user_message="Search for recent Python features",
    task_id="my-task-1",
)
# Returns: final_response, messages, task_id
```

**CLI Headless**: Also supports `-p` flag via OpenClaw migration path.

#### Resource Usage
- **Memory**: ~500MB+ (Python runtime)
- **CPU**: Moderate (depends on model)
- **Multi-agent**: Supports subagents via `sessions_spawn` tool
- **Batch**: `batch_runner.py` for parallel processing

#### Session Management
- **SQLite-based** session storage (configurable location)
- **Pluggable memory providers** (v0.7.0+) - built-in, Honcho, or custom
- **Conversation history** preserved across sessions
- **FTS5 search** for cross-session recall
- Multi-turn conversations via `conversation_history` parameter

#### Agent Lifecycle
1. **Initialize**: `AIAgent(model=, quiet_mode=)`
2. **Run**: `chat()` or `run_conversation()`
3. **Terminate**: Automatic cleanup; resources released on conversation end

**Key options**:
- `max_iterations`: 90 default (configurable)
- `enabled_toolsets` / `disabled_toolsets`: Control available tools
- `skip_memory` / `skip_context_files`: Stateless mode for APIs

#### Data Persistence
- **SQLite**: Session data stored in `~/.hermes/`
- **Memory**: Pluggable providers (built-in, Honcho, vector stores)
- **Trajectories**: JSONL format for training data (`save_trajectories=True`)
- **API Server**: Shared SessionDB for Open WebUI integration

#### Customizability
- **Skills**: Procedural memory via `SKILL.md` files
- **Tools**: Custom tool registration
- **Prompts**: `ephemeral_system_prompt` for dynamic prompts
- **MCP**: Model Context Protocol support
- **Platform hints**: `platform` param for Discord, Telegram, etc.

#### Performance/Intelligence
- **Self-improving**: Agent creates skills from experience
- **Memory persistence**: Learns across sessions
- **Credential pooling**: Multiple API keys with rotation
- **Compression**: Context compression to prevent overflow

#### Integration Example (FastAPI)
```python
from fastapi import FastAPI
from pydantic import BaseModel
from run_agent import AIAgent

app = FastAPI()

class ChatRequest(BaseModel):
    message: str
    model: str = "anthropic/claude-sonnet-4"

@app.post("/chat")
async def chat(request: ChatRequest):
    agent = AIAgent(
        model=request.model,
        quiet_mode=True,
        skip_context_files=True,
        skip_memory=True,
    )
    return {"response": agent.chat(request.message)}
```

---

### 2. OpenCode (anomalyco/opencode)

**Repository**: https://github.com/anomalyco/opencode (138.9K stars, but this is the frontend repo - the actual agent is https://github.com/opencode-ai/opencode with 11.8K stars)

#### Headless / Programmatic API
✅ **Yes - SDK + Server Mode**

**Server Mode**:
```bash
opencode serve [--port 4096] [--hostname "127.0.0.1"]
```

**SDK**:
```typescript
import { createOpencode } from "@opencode-ai/sdk"

const { client } = await createOpencode()
// Or client-only:
const client = createOpencodeClient({ baseUrl: "http://localhost:4096" })
```

#### Resource Usage
- **Memory**: ~200-400MB (Go runtime)
- **Architecture**: Client/server - TUI is just one client
- **Multi-agent**: Multiple sessions supported

#### Session Management
- Full **Session API**:
  - `session.create()`, `session.list()`, `session.get()`
  - `session.prompt()` - send prompts
  - `session.abort()` - cancel running sessions
  - `session.summarize()` - compress context

#### Agent Lifecycle
1. **Start server**: `opencode serve`
2. **Create session**: `client.session.create()`
3. **Prompt**: `client.session.prompt()`
4. **Terminate**: Server stays running; sessions are disposable

#### Data Persistence
- File-based configuration (`opencode.json`)
- Sessions stored in server memory (configurable)

#### Customizability
- **Tools**: Custom tool definitions
- **Prompts**: Custom system prompts
- **Structured Output**: JSON Schema support
- **Provider-agnostic**: Any model via configuration

#### Structured Output Example
```typescript
const result = await client.session.prompt({
  path: { id: sessionId },
  body: {
    parts: [{ type: "text", text: "Research Anthropic" }],
    format: {
      type: "json_schema",
      schema: {
        type: "object",
        properties: {
          company: { type: "string" },
          founded: { type: "number" },
        },
        required: ["company", "founded"],
      },
    },
  },
});
```

---

### 3. Pi (badlogic/pi-mono)

**Repository**: https://github.com/badlogic/pi-mono (33.1K stars)

**This is the actual agent runtime that Feynman uses.**

#### Headless / Programmatic API
✅ **Yes - Full TypeScript SDK**

```typescript
import { Agent } from "@mariozechner/pi-agent-core";
import { getModel } from "@mariozechner/pi-ai";

const agent = new Agent({
  initialState: {
    systemPrompt: "You are a helpful assistant.",
    model: getModel("anthropic", "claude-sonnet-4-20250514"),
  },
});

agent.subscribe((event) => {
  if (event.type === "message_update" && event.assistantMessageEvent.type === "text_delta") {
    process.stdout.write(event.assistantMessageEvent.delta);
  }
});

await agent.prompt("Hello!");
```

#### Resource Usage
- **Memory**: ~50-100MB for core agent (very lightweight)
- **CPU**: Minimal (just orchestration)
- **Multi-agent**: Create multiple `Agent` instances
- **Dependencies**: Requires `@mariozechner/pi-ai` for LLM calls

#### Session Management
- **In-memory** by default - you control persistence
- **Messages array** in agent state
- **Custom state schema** via TypeScript interfaces
- **Session ID** for provider caching

#### Agent Lifecycle
1. **Create**: `new Agent({ initialState })`
2. **Prompt**: `agent.prompt()` or `agent.continue()`
3. **Events**: Subscribe to `agent_start`, `turn_start`, `message_update`, etc.
4. **Terminate**: `agent.reset()` or let go out of scope

**Key options**:
- `transformContext`: Prune/compress messages
- `convertToLlm`: Filter custom message types
- `beforeToolCall` / `afterToolCall`: Hooks for tool execution

#### Data Persistence
- **You control**: Implement persistence via middleware
- **State is mutable**: `agent.state.messages = newMessages`
- **No built-in storage**: Freedom to implement as needed

#### Customizability
- **Tools**: `AgentTool` with Typebox schemas
- **Middleware**: `@dynamic_prompt`, `@wrap_tool_call` decorators
- **Message types**: Custom via declaration merging
- **Thinking budgets**: Configurable per provider

#### Low-Level API
```typescript
import { agentLoop, agentLoopContinue } from "@mariozechner/pi-agent-core";

for await (const event of agentLoop([userMessage], context, config)) {
  console.log(event.type);
}
```

---

### 4. OpenClaw (openclaw/openclaw)

**Repository**: https://github.com/openclaw/openclaw (351.9K stars)

#### Headless / Programmatic API
✅ **Yes - Gateway WebSocket API**

OpenClaw has an extensive Gateway WS API:
```bash
openclaw gateway --port 18789 --verbose

# Send a message
openclaw message send --to +1234567890 --message "Hello"

# Agent command
openclaw agent --message "Ship checklist" --thinking high
```

#### Resource Usage
- **Memory**: ~500MB+ (Node.js runtime)
- **Multi-agent**: Multi-agent routing via Gateway

#### Session Management
- **Gateway Sessions**: Main session + group isolation
- **Session tools**: `sessions_list`, `sessions_history`, `sessions_send`
- **SQLite-based** storage

#### Agent Lifecycle
1. **Start Gateway**: `openclaw gateway`
2. **Connect**: WebSocket to `ws://127.0.0.1:18789`
3. **Message**: Send via CLI or API
4. **Persistence**: Sessions saved to SQLite

#### Data Persistence
- **SQLite**: Gateway session storage
- **Workspace**: `~/.openclaw/workspace`
- **Skills**: `~/.openclaw/workspace/skills/<skill>/SKILL.md`

#### Customizability
- **Skills**: Full skill system (ClawHub registry)
- **MCP**: Model Context Protocol support
- **Channels**: 20+ messaging platforms

---

### 5. LangChain Agents (langchain-ai/langchain)

**Repository**: https://github.com/langchain-ai/langchain

#### Headless / Programmatic API
✅ **Yes - Full Python API**

```python
from langchain.agents import create_agent

agent = create_agent("openai:gpt-5", tools=tools)
result = agent.invoke({"messages": [{"role": "user", "content": "Hello"}]})
```

#### Resource Usage
- **Memory**: ~100-300MB (Python)
- **Flexible**: Your code controls resource allocation
- **Multi-agent**: Via LangGraph subgraphs

#### Session Management
- **Manual**: You manage message history in state
- **Custom state**: Extend `AgentState` TypedDict
- **Memory integration**: Optional short-term/long-term memory

#### Agent Lifecycle
1. **Create**: `create_agent(model, tools, system_prompt)`
2. **Invoke**: `agent.invoke({"messages": [...]})`
3. **Stream**: `agent.stream()` for real-time events

#### Data Persistence
- **You implement**: Full control via middleware
- **Optional memory**: LangChain memory modules

#### Customizability
- **Very high**: Middleware, tools, prompts, dynamic everything
- **ReAct pattern**: Built-in reasoning + acting loop
- **ToolStrategy** / **ProviderStrategy**: Structured output

---

### 6. Claude Code (anthropics/claude-code)

**Repository**: https://github.com/anthropics/claude-code

#### Headless / Programmatic API
✅ **Yes - Agent SDK + CLI**

**CLI Headless**:
```bash
claude -p "Find and fix the bug in auth.py" --allowedTools "Read,Edit,Bash"
claude --bare -p "Summarize" --allowedTools "Read"
```

**SDK** (Python/TypeScript):
```python
from anthropic import Agent

agent = Agent(
    model="claude-sonnet-4-20250514",
    tools=[...],
)
result = agent.run("Fix the bug in auth.py")
```

#### Resource Usage
- **Memory**: ~200-400MB (Node.js)
- **Structured output**: JSON with `--output-format json`
- **Streaming**: `--output-format stream-json`

#### Session Management
- **Session ID**: `--resume <session-id>`
- **Continue**: `--continue` for follow-up
- **Persistence**: File-based in `~/.claude/`

#### Agent Lifecycle
1. **Run**: `claude -p "task"`
2. **Continue**: `claude -p "more" --continue`
3. **Resume**: `claude --resume <session-id>`

#### Customizability
- **Hooks**: Pre/post tool use
- **Plugins**: Custom commands and agents
- **MCP**: Model Context Protocol
- **Settings**: JSON config files

---

### 7. Codex (openai/codex)

**Repository**: https://github.com/openai/codex

#### Headless / Programmatic API
❌ **CLI Only - No official programmatic API**

```bash
npm install -g @openai/codex
codex "Write a function to sort a list"
```

#### Resource Usage
- **Memory**: ~200-300MB (Rust binary)
- **Lightweight**: Minimal footprint

#### Session Management
- **Limited**: Basic session support
- **No SDK**: Not designed for programmatic control

#### Customizability
- **Low**: No official extension API
- **Provider-locked**: OpenAI-first

---

## Recommendations for User's Use Case

### Primary Recommendation: Pi (agent-core)

**Why**:
- Lightest weight (~50-100MB)
- Full programmatic control via TypeScript
- Event-driven architecture perfect for custom integration
- Feynman already uses it - seamless replacement
- You control persistence - perfect for cloud production

**Best for**: User wants fine-grained control, lightweight footprint, TypeScript ecosystem

### Secondary: Claude Code

**Why**:
- Production-grade headless mode
- Structured output support
- Official SDK (Python/TypeScript)
- CI/CD integration built-in
- `bare` mode for consistent CI runs

**Best for**: Production cloud deployment with structured requirements

### Alternative: LangChain

**Why**:
- Maximum flexibility
- Any LLM provider
- Rich ecosystem
- Full control over agent loop

**Best for**: User wants to build custom agent behavior from scratch

---

## Sources

### Primary Sources (Kept)
- **Hermes Agent**: https://github.com/NousResearch/hermes-agent - Python library docs, v0.7.0 release notes
- **OpenCode SDK**: https://opencode.ai/docs/sdk/ - Full TypeScript SDK documentation
- **Pi agent-core**: https://github.com/badlogic/pi-mono/tree/main/packages/agent - Complete TypeScript API
- **Claude Code Headless**: https://code.claude.com/docs/en/headless - Official headless documentation
- **LangChain Agents**: https://docs.langchain.com/oss/python/langchain/agents - Official agents documentation
- **OpenClaw**: https://github.com/openclaw/openclaw - Gateway architecture
- **Codex**: https://github.com/openai/codex - CLI tool

### Why These Sources
- Official repositories and documentation
- Recent updates (2025-2026)
- Direct technical details from source
- Code examples for integration

---

## Gaps & Limitations

### Not Fully Covered
1. **Benchmark data**: No comprehensive benchmarks comparing agent performance across tools
2. **OpenCode internal architecture**: Client/server details somewhat opaque
3. **Exact resource numbers**: Estimates based on typical Python/Node.js/Go runtime sizes
4. **OpenClaw detailed SDK**: Very large project; deep programmatic details require more investigation
5. **Codex SDK**: Currently CLI-only with no programmatic API

### Suggested Next Steps
1. **Test Pi locally**: Install `@mariozechner/pi-agent-core` and verify headless operation
2. **Test Claude Code**: Try `claude -p --bare` for CI use case
3. **OpenCode server test**: Run `opencode serve` and test SDK integration
4. **Hermes Python lib**: Test the programmatic API for comparison

### For Cloud Production
- Consider **Pi** for lightweight containers
- Consider **Claude Code** for structured output requirements
- Both support any LLM provider - not locked in