Initial commit: kage-research project files

2026-04-09 00:39:52 +00:00
commit 71fc8b4495
19 changed files with 5303 additions and 0 deletions
--- a/research.md
+++ b/research.md
@@ -0,0 +1,505 @@
+# Research: Agent Frameworks for Programmatic/Headless Usage
+
+## Summary
+
+This research evaluates seven agent frameworks/tools for programmatic/headless usage: Hermes, OpenCode, Pi, OpenClaw, LangChain Agents, Claude Code, and Codex. The evaluation focuses on headless operation, resource usage, session management, agent lifecycle, data persistence, customizability, and integration complexity. **For the user's use case (replacing hermes + opencode with something better for local dev and cloud production)**, the top recommendations are:
+
+- **Pi (agent-core)**: Best for pure programmatic control with excellent TypeScript SDK, event-driven architecture, and lightweight footprint
+- **Claude Code**: Best for production-grade headless operation with structured output, CI/CD integration, and official SDK support
+- **LangChain**: Best for flexibility and customization if the user wants full control over the agent loop
+- **OpenCode**: Strong option if they want to stick with a similar architecture but need better SDK
+
+---
+
+## Comparison Matrix
+
+| Criteria | Hermes | OpenCode | Pi (agent-core) | OpenClaw | LangChain Agents | Claude Code | Codex |
+|----------|--------|----------|-----------------|----------|-----------------|-------------|-------|
+| **Headless/Programmatic** | ✅ Python lib (`AIAgent`) | ✅ SDK + server mode | ✅ Full TypeScript SDK | ✅ Gateway WS API | ✅ `create_agent()` Python | ✅ `-p` flag + SDK | ❌ CLI only |
+| **Resource Usage** | ~500MB+ (Python) | ~200-400MB (Go) | ~50-100MB (TS core) | ~500MB+ (Node) | ~100-300MB (Python) | ~200-400MB (Node) | ~200-300MB (Rust) |
+| **Multi-agent Support** | ✅ Subagents/spawn | ✅ Multiple sessions | ✅ Multiple instances | ✅ Multi-agent routing | ✅ Via LangGraph | ✅ Multiple sessions | ❌ Single agent |
+| **Session Management** | SQLite-based | Session API | In-memory + custom | Gateway sessions | Manual state | `--resume` flag | Session-based |
+| **Data Persistence** | SQLite + pluggable memory | File-based | Custom (you control) | SQLite + gateway | You implement | File-based | File-based |
+| **Customizability** | High (skills, tools, prompts) | High (tools, prompts) | High (tools, middleware) | High (skills, MCP) | Very high | Medium (plugins, hooks) | Low |
+| **Plug-and-Play** | Easy (pip install) | Easy (npm) | Easy (npm) | Moderate | Moderate | Easy | Easy |
+| **LLM Flexibility** | 200+ via OpenRouter | Any (provider-agnostic) | Any (multi-provider) | Any (multi-provider) | Any | Anthropic-first | OpenAI-first |
+
+---
+
+## Per-Tool Deep Dives
+
+### 1. Hermes Agent (NousResearch/hermes-agent)
+
+**Repository**: https://github.com/NousResearch/hermes-agent (30.7K stars)
+
+#### Headless / Programmatic API
+✅ **Yes - Python Library**
+
+Hermes can be imported and used as a Python library:
+
+```python
+from run_agent import AIAgent
+
+agent = AIAgent(
+    model="anthropic/claude-sonnet-4",
+    quiet_mode=True,
+)
+response = agent.chat("What is the capital of France?")
+```
+
+For full conversation control:
+```python
+result = agent.run_conversation(
+    user_message="Search for recent Python features",
+    task_id="my-task-1",
+)
+# Returns: final_response, messages, task_id
+```
+
+**CLI Headless**: Also supports `-p` flag via OpenClaw migration path.
+
+#### Resource Usage
+- **Memory**: ~500MB+ (Python runtime)
+- **CPU**: Moderate (depends on model)
+- **Multi-agent**: Supports subagents via `sessions_spawn` tool
+- **Batch**: `batch_runner.py` for parallel processing
+
+#### Session Management
+- **SQLite-based** session storage (configurable location)
+- **Pluggable memory providers** (v0.7.0+) - built-in, Honcho, or custom
+- **Conversation history** preserved across sessions
+- **FTS5 search** for cross-session recall
+- Multi-turn conversations via `conversation_history` parameter
+
+#### Agent Lifecycle
+1. **Initialize**: `AIAgent(model=, quiet_mode=)`
+2. **Run**: `chat()` or `run_conversation()`
+3. **Terminate**: Automatic cleanup; resources released on conversation end
+
+**Key options**:
+- `max_iterations`: 90 default (configurable)
+- `enabled_toolsets` / `disabled_toolsets`: Control available tools
+- `skip_memory` / `skip_context_files`: Stateless mode for APIs
+
+#### Data Persistence
+- **SQLite**: Session data stored in `~/.hermes/`
+- **Memory**: Pluggable providers (built-in, Honcho, vector stores)
+- **Trajectories**: JSONL format for training data (`save_trajectories=True`)
+- **API Server**: Shared SessionDB for Open WebUI integration
+
+#### Customizability
+- **Skills**: Procedural memory via `SKILL.md` files
+- **Tools**: Custom tool registration
+- **Prompts**: `ephemeral_system_prompt` for dynamic prompts
+- **MCP**: Model Context Protocol support
+- **Platform hints**: `platform` param for Discord, Telegram, etc.
+
+#### Performance/Intelligence
+- **Self-improving**: Agent creates skills from experience
+- **Memory persistence**: Learns across sessions
+- **Credential pooling**: Multiple API keys with rotation
+- **Compression**: Context compression to prevent overflow
+
+#### Integration Example (FastAPI)
+```python
+from fastapi import FastAPI
+from pydantic import BaseModel
+from run_agent import AIAgent
+
+app = FastAPI()
+
+class ChatRequest(BaseModel):
+    message: str
+    model: str = "anthropic/claude-sonnet-4"
+
+@app.post("/chat")
+async def chat(request: ChatRequest):
+    agent = AIAgent(
+        model=request.model,
+        quiet_mode=True,
+        skip_context_files=True,
+        skip_memory=True,
+    )
+    return {"response": agent.chat(request.message)}
+```
+
+---
+
+### 2. OpenCode (anomalyco/opencode)
+
+**Repository**: https://github.com/anomalyco/opencode (138.9K stars, but this is the frontend repo - the actual agent is https://github.com/opencode-ai/opencode with 11.8K stars)
+
+#### Headless / Programmatic API
+✅ **Yes - SDK + Server Mode**
+
+**Server Mode**:
+```bash
+opencode serve [--port 4096] [--hostname "127.0.0.1"]
+```
+
+**SDK**:
+```typescript
+import { createOpencode } from "@opencode-ai/sdk"
+
+const { client } = await createOpencode()
+// Or client-only:
+const client = createOpencodeClient({ baseUrl: "http://localhost:4096" })
+```
+
+#### Resource Usage
+- **Memory**: ~200-400MB (Go runtime)
+- **Architecture**: Client/server - TUI is just one client
+- **Multi-agent**: Multiple sessions supported
+
+#### Session Management
+- Full **Session API**:
+  - `session.create()`, `session.list()`, `session.get()`
+  - `session.prompt()` - send prompts
+  - `session.abort()` - cancel running sessions
+  - `session.summarize()` - compress context
+
+#### Agent Lifecycle
+1. **Start server**: `opencode serve`
+2. **Create session**: `client.session.create()`
+3. **Prompt**: `client.session.prompt()`
+4. **Terminate**: Server stays running; sessions are disposable
+
+#### Data Persistence
+- File-based configuration (`opencode.json`)
+- Sessions stored in server memory (configurable)
+
+#### Customizability
+- **Tools**: Custom tool definitions
+- **Prompts**: Custom system prompts
+- **Structured Output**: JSON Schema support
+- **Provider-agnostic**: Any model via configuration
+
+#### Structured Output Example
+```typescript
+const result = await client.session.prompt({
+  path: { id: sessionId },
+  body: {
+    parts: [{ type: "text", text: "Research Anthropic" }],
+    format: {
+      type: "json_schema",
+      schema: {
+        type: "object",
+        properties: {
+          company: { type: "string" },
+          founded: { type: "number" },
+        },
+        required: ["company", "founded"],
+      },
+    },
+  },
+});
+```
+
+---
+
+### 3. Pi (badlogic/pi-mono)
+
+**Repository**: https://github.com/badlogic/pi-mono (33.1K stars)
+
+**This is the actual agent runtime that Feynman uses.**
+
+#### Headless / Programmatic API
+✅ **Yes - Full TypeScript SDK**
+
+```typescript
+import { Agent } from "@mariozechner/pi-agent-core";
+import { getModel } from "@mariozechner/pi-ai";
+
+const agent = new Agent({
+  initialState: {
+    systemPrompt: "You are a helpful assistant.",
+    model: getModel("anthropic", "claude-sonnet-4-20250514"),
+  },
+});
+
+agent.subscribe((event) => {
+  if (event.type === "message_update" && event.assistantMessageEvent.type === "text_delta") {
+    process.stdout.write(event.assistantMessageEvent.delta);
+  }
+});
+
+await agent.prompt("Hello!");
+```
+
+#### Resource Usage
+- **Memory**: ~50-100MB for core agent (very lightweight)
+- **CPU**: Minimal (just orchestration)
+- **Multi-agent**: Create multiple `Agent` instances
+- **Dependencies**: Requires `@mariozechner/pi-ai` for LLM calls
+
+#### Session Management
+- **In-memory** by default - you control persistence
+- **Messages array** in agent state
+- **Custom state schema** via TypeScript interfaces
+- **Session ID** for provider caching
+
+#### Agent Lifecycle
+1. **Create**: `new Agent({ initialState })`
+2. **Prompt**: `agent.prompt()` or `agent.continue()`
+3. **Events**: Subscribe to `agent_start`, `turn_start`, `message_update`, etc.
+4. **Terminate**: `agent.reset()` or let go out of scope
+
+**Key options**:
+- `transformContext`: Prune/compress messages
+- `convertToLlm`: Filter custom message types
+- `beforeToolCall` / `afterToolCall`: Hooks for tool execution
+
+#### Data Persistence
+- **You control**: Implement persistence via middleware
+- **State is mutable**: `agent.state.messages = newMessages`
+- **No built-in storage**: Freedom to implement as needed
+
+#### Customizability
+- **Tools**: `AgentTool` with Typebox schemas
+- **Middleware**: `@dynamic_prompt`, `@wrap_tool_call` decorators
+- **Message types**: Custom via declaration merging
+- **Thinking budgets**: Configurable per provider
+
+#### Low-Level API
+```typescript
+import { agentLoop, agentLoopContinue } from "@mariozechner/pi-agent-core";
+
+for await (const event of agentLoop([userMessage], context, config)) {
+  console.log(event.type);
+}
+```
+
+---
+
+### 4. OpenClaw (openclaw/openclaw)
+
+**Repository**: https://github.com/openclaw/openclaw (351.9K stars)
+
+#### Headless / Programmatic API
+✅ **Yes - Gateway WebSocket API**
+
+OpenClaw has an extensive Gateway WS API:
+```bash
+openclaw gateway --port 18789 --verbose
+
+# Send a message
+openclaw message send --to +1234567890 --message "Hello"
+
+# Agent command
+openclaw agent --message "Ship checklist" --thinking high
+```
+
+#### Resource Usage
+- **Memory**: ~500MB+ (Node.js runtime)
+- **Multi-agent**: Multi-agent routing via Gateway
+
+#### Session Management
+- **Gateway Sessions**: Main session + group isolation
+- **Session tools**: `sessions_list`, `sessions_history`, `sessions_send`
+- **SQLite-based** storage
+
+#### Agent Lifecycle
+1. **Start Gateway**: `openclaw gateway`
+2. **Connect**: WebSocket to `ws://127.0.0.1:18789`
+3. **Message**: Send via CLI or API
+4. **Persistence**: Sessions saved to SQLite
+
+#### Data Persistence
+- **SQLite**: Gateway session storage
+- **Workspace**: `~/.openclaw/workspace`
+- **Skills**: `~/.openclaw/workspace/skills/<skill>/SKILL.md`
+
+#### Customizability
+- **Skills**: Full skill system (ClawHub registry)
+- **MCP**: Model Context Protocol support
+- **Channels**: 20+ messaging platforms
+
+---
+
+### 5. LangChain Agents (langchain-ai/langchain)
+
+**Repository**: https://github.com/langchain-ai/langchain
+
+#### Headless / Programmatic API
+✅ **Yes - Full Python API**
+
+```python
+from langchain.agents import create_agent
+
+agent = create_agent("openai:gpt-5", tools=tools)
+result = agent.invoke({"messages": [{"role": "user", "content": "Hello"}]})
+```
+
+#### Resource Usage
+- **Memory**: ~100-300MB (Python)
+- **Flexible**: Your code controls resource allocation
+- **Multi-agent**: Via LangGraph subgraphs
+
+#### Session Management
+- **Manual**: You manage message history in state
+- **Custom state**: Extend `AgentState` TypedDict
+- **Memory integration**: Optional short-term/long-term memory
+
+#### Agent Lifecycle
+1. **Create**: `create_agent(model, tools, system_prompt)`
+2. **Invoke**: `agent.invoke({"messages": [...]})`
+3. **Stream**: `agent.stream()` for real-time events
+
+#### Data Persistence
+- **You implement**: Full control via middleware
+- **Optional memory**: LangChain memory modules
+
+#### Customizability
+- **Very high**: Middleware, tools, prompts, dynamic everything
+- **ReAct pattern**: Built-in reasoning + acting loop
+- **ToolStrategy** / **ProviderStrategy**: Structured output
+
+---
+
+### 6. Claude Code (anthropics/claude-code)
+
+**Repository**: https://github.com/anthropics/claude-code
+
+#### Headless / Programmatic API
+✅ **Yes - Agent SDK + CLI**
+
+**CLI Headless**:
+```bash
+claude -p "Find and fix the bug in auth.py" --allowedTools "Read,Edit,Bash"
+claude --bare -p "Summarize" --allowedTools "Read"
+```
+
+**SDK** (Python/TypeScript):
+```python
+from anthropic import Agent
+
+agent = Agent(
+    model="claude-sonnet-4-20250514",
+    tools=[...],
+)
+result = agent.run("Fix the bug in auth.py")
+```
+
+#### Resource Usage
+- **Memory**: ~200-400MB (Node.js)
+- **Structured output**: JSON with `--output-format json`
+- **Streaming**: `--output-format stream-json`
+
+#### Session Management
+- **Session ID**: `--resume <session-id>`
+- **Continue**: `--continue` for follow-up
+- **Persistence**: File-based in `~/.claude/`
+
+#### Agent Lifecycle
+1. **Run**: `claude -p "task"`
+2. **Continue**: `claude -p "more" --continue`
+3. **Resume**: `claude --resume <session-id>`
+
+#### Customizability
+- **Hooks**: Pre/post tool use
+- **Plugins**: Custom commands and agents
+- **MCP**: Model Context Protocol
+- **Settings**: JSON config files
+
+---
+
+### 7. Codex (openai/codex)
+
+**Repository**: https://github.com/openai/codex
+
+#### Headless / Programmatic API
+❌ **CLI Only - No official programmatic API**
+
+```bash
+npm install -g @openai/codex
+codex "Write a function to sort a list"
+```
+
+#### Resource Usage
+- **Memory**: ~200-300MB (Rust binary)
+- **Lightweight**: Minimal footprint
+
+#### Session Management
+- **Limited**: Basic session support
+- **No SDK**: Not designed for programmatic control
+
+#### Customizability
+- **Low**: No official extension API
+- **Provider-locked**: OpenAI-first
+
+---
+
+## Recommendations for User's Use Case
+
+### Primary Recommendation: Pi (agent-core)
+
+**Why**: 
+- Lightest weight (~50-100MB)
+- Full programmatic control via TypeScript
+- Event-driven architecture perfect for custom integration
+- Feynman already uses it - seamless replacement
+- You control persistence - perfect for cloud production
+
+**Best for**: User wants fine-grained control, lightweight footprint, TypeScript ecosystem
+
+### Secondary: Claude Code
+
+**Why**:
+- Production-grade headless mode
+- Structured output support
+- Official SDK (Python/TypeScript)
+- CI/CD integration built-in
+- `bare` mode for consistent CI runs
+
+**Best for**: Production cloud deployment with structured requirements
+
+### Alternative: LangChain
+
+**Why**:
+- Maximum flexibility
+- Any LLM provider
+- Rich ecosystem
+- Full control over agent loop
+
+**Best for**: User wants to build custom agent behavior from scratch
+
+---
+
+## Sources
+
+### Primary Sources (Kept)
+- **Hermes Agent**: https://github.com/NousResearch/hermes-agent - Python library docs, v0.7.0 release notes
+- **OpenCode SDK**: https://opencode.ai/docs/sdk/ - Full TypeScript SDK documentation
+- **Pi agent-core**: https://github.com/badlogic/pi-mono/tree/main/packages/agent - Complete TypeScript API
+- **Claude Code Headless**: https://code.claude.com/docs/en/headless - Official headless documentation
+- **LangChain Agents**: https://docs.langchain.com/oss/python/langchain/agents - Official agents documentation
+- **OpenClaw**: https://github.com/openclaw/openclaw - Gateway architecture
+- **Codex**: https://github.com/openai/codex - CLI tool
+
+### Why These Sources
+- Official repositories and documentation
+- Recent updates (2025-2026)
+- Direct technical details from source
+- Code examples for integration
+
+---
+
+## Gaps & Limitations
+
+### Not Fully Covered
+1. **Benchmark data**: No comprehensive benchmarks comparing agent performance across tools
+2. **OpenCode internal architecture**: Client/server details somewhat opaque
+3. **Exact resource numbers**: Estimates based on typical Python/Node.js/Go runtime sizes
+4. **OpenClaw detailed SDK**: Very large project; deep programmatic details require more investigation
+5. **Codex SDK**: Currently CLI-only with no programmatic API
+
+### Suggested Next Steps
+1. **Test Pi locally**: Install `@mariozechner/pi-agent-core` and verify headless operation
+2. **Test Claude Code**: Try `claude -p --bare` for CI use case
+3. **OpenCode server test**: Run `opencode serve` and test SDK integration
+4. **Hermes Python lib**: Test the programmatic API for comparison
+
+### For Cloud Production
+- Consider **Pi** for lightweight containers
+- Consider **Claude Code** for structured output requirements
+- Both support any LLM provider - not locked in