Agent timeout handling #137

Closed
opened 2026-04-05 04:36:17 +02:00 by shoko · 0 comments
Owner

Overview

Implement configurable timeout to automatically kill hanging or idle agents.

Problem

When running parallel dev agents, sometimes agents appear active but are actually stuck/idle:

  • Log shows "doing something" then idle for extended periods
  • opencode process is still running but not productive
  • No automatic cleanup mechanism

Solution

Track agent activity and kill them after configurable idle time.

Configuration

# Kill agent after N hours of total task time (default: 1)
TASK_TIMEOUT_HOURS=1

# Check interval in daemon (default: 5 minutes)
AGENT_CHECK_INTERVAL_MINUTES=5

Tracking

Track per active session:

{
  "issue_ref": "github.com/shoko/kugetsu#14",
  "started_at": "2026-04-05T10:00:00Z",
  "last_activity": "2026-04-05T10:30:00Z",
  "opencode_session_id": "ses_xyz",
  "worktree_path": "..."
}

Timeout Handling

  1. Idle timeout: If now - last_activity > TASK_TIMEOUT_HOURS, kill agent
  2. On timeout:
    • Kill opencode process
    • Mark session as "timeout" in session file
    • Log timeout event
    • Worktree is kept (can resume with kugetsu continue)

Integration with Queue Daemon

The queue daemon should also handle timeout checking:

  • Periodically scan active sessions
  • Check last_activity against TASK_TIMEOUT_HOURS
  • If exceeded, kill the opencode process and mark as timeout

Implementation Notes

  • Use ps aux | grep opencode | grep <worktree_path> to find processes
  • Use kill <pid> to terminate
  • Update session file state: "state": "timeout"
  • Daemon should log all timeout events

Session State Values

{
  "state": "idle"      // waiting for work
  "state": "active"    // currently working
  "state": "timeout"   // killed due to timeout
  "state": "completed" // finished successfully
}
## Overview Implement configurable timeout to automatically kill hanging or idle agents. ## Problem When running parallel dev agents, sometimes agents appear active but are actually stuck/idle: - Log shows "doing something" then idle for extended periods - opencode process is still running but not productive - No automatic cleanup mechanism ## Solution Track agent activity and kill them after configurable idle time. ### Configuration ```bash # Kill agent after N hours of total task time (default: 1) TASK_TIMEOUT_HOURS=1 # Check interval in daemon (default: 5 minutes) AGENT_CHECK_INTERVAL_MINUTES=5 ``` ### Tracking Track per active session: ```json { "issue_ref": "github.com/shoko/kugetsu#14", "started_at": "2026-04-05T10:00:00Z", "last_activity": "2026-04-05T10:30:00Z", "opencode_session_id": "ses_xyz", "worktree_path": "..." } ``` ### Timeout Handling 1. **Idle timeout**: If `now - last_activity > TASK_TIMEOUT_HOURS`, kill agent 2. **On timeout**: - Kill opencode process - Mark session as "timeout" in session file - Log timeout event - Worktree is kept (can resume with `kugetsu continue`) ### Integration with Queue Daemon The queue daemon should also handle timeout checking: - Periodically scan active sessions - Check `last_activity` against `TASK_TIMEOUT_HOURS` - If exceeded, kill the opencode process and mark as timeout ### Implementation Notes - Use `ps aux | grep opencode | grep <worktree_path>` to find processes - Use `kill <pid>` to terminate - Update session file state: `"state": "timeout"` - Daemon should log all timeout events ### Session State Values ```json { "state": "idle" // waiting for work "state": "active" // currently working "state": "timeout" // killed due to timeout "state": "completed" // finished successfully } ``` ## Related Issues - Meta issue: #133
han was assigned by shoko 2026-04-05 04:45:03 +02:00
shoko added this to the v0.1.0 milestone 2026-04-05 04:45:03 +02:00
shoko closed this issue 2026-04-05 06:59:05 +02:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: shoko/kugetsu#137