Compare commits
27 Commits
v0.1.5
...
54aa6419eb
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
54aa6419eb | ||
| 98a31070a7 | |||
| 26346235c9 | |||
| 2212fabf22 | |||
|
|
0fa778353b | ||
| 151efadca3 | |||
|
|
379d53cedc | ||
|
|
043542344a | ||
| e763ceb0ad | |||
|
|
61f06f825f | ||
| b76a9b883a | |||
|
|
ac850869fd | ||
|
|
3107dbf1e5 | ||
|
|
b8b97e3c09 | ||
|
|
d8af560e6d | ||
| 5d12f6ca42 | |||
|
|
91505345a2 | ||
| f7fe22de25 | |||
|
|
3ce43ffa65 | ||
|
|
416e8e5757 | ||
| 1c1d18b9ae | |||
|
|
8c639e2928 | ||
| c4c3556247 | |||
|
|
4342347ac6 | ||
| 7888a34bd9 | |||
|
|
e2a37cdbb9 | ||
|
|
6e9472b5e2 |
67
.github/ISSUES/fix-queue-daemon-excess-agents.md
vendored
Normal file
67
.github/ISSUES/fix-queue-daemon-excess-agents.md
vendored
Normal file
@@ -0,0 +1,67 @@
|
||||
# Fix: Queue daemon spawning excess agents due to race condition
|
||||
|
||||
## Problem
|
||||
|
||||
When enqueueing multiple tasks (e.g., 6 tasks), the queue daemon was spawning many more subagents than expected, eventually exhausting container memory.
|
||||
|
||||
**Root Cause:** The combination of:
|
||||
1. `process_queue()` calling `opencode run` directly instead of `kugetsu start`, bypassing all concurrency logic
|
||||
2. `count_active_dev_sessions()` counting `pm-agent.json` toward `MAX_CONCURRENT_AGENTS`, reducing effective dev agent slots
|
||||
3. No atomic locking around session count check + session file creation (TOCTOU race condition)
|
||||
4. Background spawning of multiple concurrent processes in `process_queue()`
|
||||
|
||||
**Expected behavior:** With `MAX_CONCURRENT_AGENTS=3` and 6 tasks:
|
||||
- Tasks should be processed sequentially via `kugetsu start`
|
||||
- Only 3 dev agents should run at a time
|
||||
- Tasks should queue and wait for slots to free up
|
||||
|
||||
## Solution
|
||||
|
||||
### 1. `count_active_dev_sessions()` - Exclude pm-agent
|
||||
Only count actual dev agent session files (exclude `pm-agent.json`).
|
||||
|
||||
### 2. `process_queue()` - Call `kugetsu start` directly + retry logic
|
||||
- Call `kugetsu start` directly (foreground, sequential) instead of spawning `opencode run` background process
|
||||
- Dynamic batch size = available slots (removes need for `QUEUE_DAEMON_BATCH_SIZE`)
|
||||
- Retry logic (max 3 attempts) on failure
|
||||
- On failure: cleanup worktree/session and revert to `pending` state
|
||||
- Save `fork_pid` to queue item for timeout handling
|
||||
|
||||
### 3. `cmd_start()` - Add flock
|
||||
- Add flock around critical section (count check + fork)
|
||||
- Track `fork_pid` for queue item timeout handling
|
||||
|
||||
### 4. Notification System
|
||||
New notification types:
|
||||
| Event | Type |
|
||||
|-------|------|
|
||||
| Task enqueued | `task_queued` |
|
||||
| Task dequeued | `task_dequeued` |
|
||||
| Task started | `task_started` |
|
||||
| Task completed | `task_completed` |
|
||||
| Task error | `task_error` |
|
||||
|
||||
### 5. Config
|
||||
- Remove `QUEUE_DAEMON_BATCH_SIZE` (no longer needed - batch size is now dynamic)
|
||||
|
||||
## Notification Flow
|
||||
|
||||
| Event | Location | Type |
|
||||
|-------|----------|------|
|
||||
| Task enqueued | `enqueue_task()` | `task_queued` |
|
||||
| Task dequeued | `process_queue()` after state change to `notified` | `task_dequeued` |
|
||||
| Task started | `cmd_start()` after session file created | `task_started` |
|
||||
| Task completed | `update_queue_item_state()` | `task_completed` |
|
||||
| Task error | `update_queue_item_state()` | `task_error` |
|
||||
|
||||
## Out of Scope
|
||||
|
||||
- Re-check loop in cmd_start (checking if session DB is reliable) - deferred to separate research issue
|
||||
- Buffer mechanism for excess forking (safety failsafe only)
|
||||
|
||||
## Status
|
||||
|
||||
- [x] Issue created
|
||||
- [x] Implementation
|
||||
- [x] PR created (#147)
|
||||
- [ ] Merged
|
||||
@@ -49,6 +49,8 @@ A default config file is created during `kugetsu init` with commented examples:
|
||||
| `MAX_CONCURRENT_AGENTS` | 3 | Maximum number of concurrent dev agents |
|
||||
| `KUGETSU_TEMP_DIR` | `~/.local/share/opencode/tool-output` | Temp directory for subagent tool output (useful in headless environments where /tmp is restricted) |
|
||||
| `KUGETSU_VERBOSITY` | `default` | PM agent verbosity level: `verbose`, `default`, or `quiet` |
|
||||
| `QUEUE_DAEMON_INTERVAL_MINUTES` | 5 | How often daemon polls queue (in minutes) |
|
||||
| `QUEUE_CLEANUP_AGE_DAYS` | 7 | Auto-cleanup completed/error items older than N days |
|
||||
|
||||
### Environment Variables for Agents
|
||||
|
||||
@@ -111,6 +113,10 @@ Each issue session gets its own git worktree to prevent conflicts:
|
||||
├── worktrees/
|
||||
│ ├── github.com-shoko-kugetsu-14/ # Isolated workdir for issue #14
|
||||
│ └── github.com-shoko-kugetsu-15/ # Isolated workdir for issue #15
|
||||
├── queue/
|
||||
│ ├── items/ # Queue item JSON files
|
||||
│ ├── daemon.pid # Daemon process ID
|
||||
│ └── daemon.log # Daemon log output
|
||||
└── index.json # Maps session IDs and issue refs to session files
|
||||
```
|
||||
|
||||
@@ -258,16 +264,17 @@ kugetsu destroy --base -y
|
||||
|
||||
### kugetsu delegate `<message>`
|
||||
|
||||
Send a message to the PM agent for task coordination (fire-and-forget):
|
||||
Send a message to the PM agent for task coordination via queue:
|
||||
```bash
|
||||
kugetsu delegate "work on issue #14"
|
||||
kugetsu delegate "review PR #92"
|
||||
```
|
||||
|
||||
- Non-blocking: returns immediately, runs in background
|
||||
- PM agent processes the message asynchronously
|
||||
- Uses `KUGETSU_VERBOSITY` env var to control PM agent output verbosity
|
||||
- Log output stored in `~/.kugetsu/logs/delegate-<timestamp>.log`
|
||||
- **Always enqueues** (fire-and-forget): returns immediately
|
||||
- Queue daemon polls queue and invokes PM when slots available
|
||||
- Tasks are processed FIFO (first-in-first-out)
|
||||
- Use `kugetsu queue list` to see pending tasks
|
||||
- Use `kugetsu queue-daemon logs` to debug queue processing
|
||||
|
||||
### kugetsu logs [n]
|
||||
|
||||
@@ -328,35 +335,79 @@ kugetsu server default github # Set default server
|
||||
kugetsu server get github # Get server URL
|
||||
```
|
||||
|
||||
### kugetsu queue <list|enqueue|dequeue|clear>
|
||||
### kugetsu queue <list|stats|clear>
|
||||
|
||||
Manage task queue for autonomous PM operation:
|
||||
```bash
|
||||
kugetsu queue list # Show queued tasks
|
||||
kugetsu queue enqueue "task" # Add task to queue
|
||||
kugetsu queue dequeue # Remove next task from queue
|
||||
kugetsu queue clear # Clear all queued tasks
|
||||
kugetsu queue list # Show queued tasks with status
|
||||
kugetsu queue stats # Show queue statistics (total, pending, notified, completed, error)
|
||||
kugetsu queue clear # Clean up old completed/error items
|
||||
kugetsu queue enqueue <issue-ref> <message> # Manually enqueue a task
|
||||
```
|
||||
|
||||
- Queue stored in `~/.kugetsu/queue.json`
|
||||
**Queue Item States:**
|
||||
- `pending` - Waiting in queue, daemon can pick up
|
||||
- `notified` - PM agent has picked up the task
|
||||
- `completed` - Dev agent finished, PR created
|
||||
- `error` - Timeout or failure
|
||||
|
||||
### kugetsu queue-daemon <start|stop|restart|status|logs>
|
||||
|
||||
Manage the queue daemon background process:
|
||||
```bash
|
||||
kugetsu queue-daemon start # Start daemon in background
|
||||
kugetsu queue-daemon stop # Stop daemon
|
||||
kugetsu queue-daemon restart # Restart daemon
|
||||
kugetsu queue-daemon status # Check if daemon is running
|
||||
kugetsu queue-daemon logs # Show recent daemon logs
|
||||
```
|
||||
|
||||
**Daemon Behavior:**
|
||||
1. Runs at configurable interval (default: 5 minutes)
|
||||
2. Checks if active agents < MAX_CONCURRENT_AGENTS
|
||||
3. Picks 1-N pending items (configurable batch size)
|
||||
4. Forks PM session for each picked item
|
||||
5. PM decides whether to use `start` or `continue`
|
||||
|
||||
**Queue Directory:**
|
||||
```
|
||||
~/.kugetsu/queue/
|
||||
├── items/ # Queue item JSON files
|
||||
│ ├── q_1234567890.json # One file per queued task
|
||||
│ └── q_1234567891.json
|
||||
├── daemon.pid # Daemon process ID
|
||||
├── daemon.lock # Daemon lock file
|
||||
└── daemon.log # Daemon log output
|
||||
```
|
||||
|
||||
## Workflow Example
|
||||
|
||||
### First-time Setup
|
||||
```bash
|
||||
# First-time setup (requires TTY)
|
||||
# Initialize kugetsu (requires TTY)
|
||||
kugetsu init
|
||||
# Creates: base session + pm-agent session
|
||||
|
||||
# Start work on issue
|
||||
kugetsu start github.com/shoko/kugetsu#14 "implement feature X"
|
||||
# Creates: worktree at ~/.kugetsu/worktrees/github.com-shoko-kugetsu-14/
|
||||
# Start the queue daemon (for autonomous operation)
|
||||
kugetsu queue-daemon start
|
||||
```
|
||||
|
||||
# Continue later
|
||||
### Normal Workflow
|
||||
```bash
|
||||
# Enqueue tasks via delegate - agents will process them automatically
|
||||
kugetsu delegate "work on issue #14"
|
||||
kugetsu delegate "review PR #92"
|
||||
|
||||
# Check queue status
|
||||
kugetsu queue list # See pending tasks
|
||||
kugetsu queue stats # See statistics
|
||||
|
||||
# Debug queue daemon
|
||||
kugetsu queue-daemon status # Is daemon running?
|
||||
kugetsu queue-daemon logs # See daemon logs
|
||||
|
||||
# Continue work on existing issue
|
||||
kugetsu continue github.com/shoko/kugetsu#14 "add tests"
|
||||
|
||||
# Continue again
|
||||
kugetsu continue github.com/shoko/kugetsu#14 "fix failing test"
|
||||
|
||||
# List all sessions
|
||||
kugetsu list
|
||||
|
||||
@@ -367,6 +418,21 @@ kugetsu prune --force
|
||||
kugetsu destroy github.com/shoko/kugetsu#14
|
||||
```
|
||||
|
||||
### Queue Daemon Management
|
||||
```bash
|
||||
# Check if daemon is running
|
||||
kugetsu queue-daemon status
|
||||
|
||||
# View daemon logs for debugging
|
||||
kugetsu queue-daemon logs
|
||||
|
||||
# Restart daemon if needed
|
||||
kugetsu queue-daemon restart
|
||||
|
||||
# Stop daemon
|
||||
kugetsu queue-daemon stop
|
||||
```
|
||||
|
||||
## Headless Operation
|
||||
|
||||
This design solves the headless CLI limitation discovered in Issue #14:
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user