# Fix: Queue daemon spawning excess agents due to race condition ## Problem When enqueueing multiple tasks (e.g., 6 tasks), the queue daemon was spawning many more subagents than expected, eventually exhausting container memory. **Root Cause:** The combination of: 1. `process_queue()` calling `opencode run` directly instead of `kugetsu start`, bypassing all concurrency logic 2. `count_active_dev_sessions()` counting `pm-agent.json` toward `MAX_CONCURRENT_AGENTS`, reducing effective dev agent slots 3. No atomic locking around session count check + session file creation (TOCTOU race condition) 4. Background spawning of multiple concurrent processes in `process_queue()` **Expected behavior:** With `MAX_CONCURRENT_AGENTS=3` and 6 tasks: - Tasks should be processed sequentially via `kugetsu start` - Only 3 dev agents should run at a time - Tasks should queue and wait for slots to free up ## Solution ### 1. `count_active_dev_sessions()` - Exclude pm-agent Only count actual dev agent session files (exclude `pm-agent.json`). ### 2. `process_queue()` - Call `kugetsu start` directly + retry logic - Call `kugetsu start` directly (foreground, sequential) instead of spawning `opencode run` background process - Dynamic batch size = available slots (removes need for `QUEUE_DAEMON_BATCH_SIZE`) - Retry logic (max 3 attempts) on failure - On failure: cleanup worktree/session and revert to `pending` state - Save `fork_pid` to queue item for timeout handling ### 3. `cmd_start()` - Add flock - Add flock around critical section (count check + fork) - Track `fork_pid` for queue item timeout handling ### 4. Notification System New notification types: | Event | Type | |-------|------| | Task enqueued | `task_queued` | | Task dequeued | `task_dequeued` | | Task started | `task_started` | | Task completed | `task_completed` | | Task error | `task_error` | ### 5. Config - Remove `QUEUE_DAEMON_BATCH_SIZE` (no longer needed - batch size is now dynamic) ## Notification Flow | Event | Location | Type | |-------|----------|------| | Task enqueued | `enqueue_task()` | `task_queued` | | Task dequeued | `process_queue()` after state change to `notified` | `task_dequeued` | | Task started | `cmd_start()` after session file created | `task_started` | | Task completed | `update_queue_item_state()` | `task_completed` | | Task error | `update_queue_item_state()` | `task_error` | ## Out of Scope - Re-check loop in cmd_start (checking if session DB is reliable) - deferred to separate research issue - Buffer mechanism for excess forking (safety failsafe only) ## Status - [x] Issue created - [x] Implementation - [x] PR created (#147) - [ ] Merged