Design: Implement parallel agent capacity limits and queueing #37

Closed
opened 2026-03-31 06:01:57 +02:00 by shoko · 1 comment
Owner

Context

Testing revealed that running 8+ parallel opencode agents causes timeouts due to resource contention. Current implementation has no built-in limits.

Questions to Resolve

  1. What happens when max agents is reached?

    • Reject new requests?
    • Block waiting for slot?
    • Queue for later execution?
  2. Where does the queue live?

    • In-memory (lost on restart)?
    • Persistent (file/database)?
    • Hermes/Python layer?
    • External (Redis, SQLite)?
  3. Queue ordering?

    • FIFO (first come first served)?
    • Priority-based?
    • By issue severity?
  4. Queue timeout?

    • How long to wait in queue?
    • Auto-expire queued requests?
  5. Backpressure signaling?

    • How does PM agent know limits are reached?
    • Should PM stop delegating when near limit?

Current Behavior

  • No limits enforced
  • 8+ agents causes resource contention and timeouts
  • Recommended max: 5 agents based on testing

Implementation Options

Option A: Simple Semaphore

  • In-memory counter with max concurrent limit
  • New agents blocked until slot available
  • No persistence, no queue

Option B: Queue with Persistence

  • Requests queued in SQLite/Redis
  • PM queries queue position
  • Human can inspect/manage queue

Option C: PM Agent Self-Regulation

  • PM agent tracks active agents count
  • PM decides to delay delegation when near limit
  • No explicit queue, uses PM memory

Out of Scope for This Issue

  • Actual implementation (separate issues/PRs)
  • Specific queue backend choice
  • Priority implementation details
  • Issue #3 (parallel capacity testing - completed)
  • Parallel capacity test tool: tools/parallel-capacity-test/
## Context Testing revealed that running 8+ parallel opencode agents causes timeouts due to resource contention. Current implementation has no built-in limits. ## Questions to Resolve 1. **What happens when max agents is reached?** - Reject new requests? - Block waiting for slot? - Queue for later execution? 2. **Where does the queue live?** - In-memory (lost on restart)? - Persistent (file/database)? - Hermes/Python layer? - External (Redis, SQLite)? 3. **Queue ordering?** - FIFO (first come first served)? - Priority-based? - By issue severity? 4. **Queue timeout?** - How long to wait in queue? - Auto-expire queued requests? 5. **Backpressure signaling?** - How does PM agent know limits are reached? - Should PM stop delegating when near limit? ## Current Behavior - No limits enforced - 8+ agents causes resource contention and timeouts - Recommended max: 5 agents based on testing ## Implementation Options ### Option A: Simple Semaphore - In-memory counter with max concurrent limit - New agents blocked until slot available - No persistence, no queue ### Option B: Queue with Persistence - Requests queued in SQLite/Redis - PM queries queue position - Human can inspect/manage queue ### Option C: PM Agent Self-Regulation - PM agent tracks active agents count - PM decides to delay delegation when near limit - No explicit queue, uses PM memory ## Out of Scope for This Issue - Actual implementation (separate issues/PRs) - Specific queue backend choice - Priority implementation details ## Related - Issue #3 (parallel capacity testing - completed) - Parallel capacity test tool: `tools/parallel-capacity-test/`
Author
Owner

Feedback

Great design issue! Here is my analysis based on the kugetsu architecture:

Why not pure Option A (Semaphore):

  • No visibility into queue state
  • PM agent cant inform user about wait times
  • Lost on restart (but agents restart means new context anyway)

Why not full Option B (Persistent Queue):

  • Overkill for Phase 3/initial Telegram UX
  • Redis/SQLite adds complexity
  • FIFO may not match priority needs

Proposed Hybrid Design

1. PM Agent tracks active count in memory

  • PM session stores: active_agents: [session_ids]
  • PM updates count on delegation and completion
  • Simple, no external dependency

2. User-facing queue via kugetsu index

{
  "base": "ses_abc",
  "pm_agent": "ses_pm",
  "active": ["issue-14.json", "issue-15.json"],
  "queued": ["issue-16.json"]
}
  • Persisted to ~/.kugetsu/index.json
  • kugetsu CLI can show: kugetsu queue

3. Queue behavior:

Scenario Behavior
Max reached (e.g., 5) New request queued, PM tells user "queued, position N"
Slot frees up PM picks next from queue
Queue timeout PM notifies user "task expired after 24h"

4. Queue ordering:

  • FIFO for now
  • Priority can come later (Phase 2 API)

Where to Implement

Layer What
kugetsu CLI Track active count, manage queued list, persist to index.json
PM Agent Check capacity before delegation, update on completion
Telegram UX PM responds with queue position

Capacity Limit

Based on testing, 5 agents seems reasonable. But this should be:

  • Configurable via ~/.kugetsu/config.json
  • Default: 3 (safer for resource-constrained containers)
  • Tunable based on container RAM/CPU

Integration with Phase 3

For Telegram UX, the PM agent needs to:

  1. Check kugetsu queue before delegating
  2. If full, add to queue and tell user
  3. When slot frees, pick from queue and notify user

Questions

  1. Should we implement queue timeout with notification?
  2. Should PM agent auto-retry queued tasks or wait for user confirmation?
  3. Do we need "cancel queued task" functionality?
## Feedback Great design issue! Here is my analysis based on the kugetsu architecture: ### Recommended Approach: Option C (PM Agent Self-Regulation) + Minimal Queue **Why not pure Option A (Semaphore):** - No visibility into queue state - PM agent cant inform user about wait times - Lost on restart (but agents restart means new context anyway) **Why not full Option B (Persistent Queue):** - Overkill for Phase 3/initial Telegram UX - Redis/SQLite adds complexity - FIFO may not match priority needs ### Proposed Hybrid Design **1. PM Agent tracks active count in memory** - PM session stores: `active_agents: [session_ids]` - PM updates count on delegation and completion - Simple, no external dependency **2. User-facing queue via kugetsu index** ```json { "base": "ses_abc", "pm_agent": "ses_pm", "active": ["issue-14.json", "issue-15.json"], "queued": ["issue-16.json"] } ``` - Persisted to `~/.kugetsu/index.json` - kugetsu CLI can show: `kugetsu queue` **3. Queue behavior:** | Scenario | Behavior | |---------|----------| | Max reached (e.g., 5) | New request queued, PM tells user "queued, position N" | | Slot frees up | PM picks next from queue | | Queue timeout | PM notifies user "task expired after 24h" | **4. Queue ordering:** - FIFO for now - Priority can come later (Phase 2 API) ### Where to Implement | Layer | What | |-------|------| | **kugetsu CLI** | Track active count, manage queued list, persist to index.json | | **PM Agent** | Check capacity before delegation, update on completion | | **Telegram UX** | PM responds with queue position | ### Capacity Limit Based on testing, 5 agents seems reasonable. But this should be: - Configurable via `~/.kugetsu/config.json` - Default: 3 (safer for resource-constrained containers) - Tunable based on container RAM/CPU ### Integration with Phase 3 For Telegram UX, the PM agent needs to: 1. Check `kugetsu queue` before delegating 2. If full, add to queue and tell user 3. When slot frees, pick from queue and notify user ### Questions 1. Should we implement queue timeout with notification? 2. Should PM agent auto-retry queued tasks or wait for user confirmation? 3. Do we need "cancel queued task" functionality?
shoko closed this issue 2026-03-31 14:48:53 +02:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: shoko/kugetsu#37