Compare commits

...

4 Commits

Author SHA1 Message Date
shokollm
92f8369d6f fix CONTRIBUTING.md: branch naming should include issue number
Address han's review feedback:
- Changed feat/feature-name to feat/issue-N-feature-name
- Consistent with fix/issue-N-name format
2026-04-05 10:26:59 +00:00
shokollm
16e417c88e docs: add versioning policy, changelog, and update contributing guide
- Add VERSIONING.md documenting 0.1.x/0.2.x branch strategy
- Add docs/CHANGELOG.md with release notes for v0.1.0-v0.2.1
- Update CONTRIBUTING.md: fix master->main, add develop branch
- Closes #120
2026-04-05 09:38:25 +00:00
da0fa302de Merge pull request 'fix(kugetsu): prevent excess agent spawning with flock + sequential processing' (#147) from fix/issue-queue-daemon-excess-agents into main 2026-04-05 10:49:24 +02:00
shokollm
54aa6419eb fix(kugetsu): prevent excess agent spawning with flock + sequential processing
- count_active_dev_sessions() now excludes pm-agent.json from count
- process_queue() now calls kugetsu start directly (not opencode run)
- process_queue() uses dynamic batch size = available_slots
- process_queue() has retry logic (max 3 attempts) on failure
- cmd_start() now uses flock around critical section
- Added notification types: task_queued, task_dequeued, task_started, task_completed, task_error
- Removed QUEUE_DAEMON_BATCH_SIZE config (no longer needed)

Fixes issue #146
2026-04-05 08:44:45 +00:00
6 changed files with 406 additions and 94 deletions

View File

@@ -0,0 +1,67 @@
# Fix: Queue daemon spawning excess agents due to race condition
## Problem
When enqueueing multiple tasks (e.g., 6 tasks), the queue daemon was spawning many more subagents than expected, eventually exhausting container memory.
**Root Cause:** The combination of:
1. `process_queue()` calling `opencode run` directly instead of `kugetsu start`, bypassing all concurrency logic
2. `count_active_dev_sessions()` counting `pm-agent.json` toward `MAX_CONCURRENT_AGENTS`, reducing effective dev agent slots
3. No atomic locking around session count check + session file creation (TOCTOU race condition)
4. Background spawning of multiple concurrent processes in `process_queue()`
**Expected behavior:** With `MAX_CONCURRENT_AGENTS=3` and 6 tasks:
- Tasks should be processed sequentially via `kugetsu start`
- Only 3 dev agents should run at a time
- Tasks should queue and wait for slots to free up
## Solution
### 1. `count_active_dev_sessions()` - Exclude pm-agent
Only count actual dev agent session files (exclude `pm-agent.json`).
### 2. `process_queue()` - Call `kugetsu start` directly + retry logic
- Call `kugetsu start` directly (foreground, sequential) instead of spawning `opencode run` background process
- Dynamic batch size = available slots (removes need for `QUEUE_DAEMON_BATCH_SIZE`)
- Retry logic (max 3 attempts) on failure
- On failure: cleanup worktree/session and revert to `pending` state
- Save `fork_pid` to queue item for timeout handling
### 3. `cmd_start()` - Add flock
- Add flock around critical section (count check + fork)
- Track `fork_pid` for queue item timeout handling
### 4. Notification System
New notification types:
| Event | Type |
|-------|------|
| Task enqueued | `task_queued` |
| Task dequeued | `task_dequeued` |
| Task started | `task_started` |
| Task completed | `task_completed` |
| Task error | `task_error` |
### 5. Config
- Remove `QUEUE_DAEMON_BATCH_SIZE` (no longer needed - batch size is now dynamic)
## Notification Flow
| Event | Location | Type |
|-------|----------|------|
| Task enqueued | `enqueue_task()` | `task_queued` |
| Task dequeued | `process_queue()` after state change to `notified` | `task_dequeued` |
| Task started | `cmd_start()` after session file created | `task_started` |
| Task completed | `update_queue_item_state()` | `task_completed` |
| Task error | `update_queue_item_state()` | `task_error` |
## Out of Scope
- Re-check loop in cmd_start (checking if session DB is reliable) - deferred to separate research issue
- Buffer mechanism for excess forking (safety failsafe only)
## Status
- [x] Issue created
- [x] Implementation
- [x] PR created (#147)
- [ ] Merged

View File

@@ -2,10 +2,10 @@
## Workflow ## Workflow
1. Create a branch for your work: `git checkout -b fix/issue-N-name` or `git checkout -b docs/topic-name` 1. Create a branch for your work: `git checkout -b fix/issue-N-name` or `git checkout -b feat/issue-N-feature-name`
2. Make changes and commit with clear messages 2. Make changes and commit with clear messages
3. Open a Pull Request for review 3. Open a Pull Request for review
4. Do not merge directly to `master` for reviewable changes 4. Do not merge directly to `main` or `develop` for reviewable changes
5. After approval, squash and merge 5. After approval, squash and merge
## Guidelines ## Guidelines
@@ -14,10 +14,53 @@
- Keep PRs focused and reasonably sized - Keep PRs focused and reasonably sized
- Document any non-obvious decisions - Document any non-obvious decisions
- Test changes before submitting - Test changes before submitting
- See [VERSIONING.md](VERSIONING.md) for backport compatibility rules
## Branches ## Branches
- `master` — stable, reviewed content only ### Primary Branches
- `main` — stable 0.1.x releases, production-ready code
- `develop` — experimental 0.2.x work, next major version
### Feature Branches
- `fix/*` — bug fixes - `fix/*` — bug fixes
- `feat/*` — new features
- `docs/*` — documentation updates - `docs/*` — documentation updates
- `research/*`new research notes - `refactor/*`code refactoring (no behavior change)
## Branch Model
```
main (0.1.x stable)
└── v0.1.0, v0.1.1, v0.1.2, ...
develop (0.2.x experimental)
└── (next major version work)
```
### Which Branch to Target?
| Change Type | Target Branch | Backport? |
|-------------|---------------|-----------|
| Bug fix | `main` | N/A |
| Documentation | `main` | N/A |
| New feature (backport-compatible) | `main` | Can cherry-pick to `develop` |
| Experimental feature | `develop` | No |
| Breaking change | `develop` | No |
## Backport Compatibility
Before merging, consider if your change is backport-compatible:
- **YES**: Bug fixes, docs, adding new optional inputs
- **NO**: Changing behavior, changing defaults, removing features
See [VERSIONING.md](VERSIONING.md) for full policy.
## Release Process
1. Bug fixes and docs → directly to `main`
2. New features → `develop` or feature branches → `develop`
3. When `develop` is stable enough → merge to `main` for release

71
VERSIONING.md Normal file
View File

@@ -0,0 +1,71 @@
# Versioning Policy
## Branch Strategy
Kugetsu uses a dual-branch model:
| Branch | Purpose | Version | Stability |
|--------|---------|---------|-----------|
| `main` | Stable releases | 0.1.x | Production-ready |
| `develop` | Experimental work | 0.2.x | Active development |
### Branch Definitions
- **`main`**: Contains the latest stable 0.1.x releases. All changes here should be production-ready and backport-compatible when possible.
- **`develop`**: Contains work for the next major version (0.2.x). This branch may contain experimental features that could change or be removed.
## Version Format
Versions follow [Semantic Versioning](https://semver.org/):
```
MAJOR.MINOR.PATCH
```
- **MAJOR**: Incompatible API/behavior changes
- **MINOR**: New functionality (backward-compatible)
- **PATCH**: Bug fixes (backward-compatible)
## Backport Compatibility
### Backport-Compatible Changes (0.1.x)
- Bug fixes
- Documentation updates
- Performance improvements
- Adding new inputs/options (must have sensible defaults)
- Changes that only affect 0.2.x-specific features
### NOT Backport-Compatible
- Removing or renaming existing options
- Changing default values of existing options
- Changing behavior of existing commands
- Introducing breaking changes to the API/shell interface
## Deprecation Policy
When introducing breaking changes:
1. **Deprecate in minor X**: Add warning messages, document the change
2. **Remove in major X+1**: The breaking change is removed in the next major version
Example:
- Option `--old-flag` deprecated in v0.1.5
- Option `--old-flag` removed in v1.0.0 (not v0.2.0)
## What Constitutes a Version Bump
| Change Type | Version Bump |
|-------------|--------------|
| Add new command/option | MINOR |
| Bug fix | PATCH |
| Change default value | MINOR (may warrant PATCH) |
| Add new required input | MAJOR |
| Remove deprecated feature | MAJOR |
| Change behavior of existing command | MINOR (needs deprecation first) |
## Release Process
1. Changes are developed on feature branches
2. PRs are opened against `main` for 0.1.x changes, or `develop` for 0.2.x
3. After review and approval, changes are squash-merged
4. Releases are tagged from `main` after significant changes

111
docs/CHANGELOG.md Normal file
View File

@@ -0,0 +1,111 @@
# Changelog
All notable changes to kugetsu are documented here.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
## [Unreleased]
## [v0.2.1] - 2026-04-03
### Fixed
- Prevent excess agent spawning with flock + sequential processing
## [v0.2.0] - 2026-03-30
### Added
- Queue system with background daemon
- Agent timeout handling
- Context dump/load for session isolation
- PR tracking and safe destroy
## [v0.1.13] - 2026-03-29
### Fixed
- Add missing closing parenthesis in process_queue Python extraction
## [v0.1.12] - 2026-03-25
### Added
- Post-comment helper for PM agent
## [v0.1.11] - 2026-03-20
### Fixed
- Wrap cmd_continue in subshell with cd for correct worktree dir
## [v0.1.10] - 2026-03-15
### Fixed
- destroy --base now also deletes PM agent session
## [v0.1.9] - 2026-03-10
### Added
- init creates base session in ~/.kugetsu-worktrees
- Adds context to forked sessions
- Clears logs on init
## [v0.1.8] - 2026-03-05
### Fixed
- destroy --base and --pm-agent actually delete opencode sessions
## [v0.1.7] - 2026-02-28
### Fixed
- Warn if init run from non-empty directory
## [v0.1.6] - 2026-02-20
### Fixed
- Detect session via DB query instead of opencode session list
## [v0.1.5] - 2026-02-15
### Fixed
- Update forked session permissions after detection
## [v0.1.4] - 2026-02-10
### Fixed
- Call fix_session_permissions before forking
## [v0.1.3] - 2026-02-05
### Fixed
- Session detection ordering bug and debugging
## [v0.1.2] - 2026-01-28
### Fixed
- Improve session detection in cmd_start with retry logic and logging
## [v0.1.1] - 2026-01-20
### Fixed
- Use cd + worktree inside parent dir instead of --dir flag
## [v0.1.0] - 2026-01-15
### Added
- KUGETSU_VERBOSITY for PM agent output control
- Initial documented release
[Unreleased]: https://git.fbrns.co/shoko/kugetsu/compare/v0.2.1...HEAD
[v0.2.1]: https://git.fbrns.co/shoko/kugetsu/compare/v0.2.0...v0.2.1
[v0.2.0]: https://git.fbrns.co/shoko/kugetsu/compare/v0.1.13...v0.2.0
[v0.1.13]: https://git.fbrns.co/shoko/kugetsu/compare/v0.1.12...v0.1.13
[v0.1.12]: https://git.fbrns.co/shoko/kugetsu/compare/v0.1.11...v0.1.12
[v0.1.11]: https://git.fbrns.co/shoko/kugetsu/compare/v0.1.10...v0.1.11
[v0.1.10]: https://git.fbrns.co/shoko/kugetsu/compare/v0.1.9...v0.1.10
[v0.1.9]: https://git.fbrns.co/shoko/kugetsu/compare/v0.1.8...v0.1.9
[v0.1.8]: https://git.fbrns.co/shoko/kugetsu/compare/v0.1.7...v0.1.8
[v0.1.7]: https://git.fbrns.co/shoko/kugetsu/compare/v0.1.6...v0.1.7
[v0.1.6]: https://git.fbrns.co/shoko/kugetsu/compare/v0.1.5...v0.1.6
[v0.1.5]: https://git.fbrns.co/shoko/kugetsu/compare/v0.1.4...v0.1.5
[v0.1.4]: https://git.fbrns.co/shoko/kugetsu/compare/v0.1.3...v0.1.4
[v0.1.3]: https://git.fbrns.co/shoko/kugetsu/compare/v0.1.2...v0.1.3
[v0.1.2]: https://git.fbrns.co/shoko/kugetsu/compare/v0.1.1...v0.1.2
[v0.1.1]: https://git.fbrns.co/shoko/kugetsu/compare/v0.1.0...v0.1.1
[v0.1.0]: https://git.fbrns.co/shoko/kugetsu/releases/tag/v0.1.0

View File

@@ -50,7 +50,6 @@ A default config file is created during `kugetsu init` with commented examples:
| `KUGETSU_TEMP_DIR` | `~/.local/share/opencode/tool-output` | Temp directory for subagent tool output (useful in headless environments where /tmp is restricted) | | `KUGETSU_TEMP_DIR` | `~/.local/share/opencode/tool-output` | Temp directory for subagent tool output (useful in headless environments where /tmp is restricted) |
| `KUGETSU_VERBOSITY` | `default` | PM agent verbosity level: `verbose`, `default`, or `quiet` | | `KUGETSU_VERBOSITY` | `default` | PM agent verbosity level: `verbose`, `default`, or `quiet` |
| `QUEUE_DAEMON_INTERVAL_MINUTES` | 5 | How often daemon polls queue (in minutes) | | `QUEUE_DAEMON_INTERVAL_MINUTES` | 5 | How often daemon polls queue (in minutes) |
| `QUEUE_DAEMON_BATCH_SIZE` | 2 | How many tasks daemon picks per poll |
| `QUEUE_CLEANUP_AGE_DAYS` | 7 | Auto-cleanup completed/error items older than N days | | `QUEUE_CLEANUP_AGE_DAYS` | 7 | Auto-cleanup completed/error items older than N days |
### Environment Variables for Agents ### Environment Variables for Agents

View File

@@ -23,7 +23,6 @@ QUEUE_DAEMON_PID_FILE="${QUEUE_DAEMON_PID_FILE:-$QUEUE_DIR/daemon.pid}"
QUEUE_DAEMON_LOCK_FILE="${QUEUE_DAEMON_LOCK_FILE:-$QUEUE_DIR/daemon.lock}" QUEUE_DAEMON_LOCK_FILE="${QUEUE_DAEMON_LOCK_FILE:-$QUEUE_DIR/daemon.lock}"
QUEUE_DAEMON_LOG_FILE="${QUEUE_DAEMON_LOG_FILE:-$QUEUE_DIR/daemon.log}" QUEUE_DAEMON_LOG_FILE="${QUEUE_DAEMON_LOG_FILE:-$QUEUE_DIR/daemon.log}"
QUEUE_DAEMON_INTERVAL_MINUTES="${QUEUE_DAEMON_INTERVAL_MINUTES:-5}" QUEUE_DAEMON_INTERVAL_MINUTES="${QUEUE_DAEMON_INTERVAL_MINUTES:-5}"
QUEUE_DAEMON_BATCH_SIZE="${QUEUE_DAEMON_BATCH_SIZE:-2}"
QUEUE_CLEANUP_AGE_DAYS="${QUEUE_CLEANUP_AGE_DAYS:-7}" QUEUE_CLEANUP_AGE_DAYS="${QUEUE_CLEANUP_AGE_DAYS:-7}"
TASK_TIMEOUT_HOURS="${TASK_TIMEOUT_HOURS:-1}" TASK_TIMEOUT_HOURS="${TASK_TIMEOUT_HOURS:-1}"
@@ -63,7 +62,7 @@ count_active_dev_sessions() {
for session_file in "$SESSIONS_DIR"/*.json; do for session_file in "$SESSIONS_DIR"/*.json; do
if [ -f "$session_file" ]; then if [ -f "$session_file" ]; then
local filename=$(basename "$session_file") local filename=$(basename "$session_file")
if [ "$filename" != "base.json" ]; then if [ "$filename" != "base.json" ] && [ "$filename" != "pm-agent.json" ]; then
count=$((count + 1)) count=$((count + 1))
fi fi
fi fi
@@ -532,6 +531,8 @@ with open("$QUEUE_ITEMS_DIR/${queue_id}.json", "w") as f:
print(f"Enqueued: $queue_id") print(f"Enqueued: $queue_id")
PYEOF PYEOF
kugetsu_add_notification "task_queued" "Task queued: $issue_ref" "$issue_ref"
} }
get_pending_tasks() { get_pending_tasks() {
@@ -588,6 +589,7 @@ update_queue_item_state() {
python3 << PYEOF python3 << PYEOF
import json import json
import os
from datetime import datetime from datetime import datetime
item_file = "$item_file" item_file = "$item_file"
@@ -598,6 +600,8 @@ pid = "$pid"
with open(item_file, 'r') as f: with open(item_file, 'r') as f:
item = json.load(f) item = json.load(f)
issue_ref = item.get('issue_ref', '')
item['state'] = new_state item['state'] = new_state
if new_state == "notified": if new_state == "notified":
@@ -608,8 +612,10 @@ if new_state == "notified":
item['pid'] = int(pid) if pid.isdigit() else None item['pid'] = int(pid) if pid.isdigit() else None
elif new_state == "completed": elif new_state == "completed":
item['completed_at'] = datetime.now().isoformat() + "Z" item['completed_at'] = datetime.now().isoformat() + "Z"
os.system(f"kugetsu_add_notification 'task_completed' 'Task completed: {issue_ref}' '{issue_ref}'")
elif new_state == "error": elif new_state == "error":
item['error'] = datetime.now().isoformat() + "Z" item['error'] = datetime.now().isoformat() + "Z"
os.system(f"kugetsu_add_notification 'task_error' 'Task error: {issue_ref}' '{issue_ref}'")
with open(item_file, 'w') as f: with open(item_file, 'w') as f:
json.dump(item, f, indent=2) json.dump(item, f, indent=2)
@@ -1366,22 +1372,15 @@ process_queue() {
fi fi
local available_slots=$((MAX_CONCURRENT_AGENTS - active_count)) local available_slots=$((MAX_CONCURRENT_AGENTS - active_count))
local batch_size=$QUEUE_DAEMON_BATCH_SIZE
[ "$batch_size" -gt "$available_slots" ] && batch_size=$available_slots
if [ "$batch_size" -le 0 ]; then if [ "$available_slots" -le 0 ]; then
return
fi
local pm_session=$(get_pm_agent_session_id)
if [ -z "$pm_session" ] || [ "$pm_session" = "null" ]; then
return return
fi fi
local count=0 local count=0
for item in $(ls -t "$QUEUE_ITEMS_DIR"/*.json 2>/dev/null | head -20); do for item in $(ls -t "$QUEUE_ITEMS_DIR"/*.json 2>/dev/null | head -20); do
[ $count -ge "$available_slots" ] && break
[ -f "$item" ] || continue [ -f "$item" ] || continue
[ $count -ge "$batch_size" ] && break
local state=$(python3 -c "import json; print(json.load(open('$item')).get('state', ''))" 2>/dev/null) local state=$(python3 -c "import json; print(json.load(open('$item')).get('state', ''))" 2>/dev/null)
if [ "$state" != "pending" ]; then if [ "$state" != "pending" ]; then
@@ -1397,25 +1396,41 @@ process_queue() {
fi fi
update_queue_item_state "$queue_id" "notified" update_queue_item_state "$queue_id" "notified"
kugetsu_add_notification "task_dequeued" "Task dequeued: $issue_ref" "$issue_ref"
local log_file="$LOGS_DIR/delegate-${queue_id}.log" local log_file="$LOGS_DIR/delegate-${queue_id}.log"
mkdir -p "$LOGS_DIR" mkdir -p "$LOGS_DIR"
local env_sh="set -a; " local max_retries=3
if [ -f "$ENV_DIR/pm-agent.env" ]; then local attempt=1
env_sh="${env_sh}source '$ENV_DIR/pm-agent.env'; " local success=false
elif [ -f "$ENV_DIR/default.env" ]; then local fork_pid=""
env_sh="${env_sh}source '$ENV_DIR/default.env'; "
while [ $attempt -le $max_retries ]; do
if kugetsu start "$issue_ref" "$message" >> "$log_file" 2>&1; then
success=true
break
fi fi
env_sh="${env_sh}set +a; "
nohup sh -c "${env_sh}opencode run 'Delegate task: ${message}' --continue --session '$pm_session'" >> "$log_file" 2>&1 & echo "Attempt $attempt failed for $queue_id, cleaning up..." >> "$log_file"
local fork_pid=$!
update_queue_item_state "$queue_id" "notified" "" "$fork_pid" local session_file="$(issue_ref_to_filename "$issue_ref").json"
local worktree_path=$(issue_ref_to_worktree_path "$issue_ref" "$PWD")
echo "Queued task $queue_id for PM agent (PID: $fork_pid)" [ -f "$SESSIONS_DIR/$session_file" ] && rm -f "$SESSIONS_DIR/$session_file"
worktree_exists "$issue_ref" "$PWD" && remove_worktree_for_issue "$issue_ref" "$PWD"
remove_issue_from_index "$issue_ref" 2>/dev/null || true
attempt=$((attempt + 1))
done
if [ "$success" = true ]; then
echo "Started task $queue_id: $issue_ref"
count=$((count + 1)) count=$((count + 1))
else
echo "Failed to start task $queue_id after $max_retries attempts"
update_queue_item_state "$queue_id" "pending"
fi
done done
} }
@@ -2061,20 +2076,10 @@ cmd_start() {
create_worktree "$issue_ref" "$parent_dir" create_worktree "$issue_ref" "$parent_dir"
local session_file="$(issue_ref_to_filename "$issue_ref").json" local session_file="$(issue_ref_to_filename "$issue_ref").json"
echo "Forking session for '$issue_ref'..."
# Session-counting: count actual dev sessions, reject if at limit
local active_count=$(count_active_dev_sessions)
if [ "$active_count" -ge "$MAX_CONCURRENT_AGENTS" ]; then
echo "Error: Max concurrent agents ($MAX_CONCURRENT_AGENTS) reached" >&2
echo "Active sessions: $active_count" >&2
remove_worktree_for_issue "$issue_ref" "$parent_dir"
exit 1
fi
local fork_log="$SESSIONS_DIR/$session_file.fork.log" local fork_log="$SESSIONS_DIR/$session_file.fork.log"
local opencode_db="${OPENCODE_DB:-$HOME/.local/share/opencode/opencode.db}" local opencode_db="${OPENCODE_DB:-$HOME/.local/share/opencode/opencode.db}"
local lock_file="$KUGETSU_DIR/.session_lock"
local lock_fd=200
> "$fork_log" > "$fork_log"
@@ -2087,6 +2092,19 @@ ${previous_context}
## YOUR TASK ## YOUR TASK
$message" $message"
(
flock -x $lock_fd
local active_count=$(count_active_dev_sessions)
if [ "$active_count" -ge "$MAX_CONCURRENT_AGENTS" ]; then
echo "Error: Max concurrent agents ($MAX_CONCURRENT_AGENTS) reached" >&2
echo "Active sessions: $active_count" >&2
remove_worktree_for_issue "$issue_ref" "$parent_dir"
exit 1
fi
echo "Forking session for '$issue_ref'..."
fix_session_permissions fix_session_permissions
if [ "$DEBUG_MODE" = true ]; then if [ "$DEBUG_MODE" = true ]; then
@@ -2186,8 +2204,11 @@ PYEOF
kugetsu_context_dump "$issue_ref" "$message" "$branch_name" kugetsu_context_dump "$issue_ref" "$message" "$branch_name"
kugetsu_add_notification "task_started" "Task started: $issue_ref" "$issue_ref"
echo "Session started for '$issue_ref': $new_session_id" echo "Session started for '$issue_ref': $new_session_id"
echo "Worktree: $worktree_path" echo "Worktree: $worktree_path"
) 200>"$lock_file"
} }
cmd_continue() { cmd_continue() {