jigaido/.github/ISSUE_TEMPLATE/v2-simplify-storage.md

# Simplify Storage: Replace SQLite with Per-User JSON Files

## Status
Proposed

## Background

### What happened

The SQLite-based storage layer (`db.py`) introduced several categories of complexity that outweigh its benefits at this stage:

1. **Connection management bugs** — SQLite Python's `row_factory` disables implicit transaction handling. Combined with `PRAGMA foreign_keys = ON`, this caused `ON CONFLICT UPDATE` statements to silently fail to commit. The fix required setting `conn.isolation_level = None` directly on the connection object after creation. These are not obvious behaviors and took significant debugging time.

2. **Test fragility** — The `fresh_db` fixture patches `DB_PATH` but the SQLite connection is a module-level singleton with connection-level state. Tests passed in isolation but failed under pytest's caching, and the root cause was subtle enough to require multiple iterations.

3. **Tracking table complexity** — The `user_bounty_tracking` + `reminder_log` tables with dedup logic add non-trivial query complexity for what is essentially a "bookmark" feature.

4. **Schema migrations** — Any schema change requires a migration script. For a personal bot with 2 users and 50 bounties, this overhead is disproportionate.

5. **Cron/reminder system** — The daily reminder cron (`cron.py`) requires a separate process, scheduler (cron), and `reminder_log` table to prevent duplicate notifications. This is a significant operational surface for a v1.

### Why it happened

The current design was over-engineered for the actual usage pattern:
- Most commands are stateless (one request → one response)
- The user is the primary (and likely only) user
- Scale target is 10-100 users, not 10,000+
- The bot is a personal project, not a production service

SQLite was chosen for "correctness" but at this scale, the correctness guarantees are irrelevant while the complexity is real.

### Current state

The bot works and 53/53 tests pass. But `db.py` is ~300 lines with subtle connection semantics, `schema.sql` defines 7 tables, `cron.py` is a separate process, and the command layer (`commands.py`) is entangled with the DB layer.

---

## Proposal

**Replace SQLite with a JSON file storage system — one directory per group or DM user.**

### Storage Design

```
data/
├── {group_id}/
│   ├── group.json           # group bounties (all bounties in this group)
│   └── {user_id}.json       # user tracking within this group (which bounty IDs they track)
└── {user_id}/
    └── user.json             # user's personal bounties (DM — only this user)
```

**Bot context lookup:**

| Context | Entry point |
|---|---|
| In group (`chat_id = -100123`) | `data/-100123/group.json` |
| In DM (`chat_id = 123`) | `data/123/user.json` |

**File: `data/{group_id}/group.json`** — group bounties:
```json
{
  "group_id": -100123,
  "bounties": [
    {
      "id": 1,
      "created_by_user_id": 456,
      "text": "Fix login bug",
      "link": "https://github.com/example/repo/issues/1",
      "due_date_ts": 1735689600,
      "created_at": 1735603200
    }
  ]
}
```

**File: `data/{group_id}/{user_id}.json`** — user tracking in a group:
```json
{
  "user_id": 456,
  "tracked": [1, 5, 9]
}
```

**File: `data/{user_id}/user.json`** — user's personal bounties (DM):
```json
{
  "user_id": 123,
  "bounties": [
    {
      "id": 1,
      "text": "Fix login bug",
      "link": "https://github.com/example/repo/issues/1",
      "due_date_ts": 1735689600,
      "created_at": 1735603200
    }
  ]
}
```

### Key design decisions

1. **Group/DM as directory** — `chat_id` is the gateway. Group → `data/{group_id}/group.json`. DM → `data/{user_id}/user.json`. No scanning needed.

2. **Tracking is per-group-per-user** — `data/{group_id}/{user_id}.json` stores the list of bounty IDs this user tracks in this group. Simple, isolated.

3. **No cross-group access** — Group bounties live only in that group's file. A member of Group A cannot see or track Group B's bounties.

4. **Bounty IDs are sequential integers per group** — Not global. Each `group.json` has its own `next_id` counter.

5. **No reminders in v1** — Drop the cron/reminder system entirely. The `reminder_log` table and `cron.py` are removed.

6. **No admin model in v1** — Anyone in the group can add bounties. Only the bounty creator can edit/delete (enforced by `created_by_user_id` check).

### Deleted components

- `db.py` — removed entirely
- `schema.sql` — removed entirely
- `cron.py` — removed entirely
- `reminder_log` table — removed
- `user_bounty_tracking` table — replaced by `tracked_bounties` in user JSON
- `groups` table — removed (group_id stored directly in bounty objects)
- `group_admins` table — removed (simplified permission model)

### Retained components

- `bot.py` — minimal entrypoint
- `commands.py` — command parsing and reply logic (simplified)
- `tests/` — simplified to match new data model

---

## Implementation Plan

### Phase 1: Data model + storage layer

1. Create `storage.py` with:
   - `get_user_path(user_id)` — returns `Path` to user's JSON
   - `load_user(user_id)` — reads and parses JSON, returns dict, creates file if missing
   - `save_user(user_id, data)` — writes JSON atomically (temp file + rename)
   - `next_bounty_id(user_id)` — increments and returns next ID for that user's file

2. No locking needed at v1 scale. `tempfile` + `rename` gives atomic writes.

### Phase 2: Rewrite commands.py

Simplified command set:

| Command | Where | Who | Description |
|---|---|---|---|
| `/bounty` | Group / DM | Anyone | List all bounties (group-scoped in group, personal in DM) |
| `/add <text> [link] [due>` | Group | Anyone | Add bounty to group |
| `/add <text> [link] [due>` | DM | Anyone | Add personal bounty |
| `/edit <id> [text] [link] [due>` | Group | Creator only | Edit bounty |
| `/edit <id> [text] [link] [due>` | DM | Creator only | Edit personal bounty |
| `/delete <id>` | Group | Creator only | Delete bounty |
| `/delete <id>` | DM | Creator only | Delete personal bounty |
| `/track <id>` | Group | Anyone | Track a group bounty |
| `/untrack <id>` | Group | Anyone | Untrack a bounty |
| `/my` | Group | Anyone | Show tracked group bounties |
| `/my` | DM | Anyone | Show tracked personal bounties |
| `/start` | Anywhere | Anyone | Re-initialize user |
| `/help` | Anywhere | Anyone | Show help |

**Removed commands:**
- `/admin_add`, `/admin_remove` — no admin model in v1
- Reminder-related logic — no cron in v1

### Phase 3: Simplify bot.py

- Remove `Application.post_init` setup (no DB init needed)
- Bot starts instantly — JSON files created on first use
- No migration logic

### Phase 4: Rewrite tests

- `test_commands.py` — keep (parsing is unchanged)
- `test_storage.py` — new, tests `load_user`, `save_user`, `next_bounty_id`
- Remove all DB-dependent tests (`test_db.py` deleted)

### Phase 5: Cleanup

- Delete `db.py`, `schema.sql`, `cron.py`, `test_db.py`
- Delete `requirements-dev.txt` (dev deps in `pyproject.toml`)
- Update README to reflect simplified commands

---

## Estimated effort

- Storage layer: ~80 lines
- Commands rewrite: ~200 lines (simpler than current)
- Tests: ~100 lines
- Cleanup: trivial

Total: ~1 day of work for one person.

---

## When to revert to SQLite

If any of these become true, SQLite is the right choice:
- Multiple concurrent users with write conflicts
- Need for complex queries (across all users, aggregations, etc.)
- Reminder system with proper deduplication
- Scale target > 1,000 users
- Need for ACID guarantees on concurrent writes

For a personal bot with < 100 users, JSON files are the right default.