# Simplify Storage: Replace SQLite with Per-User JSON Files ## Status Proposed ## Background ### What happened The SQLite-based storage layer (`db.py`) introduced several categories of complexity that outweigh its benefits at this stage: 1. **Connection management bugs** — SQLite Python's `row_factory` disables implicit transaction handling. Combined with `PRAGMA foreign_keys = ON`, this caused `ON CONFLICT UPDATE` statements to silently fail to commit. The fix required setting `conn.isolation_level = None` directly on the connection object after creation. These are not obvious behaviors and took significant debugging time. 2. **Test fragility** — The `fresh_db` fixture patches `DB_PATH` but the SQLite connection is a module-level singleton with connection-level state. Tests passed in isolation but failed under pytest's caching, and the root cause was subtle enough to require multiple iterations. 3. **Tracking table complexity** — The `user_bounty_tracking` + `reminder_log` tables with dedup logic add non-trivial query complexity for what is essentially a "bookmark" feature. 4. **Schema migrations** — Any schema change requires a migration script. For a personal bot with 2 users and 50 bounties, this overhead is disproportionate. 5. **Cron/reminder system** — The daily reminder cron (`cron.py`) requires a separate process, scheduler (cron), and `reminder_log` table to prevent duplicate notifications. This is a significant operational surface for a v1. ### Why it happened The current design was over-engineered for the actual usage pattern: - Most commands are stateless (one request → one response) - The user is the primary (and likely only) user - Scale target is 10-100 users, not 10,000+ - The bot is a personal project, not a production service SQLite was chosen for "correctness" but at this scale, the correctness guarantees are irrelevant while the complexity is real. ### Current state The bot works and 53/53 tests pass. But `db.py` is ~300 lines with subtle connection semantics, `schema.sql` defines 7 tables, `cron.py` is a separate process, and the command layer (`commands.py`) is entangled with the DB layer. --- ## Proposal **Replace SQLite with a per-user JSON file storage system.** ### Storage Design ``` data/ └── users/ └── {telegram_user_id}.json # one file per user ``` **File structure (`users/{id}.json`):** ```json { "user_id": 123, "username": "alice", "personal_bounties": [ { "id": 1, "text": "Fix login bug", "link": "https://github.com/example/repo/issues/1", "due_date_ts": 1735689600, "created_at": 1735603200 } ], "tracked_bounties": [ {"bounty_id": 5, "group_id": -1001, "created_at": 1735600000}, {"bounty_id": 3, "group_id": null, "created_at": 1735590000} ] } ``` ### Key design decisions 1. **Single file per user** — No group-level files. Personal bounties live in the creator's file. Group bounties live in the creator's file with `group_id` set. 2. **Bounty IDs are sequential integers per file** — Not global. Each user's file has its own `next_id` counter. This avoids coordination between users at the cost of non-global IDs (acceptable for personal use). 3. **Cross-group tracking** — When Alice (in Group A) tracks a bounty created by Bob in Group B, Alice's file stores `{bounty_id: X, group_id: -100B}`. To display it, the bot loads Bob's file and finds bounty `X`. 4. **No reminders in v1** — Drop the cron/reminder system entirely. The `reminder_log` table and `cron.py` are removed. Reminders can be added back as a v2 feature with a simpler design (e.g., just a "due soon" filter on `/my`). 5. **No admin model in v1** — Drop `group_admins` table. Group bounties are open to anyone in the group to add/edit/delete. The creator can be the only one who can modify (enforced by `created_by_user_id` check). ### Deleted components - `db.py` — removed entirely - `schema.sql` — removed entirely - `cron.py` — removed entirely - `reminder_log` table — removed - `user_bounty_tracking` table — replaced by `tracked_bounties` in user JSON - `groups` table — removed (group_id stored directly in bounty objects) - `group_admins` table — removed (simplified permission model) ### Retained components - `bot.py` — minimal entrypoint - `commands.py` — command parsing and reply logic (simplified) - `tests/` — simplified to match new data model --- ## Implementation Plan ### Phase 1: Data model + storage layer 1. Create `storage.py` with: - `get_user_path(user_id)` — returns `Path` to user's JSON - `load_user(user_id)` — reads and parses JSON, returns dict, creates file if missing - `save_user(user_id, data)` — writes JSON atomically (temp file + rename) - `next_bounty_id(user_id)` — increments and returns next ID for that user's file 2. No locking needed at v1 scale. `tempfile` + `rename` gives atomic writes. ### Phase 2: Rewrite commands.py Simplified command set: | Command | Where | Who | Description | |---|---|---|---| | `/bounty` | Group / DM | Anyone | List all bounties (group-scoped in group, personal in DM) | | `/add [link] [due>` | Group | Anyone | Add bounty to group | | `/add [link] [due>` | DM | Anyone | Add personal bounty | | `/edit [text] [link] [due>` | Group | Creator only | Edit bounty | | `/edit [text] [link] [due>` | DM | Creator only | Edit personal bounty | | `/delete ` | Group | Creator only | Delete bounty | | `/delete ` | DM | Creator only | Delete personal bounty | | `/track ` | Group | Anyone | Track a group bounty | | `/untrack ` | Group | Anyone | Untrack a bounty | | `/my` | Group | Anyone | Show tracked group bounties | | `/my` | DM | Anyone | Show tracked personal bounties | | `/start` | Anywhere | Anyone | Re-initialize user | | `/help` | Anywhere | Anyone | Show help | **Removed commands:** - `/admin_add`, `/admin_remove` — no admin model in v1 - Reminder-related logic — no cron in v1 ### Phase 3: Simplify bot.py - Remove `Application.post_init` setup (no DB init needed) - Bot starts instantly — JSON files created on first use - No migration logic ### Phase 4: Rewrite tests - `test_commands.py` — keep (parsing is unchanged) - `test_storage.py` — new, tests `load_user`, `save_user`, `next_bounty_id` - Remove all DB-dependent tests (`test_db.py` deleted) ### Phase 5: Cleanup - Delete `db.py`, `schema.sql`, `cron.py`, `test_db.py` - Delete `requirements-dev.txt` (dev deps in `pyproject.toml`) - Update README to reflect simplified commands --- ## Estimated effort - Storage layer: ~80 lines - Commands rewrite: ~200 lines (simpler than current) - Tests: ~100 lines - Cleanup: trivial Total: ~1 day of work for one person. --- ## When to revert to SQLite If any of these become true, SQLite is the right choice: - Multiple concurrent users with write conflicts - Need for complex queries (across all users, aggregations, etc.) - Reminder system with proper deduplication - Scale target > 1,000 users - Need for ACID guarantees on concurrent writes For a personal bot with < 100 users, JSON files are the right default.