7.6 KiB
Simplify Storage: Replace SQLite with Per-User JSON Files
Status
Proposed
Background
What happened
The SQLite-based storage layer (db.py) introduced several categories of complexity that outweigh its benefits at this stage:
-
Connection management bugs — SQLite Python's
row_factorydisables implicit transaction handling. Combined withPRAGMA foreign_keys = ON, this causedON CONFLICT UPDATEstatements to silently fail to commit. The fix required settingconn.isolation_level = Nonedirectly on the connection object after creation. These are not obvious behaviors and took significant debugging time. -
Test fragility — The
fresh_dbfixture patchesDB_PATHbut the SQLite connection is a module-level singleton with connection-level state. Tests passed in isolation but failed under pytest's caching, and the root cause was subtle enough to require multiple iterations. -
Tracking table complexity — The
user_bounty_tracking+reminder_logtables with dedup logic add non-trivial query complexity for what is essentially a "bookmark" feature. -
Schema migrations — Any schema change requires a migration script. For a personal bot with 2 users and 50 bounties, this overhead is disproportionate.
-
Cron/reminder system — The daily reminder cron (
cron.py) requires a separate process, scheduler (cron), andreminder_logtable to prevent duplicate notifications. This is a significant operational surface for a v1.
Why it happened
The current design was over-engineered for the actual usage pattern:
- Most commands are stateless (one request → one response)
- The user is the primary (and likely only) user
- Scale target is 10-100 users, not 10,000+
- The bot is a personal project, not a production service
SQLite was chosen for "correctness" but at this scale, the correctness guarantees are irrelevant while the complexity is real.
Current state
The bot works and 53/53 tests pass. But db.py is ~300 lines with subtle connection semantics, schema.sql defines 7 tables, cron.py is a separate process, and the command layer (commands.py) is entangled with the DB layer.
Proposal
Replace SQLite with a JSON file storage system — one directory per group or DM user.
Storage Design
data/
├── {group_id}/
│ ├── group.json # group bounties (all bounties in this group)
│ └── {user_id}.json # user tracking within this group (which bounty IDs they track)
└── {user_id}/
└── user.json # user's personal bounties (DM — only this user)
Bot context lookup:
| Context | Entry point |
|---|---|
In group (chat_id = -100123) |
data/-100123/group.json |
In DM (chat_id = 123) |
data/123/user.json |
File: data/{group_id}/group.json — group bounties:
{
"group_id": -100123,
"bounties": [
{
"id": 1,
"created_by_user_id": 456,
"text": "Fix login bug",
"link": "https://github.com/example/repo/issues/1",
"due_date_ts": 1735689600,
"created_at": 1735603200
}
]
}
File: data/{group_id}/{user_id}.json — user tracking in a group:
{
"user_id": 456,
"tracked": [1, 5, 9]
}
File: data/{user_id}/user.json — user's personal bounties (DM):
{
"user_id": 123,
"bounties": [
{
"id": 1,
"text": "Fix login bug",
"link": "https://github.com/example/repo/issues/1",
"due_date_ts": 1735689600,
"created_at": 1735603200
}
]
}
Key design decisions
-
Group/DM as directory —
chat_idis the gateway. Group →data/{group_id}/group.json. DM →data/{user_id}/user.json. No scanning needed. -
Tracking is per-group-per-user —
data/{group_id}/{user_id}.jsonstores the list of bounty IDs this user tracks in this group. Simple, isolated. -
No cross-group access — Group bounties live only in that group's file. A member of Group A cannot see or track Group B's bounties.
-
Bounty IDs are sequential integers per group — Not global. Each
group.jsonhas its ownnext_idcounter. -
No reminders in v1 — Drop the cron/reminder system entirely. The
reminder_logtable andcron.pyare removed. -
No admin model in v1 — Anyone in the group can add bounties. Only the bounty creator can edit/delete (enforced by
created_by_user_idcheck).
Deleted components
db.py— removed entirelyschema.sql— removed entirelycron.py— removed entirelyreminder_logtable — removeduser_bounty_trackingtable — replaced bytracked_bountiesin user JSONgroupstable — removed (group_id stored directly in bounty objects)group_adminstable — removed (simplified permission model)
Retained components
bot.py— minimal entrypointcommands.py— command parsing and reply logic (simplified)tests/— simplified to match new data model
Implementation Plan
Phase 1: Data model + storage layer
-
Create
storage.pywith:get_user_path(user_id)— returnsPathto user's JSONload_user(user_id)— reads and parses JSON, returns dict, creates file if missingsave_user(user_id, data)— writes JSON atomically (temp file + rename)next_bounty_id(user_id)— increments and returns next ID for that user's file
-
No locking needed at v1 scale.
tempfile+renamegives atomic writes.
Phase 2: Rewrite commands.py
Simplified command set:
| Command | Where | Who | Description |
|---|---|---|---|
/bounty |
Group / DM | Anyone | List all bounties (group-scoped in group, personal in DM) |
/add <text> [link] [due> |
Group | Anyone | Add bounty to group |
/add <text> [link] [due> |
DM | Anyone | Add personal bounty |
/edit <id> [text] [link] [due> |
Group | Creator only | Edit bounty |
/edit <id> [text] [link] [due> |
DM | Creator only | Edit personal bounty |
/delete <id> |
Group | Creator only | Delete bounty |
/delete <id> |
DM | Creator only | Delete personal bounty |
/track <id> |
Group | Anyone | Track a group bounty |
/untrack <id> |
Group | Anyone | Untrack a bounty |
/my |
Group | Anyone | Show tracked group bounties |
/my |
DM | Anyone | Show tracked personal bounties |
/start |
Anywhere | Anyone | Re-initialize user |
/help |
Anywhere | Anyone | Show help |
Removed commands:
/admin_add,/admin_remove— no admin model in v1- Reminder-related logic — no cron in v1
Phase 3: Simplify bot.py
- Remove
Application.post_initsetup (no DB init needed) - Bot starts instantly — JSON files created on first use
- No migration logic
Phase 4: Rewrite tests
test_commands.py— keep (parsing is unchanged)test_storage.py— new, testsload_user,save_user,next_bounty_id- Remove all DB-dependent tests (
test_db.pydeleted)
Phase 5: Cleanup
- Delete
db.py,schema.sql,cron.py,test_db.py - Delete
requirements-dev.txt(dev deps inpyproject.toml) - Update README to reflect simplified commands
Estimated effort
- Storage layer: ~80 lines
- Commands rewrite: ~200 lines (simpler than current)
- Tests: ~100 lines
- Cleanup: trivial
Total: ~1 day of work for one person.
When to revert to SQLite
If any of these become true, SQLite is the right choice:
- Multiple concurrent users with write conflicts
- Need for complex queries (across all users, aggregations, etc.)
- Reminder system with proper deduplication
- Scale target > 1,000 users
- Need for ACID guarantees on concurrent writes
For a personal bot with < 100 users, JSON files are the right default.