Files
jigaido/.github/ISSUE_TEMPLATE/v2-simplify-storage.md
2026-04-01 09:53:38 +00:00

7.0 KiB

Simplify Storage: Replace SQLite with Per-User JSON Files

Status

Proposed

Background

What happened

The SQLite-based storage layer (db.py) introduced several categories of complexity that outweigh its benefits at this stage:

  1. Connection management bugs — SQLite Python's row_factory disables implicit transaction handling. Combined with PRAGMA foreign_keys = ON, this caused ON CONFLICT UPDATE statements to silently fail to commit. The fix required setting conn.isolation_level = None directly on the connection object after creation. These are not obvious behaviors and took significant debugging time.

  2. Test fragility — The fresh_db fixture patches DB_PATH but the SQLite connection is a module-level singleton with connection-level state. Tests passed in isolation but failed under pytest's caching, and the root cause was subtle enough to require multiple iterations.

  3. Tracking table complexity — The user_bounty_tracking + reminder_log tables with dedup logic add non-trivial query complexity for what is essentially a "bookmark" feature.

  4. Schema migrations — Any schema change requires a migration script. For a personal bot with 2 users and 50 bounties, this overhead is disproportionate.

  5. Cron/reminder system — The daily reminder cron (cron.py) requires a separate process, scheduler (cron), and reminder_log table to prevent duplicate notifications. This is a significant operational surface for a v1.

Why it happened

The current design was over-engineered for the actual usage pattern:

  • Most commands are stateless (one request → one response)
  • The user is the primary (and likely only) user
  • Scale target is 10-100 users, not 10,000+
  • The bot is a personal project, not a production service

SQLite was chosen for "correctness" but at this scale, the correctness guarantees are irrelevant while the complexity is real.

Current state

The bot works and 53/53 tests pass. But db.py is ~300 lines with subtle connection semantics, schema.sql defines 7 tables, cron.py is a separate process, and the command layer (commands.py) is entangled with the DB layer.


Proposal

Replace SQLite with a per-user JSON file storage system.

Storage Design

data/
└── users/
    └── {telegram_user_id}.json    # one file per user

File structure (users/{id}.json):

{
  "user_id": 123,
  "username": "alice",
  "personal_bounties": [
    {
      "id": 1,
      "text": "Fix login bug",
      "link": "https://github.com/example/repo/issues/1",
      "due_date_ts": 1735689600,
      "created_at": 1735603200
    }
  ],
  "tracked_bounties": [
    {"bounty_id": 5, "group_id": -1001, "created_at": 1735600000},
    {"bounty_id": 3, "group_id": null, "created_at": 1735590000}
  ]
}

Key design decisions

  1. Single file per user — No group-level files. Personal bounties live in the creator's file. Group bounties live in the creator's file with group_id set.

  2. Bounty IDs are sequential integers per file — Not global. Each user's file has its own next_id counter. This avoids coordination between users at the cost of non-global IDs (acceptable for personal use).

  3. Cross-group tracking — When Alice (in Group A) tracks a bounty created by Bob in Group B, Alice's file stores {bounty_id: X, group_id: -100B}. To display it, the bot loads Bob's file and finds bounty X.

  4. No reminders in v1 — Drop the cron/reminder system entirely. The reminder_log table and cron.py are removed. Reminders can be added back as a v2 feature with a simpler design (e.g., just a "due soon" filter on /my).

  5. No admin model in v1 — Drop group_admins table. Group bounties are open to anyone in the group to add/edit/delete. The creator can be the only one who can modify (enforced by created_by_user_id check).

Deleted components

  • db.py — removed entirely
  • schema.sql — removed entirely
  • cron.py — removed entirely
  • reminder_log table — removed
  • user_bounty_tracking table — replaced by tracked_bounties in user JSON
  • groups table — removed (group_id stored directly in bounty objects)
  • group_admins table — removed (simplified permission model)

Retained components

  • bot.py — minimal entrypoint
  • commands.py — command parsing and reply logic (simplified)
  • tests/ — simplified to match new data model

Implementation Plan

Phase 1: Data model + storage layer

  1. Create storage.py with:

    • get_user_path(user_id) — returns Path to user's JSON
    • load_user(user_id) — reads and parses JSON, returns dict, creates file if missing
    • save_user(user_id, data) — writes JSON atomically (temp file + rename)
    • next_bounty_id(user_id) — increments and returns next ID for that user's file
  2. No locking needed at v1 scale. tempfile + rename gives atomic writes.

Phase 2: Rewrite commands.py

Simplified command set:

Command Where Who Description
/bounty Group / DM Anyone List all bounties (group-scoped in group, personal in DM)
/add <text> [link] [due> Group Anyone Add bounty to group
/add <text> [link] [due> DM Anyone Add personal bounty
/edit <id> [text] [link] [due> Group Creator only Edit bounty
/edit <id> [text] [link] [due> DM Creator only Edit personal bounty
/delete <id> Group Creator only Delete bounty
/delete <id> DM Creator only Delete personal bounty
/track <id> Group Anyone Track a group bounty
/untrack <id> Group Anyone Untrack a bounty
/my Group Anyone Show tracked group bounties
/my DM Anyone Show tracked personal bounties
/start Anywhere Anyone Re-initialize user
/help Anywhere Anyone Show help

Removed commands:

  • /admin_add, /admin_remove — no admin model in v1
  • Reminder-related logic — no cron in v1

Phase 3: Simplify bot.py

  • Remove Application.post_init setup (no DB init needed)
  • Bot starts instantly — JSON files created on first use
  • No migration logic

Phase 4: Rewrite tests

  • test_commands.py — keep (parsing is unchanged)
  • test_storage.py — new, tests load_user, save_user, next_bounty_id
  • Remove all DB-dependent tests (test_db.py deleted)

Phase 5: Cleanup

  • Delete db.py, schema.sql, cron.py, test_db.py
  • Delete requirements-dev.txt (dev deps in pyproject.toml)
  • Update README to reflect simplified commands

Estimated effort

  • Storage layer: ~80 lines
  • Commands rewrite: ~200 lines (simpler than current)
  • Tests: ~100 lines
  • Cleanup: trivial

Total: ~1 day of work for one person.


When to revert to SQLite

If any of these become true, SQLite is the right choice:

  • Multiple concurrent users with write conflicts
  • Need for complex queries (across all users, aggregations, etc.)
  • Reminder system with proper deduplication
  • Scale target > 1,000 users
  • Need for ACID guarantees on concurrent writes

For a personal bot with < 100 users, JSON files are the right default.