shoko/jigaido

Fork 0

Files

shokollm 8647f2f4b8 Add issue template for v2 storage simplification

2026-04-01 09:53:38 +00:00

7.0 KiB

Raw Blame History

Simplify Storage: Replace SQLite with Per-User JSON Files

Status

Proposed

Background

What happened

The SQLite-based storage layer (db.py) introduced several categories of complexity that outweigh its benefits at this stage:

Connection management bugs — SQLite Python's row_factory disables implicit transaction handling. Combined with PRAGMA foreign_keys = ON, this caused ON CONFLICT UPDATE statements to silently fail to commit. The fix required setting conn.isolation_level = None directly on the connection object after creation. These are not obvious behaviors and took significant debugging time.
Test fragility — The fresh_db fixture patches DB_PATH but the SQLite connection is a module-level singleton with connection-level state. Tests passed in isolation but failed under pytest's caching, and the root cause was subtle enough to require multiple iterations.
Tracking table complexity — The user_bounty_tracking + reminder_log tables with dedup logic add non-trivial query complexity for what is essentially a "bookmark" feature.
Schema migrations — Any schema change requires a migration script. For a personal bot with 2 users and 50 bounties, this overhead is disproportionate.
Cron/reminder system — The daily reminder cron (cron.py) requires a separate process, scheduler (cron), and reminder_log table to prevent duplicate notifications. This is a significant operational surface for a v1.

Why it happened

The current design was over-engineered for the actual usage pattern:

Most commands are stateless (one request → one response)
The user is the primary (and likely only) user
Scale target is 10-100 users, not 10,000+
The bot is a personal project, not a production service

SQLite was chosen for "correctness" but at this scale, the correctness guarantees are irrelevant while the complexity is real.

Current state

The bot works and 53/53 tests pass. But db.py is ~300 lines with subtle connection semantics, schema.sql defines 7 tables, cron.py is a separate process, and the command layer (commands.py) is entangled with the DB layer.

Proposal

Replace SQLite with a per-user JSON file storage system.

Storage Design

data/
└── users/
    └── {telegram_user_id}.json    # one file per user

File structure (users/{id}.json):

{
  "user_id": 123,
  "username": "alice",
  "personal_bounties": [
    {
      "id": 1,
      "text": "Fix login bug",
      "link": "https://github.com/example/repo/issues/1",
      "due_date_ts": 1735689600,
      "created_at": 1735603200
    }
  ],
  "tracked_bounties": [
    {"bounty_id": 5, "group_id": -1001, "created_at": 1735600000},
    {"bounty_id": 3, "group_id": null, "created_at": 1735590000}
  ]
}

Key design decisions

Single file per user — No group-level files. Personal bounties live in the creator's file. Group bounties live in the creator's file with group_id set.
Bounty IDs are sequential integers per file — Not global. Each user's file has its own next_id counter. This avoids coordination between users at the cost of non-global IDs (acceptable for personal use).
Cross-group tracking — When Alice (in Group A) tracks a bounty created by Bob in Group B, Alice's file stores {bounty_id: X, group_id: -100B}. To display it, the bot loads Bob's file and finds bounty X.
No reminders in v1 — Drop the cron/reminder system entirely. The reminder_log table and cron.py are removed. Reminders can be added back as a v2 feature with a simpler design (e.g., just a "due soon" filter on /my).
No admin model in v1 — Drop group_admins table. Group bounties are open to anyone in the group to add/edit/delete. The creator can be the only one who can modify (enforced by created_by_user_id check).

Deleted components

db.py — removed entirely
schema.sql — removed entirely
cron.py — removed entirely
reminder_log table — removed
user_bounty_tracking table — replaced by tracked_bounties in user JSON
groups table — removed (group_id stored directly in bounty objects)
group_admins table — removed (simplified permission model)

Retained components

bot.py — minimal entrypoint
commands.py — command parsing and reply logic (simplified)
tests/ — simplified to match new data model

Implementation Plan

Phase 1: Data model + storage layer

Create storage.py with:
- get_user_path(user_id) — returns Path to user's JSON
- load_user(user_id) — reads and parses JSON, returns dict, creates file if missing
- save_user(user_id, data) — writes JSON atomically (temp file + rename)
- next_bounty_id(user_id) — increments and returns next ID for that user's file
No locking needed at v1 scale. tempfile + rename gives atomic writes.

Phase 2: Rewrite commands.py

Simplified command set:

Command	Where	Who	Description
`/bounty`	Group / DM	Anyone	List all bounties (group-scoped in group, personal in DM)
`/add <text> [link] [due>`	Group	Anyone	Add bounty to group
`/add <text> [link] [due>`	DM	Anyone	Add personal bounty
`/edit <id> [text] [link] [due>`	Group	Creator only	Edit bounty
`/edit <id> [text] [link] [due>`	DM	Creator only	Edit personal bounty
`/delete <id>`	Group	Creator only	Delete bounty
`/delete <id>`	DM	Creator only	Delete personal bounty
`/track <id>`	Group	Anyone	Track a group bounty
`/untrack <id>`	Group	Anyone	Untrack a bounty
`/my`	Group	Anyone	Show tracked group bounties
`/my`	DM	Anyone	Show tracked personal bounties
`/start`	Anywhere	Anyone	Re-initialize user
`/help`	Anywhere	Anyone	Show help

Removed commands:

/admin_add, /admin_remove — no admin model in v1
Reminder-related logic — no cron in v1

Phase 3: Simplify bot.py

Remove Application.post_init setup (no DB init needed)
Bot starts instantly — JSON files created on first use
No migration logic

Phase 4: Rewrite tests

test_commands.py — keep (parsing is unchanged)
test_storage.py — new, tests load_user, save_user, next_bounty_id
Remove all DB-dependent tests (test_db.py deleted)

Phase 5: Cleanup

Delete db.py, schema.sql, cron.py, test_db.py
Delete requirements-dev.txt (dev deps in pyproject.toml)
Update README to reflect simplified commands

Estimated effort

Storage layer: ~80 lines
Commands rewrite: ~200 lines (simpler than current)
Tests: ~100 lines
Cleanup: trivial

Total: ~1 day of work for one person.

When to revert to SQLite

If any of these become true, SQLite is the right choice:

Multiple concurrent users with write conflicts
Need for complex queries (across all users, aggregations, etc.)
Reminder system with proper deduplication
Scale target > 1,000 users
Need for ACID guarantees on concurrent writes

For a personal bot with < 100 users, JSON files are the right default.

7.0 KiB Raw Blame History