ADR-005: SQLite Response Cache¶

Date: 2026-03-14 Status: Accepted

Context¶

API calls to academic databases are: - Slow (rate-limited, network latency) - Potentially costly (OpenAlex may start charging) - Repetitive (same queries during iterative development/testing) - Not always available (offline work, API downtime)

Need a caching layer that works transparently across all source adapters.

Options Considered¶

No cache — always hit the API
Pro: simplest, always fresh data
Con: slow, wasteful, breaks offline work
File-based cache — one JSON file per request, hashed filename
Pro: simple, inspectable, no database dependency
Con: many small files, no expiry management, filesystem overhead
SQLite cache — keyed by (base_url, path, params) with TTL
Pro: single file, stdlib, fast lookups, easy expiry, atomic writes
Con: slightly more complex than flat files
DuckDB cache — use DuckDB for both cache and graph
Pro: single database technology
Con: adds DuckDB as hard dependency even for basic search, overkill for key-value

Decision¶

SQLite for the response cache. Located at ~/.cache/litseer/responses.db.

Rationale¶

SQLite is in the Python stdlib — zero additional dependencies for basic search use
The cache is a simple key-value store with TTL — no analytical queries needed
Keeping cache as SQLite means users who only want basic search don't need DuckDB installed
7-day TTL default balances freshness vs API savings (academic data doesn't change fast)
Cache integrates at the AsyncClient.get_json() level so all adapters benefit automatically
--no-cache flag and litseer cache stats/clear commands for user control

Consequences¶

Two database files: responses.db (SQLite, ephemeral) and graph.db (DuckDB, permanent)
Cache key is SHA-256 of (base_url, path, sorted params) — deterministic
Successful responses only are cached (errors are not cached)
WAL mode for concurrent read/write safety
The full OpenAlex snapshot idea (300GB+ into Neo4j) is permanently shelved — the cache is the practical middle ground