Skip to content

ADR-005: SQLite Response Cache

Date: 2026-03-14 Status: Accepted

Context

API calls to academic databases are: - Slow (rate-limited, network latency) - Potentially costly (OpenAlex may start charging) - Repetitive (same queries during iterative development/testing) - Not always available (offline work, API downtime)

Need a caching layer that works transparently across all source adapters.

Options Considered

  1. No cache — always hit the API
  2. Pro: simplest, always fresh data
  3. Con: slow, wasteful, breaks offline work

  4. File-based cache — one JSON file per request, hashed filename

  5. Pro: simple, inspectable, no database dependency
  6. Con: many small files, no expiry management, filesystem overhead

  7. SQLite cache — keyed by (base_url, path, params) with TTL

  8. Pro: single file, stdlib, fast lookups, easy expiry, atomic writes
  9. Con: slightly more complex than flat files

  10. DuckDB cache — use DuckDB for both cache and graph

  11. Pro: single database technology
  12. Con: adds DuckDB as hard dependency even for basic search, overkill for key-value

Decision

SQLite for the response cache. Located at ~/.cache/litseer/responses.db.

Rationale

  • SQLite is in the Python stdlib — zero additional dependencies for basic search use
  • The cache is a simple key-value store with TTL — no analytical queries needed
  • Keeping cache as SQLite means users who only want basic search don't need DuckDB installed
  • 7-day TTL default balances freshness vs API savings (academic data doesn't change fast)
  • Cache integrates at the AsyncClient.get_json() level so all adapters benefit automatically
  • --no-cache flag and litseer cache stats/clear commands for user control

Consequences

  • Two database files: responses.db (SQLite, ephemeral) and graph.db (DuckDB, permanent)
  • Cache key is SHA-256 of (base_url, path, sorted params) — deterministic
  • Successful responses only are cached (errors are not cached)
  • WAL mode for concurrent read/write safety
  • The full OpenAlex snapshot idea (300GB+ into Neo4j) is permanently shelved — the cache is the practical middle ground