ADR-004: No Built-in LLM Calls¶

Date: 2026-03-14 Status: Accepted

Context¶

LLMs could add value to literature review in many ways: - Summarizing papers - Scoring relevance - Extracting key findings - Generating review narratives - Classifying papers by methodology

However, baking LLM calls into the tool has significant downsides.

Options Considered¶

Built-in LLM integration — call OpenAI/Anthropic APIs for summarization, relevance scoring
Pro: powerful features out of the box
Con: API costs, non-deterministic, hard to verify, vendor lock-in, requires API keys
LLM-ready structured output — produce clean JSON/markdown that LLMs can consume externally
Pro: deterministic core, user chooses their own LLM, verifiable data pipeline
Con: requires external tooling for LLM-powered features
Optional LLM plugin — core is LLM-free, optional module adds LLM features
Pro: best of both worlds
Con: more complexity, still non-deterministic when enabled

Decision¶

Option 2: No built-in LLM calls. All output formats designed for external LLM consumption.

Rationale¶

Determinism: Same search config should produce the same results every time. LLM outputs are inherently non-deterministic.
Verifiability: Every claim in a literature review should trace to a specific paper and DOI. LLM summaries can hallucinate.
Cost control: API calls add up. The user should control when and how they spend on LLM inference.
Flexibility: The user can use Claude, GPT, Llama, or any future model. No vendor lock-in.
Separation of concerns: Litseer is a data pipeline. LLM analysis is a downstream consumer.
Academic rigor: Researchers need to cite sources, not AI summaries. The tool should make citing easy, not replace it.

The tool compensates by making outputs maximally structured and machine-readable: - JSON with full metadata for every paper - Citation graph export (JSON, DOT, GraphML) - Structured markdown with DOI links - BibTeX for direct use in papers

Consequences¶

Reference parsing (v0.3) must be rule-based (regex + heuristics), not LLM-powered
Quality classification uses deterministic tier rules, not LLM judgment
Users who want LLM features pipe litseer output to their own LLM toolchain
Documentation should include examples of how to use litseer output with LLMs
May revisit with an optional plugin system if there's strong demand (would be a new ADR)