ADR-004: No Built-in LLM Calls¶
Date: 2026-03-14 Status: Accepted
Context¶
LLMs could add value to literature review in many ways: - Summarizing papers - Scoring relevance - Extracting key findings - Generating review narratives - Classifying papers by methodology
However, baking LLM calls into the tool has significant downsides.
Options Considered¶
- Built-in LLM integration — call OpenAI/Anthropic APIs for summarization, relevance scoring
- Pro: powerful features out of the box
-
Con: API costs, non-deterministic, hard to verify, vendor lock-in, requires API keys
-
LLM-ready structured output — produce clean JSON/markdown that LLMs can consume externally
- Pro: deterministic core, user chooses their own LLM, verifiable data pipeline
-
Con: requires external tooling for LLM-powered features
-
Optional LLM plugin — core is LLM-free, optional module adds LLM features
- Pro: best of both worlds
- Con: more complexity, still non-deterministic when enabled
Decision¶
Option 2: No built-in LLM calls. All output formats designed for external LLM consumption.
Rationale¶
- Determinism: Same search config should produce the same results every time. LLM outputs are inherently non-deterministic.
- Verifiability: Every claim in a literature review should trace to a specific paper and DOI. LLM summaries can hallucinate.
- Cost control: API calls add up. The user should control when and how they spend on LLM inference.
- Flexibility: The user can use Claude, GPT, Llama, or any future model. No vendor lock-in.
- Separation of concerns: Litseer is a data pipeline. LLM analysis is a downstream consumer.
- Academic rigor: Researchers need to cite sources, not AI summaries. The tool should make citing easy, not replace it.
The tool compensates by making outputs maximally structured and machine-readable: - JSON with full metadata for every paper - Citation graph export (JSON, DOT, GraphML) - Structured markdown with DOI links - BibTeX for direct use in papers
Consequences¶
- Reference parsing (v0.3) must be rule-based (regex + heuristics), not LLM-powered
- Quality classification uses deterministic tier rules, not LLM judgment
- Users who want LLM features pipe litseer output to their own LLM toolchain
- Documentation should include examples of how to use litseer output with LLMs
- May revisit with an optional plugin system if there's strong demand (would be a new ADR)