Skip to content

ADR-004: No Built-in LLM Calls

Date: 2026-03-14 Status: Accepted

Context

LLMs could add value to literature review in many ways: - Summarizing papers - Scoring relevance - Extracting key findings - Generating review narratives - Classifying papers by methodology

However, baking LLM calls into the tool has significant downsides.

Options Considered

  1. Built-in LLM integration — call OpenAI/Anthropic APIs for summarization, relevance scoring
  2. Pro: powerful features out of the box
  3. Con: API costs, non-deterministic, hard to verify, vendor lock-in, requires API keys

  4. LLM-ready structured output — produce clean JSON/markdown that LLMs can consume externally

  5. Pro: deterministic core, user chooses their own LLM, verifiable data pipeline
  6. Con: requires external tooling for LLM-powered features

  7. Optional LLM plugin — core is LLM-free, optional module adds LLM features

  8. Pro: best of both worlds
  9. Con: more complexity, still non-deterministic when enabled

Decision

Option 2: No built-in LLM calls. All output formats designed for external LLM consumption.

Rationale

  • Determinism: Same search config should produce the same results every time. LLM outputs are inherently non-deterministic.
  • Verifiability: Every claim in a literature review should trace to a specific paper and DOI. LLM summaries can hallucinate.
  • Cost control: API calls add up. The user should control when and how they spend on LLM inference.
  • Flexibility: The user can use Claude, GPT, Llama, or any future model. No vendor lock-in.
  • Separation of concerns: Litseer is a data pipeline. LLM analysis is a downstream consumer.
  • Academic rigor: Researchers need to cite sources, not AI summaries. The tool should make citing easy, not replace it.

The tool compensates by making outputs maximally structured and machine-readable: - JSON with full metadata for every paper - Citation graph export (JSON, DOT, GraphML) - Structured markdown with DOI links - BibTeX for direct use in papers

Consequences

  • Reference parsing (v0.3) must be rule-based (regex + heuristics), not LLM-powered
  • Quality classification uses deterministic tier rules, not LLM judgment
  • Users who want LLM features pipe litseer output to their own LLM toolchain
  • Documentation should include examples of how to use litseer output with LLMs
  • May revisit with an optional plugin system if there's strong demand (would be a new ADR)