Skip to content

ADR-006: Multi-Source Federation as Core Design

Date: 2026-03-14 Status: Accepted

Context

Academic literature is scattered across many databases, each with different coverage, APIs, and strengths. No single source covers all of aerospace/ engineering literature. Researchers typically search 3-5 databases manually and deduplicate by hand.

Competitive analysis (2026-03-14) confirmed that no other open-source tool combines multi-source federation + dedup + snowballing + quality scoring in a single automated pipeline.

Options Considered

  1. Single-source wrapper — build on OpenAlex only (largest open database)
  2. Pro: simplest, one API to maintain
  3. Con: misses NTRS, IEEE, SKYbrary, domain-specific content

  4. Multi-source federation — unified interface across many sources

  5. Pro: best coverage, dedup catches cross-source duplicates
  6. Con: more adapters to maintain, API differences to abstract

  7. Meta-search via Google Scholar — scrape/use GS as the universal index

  8. Pro: broadest coverage
  9. Con: no official API, scraping is fragile and against ToS

Decision

Multi-source federation with a pluggable adapter architecture. Each source implements the SearchSource protocol. The orchestrator fans out searches in parallel and deduplicates results.

Current Sources

Source Coverage Citation Walking API Key
OpenAlex 250M+ works, broad Full (forward + backward) Free (polite pool)
Semantic Scholar 225M+ works, CS-heavy Full Free (rate-limited)
CrossRef 150M+ DOIs, metadata Backward only Free (polite pool)
NASA NTRS 500K+ aerospace reports None (no graph API) Free
IEEE Xplore EE/CS journals, conferences None Required
AIAA Aerospace journals (via CrossRef) Backward only Free
SAE Automotive/aerospace (via CrossRef) Backward only Free
SKYbrary Aviation safety knowledge base Approximate (wiki links) Free

Future Source Candidates

  • Lens.org — patent + scholarly combined, CC-BY data, free tier
  • Dimensions.ai — grants + clinical trials + policy linkage, free tier
  • DTIC — US DoD technical reports (restricted access)
  • arXiv — preprints, important for CS/physics crossover

Rationale

  • Aerospace/engineering research spans journal papers (OpenAlex, CrossRef), conference proceedings (IEEE, AIAA), government reports (NTRS), and safety databases (SKYbrary) — no single source covers all
  • The adapter pattern makes adding new sources low-cost (~150 LOC each)
  • Cross-source dedup is a core value proposition — the same paper may appear in OpenAlex, CrossRef, and IEEE with slightly different metadata
  • OpenAlex now charges beyond $1/day free usage — multi-source reduces dependency on any single API's pricing decisions

Consequences

  • Each adapter must implement the SearchSource protocol
  • Response cache reduces API call costs across all sources
  • Source-specific quirks (rate limits, date formats, ID schemes) are isolated inside each adapter
  • Quality tier classification normalizes venue types across sources
  • New sources can be added without changing core search/dedup/export logic