ADR-006: Multi-Source Federation as Core Design¶
Date: 2026-03-14 Status: Accepted
Context¶
Academic literature is scattered across many databases, each with different coverage, APIs, and strengths. No single source covers all of aerospace/ engineering literature. Researchers typically search 3-5 databases manually and deduplicate by hand.
Competitive analysis (2026-03-14) confirmed that no other open-source tool combines multi-source federation + dedup + snowballing + quality scoring in a single automated pipeline.
Options Considered¶
- Single-source wrapper — build on OpenAlex only (largest open database)
- Pro: simplest, one API to maintain
-
Con: misses NTRS, IEEE, SKYbrary, domain-specific content
-
Multi-source federation — unified interface across many sources
- Pro: best coverage, dedup catches cross-source duplicates
-
Con: more adapters to maintain, API differences to abstract
-
Meta-search via Google Scholar — scrape/use GS as the universal index
- Pro: broadest coverage
- Con: no official API, scraping is fragile and against ToS
Decision¶
Multi-source federation with a pluggable adapter architecture. Each source
implements the SearchSource protocol. The orchestrator fans out searches
in parallel and deduplicates results.
Current Sources¶
| Source | Coverage | Citation Walking | API Key |
|---|---|---|---|
| OpenAlex | 250M+ works, broad | Full (forward + backward) | Free (polite pool) |
| Semantic Scholar | 225M+ works, CS-heavy | Full | Free (rate-limited) |
| CrossRef | 150M+ DOIs, metadata | Backward only | Free (polite pool) |
| NASA NTRS | 500K+ aerospace reports | None (no graph API) | Free |
| IEEE Xplore | EE/CS journals, conferences | None | Required |
| AIAA | Aerospace journals (via CrossRef) | Backward only | Free |
| SAE | Automotive/aerospace (via CrossRef) | Backward only | Free |
| SKYbrary | Aviation safety knowledge base | Approximate (wiki links) | Free |
Future Source Candidates¶
- Lens.org — patent + scholarly combined, CC-BY data, free tier
- Dimensions.ai — grants + clinical trials + policy linkage, free tier
- DTIC — US DoD technical reports (restricted access)
- arXiv — preprints, important for CS/physics crossover
Rationale¶
- Aerospace/engineering research spans journal papers (OpenAlex, CrossRef), conference proceedings (IEEE, AIAA), government reports (NTRS), and safety databases (SKYbrary) — no single source covers all
- The adapter pattern makes adding new sources low-cost (~150 LOC each)
- Cross-source dedup is a core value proposition — the same paper may appear in OpenAlex, CrossRef, and IEEE with slightly different metadata
- OpenAlex now charges beyond $1/day free usage — multi-source reduces dependency on any single API's pricing decisions
Consequences¶
- Each adapter must implement the
SearchSourceprotocol - Response cache reduces API call costs across all sources
- Source-specific quirks (rate limits, date formats, ID schemes) are isolated inside each adapter
- Quality tier classification normalizes venue types across sources
- New sources can be added without changing core search/dedup/export logic