Litseer: Literature Positioning and Motivation¶
The Problem¶
Systematic literature review is a core research skill taught in every graduate program. Librarians have codified best practices for decades. Formal methodologies exist — PRISMA (Page et al. 2021), Wohlin snowballing (2014), Cochrane protocols, Campbell Collaboration guidelines. Yet no widely-adopted open-source tool implements these methodologies as reproducible software.
The current state of practice:
- Researchers query databases manually — typing keywords into Google Scholar, Scopus, or Web of Science one at a time
- They track results in spreadsheets — copy-pasting titles and DOIs
- They draw PRISMA flow diagrams by hand — in PowerPoint, after the fact
- They snowball by following reference lists — clicking through PDFs
- The process is not reproducible — a different researcher running the same review will get different results depending on which databases they searched, what terms they used, and when they ran the queries
This is a well-documented problem. Biocić et al. (2019) found that search strategies in published systematic reviews are not reproducible. Ioannidis (2016) documented the "mass production of redundant, misleading, and conflicted systematic reviews." The methodology exists; the implementation does not.
The Gap¶
What exists: databases¶
Google Scholar, Scopus, Web of Science, OpenAlex, Semantic Scholar, PubMed, and dozens of domain-specific databases (IEEE Xplore, NASA NTRS, AIAA Arc) provide search infrastructure. They are the equivalent of PostgreSQL — they store and index papers. They do not implement research methodology.
Gusenbauer & Haddaway (2020) evaluated 28 academic search systems and concluded that no single source is sufficient for systematic reviews. Their recommendation: search multiple databases with documented strategies. This finding validates multi-source federation as an architectural requirement.
What exists: screening tools¶
| Tool | What it does | What it doesn't do |
|---|---|---|
| Rayyan (Ouzzani et al. 2016, 22.9K citations) | Inclusion/exclusion screening | Search |
| Covidence | Full SR workflow (commercial, $300/yr) | Open-source, reproducible search |
| ASReview (van de Schoot et al. 2021, 935 citations) | ML-prioritized screening | Multi-source search |
| DistillerSR | Commercial SR platform | Open-source anything |
These tools solve the step after search — helping researchers decide which papers to include. None of them automate the search itself.
What exists: bibliometric analysis¶
| Tool | What it does | What it doesn't do |
|---|---|---|
| VOSviewer (van Eck & Waltman) | Citation network visualization | Search or data collection |
| Bibliometrix R (Aria & Cuccurullo 2017) | Bipartite matrices, co-citation analysis | Multi-source search |
| CiteSpace | Temporal citation analysis | Automated data collection |
| CitNetExplorer (van Eck & Waltman 2014) | Citation network exploration | Search federation |
These tools analyze citation networks after the data has been collected. They assume the researcher has already gathered a corpus.
What doesn't exist¶
No tool combines: - Automated multi-source search across heterogeneous academic databases - Systematic citation snowballing following Wohlin (2014) methodology - Cross-source deduplication (DOI + title normalization) - Local citation graph accumulation that grows over time - Bibliometric network analysis using established methods - Deterministic, reproducible execution from declarative configuration - PRISMA-compatible reporting with auditable pipeline metadata
This is the gap litseer fills.
Competitive Landscape (2025-2026)¶
A richer ecosystem of partial solutions exists beyond the academic tool papers. None combines all the capabilities litseer targets, but several solve individual pieces well.
Citation graph visualization (closest to litseer's vision)¶
| Tool | Strength | Missing |
|---|---|---|
| Connected Papers | Beautiful 2D graph from a single seed, bibliographic coupling | No multi-source, no CLI, no batch, single-seed only |
| Research Rabbit | Iterative citation chaining, collection-based (acquired by Litmaps 2025) | Cloud-only, no reproducible configs, no BibTeX pipeline |
| Litmaps | Timeline + citation visualization, monitoring alerts | Freemium ($10/mo), no CLI, no multi-database federation |
| Inciteful | Paper Discovery network, Literature Connector (bridge finder) | Web-only, no export pipeline, no batch/portfolio |
| Citation Gecko | Upload BibTeX, find missing papers via citation overlap | Abandoned (~2022), web-only |
arXiv Bibliographic Explorer and Influence Flower (both arXivLabs partners) add citation overlays directly on arXiv pages — lightweight but single-source.
These tools solve visual exploration but none is open source, CLI-first, multi-database, or config-driven.
Systematic review automation tools¶
| Tool | Strength | Missing |
|---|---|---|
| Polyglot Search (SR Accelerator) | Translates one PubMed query to 7+ database syntaxes | Generates query strings, doesn't execute them |
| SPARK (2024) | Automated collection + filtering + extraction scaffolding | Medical focus, early stage |
| ASReview | ML-prioritized screening, Nature Machine Intelligence paper | Screening only, not search |
| Rayyan | AI-powered inclusion/exclusion screening | Post-search tool, doesn't collect papers |
| Covidence | Full SR workflow, widely used in medical fields | Commercial ($300/yr), no CLI, medical focus |
| SR Toolbox | Directory of 235 tools (as of Dec 2024) | It's a directory, not a tool |
The pattern is clear: the ecosystem is heavily weighted toward screening (deciding which papers to include after you already have them) rather than search (systematically finding papers in the first place).
Reference management (researcher daily-drivers)¶
| Tool | Strength | Missing |
|---|---|---|
| Zotero + Better BibTeX | Best open-source reference manager, excellent BibTeX/LaTeX integration | No automated search, manual citation chasing only, no multi-source federation |
| Mendeley | PDF management, social features | Elsevier-owned, limited export, no automation |
| Paperpile | Clean Google Docs/Drive integration | Commercial, no CLI, no batch |
Zotero with Better BibTeX is the closest to litseer's ethos (open source,
researcher-controlled). Litseer is designed to complement Zotero: search
results export to BibTeX that Zotero can import, and existing_bib_path
prevents re-discovering papers already in the researcher's library.
Bibliometric analysis software¶
| Tool | Strength | Missing |
|---|---|---|
| VOSviewer (van Eck & Waltman) | Gold standard for co-citation/coupling visualization | Manual data import, no search, Java desktop app |
| Bibliometrix R (Aria & Cuccurullo) | Comprehensive R package, bipartite matrices | R-only, no search automation, no CLI pipeline |
| CiteSpace | Temporal citation analysis, burst detection | Java, manual data import |
| CitNetExplorer (van Eck & Waltman) | Citation network drill-down | Manual data import, desktop-only |
Litseer's networks.py implements the same bipartite matrix pattern as
Bibliometrix R and the same normalization as VOSviewer/CitNetExplorer, but
integrated into an automated search pipeline rather than a standalone
analysis tool.
Spatial / VR knowledge exploration¶
| Tool | What it is | Status |
|---|---|---|
| Graph2VR (2024) | VR knowledge graph exploration with gesture-driven queries | Academic prototype, generic knowledge graphs, not citation-specific |
| IEEE VR 2025 papers | VR search interfaces with Vision LLMs | Early research, no released tools |
| HoloLens medical (UCSF) | 3D holographic data from MRI/CT scans | Medical imaging, but demonstrates the interaction paradigm |
Nobody has built "Minority Report for citations." The specific combination of citation network + spatial exploration + research workflow is an open field. Connected Papers' 2D graph is the closest mainstream product; Graph2VR is the closest academic prototype for generic graph data in VR.
Big tech (conspicuously absent)¶
| Company | What they did | What happened |
|---|---|---|
| Google Scholar (2004) — deliberately minimal, no API, blocks programmatic access | Unchanged for 20 years. Incentive is web traffic, not researcher workflow | |
| Microsoft | Microsoft Academic Graph (MAG) — 260M+ papers, open data, citation network | Shut down in 2021. OpenAlex was created to replace it |
| Apple | Nothing. Zero scholarly products | Makes hardware researchers use but ignores the workflow |
| Meta | Internal knowledge graph tools, AI2/Semantic Scholar (Paul Allen legacy) | S2 is useful but a database, not a workflow tool |
These companies see academic search as a feature of a search engine (type query, get results), not as a research methodology workflow. The workflow layer — systematic snowballing, multi-source dedup, quality classification, reproducible configs, graph accumulation — falls between the cracks.
Where litseer sits¶
Automated Search ──────────────────► Manual Search
│ │
┌────┴────┐ ┌────┴────┐
│ litseer │ │ Scholar │
│ SPARK │ │ Scopus │
└────┬────┘ │ PubMed │
│ └────┬────┘
Graph/Network │
│ Screening
┌────┴────┐ ┌────┴────┐
│ VOSview │ ◄─── gap ───► │ Rayyan │
│ bibliom │ │ASReview │
│CitNetExp│ │Covidence│
└────┬────┘ └─────────┘
│
Visualization
│
┌────┴────┐
│ConnPaper│
│Litmaps │
│ResRabbit│
│Inciteful│
└────┬────┘
│
Spatial/VR
│
┌────┴────┐
│Graph2VR │
│ (empty) │ ◄─── litseer's future
└─────────┘
Litseer is the only open-source, CLI-first tool in the top-left quadrant that connects all the way down to the visualization layer. The vertical integration — from automated multi-source search through bibliometric analysis to spatial exploration — is unique.
Methodological Foundations¶
Litseer implements established methods rather than inventing new ones:
Multi-source federation¶
Validated by Gusenbauer & Haddaway (2020): "Which academic search systems are suitable for systematic reviews?" Nine source adapters covering general databases (OpenAlex, Semantic Scholar, CrossRef), domain-specific sources (NASA NTRS, IEEE, AIAA, SAE), and specialized knowledge bases (SKYbrary).
Database search + snowballing combination¶
Wohlin (2022): "Successful combination of database search and snowballing for identification of primary studies in systematic literature studies." Demonstrates that combining database search with forward/backward citation walking discovers significantly more relevant papers than either method alone.
Bibliometric network analysis¶
Bipartite sparse matrix pattern from R bibliometrix (Aria & Cuccurullo 2017). Co-citation and bibliographic coupling via matrix multiplication. Association strength normalization following van Eck & Waltman (2009).
Quality classification¶
Venue-based tier classification (peer-reviewed → technical → preprint → grey literature) following the evidence hierarchy used in systematic review protocols (Cochrane, Campbell Collaboration).
Design Principles¶
Deterministic reproducibility¶
Same YAML config + same date range = same results. No randomness, no LLM-dependent features baked in. This directly addresses the reproducibility critique from Biocić et al. (2019).
No built-in LLM¶
Every output is structured, machine-readable, and independently verifiable. The tool produces data for humans and LLMs to analyze — it does not analyze itself. This is a deliberate architectural choice for credibility: claims trace to specific papers, DOIs, and source databases, not to an opaque model.
Incremental accumulation¶
The local citation graph grows with every search and citation walk. Each run adds value to the knowledge base. This mirrors how a librarian's expertise accumulates — but deterministically and sharably.
Automation-friendly¶
YAML-driven configuration, CLI-first interface, JSON/BibTeX/markdown export. Designed for cron jobs, CI pipelines, and integration with document preparation systems (LaTeX, Typst).
Future: Spatial Visualization¶
Why spatial matters for literature review¶
Literature review data is inherently graph-structured: papers cite papers, keywords co-occur, authors collaborate, topics cluster. Traditional 2D interfaces (spreadsheets, flat search results) lose this structure.
Research supports the value of spatial and immersive data exploration:
-
Krokos et al. (2018): "Virtual memory palaces: immersion aids recall." VR environments improved information recall by 8.8% compared to desktop, attributed to spatial encoding in the hippocampus. For literature review, spatial layout of citation clusters could leverage the same effect.
-
Olshannikova et al. (2015): "Visualizing Big Data with augmented and virtual reality: challenges and research agenda." Identifies AR/VR as particularly suited for exploring complex relational data — exactly the structure of citation networks.
-
Cipresso et al. (2018): Network and cluster analysis of VR/AR research itself, demonstrating how bibliometric visualization aids understanding of large research fields.
Proposed interaction paradigm¶
Visual cues map directly to bibliometric properties: - Node size → citation count (highly cited papers are larger) - Edge thickness → co-citation strength - Cluster color → topic grouping (from keyword co-occurrence) - Glow/sparkle effects → high-impact or newly discovered papers - Spatial proximity → bibliographic coupling (papers that share references cluster together in 3D space) - Temporal layering → publication year as depth axis
The goal is discovery through exploration — a researcher walking through their citation graph should notice unexpected connections, identify gaps (sparse regions between clusters), and experience the satisfying "aha" of finding a bridge paper that connects two previously unrelated topic areas.
Platform strategy¶
The data layer (graph JSON, network matrices, portfolio summaries) is
platform-agnostic. A local API server (litseer serve) provides the
backend. Multiple frontends can consume it:
- Web/WebGPU — 3D force-directed graph in the browser, works on wall displays. WebXR extensions enable Quest/headset use with zero native development.
- visionOS — Apple RealityKit for volumetric citation graphs. Spatial computing is a natural fit for graph data.
- OpenXR — Cross-platform VR (Quest, SteamVR) via a shared standard.
Key References¶
Systematic review methodology¶
- Page MJ et al. (2021). "The PRISMA 2020 statement." BMJ. doi:10.1136/bmj.n71
- Rethlefsen ML et al. (2021). "PRISMA-S: an extension for reporting literature searches." doi:10.1186/s13643-020-01542-z
- Moher D et al. (2015). "PRISMA-P 2015 statement." doi:10.1186/2046-4053-4-1
- Wohlin C (2014). "Guidelines for snowballing in systematic literature studies." doi:10.1145/2601248.2601268
- Wohlin C et al. (2022). "Successful combination of database search and snowballing." doi:10.1016/j.infsof.2022.106908
- Gusenbauer M, Haddaway NR (2020). "Which academic search systems are suitable for systematic reviews?" doi:10.1002/jrsm.1378
- Gusenbauer M (2024). "How to search for literature in systematic reviews." doi:10.1016/j.techfore.2024.123833
The reproducibility problem¶
- Ioannidis JPA (2016). "The mass production of redundant, misleading, and conflicted systematic reviews." doi:10.1111/1468-0009.12210
- Biocić M et al. (2019). "Reproducibility of search strategies is suboptimal." doi:10.1016/j.bja.2019.02.014
- Bolaños FJ et al. (2024). "AI for literature reviews: opportunities and challenges." doi:10.1007/s10462-024-10902-3
Existing tools¶
- Ouzzani M et al. (2016). "Rayyan — a web and mobile app for systematic reviews." doi:10.1186/s13643-016-0384-4
- van de Schoot R et al. (2021). "An open source ML framework for systematic reviews." Nature Machine Intelligence. doi:10.1038/s42256-020-00287-7
- van Eck NJ, Waltman L (2014). "CitNetExplorer: citation network analysis." doi:10.1016/j.joi.2014.07.006
- Moral-Muñoz JA et al. (2020). "Software tools for bibliometric analysis." doi:10.3145/epi.2020.ene.03
- Aria M, Cuccurullo C (2017). "bibliometrix: An R tool for science mapping." doi:10.1016/j.joi.2017.08.007
Bibliometric methods¶
- van Eck NJ, Waltman L (2009). "How to normalize co-occurrence data." ISSI.
- Ellegaard O, Wallin JA (2015). "The bibliometric analysis of scholarly production." doi:10.1007/s11192-015-1645-z
- Birkle C et al. (2020). "Web of Science as a data source." doi:10.1162/qss_a_00018
Spatial and immersive visualization¶
- Krokos E et al. (2018). "Virtual memory palaces: immersion aids recall." doi:10.1007/s10055-018-0346-3
- Olshannikova E et al. (2015). "Visualizing Big Data with AR and VR." doi:10.1186/s40537-015-0031-2
- Cipresso P et al. (2018). "Past, present, future of VR/AR research." doi:10.3389/fpsyg.2018.02086
- Isenberg P et al. (2011). "Collaborative visualization." doi:10.1177/1473871611412817
- Graph2VR (2024). "Visualization and exploration of linked data using virtual reality." Database. doi:10.1093/database/baae008
Citation graph visualization tools (non-academic)¶
- Connected Papers — single-seed bibliographic coupling graph
- Research Rabbit — iterative citation chaining (acquired by Litmaps 2025)
- Litmaps — timeline + citation visualization with monitoring
- Inciteful — paper discovery network and literature connector
- Polyglot Search — multi-database query translation
- SR Toolbox — directory of 235 systematic review tools
- arXiv Bibliographic Explorer — open-source citation overlay for arXiv
- Influence Flower — radial citation influence visualization (CMU)
- CORE — 300M+ open access papers from 11K+ global repositories
- IArxiv — AI-sorted daily arXiv paper delivery