Skip to content

Litseer: Literature Positioning and Motivation

The Problem

Systematic literature review is a core research skill taught in every graduate program. Librarians have codified best practices for decades. Formal methodologies exist — PRISMA (Page et al. 2021), Wohlin snowballing (2014), Cochrane protocols, Campbell Collaboration guidelines. Yet no widely-adopted open-source tool implements these methodologies as reproducible software.

The current state of practice:

  1. Researchers query databases manually — typing keywords into Google Scholar, Scopus, or Web of Science one at a time
  2. They track results in spreadsheets — copy-pasting titles and DOIs
  3. They draw PRISMA flow diagrams by hand — in PowerPoint, after the fact
  4. They snowball by following reference lists — clicking through PDFs
  5. The process is not reproducible — a different researcher running the same review will get different results depending on which databases they searched, what terms they used, and when they ran the queries

This is a well-documented problem. Biocić et al. (2019) found that search strategies in published systematic reviews are not reproducible. Ioannidis (2016) documented the "mass production of redundant, misleading, and conflicted systematic reviews." The methodology exists; the implementation does not.

The Gap

What exists: databases

Google Scholar, Scopus, Web of Science, OpenAlex, Semantic Scholar, PubMed, and dozens of domain-specific databases (IEEE Xplore, NASA NTRS, AIAA Arc) provide search infrastructure. They are the equivalent of PostgreSQL — they store and index papers. They do not implement research methodology.

Gusenbauer & Haddaway (2020) evaluated 28 academic search systems and concluded that no single source is sufficient for systematic reviews. Their recommendation: search multiple databases with documented strategies. This finding validates multi-source federation as an architectural requirement.

What exists: screening tools

Tool What it does What it doesn't do
Rayyan (Ouzzani et al. 2016, 22.9K citations) Inclusion/exclusion screening Search
Covidence Full SR workflow (commercial, $300/yr) Open-source, reproducible search
ASReview (van de Schoot et al. 2021, 935 citations) ML-prioritized screening Multi-source search
DistillerSR Commercial SR platform Open-source anything

These tools solve the step after search — helping researchers decide which papers to include. None of them automate the search itself.

What exists: bibliometric analysis

Tool What it does What it doesn't do
VOSviewer (van Eck & Waltman) Citation network visualization Search or data collection
Bibliometrix R (Aria & Cuccurullo 2017) Bipartite matrices, co-citation analysis Multi-source search
CiteSpace Temporal citation analysis Automated data collection
CitNetExplorer (van Eck & Waltman 2014) Citation network exploration Search federation

These tools analyze citation networks after the data has been collected. They assume the researcher has already gathered a corpus.

What doesn't exist

No tool combines: - Automated multi-source search across heterogeneous academic databases - Systematic citation snowballing following Wohlin (2014) methodology - Cross-source deduplication (DOI + title normalization) - Local citation graph accumulation that grows over time - Bibliometric network analysis using established methods - Deterministic, reproducible execution from declarative configuration - PRISMA-compatible reporting with auditable pipeline metadata

This is the gap litseer fills.

Competitive Landscape (2025-2026)

A richer ecosystem of partial solutions exists beyond the academic tool papers. None combines all the capabilities litseer targets, but several solve individual pieces well.

Citation graph visualization (closest to litseer's vision)

Tool Strength Missing
Connected Papers Beautiful 2D graph from a single seed, bibliographic coupling No multi-source, no CLI, no batch, single-seed only
Research Rabbit Iterative citation chaining, collection-based (acquired by Litmaps 2025) Cloud-only, no reproducible configs, no BibTeX pipeline
Litmaps Timeline + citation visualization, monitoring alerts Freemium ($10/mo), no CLI, no multi-database federation
Inciteful Paper Discovery network, Literature Connector (bridge finder) Web-only, no export pipeline, no batch/portfolio
Citation Gecko Upload BibTeX, find missing papers via citation overlap Abandoned (~2022), web-only

arXiv Bibliographic Explorer and Influence Flower (both arXivLabs partners) add citation overlays directly on arXiv pages — lightweight but single-source.

These tools solve visual exploration but none is open source, CLI-first, multi-database, or config-driven.

Systematic review automation tools

Tool Strength Missing
Polyglot Search (SR Accelerator) Translates one PubMed query to 7+ database syntaxes Generates query strings, doesn't execute them
SPARK (2024) Automated collection + filtering + extraction scaffolding Medical focus, early stage
ASReview ML-prioritized screening, Nature Machine Intelligence paper Screening only, not search
Rayyan AI-powered inclusion/exclusion screening Post-search tool, doesn't collect papers
Covidence Full SR workflow, widely used in medical fields Commercial ($300/yr), no CLI, medical focus
SR Toolbox Directory of 235 tools (as of Dec 2024) It's a directory, not a tool

The pattern is clear: the ecosystem is heavily weighted toward screening (deciding which papers to include after you already have them) rather than search (systematically finding papers in the first place).

Reference management (researcher daily-drivers)

Tool Strength Missing
Zotero + Better BibTeX Best open-source reference manager, excellent BibTeX/LaTeX integration No automated search, manual citation chasing only, no multi-source federation
Mendeley PDF management, social features Elsevier-owned, limited export, no automation
Paperpile Clean Google Docs/Drive integration Commercial, no CLI, no batch

Zotero with Better BibTeX is the closest to litseer's ethos (open source, researcher-controlled). Litseer is designed to complement Zotero: search results export to BibTeX that Zotero can import, and existing_bib_path prevents re-discovering papers already in the researcher's library.

Bibliometric analysis software

Tool Strength Missing
VOSviewer (van Eck & Waltman) Gold standard for co-citation/coupling visualization Manual data import, no search, Java desktop app
Bibliometrix R (Aria & Cuccurullo) Comprehensive R package, bipartite matrices R-only, no search automation, no CLI pipeline
CiteSpace Temporal citation analysis, burst detection Java, manual data import
CitNetExplorer (van Eck & Waltman) Citation network drill-down Manual data import, desktop-only

Litseer's networks.py implements the same bipartite matrix pattern as Bibliometrix R and the same normalization as VOSviewer/CitNetExplorer, but integrated into an automated search pipeline rather than a standalone analysis tool.

Spatial / VR knowledge exploration

Tool What it is Status
Graph2VR (2024) VR knowledge graph exploration with gesture-driven queries Academic prototype, generic knowledge graphs, not citation-specific
IEEE VR 2025 papers VR search interfaces with Vision LLMs Early research, no released tools
HoloLens medical (UCSF) 3D holographic data from MRI/CT scans Medical imaging, but demonstrates the interaction paradigm

Nobody has built "Minority Report for citations." The specific combination of citation network + spatial exploration + research workflow is an open field. Connected Papers' 2D graph is the closest mainstream product; Graph2VR is the closest academic prototype for generic graph data in VR.

Big tech (conspicuously absent)

Company What they did What happened
Google Google Scholar (2004) — deliberately minimal, no API, blocks programmatic access Unchanged for 20 years. Incentive is web traffic, not researcher workflow
Microsoft Microsoft Academic Graph (MAG) — 260M+ papers, open data, citation network Shut down in 2021. OpenAlex was created to replace it
Apple Nothing. Zero scholarly products Makes hardware researchers use but ignores the workflow
Meta Internal knowledge graph tools, AI2/Semantic Scholar (Paul Allen legacy) S2 is useful but a database, not a workflow tool

These companies see academic search as a feature of a search engine (type query, get results), not as a research methodology workflow. The workflow layer — systematic snowballing, multi-source dedup, quality classification, reproducible configs, graph accumulation — falls between the cracks.

Where litseer sits

                    Automated Search ──────────────────► Manual Search
                         │                                    │
                    ┌────┴────┐                          ┌────┴────┐
                    │ litseer │                          │ Scholar │
                    │ SPARK   │                          │ Scopus  │
                    └────┬────┘                          │ PubMed  │
                         │                               └────┬────┘
                    Graph/Network                              │
                         │                               Screening
                    ┌────┴────┐                          ┌────┴────┐
                    │ VOSview │ ◄─── gap ───►            │ Rayyan  │
                    │ bibliom │                           │ASReview │
                    │CitNetExp│                           │Covidence│
                    └────┬────┘                          └─────────┘
                    Visualization
                    ┌────┴────┐
                    │ConnPaper│
                    │Litmaps  │
                    │ResRabbit│
                    │Inciteful│
                    └────┬────┘
                    Spatial/VR
                    ┌────┴────┐
                    │Graph2VR │
                    │ (empty) │ ◄─── litseer's future
                    └─────────┘

Litseer is the only open-source, CLI-first tool in the top-left quadrant that connects all the way down to the visualization layer. The vertical integration — from automated multi-source search through bibliometric analysis to spatial exploration — is unique.

Methodological Foundations

Litseer implements established methods rather than inventing new ones:

Multi-source federation

Validated by Gusenbauer & Haddaway (2020): "Which academic search systems are suitable for systematic reviews?" Nine source adapters covering general databases (OpenAlex, Semantic Scholar, CrossRef), domain-specific sources (NASA NTRS, IEEE, AIAA, SAE), and specialized knowledge bases (SKYbrary).

Database search + snowballing combination

Wohlin (2022): "Successful combination of database search and snowballing for identification of primary studies in systematic literature studies." Demonstrates that combining database search with forward/backward citation walking discovers significantly more relevant papers than either method alone.

Bibliometric network analysis

Bipartite sparse matrix pattern from R bibliometrix (Aria & Cuccurullo 2017). Co-citation and bibliographic coupling via matrix multiplication. Association strength normalization following van Eck & Waltman (2009).

Quality classification

Venue-based tier classification (peer-reviewed → technical → preprint → grey literature) following the evidence hierarchy used in systematic review protocols (Cochrane, Campbell Collaboration).

Design Principles

Deterministic reproducibility

Same YAML config + same date range = same results. No randomness, no LLM-dependent features baked in. This directly addresses the reproducibility critique from Biocić et al. (2019).

No built-in LLM

Every output is structured, machine-readable, and independently verifiable. The tool produces data for humans and LLMs to analyze — it does not analyze itself. This is a deliberate architectural choice for credibility: claims trace to specific papers, DOIs, and source databases, not to an opaque model.

Incremental accumulation

The local citation graph grows with every search and citation walk. Each run adds value to the knowledge base. This mirrors how a librarian's expertise accumulates — but deterministically and sharably.

Automation-friendly

YAML-driven configuration, CLI-first interface, JSON/BibTeX/markdown export. Designed for cron jobs, CI pipelines, and integration with document preparation systems (LaTeX, Typst).

Future: Spatial Visualization

Why spatial matters for literature review

Literature review data is inherently graph-structured: papers cite papers, keywords co-occur, authors collaborate, topics cluster. Traditional 2D interfaces (spreadsheets, flat search results) lose this structure.

Research supports the value of spatial and immersive data exploration:

  • Krokos et al. (2018): "Virtual memory palaces: immersion aids recall." VR environments improved information recall by 8.8% compared to desktop, attributed to spatial encoding in the hippocampus. For literature review, spatial layout of citation clusters could leverage the same effect.

  • Olshannikova et al. (2015): "Visualizing Big Data with augmented and virtual reality: challenges and research agenda." Identifies AR/VR as particularly suited for exploring complex relational data — exactly the structure of citation networks.

  • Cipresso et al. (2018): Network and cluster analysis of VR/AR research itself, demonstrating how bibliometric visualization aids understanding of large research fields.

Proposed interaction paradigm

Visual cues map directly to bibliometric properties: - Node size → citation count (highly cited papers are larger) - Edge thickness → co-citation strength - Cluster color → topic grouping (from keyword co-occurrence) - Glow/sparkle effects → high-impact or newly discovered papers - Spatial proximity → bibliographic coupling (papers that share references cluster together in 3D space) - Temporal layering → publication year as depth axis

The goal is discovery through exploration — a researcher walking through their citation graph should notice unexpected connections, identify gaps (sparse regions between clusters), and experience the satisfying "aha" of finding a bridge paper that connects two previously unrelated topic areas.

Platform strategy

The data layer (graph JSON, network matrices, portfolio summaries) is platform-agnostic. A local API server (litseer serve) provides the backend. Multiple frontends can consume it:

  1. Web/WebGPU — 3D force-directed graph in the browser, works on wall displays. WebXR extensions enable Quest/headset use with zero native development.
  2. visionOS — Apple RealityKit for volumetric citation graphs. Spatial computing is a natural fit for graph data.
  3. OpenXR — Cross-platform VR (Quest, SteamVR) via a shared standard.

Key References

Systematic review methodology

  • Page MJ et al. (2021). "The PRISMA 2020 statement." BMJ. doi:10.1136/bmj.n71
  • Rethlefsen ML et al. (2021). "PRISMA-S: an extension for reporting literature searches." doi:10.1186/s13643-020-01542-z
  • Moher D et al. (2015). "PRISMA-P 2015 statement." doi:10.1186/2046-4053-4-1
  • Wohlin C (2014). "Guidelines for snowballing in systematic literature studies." doi:10.1145/2601248.2601268
  • Wohlin C et al. (2022). "Successful combination of database search and snowballing." doi:10.1016/j.infsof.2022.106908
  • Gusenbauer M, Haddaway NR (2020). "Which academic search systems are suitable for systematic reviews?" doi:10.1002/jrsm.1378
  • Gusenbauer M (2024). "How to search for literature in systematic reviews." doi:10.1016/j.techfore.2024.123833

The reproducibility problem

  • Ioannidis JPA (2016). "The mass production of redundant, misleading, and conflicted systematic reviews." doi:10.1111/1468-0009.12210
  • Biocić M et al. (2019). "Reproducibility of search strategies is suboptimal." doi:10.1016/j.bja.2019.02.014
  • Bolaños FJ et al. (2024). "AI for literature reviews: opportunities and challenges." doi:10.1007/s10462-024-10902-3

Existing tools

  • Ouzzani M et al. (2016). "Rayyan — a web and mobile app for systematic reviews." doi:10.1186/s13643-016-0384-4
  • van de Schoot R et al. (2021). "An open source ML framework for systematic reviews." Nature Machine Intelligence. doi:10.1038/s42256-020-00287-7
  • van Eck NJ, Waltman L (2014). "CitNetExplorer: citation network analysis." doi:10.1016/j.joi.2014.07.006
  • Moral-Muñoz JA et al. (2020). "Software tools for bibliometric analysis." doi:10.3145/epi.2020.ene.03
  • Aria M, Cuccurullo C (2017). "bibliometrix: An R tool for science mapping." doi:10.1016/j.joi.2017.08.007

Bibliometric methods

  • van Eck NJ, Waltman L (2009). "How to normalize co-occurrence data." ISSI.
  • Ellegaard O, Wallin JA (2015). "The bibliometric analysis of scholarly production." doi:10.1007/s11192-015-1645-z
  • Birkle C et al. (2020). "Web of Science as a data source." doi:10.1162/qss_a_00018

Spatial and immersive visualization

  • Krokos E et al. (2018). "Virtual memory palaces: immersion aids recall." doi:10.1007/s10055-018-0346-3
  • Olshannikova E et al. (2015). "Visualizing Big Data with AR and VR." doi:10.1186/s40537-015-0031-2
  • Cipresso P et al. (2018). "Past, present, future of VR/AR research." doi:10.3389/fpsyg.2018.02086
  • Isenberg P et al. (2011). "Collaborative visualization." doi:10.1177/1473871611412817
  • Graph2VR (2024). "Visualization and exploration of linked data using virtual reality." Database. doi:10.1093/database/baae008

Citation graph visualization tools (non-academic)