Litseer: Literature Positioning and Motivation¶

The Problem¶

Systematic literature review is a core research skill taught in every graduate program. Librarians have codified best practices for decades. Formal methodologies exist — PRISMA (Page et al. 2021), Wohlin snowballing (2014), Cochrane protocols, Campbell Collaboration guidelines. Yet no widely-adopted open-source tool implements these methodologies as reproducible software.

The current state of practice:

Researchers query databases manually — typing keywords into Google Scholar, Scopus, or Web of Science one at a time
They track results in spreadsheets — copy-pasting titles and DOIs
They draw PRISMA flow diagrams by hand — in PowerPoint, after the fact
They snowball by following reference lists — clicking through PDFs
The process is not reproducible — a different researcher running the same review will get different results depending on which databases they searched, what terms they used, and when they ran the queries

This is a well-documented problem. Biocić et al. (2019) found that search strategies in published systematic reviews are not reproducible. Ioannidis (2016) documented the "mass production of redundant, misleading, and conflicted systematic reviews." The methodology exists; the implementation does not.

The Gap¶

What exists: databases¶

Google Scholar, Scopus, Web of Science, OpenAlex, Semantic Scholar, PubMed, and dozens of domain-specific databases (IEEE Xplore, NASA NTRS, AIAA Arc) provide search infrastructure. They are the equivalent of PostgreSQL — they store and index papers. They do not implement research methodology.

Gusenbauer & Haddaway (2020) evaluated 28 academic search systems and concluded that no single source is sufficient for systematic reviews. Their recommendation: search multiple databases with documented strategies. This finding validates multi-source federation as an architectural requirement.

What exists: screening tools¶

Tool	What it does	What it doesn't do
Rayyan (Ouzzani et al. 2016, 22.9K citations)	Inclusion/exclusion screening	Search
Covidence	Full SR workflow (commercial, $300/yr)	Open-source, reproducible search
ASReview (van de Schoot et al. 2021, 935 citations)	ML-prioritized screening	Multi-source search
DistillerSR	Commercial SR platform	Open-source anything

These tools solve the step after search — helping researchers decide which papers to include. None of them automate the search itself.

What exists: bibliometric analysis¶

Tool	What it does	What it doesn't do
VOSviewer (van Eck & Waltman)	Citation network visualization	Search or data collection
Bibliometrix R (Aria & Cuccurullo 2017)	Bipartite matrices, co-citation analysis	Multi-source search
CiteSpace	Temporal citation analysis	Automated data collection
CitNetExplorer (van Eck & Waltman 2014)	Citation network exploration	Search federation

These tools analyze citation networks after the data has been collected. They assume the researcher has already gathered a corpus.

What doesn't exist¶

No tool combines: - Automated multi-source search across heterogeneous academic databases - Systematic citation snowballing following Wohlin (2014) methodology - Cross-source deduplication (DOI + title normalization) - Local citation graph accumulation that grows over time - Bibliometric network analysis using established methods - Deterministic, reproducible execution from declarative configuration - PRISMA-compatible reporting with auditable pipeline metadata

This is the gap litseer fills.

Competitive Landscape (2025-2026)¶

A richer ecosystem of partial solutions exists beyond the academic tool papers. None combines all the capabilities litseer targets, but several solve individual pieces well.

Citation graph visualization (closest to litseer's vision)¶

Tool	Strength	Missing
Connected Papers	Beautiful 2D graph from a single seed, bibliographic coupling	No multi-source, no CLI, no batch, single-seed only
Research Rabbit	Iterative citation chaining, collection-based (acquired by Litmaps 2025)	Cloud-only, no reproducible configs, no BibTeX pipeline
Litmaps	Timeline + citation visualization, monitoring alerts	Freemium ($10/mo), no CLI, no multi-database federation
Inciteful	Paper Discovery network, Literature Connector (bridge finder)	Web-only, no export pipeline, no batch/portfolio
Citation Gecko	Upload BibTeX, find missing papers via citation overlap	Abandoned (~2022), web-only

arXiv Bibliographic Explorer and Influence Flower (both arXivLabs partners) add citation overlays directly on arXiv pages — lightweight but single-source.

These tools solve visual exploration but none is open source, CLI-first, multi-database, or config-driven.

Systematic review automation tools¶

Tool	Strength	Missing
Polyglot Search (SR Accelerator)	Translates one PubMed query to 7+ database syntaxes	Generates query strings, doesn't execute them
SPARK (2024)	Automated collection + filtering + extraction scaffolding	Medical focus, early stage
ASReview	ML-prioritized screening, Nature Machine Intelligence paper	Screening only, not search
Rayyan	AI-powered inclusion/exclusion screening	Post-search tool, doesn't collect papers
Covidence	Full SR workflow, widely used in medical fields	Commercial ($300/yr), no CLI, medical focus
SR Toolbox	Directory of 235 tools (as of Dec 2024)	It's a directory, not a tool

The pattern is clear: the ecosystem is heavily weighted toward screening (deciding which papers to include after you already have them) rather than search (systematically finding papers in the first place).

Reference management (researcher daily-drivers)¶

Tool	Strength	Missing
Zotero + Better BibTeX	Best open-source reference manager, excellent BibTeX/LaTeX integration	No automated search, manual citation chasing only, no multi-source federation
Mendeley	PDF management, social features	Elsevier-owned, limited export, no automation
Paperpile	Clean Google Docs/Drive integration	Commercial, no CLI, no batch

Zotero with Better BibTeX is the closest to litseer's ethos (open source, researcher-controlled). Litseer is designed to complement Zotero: search results export to BibTeX that Zotero can import, and existing_bib_path prevents re-discovering papers already in the researcher's library.

Bibliometric analysis software¶

Tool	Strength	Missing
VOSviewer (van Eck & Waltman)	Gold standard for co-citation/coupling visualization	Manual data import, no search, Java desktop app
Bibliometrix R (Aria & Cuccurullo)	Comprehensive R package, bipartite matrices	R-only, no search automation, no CLI pipeline
CiteSpace	Temporal citation analysis, burst detection	Java, manual data import
CitNetExplorer (van Eck & Waltman)	Citation network drill-down	Manual data import, desktop-only

Litseer's networks.py implements the same bipartite matrix pattern as Bibliometrix R and the same normalization as VOSviewer/CitNetExplorer, but integrated into an automated search pipeline rather than a standalone analysis tool.

Spatial / VR knowledge exploration¶

Tool	What it is	Status
Graph2VR (2024)	VR knowledge graph exploration with gesture-driven queries	Academic prototype, generic knowledge graphs, not citation-specific
IEEE VR 2025 papers	VR search interfaces with Vision LLMs	Early research, no released tools
HoloLens medical (UCSF)	3D holographic data from MRI/CT scans	Medical imaging, but demonstrates the interaction paradigm

Nobody has built "Minority Report for citations." The specific combination of citation network + spatial exploration + research workflow is an open field. Connected Papers' 2D graph is the closest mainstream product; Graph2VR is the closest academic prototype for generic graph data in VR.

Big tech (conspicuously absent)¶

Company	What they did	What happened
Google	Google Scholar (2004) — deliberately minimal, no API, blocks programmatic access	Unchanged for 20 years. Incentive is web traffic, not researcher workflow
Microsoft	Microsoft Academic Graph (MAG) — 260M+ papers, open data, citation network	Shut down in 2021. OpenAlex was created to replace it
Apple	Nothing. Zero scholarly products	Makes hardware researchers use but ignores the workflow
Meta	Internal knowledge graph tools, AI2/Semantic Scholar (Paul Allen legacy)	S2 is useful but a database, not a workflow tool

These companies see academic search as a feature of a search engine (type query, get results), not as a research methodology workflow. The workflow layer — systematic snowballing, multi-source dedup, quality classification, reproducible configs, graph accumulation — falls between the cracks.

Where litseer sits¶

                    Automated Search ──────────────────► Manual Search
                         │                                    │
                    ┌────┴────┐                          ┌────┴────┐
                    │ litseer │                          │ Scholar │
                    │ SPARK   │                          │ Scopus  │
                    └────┬────┘                          │ PubMed  │
                         │                               └────┬────┘
                    Graph/Network                              │
                         │                               Screening
                    ┌────┴────┐                          ┌────┴────┐
                    │ VOSview │ ◄─── gap ───►            │ Rayyan  │
                    │ bibliom │                           │ASReview │
                    │CitNetExp│                           │Covidence│
                    └────┬────┘                          └─────────┘
                         │
                    Visualization
                         │
                    ┌────┴────┐
                    │ConnPaper│
                    │Litmaps  │
                    │ResRabbit│
                    │Inciteful│
                    └────┬────┘
                         │
                    Spatial/VR
                         │
                    ┌────┴────┐
                    │Graph2VR │
                    │ (empty) │ ◄─── litseer's future
                    └─────────┘

Litseer is the only open-source, CLI-first tool in the top-left quadrant that connects all the way down to the visualization layer. The vertical integration — from automated multi-source search through bibliometric analysis to spatial exploration — is unique.

Methodological Foundations¶

Litseer implements established methods rather than inventing new ones:

Multi-source federation¶

Validated by Gusenbauer & Haddaway (2020): "Which academic search systems are suitable for systematic reviews?" Nine source adapters covering general databases (OpenAlex, Semantic Scholar, CrossRef), domain-specific sources (NASA NTRS, IEEE, AIAA, SAE), and specialized knowledge bases (SKYbrary).

Database search + snowballing combination¶

Wohlin (2022): "Successful combination of database search and snowballing for identification of primary studies in systematic literature studies." Demonstrates that combining database search with forward/backward citation walking discovers significantly more relevant papers than either method alone.

Bibliometric network analysis¶

Bipartite sparse matrix pattern from R bibliometrix (Aria & Cuccurullo 2017). Co-citation and bibliographic coupling via matrix multiplication. Association strength normalization following van Eck & Waltman (2009).

Quality classification¶

Venue-based tier classification (peer-reviewed → technical → preprint → grey literature) following the evidence hierarchy used in systematic review protocols (Cochrane, Campbell Collaboration).

Design Principles¶

Deterministic reproducibility¶

Same YAML config + same date range = same results. No randomness, no LLM-dependent features baked in. This directly addresses the reproducibility critique from Biocić et al. (2019).

No built-in LLM¶

Every output is structured, machine-readable, and independently verifiable. The tool produces data for humans and LLMs to analyze — it does not analyze itself. This is a deliberate architectural choice for credibility: claims trace to specific papers, DOIs, and source databases, not to an opaque model.

Incremental accumulation¶

The local citation graph grows with every search and citation walk. Each run adds value to the knowledge base. This mirrors how a librarian's expertise accumulates — but deterministically and sharably.

Automation-friendly¶

YAML-driven configuration, CLI-first interface, JSON/BibTeX/markdown export. Designed for cron jobs, CI pipelines, and integration with document preparation systems (LaTeX, Typst).

Future: Spatial Visualization¶

Why spatial matters for literature review¶

Literature review data is inherently graph-structured: papers cite papers, keywords co-occur, authors collaborate, topics cluster. Traditional 2D interfaces (spreadsheets, flat search results) lose this structure.

Research supports the value of spatial and immersive data exploration:

Krokos et al. (2018): "Virtual memory palaces: immersion aids recall." VR environments improved information recall by 8.8% compared to desktop, attributed to spatial encoding in the hippocampus. For literature review, spatial layout of citation clusters could leverage the same effect.
Olshannikova et al. (2015): "Visualizing Big Data with augmented and virtual reality: challenges and research agenda." Identifies AR/VR as particularly suited for exploring complex relational data — exactly the structure of citation networks.
Cipresso et al. (2018): Network and cluster analysis of VR/AR research itself, demonstrating how bibliometric visualization aids understanding of large research fields.

Proposed interaction paradigm¶

Visual cues map directly to bibliometric properties: - Node size → citation count (highly cited papers are larger) - Edge thickness → co-citation strength - Cluster color → topic grouping (from keyword co-occurrence) - Glow/sparkle effects → high-impact or newly discovered papers - Spatial proximity → bibliographic coupling (papers that share references cluster together in 3D space) - Temporal layering → publication year as depth axis

The goal is discovery through exploration — a researcher walking through their citation graph should notice unexpected connections, identify gaps (sparse regions between clusters), and experience the satisfying "aha" of finding a bridge paper that connects two previously unrelated topic areas.

Platform strategy¶

The data layer (graph JSON, network matrices, portfolio summaries) is platform-agnostic. A local API server (litseer serve) provides the backend. Multiple frontends can consume it:

Web/WebGPU — 3D force-directed graph in the browser, works on wall displays. WebXR extensions enable Quest/headset use with zero native development.
visionOS — Apple RealityKit for volumetric citation graphs. Spatial computing is a natural fit for graph data.
OpenXR — Cross-platform VR (Quest, SteamVR) via a shared standard.

Key References¶

Systematic review methodology¶

Page MJ et al. (2021). "The PRISMA 2020 statement." BMJ. doi:10.1136/bmj.n71
Rethlefsen ML et al. (2021). "PRISMA-S: an extension for reporting literature searches." doi:10.1186/s13643-020-01542-z
Moher D et al. (2015). "PRISMA-P 2015 statement." doi:10.1186/2046-4053-4-1
Wohlin C (2014). "Guidelines for snowballing in systematic literature studies." doi:10.1145/2601248.2601268
Wohlin C et al. (2022). "Successful combination of database search and snowballing." doi:10.1016/j.infsof.2022.106908
Gusenbauer M, Haddaway NR (2020). "Which academic search systems are suitable for systematic reviews?" doi:10.1002/jrsm.1378
Gusenbauer M (2024). "How to search for literature in systematic reviews." doi:10.1016/j.techfore.2024.123833

The reproducibility problem¶

Ioannidis JPA (2016). "The mass production of redundant, misleading, and conflicted systematic reviews." doi:10.1111/1468-0009.12210
Biocić M et al. (2019). "Reproducibility of search strategies is suboptimal." doi:10.1016/j.bja.2019.02.014
Bolaños FJ et al. (2024). "AI for literature reviews: opportunities and challenges." doi:10.1007/s10462-024-10902-3

Existing tools¶

Ouzzani M et al. (2016). "Rayyan — a web and mobile app for systematic reviews." doi:10.1186/s13643-016-0384-4
van de Schoot R et al. (2021). "An open source ML framework for systematic reviews." Nature Machine Intelligence. doi:10.1038/s42256-020-00287-7
van Eck NJ, Waltman L (2014). "CitNetExplorer: citation network analysis." doi:10.1016/j.joi.2014.07.006
Moral-Muñoz JA et al. (2020). "Software tools for bibliometric analysis." doi:10.3145/epi.2020.ene.03
Aria M, Cuccurullo C (2017). "bibliometrix: An R tool for science mapping." doi:10.1016/j.joi.2017.08.007

Bibliometric methods¶

van Eck NJ, Waltman L (2009). "How to normalize co-occurrence data." ISSI.
Ellegaard O, Wallin JA (2015). "The bibliometric analysis of scholarly production." doi:10.1007/s11192-015-1645-z
Birkle C et al. (2020). "Web of Science as a data source." doi:10.1162/qss_a_00018

Spatial and immersive visualization¶

Krokos E et al. (2018). "Virtual memory palaces: immersion aids recall." doi:10.1007/s10055-018-0346-3
Olshannikova E et al. (2015). "Visualizing Big Data with AR and VR." doi:10.1186/s40537-015-0031-2
Cipresso P et al. (2018). "Past, present, future of VR/AR research." doi:10.3389/fpsyg.2018.02086
Isenberg P et al. (2011). "Collaborative visualization." doi:10.1177/1473871611412817
Graph2VR (2024). "Visualization and exploration of linked data using virtual reality." Database. doi:10.1093/database/baae008

Citation graph visualization tools (non-academic)¶

Connected Papers — single-seed bibliographic coupling graph
Research Rabbit — iterative citation chaining (acquired by Litmaps 2025)
Litmaps — timeline + citation visualization with monitoring
Inciteful — paper discovery network and literature connector
Polyglot Search — multi-database query translation
SR Toolbox — directory of 235 systematic review tools
arXiv Bibliographic Explorer — open-source citation overlay for arXiv
Influence Flower — radial citation influence visualization (CMU)
CORE — 300M+ open access papers from 11K+ global repositories
IArxiv — AI-sorted daily arXiv paper delivery