Skip to content

Litseer Usage Guide

Litseer is an async Python CLI tool for multi-source academic literature search. It searches multiple databases in parallel, deduplicates results, classifies quality tiers, and exports to BibTeX, JSON, or markdown. Designed for systematic literature reviews in aerospace and engineering.

Installation

Requires Python 3.11+. Install with uv:

uv pip install -e .

# With development dependencies (pytest, ruff, respx):
uv pip install -e ".[dev]"

# With network analysis (bipartite matrices, co-citation, coupling):
uv pip install -e ".[networks]"

After installation the litseer command is available in your shell.

Quick Start

# Run a search from a config file
litseer search examples/aerospace-review.yaml

# Specify output directory and email for polite API access
litseer search examples/aerospace-review.yaml -o results/ --mailto you@university.edu

# Search only specific sources
litseer search examples/aerospace-review.yaml -s openalex -s crossref

# Enable debug logging
litseer -v search examples/aerospace-review.yaml

Config File Format

Searches are driven by YAML config files. A config defines topic clusters, each with a set of query strings. Litseer runs every query against every year in the range, across all selected sources.

name: my-literature-review
description: >
  Systematic review of film cooling effectiveness on turbine blades.

year_min: 2020
year_max: 2026

# Optional: path to your existing .bib file.
# Litseer will skip papers you already cite.
existing_bib_path: ../thesis/references.bib

# Optional: seed DOIs for citation walking
seed_dois:
  - "10.1115/1.4045389"

clusters:
  - id: film_cooling
    label: Film Cooling Effectiveness
    queries:
      - "film cooling turbine blade effectiveness"
      - "shaped hole film cooling"
    sections: [literature_review, methodology]

  - id: conjugate_ht
    label: Conjugate Heat Transfer
    queries:
      - "conjugate heat transfer turbine blade CFD"
    sections: [methodology]

Config Fields

Field Required Default Description
name no "unnamed" Name for this search configuration
description no "" Free-text description
year_min no 2024 Earliest publication year to search
year_max no 2026 Latest publication year to search
existing_bib_path no none Path to existing .bib file for dedup
seed_dois no [] DOIs for citation walking
clusters yes -- List of topic clusters (see below)

Cluster Fields

Field Required Description
id yes Short identifier (used in output grouping)
label no Human-readable label
queries yes List of search query strings
sections no Thesis/paper sections this cluster maps to

Search Workflow

When you run litseer search, the tool:

  1. Loads the YAML config
  2. Creates source adapters (OpenAlex, CrossRef, etc.)
  3. For each cluster, runs each query for each year across all sources
  4. Classifies every result by quality tier
  5. Filters results by the --max-tier threshold
  6. Deduplicates by DOI and title similarity
  7. Removes papers already in your existing .bib file
  8. Populates the local citation graph with discovered papers and edges
  9. Writes three output files to the output directory

Output Files

For a search run on 2026-03-14, you get:

  • output/search-2026-03-14.json -- Full structured results with query log
  • output/new-refs-2026-03-14.bib -- BibTeX entries ready to merge
  • output/summary-2026-03-14.md -- Markdown summary grouped by cluster

Quality Tiers

Every result is classified into one of four quality tiers based on venue type:

Tier Label Venue Types Example
1 Peer-reviewed journal, conference AIAA Journal, ASME Turbo Expo
2 Technical report, thesis, standard, book NASA TM, SAE standards
3 Preprint preprint arXiv, SSRN
4 Grey literature news, blog, unclassified Trade press, blogs

Use --max-tier to control which tiers to include:

# Only peer-reviewed (tier 1)
litseer search config.yaml --max-tier 1

# Peer-reviewed + technical reports (default)
litseer search config.yaml --max-tier 3

# Everything including grey literature
litseer search config.yaml --max-tier 4

Citation Snowball Walking

Use cite-walk to explore the citation graph around a seed paper. This finds papers that cite the seed (forward) and papers the seed references (backward). Discovered works are automatically ingested into the local citation graph for future use.

# Walk both directions from a DOI, depth 1
litseer cite-walk "10.1115/1.4045389"

# Forward citations only, 2 levels deep
litseer cite-walk "10.1115/1.4045389" --direction forward --depth 2

# Backward references only, limit 20 per level
litseer cite-walk "10.1115/1.4045389" --direction backward --limit 20

# Use Semantic Scholar instead of OpenAlex
litseer cite-walk "10.1115/1.4045389" --source semanticscholar

# Walk from a paper in your local graph (no API calls needed)
litseer cite-walk "10.1115/1.4045389" --source local_graph

Supported Citation Walk Sources

Source Forward (citing) Backward (references)
openalex yes yes
semanticscholar yes yes
crossref no yes
local_graph yes yes
skybrary yes (backlinks) yes (wiki links)
nasa_ntrs no no
ieee no no
aiaa no yes
sae no yes

For multi-technology research programs, the portfolio command runs batch searches across a directory of YAML configs. Each technology gets its own output subfolder, and a portfolio-level summary tracks cross-technology shared references.

# Run all configs in a directory
litseer portfolio examples/portfolio-demo/ -o output/

# With quality filtering
litseer portfolio examples/portfolio-demo/ -o output/ --max-tier 2

# Specific sources only
litseer portfolio examples/portfolio-demo/ -o output/ --sources openalex --sources crossref

Portfolio Output Structure

output/
  turbine-cooling/
    search-2026-03-15.json
    new-refs-2026-03-15.bib
    summary-2026-03-15.md
  cmc-materials/
    search-2026-03-15.json
    new-refs-2026-03-15.bib
    summary-2026-03-15.md
  portfolio-summary-2026-03-15.json   # cross-technology analysis

The portfolio summary JSON includes: - Per-technology result counts - Papers shared across multiple technologies - Total unique papers across the portfolio

This is designed for scheduled/automated runs (cron, CI) to keep a research program's literature coverage up to date.

Local Citation Graph

Every search and cite-walk automatically populates a local DuckDB citation graph at ~/.cache/litseer/graph.db. This graph accumulates paper metadata and citation edges over time, enabling:

  • Citation walking against locally cached papers without API calls
  • Graph statistics and visualization
  • Bibliometric network analysis (with [networks] extra)
# View graph statistics
litseer graph stats

# Export graph as JSON (for external tools or LLM consumption)
litseer graph export --format json -o graph.json

# Export as DOT for Graphviz visualization
litseer graph export --format dot -o citations.dot
dot -Tpdf citations.dot -o citations.pdf

# Disable graph population for a search
litseer --no-graph search config.yaml

Network Analysis

With the [networks] extra installed, you can build bibliometric co-occurrence networks from search results using sparse bipartite matrices. This follows the methodology from R bibliometrix (Aria & Cuccurullo, 2017).

Available operations: - Bibliographic coupling — papers that share references (A @ A.T) - Co-citation — references cited together (A.T @ A) - Keyword co-occurrence — author/indexed keywords appearing together - Association strength normalization — van Eck & Waltman (2009) - NetworkX export — weighted graphs for further analysis

from litseer.networks import (
    build_bipartite,
    compute_coupling,
    compute_cocitation,
    normalize_association_strength,
    keywords_accessor,
    references_accessor,
    to_networkx,
    top_nodes,
)

# Build a keyword co-occurrence matrix
bip = build_bipartite(works, keywords_accessor)
CC = compute_cocitation(bip)

# Top connected keywords
print(top_nodes(CC, bip.item_labels, n=20))

# Normalized network for visualization
S = normalize_association_strength(CC)
G = to_networkx(S, bip.item_labels, threshold=0.01)

Merging Results

Combine multiple search result JSON files (e.g., from different runs or different configs) into one deduplicated set:

litseer merge output/search-2026-03-01.json output/search-2026-03-14.json \
    -o output/merged.json

Exporting

Convert a search result JSON to BibTeX or markdown:

# BibTeX (default)
litseer export output/search-2026-03-14.json --format bibtex

# BibTeX with unique keys relative to your existing .bib
litseer export output/search-2026-03-14.json --format bibtex \
    --existing-bib ../thesis/references.bib

# Markdown summary
litseer export output/search-2026-03-14.json --format markdown

Export output goes to stdout, so pipe or redirect as needed:

litseer export output/merged.json --format bibtex > new-refs.bib

Available Sources

Litseer searches nine sources by default. See sources.md for full details on each adapter.

Source Key needed Best for
openalex no Broad academic coverage, citation graph
semanticscholar no CS/engineering, citation graph
crossref no DOI metadata, publisher coverage
nasa_ntrs no NASA technical reports
ieee yes IEEE journals and conferences
aiaa no AIAA aerospace publications
sae no SAE automotive/aerospace standards
skybrary no Aviation safety knowledge base
local_graph no Locally cached papers and citations

Select specific sources with -s:

litseer search config.yaml -s openalex -s aiaa -s nasa_ntrs

Environment Variables

Variable Purpose Required
IEEE_API_KEY API key for IEEE Xplore Only if using ieee source

The --mailto flag (default: litseer@example.com) is sent in the User-Agent header to access polite rate-limit pools on OpenAlex and CrossRef. Set it to your real email for better throughput:

litseer search config.yaml --mailto researcher@mit.edu

Typical Workflow for a Literature Review

# 1. Write your search config
#    (see examples/aerospace-review.yaml for a real-world example,
#     or examples/quick-search.yaml for a minimal starting point)

# 2. Run the search
litseer search examples/aerospace-review.yaml \
    -o output/ \
    --mailto you@university.edu \
    --max-tier 2

# 3. Walk citations from key seed papers
litseer cite-walk "10.2514/1.J058123" --depth 2 --direction both

# 4. Check what's accumulated in the local graph
litseer graph stats

# 5. Re-walk using the local graph (no API calls)
litseer cite-walk "10.2514/1.J058123" --source local_graph --depth 2

# 6. Review the markdown summary
#    output/summary-2026-03-14.md

# 7. Merge the BibTeX into your thesis
#    Review output/new-refs-2026-03-14.bib, then copy relevant entries
#    into your references.bib

# 8. Re-run periodically to catch new publications
litseer search examples/aerospace-review.yaml -o output/

Multi-Technology Research Program

# 1. Create a directory with one YAML config per technology
#    (see examples/portfolio-demo/ for a working example)

# 2. Run the portfolio search
litseer portfolio examples/portfolio-demo/ \
    -o output/ \
    --mailto you@university.edu \
    --max-tier 2

# 3. Review per-technology results and the cross-tech summary
#    output/turbine-cooling/summary-2026-03-15.md
#    output/cmc-materials/summary-2026-03-15.md
#    output/portfolio-summary-2026-03-15.json

# 4. Re-run on a schedule (cron) to keep coverage current

Example Configs

The examples/ directory contains ready-to-use configs:

File Description
quick-search.yaml Minimal single-cluster example
aerospace-review.yaml Multi-cluster turbine blade cooling review with seed DOIs
orbital-edge-computing.yaml Real research config: LEO satellite constellation economics
portfolio-demo/ Two-technology portfolio batch search demo