Litseer Usage Guide¶
Litseer is an async Python CLI tool for multi-source academic literature search. It searches multiple databases in parallel, deduplicates results, classifies quality tiers, and exports to BibTeX, JSON, or markdown. Designed for systematic literature reviews in aerospace and engineering.
Installation¶
Requires Python 3.11+. Install with uv:
uv pip install -e .
# With development dependencies (pytest, ruff, respx):
uv pip install -e ".[dev]"
# With network analysis (bipartite matrices, co-citation, coupling):
uv pip install -e ".[networks]"
After installation the litseer command is available in your shell.
Quick Start¶
# Run a search from a config file
litseer search examples/aerospace-review.yaml
# Specify output directory and email for polite API access
litseer search examples/aerospace-review.yaml -o results/ --mailto you@university.edu
# Search only specific sources
litseer search examples/aerospace-review.yaml -s openalex -s crossref
# Enable debug logging
litseer -v search examples/aerospace-review.yaml
Config File Format¶
Searches are driven by YAML config files. A config defines topic clusters, each with a set of query strings. Litseer runs every query against every year in the range, across all selected sources.
name: my-literature-review
description: >
Systematic review of film cooling effectiveness on turbine blades.
year_min: 2020
year_max: 2026
# Optional: path to your existing .bib file.
# Litseer will skip papers you already cite.
existing_bib_path: ../thesis/references.bib
# Optional: seed DOIs for citation walking
seed_dois:
- "10.1115/1.4045389"
clusters:
- id: film_cooling
label: Film Cooling Effectiveness
queries:
- "film cooling turbine blade effectiveness"
- "shaped hole film cooling"
sections: [literature_review, methodology]
- id: conjugate_ht
label: Conjugate Heat Transfer
queries:
- "conjugate heat transfer turbine blade CFD"
sections: [methodology]
Config Fields¶
| Field | Required | Default | Description |
|---|---|---|---|
name |
no | "unnamed" |
Name for this search configuration |
description |
no | "" |
Free-text description |
year_min |
no | 2024 |
Earliest publication year to search |
year_max |
no | 2026 |
Latest publication year to search |
existing_bib_path |
no | none | Path to existing .bib file for dedup |
seed_dois |
no | [] |
DOIs for citation walking |
clusters |
yes | -- | List of topic clusters (see below) |
Cluster Fields¶
| Field | Required | Description |
|---|---|---|
id |
yes | Short identifier (used in output grouping) |
label |
no | Human-readable label |
queries |
yes | List of search query strings |
sections |
no | Thesis/paper sections this cluster maps to |
Search Workflow¶
When you run litseer search, the tool:
- Loads the YAML config
- Creates source adapters (OpenAlex, CrossRef, etc.)
- For each cluster, runs each query for each year across all sources
- Classifies every result by quality tier
- Filters results by the
--max-tierthreshold - Deduplicates by DOI and title similarity
- Removes papers already in your existing
.bibfile - Populates the local citation graph with discovered papers and edges
- Writes three output files to the output directory
Output Files¶
For a search run on 2026-03-14, you get:
output/search-2026-03-14.json-- Full structured results with query logoutput/new-refs-2026-03-14.bib-- BibTeX entries ready to mergeoutput/summary-2026-03-14.md-- Markdown summary grouped by cluster
Quality Tiers¶
Every result is classified into one of four quality tiers based on venue type:
| Tier | Label | Venue Types | Example |
|---|---|---|---|
| 1 | Peer-reviewed | journal, conference | AIAA Journal, ASME Turbo Expo |
| 2 | Technical | report, thesis, standard, book | NASA TM, SAE standards |
| 3 | Preprint | preprint | arXiv, SSRN |
| 4 | Grey literature | news, blog, unclassified | Trade press, blogs |
Use --max-tier to control which tiers to include:
# Only peer-reviewed (tier 1)
litseer search config.yaml --max-tier 1
# Peer-reviewed + technical reports (default)
litseer search config.yaml --max-tier 3
# Everything including grey literature
litseer search config.yaml --max-tier 4
Citation Snowball Walking¶
Use cite-walk to explore the citation graph around a seed paper.
This finds papers that cite the seed (forward) and papers the seed
references (backward). Discovered works are automatically ingested
into the local citation graph for future use.
# Walk both directions from a DOI, depth 1
litseer cite-walk "10.1115/1.4045389"
# Forward citations only, 2 levels deep
litseer cite-walk "10.1115/1.4045389" --direction forward --depth 2
# Backward references only, limit 20 per level
litseer cite-walk "10.1115/1.4045389" --direction backward --limit 20
# Use Semantic Scholar instead of OpenAlex
litseer cite-walk "10.1115/1.4045389" --source semanticscholar
# Walk from a paper in your local graph (no API calls needed)
litseer cite-walk "10.1115/1.4045389" --source local_graph
Supported Citation Walk Sources¶
| Source | Forward (citing) | Backward (references) |
|---|---|---|
| openalex | yes | yes |
| semanticscholar | yes | yes |
| crossref | no | yes |
| local_graph | yes | yes |
| skybrary | yes (backlinks) | yes (wiki links) |
| nasa_ntrs | no | no |
| ieee | no | no |
| aiaa | no | yes |
| sae | no | yes |
Technology Portfolio Search¶
For multi-technology research programs, the portfolio command runs batch
searches across a directory of YAML configs. Each technology gets its own
output subfolder, and a portfolio-level summary tracks cross-technology
shared references.
# Run all configs in a directory
litseer portfolio examples/portfolio-demo/ -o output/
# With quality filtering
litseer portfolio examples/portfolio-demo/ -o output/ --max-tier 2
# Specific sources only
litseer portfolio examples/portfolio-demo/ -o output/ --sources openalex --sources crossref
Portfolio Output Structure¶
output/
turbine-cooling/
search-2026-03-15.json
new-refs-2026-03-15.bib
summary-2026-03-15.md
cmc-materials/
search-2026-03-15.json
new-refs-2026-03-15.bib
summary-2026-03-15.md
portfolio-summary-2026-03-15.json # cross-technology analysis
The portfolio summary JSON includes: - Per-technology result counts - Papers shared across multiple technologies - Total unique papers across the portfolio
This is designed for scheduled/automated runs (cron, CI) to keep a research program's literature coverage up to date.
Local Citation Graph¶
Every search and cite-walk automatically populates a local DuckDB citation
graph at ~/.cache/litseer/graph.db. This graph accumulates paper metadata
and citation edges over time, enabling:
- Citation walking against locally cached papers without API calls
- Graph statistics and visualization
- Bibliometric network analysis (with
[networks]extra)
# View graph statistics
litseer graph stats
# Export graph as JSON (for external tools or LLM consumption)
litseer graph export --format json -o graph.json
# Export as DOT for Graphviz visualization
litseer graph export --format dot -o citations.dot
dot -Tpdf citations.dot -o citations.pdf
# Disable graph population for a search
litseer --no-graph search config.yaml
Network Analysis¶
With the [networks] extra installed, you can build bibliometric
co-occurrence networks from search results using sparse bipartite
matrices. This follows the methodology from R bibliometrix
(Aria & Cuccurullo, 2017).
Available operations: - Bibliographic coupling — papers that share references (A @ A.T) - Co-citation — references cited together (A.T @ A) - Keyword co-occurrence — author/indexed keywords appearing together - Association strength normalization — van Eck & Waltman (2009) - NetworkX export — weighted graphs for further analysis
from litseer.networks import (
build_bipartite,
compute_coupling,
compute_cocitation,
normalize_association_strength,
keywords_accessor,
references_accessor,
to_networkx,
top_nodes,
)
# Build a keyword co-occurrence matrix
bip = build_bipartite(works, keywords_accessor)
CC = compute_cocitation(bip)
# Top connected keywords
print(top_nodes(CC, bip.item_labels, n=20))
# Normalized network for visualization
S = normalize_association_strength(CC)
G = to_networkx(S, bip.item_labels, threshold=0.01)
Merging Results¶
Combine multiple search result JSON files (e.g., from different runs or different configs) into one deduplicated set:
Exporting¶
Convert a search result JSON to BibTeX or markdown:
# BibTeX (default)
litseer export output/search-2026-03-14.json --format bibtex
# BibTeX with unique keys relative to your existing .bib
litseer export output/search-2026-03-14.json --format bibtex \
--existing-bib ../thesis/references.bib
# Markdown summary
litseer export output/search-2026-03-14.json --format markdown
Export output goes to stdout, so pipe or redirect as needed:
Available Sources¶
Litseer searches nine sources by default. See sources.md for full details on each adapter.
| Source | Key needed | Best for |
|---|---|---|
openalex |
no | Broad academic coverage, citation graph |
semanticscholar |
no | CS/engineering, citation graph |
crossref |
no | DOI metadata, publisher coverage |
nasa_ntrs |
no | NASA technical reports |
ieee |
yes | IEEE journals and conferences |
aiaa |
no | AIAA aerospace publications |
sae |
no | SAE automotive/aerospace standards |
skybrary |
no | Aviation safety knowledge base |
local_graph |
no | Locally cached papers and citations |
Select specific sources with -s:
Environment Variables¶
| Variable | Purpose | Required |
|---|---|---|
IEEE_API_KEY |
API key for IEEE Xplore | Only if using ieee source |
The --mailto flag (default: litseer@example.com) is sent in the
User-Agent header to access polite rate-limit pools on OpenAlex and CrossRef.
Set it to your real email for better throughput:
Typical Workflow for a Literature Review¶
# 1. Write your search config
# (see examples/aerospace-review.yaml for a real-world example,
# or examples/quick-search.yaml for a minimal starting point)
# 2. Run the search
litseer search examples/aerospace-review.yaml \
-o output/ \
--mailto you@university.edu \
--max-tier 2
# 3. Walk citations from key seed papers
litseer cite-walk "10.2514/1.J058123" --depth 2 --direction both
# 4. Check what's accumulated in the local graph
litseer graph stats
# 5. Re-walk using the local graph (no API calls)
litseer cite-walk "10.2514/1.J058123" --source local_graph --depth 2
# 6. Review the markdown summary
# output/summary-2026-03-14.md
# 7. Merge the BibTeX into your thesis
# Review output/new-refs-2026-03-14.bib, then copy relevant entries
# into your references.bib
# 8. Re-run periodically to catch new publications
litseer search examples/aerospace-review.yaml -o output/
Multi-Technology Research Program¶
# 1. Create a directory with one YAML config per technology
# (see examples/portfolio-demo/ for a working example)
# 2. Run the portfolio search
litseer portfolio examples/portfolio-demo/ \
-o output/ \
--mailto you@university.edu \
--max-tier 2
# 3. Review per-technology results and the cross-tech summary
# output/turbine-cooling/summary-2026-03-15.md
# output/cmc-materials/summary-2026-03-15.md
# output/portfolio-summary-2026-03-15.json
# 4. Re-run on a schedule (cron) to keep coverage current
Example Configs¶
The examples/ directory contains ready-to-use configs:
| File | Description |
|---|---|
quick-search.yaml |
Minimal single-cluster example |
aerospace-review.yaml |
Multi-cluster turbine blade cooling review with seed DOIs |
orbital-edge-computing.yaml |
Real research config: LEO satellite constellation economics |
portfolio-demo/ |
Two-technology portfolio batch search demo |