Skip to content

ADR-009: Spatial Visualization Architecture

Date: 2026-03-15 Status: Draft (iterating on interaction design)

Context

Litseer accumulates citation graph data, bibliometric network matrices, and portfolio summaries that are inherently graph-structured. Traditional 2D interfaces (spreadsheets, flat search result lists) lose this structure. The goal is a spatial visualization that lets researchers explore their literature as an interactive, immersive environment — from desktop browsers to wall displays to VR headsets.

The interaction design is still evolving. This ADR captures the architectural decisions that are stable, and marks the interaction paradigm sections as draft for continued iteration.

Design Principles

Three principles from John Underkoffler's work on spatial operating environments (MIT Tangible Media Group → Oblong Industries → the Minority Report interface design) shape this architecture:

1. The interface is the operating system. When the interaction paradigm changes fundamentally, you cannot bolt it onto the old architecture (Underkoffler, TED 2010: "the OS is the interface... they're inseparable, they are one"). Litseer's spatial view is not a "3D rendering of CLI output." The spatial environment is the research tool — search, browse, verify, annotate all happen in the same space. This means the litseer serve API must expose operations (trigger search, walk citations, exclude papers, export selection) as first-class actions in the spatial UI, not just passive graph rendering.

2. Input and output spaces must be co-located. The Luminous Room principle (Underkoffler & Ishii, 1999): when you use a mouse, your hand is on the desk while the pixels are on the screen — two different planes. In spatial computing, the researcher reaches into the citation graph and manipulates it directly. Selecting a paper, walking its citations, opening the referenced passage — these happen where the data is, not through an intermediary control surface. This is the difference between using a tool to look at data and being inside the data.

3. Non-spatial data can be spatialized to leverage human spatial cognition. "The part that isn't spatial can often be spatialized to allow our wetware to make greater sense of it" (Underkoffler, TED 2010). Citation networks are inherently graph-structured (spatial), but metadata like publication date, venue tier, and citation count are scalar. Encoding these as visual-spatial properties (node size, color saturation, depth-axis position) lets the researcher's spatial cognition process them without reading numbers. The node property → visual mapping table in the "Visual Feedback" section below implements this principle.

Corollary: Collaborative spatial analysis — multiple researchers exploring the same citation graph simultaneously, each able to select, annotate, and flag papers visible to others — is a natural extension of co-located input/output. This is a future consideration, not a v1 requirement, but the architecture should not preclude it.

References for Design Principles

  • Underkoffler J (2010). "Pointing to the future of UI." TED Talk. https://www.ted.com/talks/john_underkoffler_pointing_to_the_future_of_ui
  • Underkoffler J & Ishii H (1999). "Urp: a luminous-tangible workbench for urban planning and design." CHI '99. doi:10.1145/302979.303114
  • Ishii H & Ullmer B (1997). "Tangible Bits: towards seamless interfaces between people, bits and atoms." CHI '97. doi:10.1145/258549.258715

Decision

Platform Strategy: WebXR Progressive Enhancement

A single web application that renders in a standard browser and optionally enters immersive mode on XR-capable devices.

Platform How it works
Desktop browser Mouse orbit, click to select, scroll to zoom
Wall display Same as desktop, optimized for large resolution
Quest 3 "Enter VR" button → full WebXR immersive-vr session
visionOS Safari WebXR immersive-vr (gaze + pinch via transient-pointer)
SteamVR WebXR via Chrome/Edge with OpenXR runtime

Rationale: WebXR is production-ready on all target platforms as of 2025-2026. Native development (Xcode/RealityKit, Unity/Unreal) is unnecessary for the initial implementation and would fragment the codebase. A single Three.js scene renders identically on all targets; only the interaction layer differs.

Tech Stack

Backend:   Starlette + uvicorn (same team as httpx, minimal deps)
Graph viz: three-forcegraph (raw Three.js Object3D, by vasturiano)
Renderer:  Three.js WebGPURenderer with automatic WebGL fallback
XR:        Native Three.js WebXR (renderer.xr.enabled = true)
Physics:   d3-force-3d (client-side, interactive simulation)

Why Starlette: Two dependencies (starlette + uvicorn). Same Encode team as httpx which litseer already uses. Starlette is pure ASGI — routes, static files, WebSocket. If we later need request validation or auto-generated API docs, upgrading to FastAPI is a one-import change (FastAPI is Starlette + Pydantic).

Why three-forcegraph: Proven library (vasturiano), renders as a plain Three.js Object3D that can be dropped into any scene — including one with WebXR enabled. The VR variant exists separately (3d-force-graph-vr via A-Frame) but we skip A-Frame and wire WebXR directly for fewer dependencies and more control.

Why WebGPU with fallback: Three.js WebGPURenderer (since r171) degrades automatically to WebGL 2. WebGPU is enabled by default on Chrome, Firefox 147+, Safari 26+, and visionOS. ~70% browser coverage as of early 2026. For citation graphs at realistic academic scale (hundreds to low thousands of nodes), WebGL is sufficient; WebGPU is a bonus for larger graphs or GPU-side physics.

Server Architecture (litseer serve)

┌─────────────────────────────────────────────────────┐
│  litseer core (existing)                            │
│  graph.py · networks.py · snowball.py · portfolio   │
│  sources/ · dedup · quality · export · sanitize     │
└──────────────────────┬──────────────────────────────┘
                       │ Python API
┌──────────────────────┴──────────────────────────────┐
│  litseer serve (new, Starlette + uvicorn)           │
│                                                      │
│  REST endpoints:                                     │
│    GET  /api/graph          full graph as JSON        │
│    GET  /api/graph/layout   graph + computed positions│
│    GET  /api/graph/stats    graph statistics          │
│    GET  /api/networks/{type} coupling/co-citation     │
│    GET  /api/configs        list YAML configs         │
│    PUT  /api/configs/{name} save/update config        │
│    POST /api/search         trigger search (job ID)   │
│    GET  /api/portfolio/{name} portfolio summary       │
│                                                      │
│  WebSocket:                                          │
│    WS   /ws/events          live search progress,    │
│                              new papers discovered   │
│                                                      │
│  Static files:                                       │
│    /    serve dashboard HTML/JS/CSS                   │
└──────────────────────┬──────────────────────────────┘
                       │ HTTP / WebSocket
┌──────────────────────┴──────────────────────────────┐
│  Visualization client (Three.js + WebXR)            │
│                                                      │
│  Shared: scene, rendering, data, layout physics      │
│  Per-platform:                                       │
│    DesktopControls (OrbitControls, mouse raycasting) │
│    QuestControls  (controller ray, trigger, teleport)│
│    VisionControls (transient-pointer, gaze + pinch)  │
└─────────────────────────────────────────────────────┘

Graph Layout: Server-Computed Initial + Client Physics

Hybrid approach: The server pre-computes initial node positions using networkx spring layout (fast, deterministic). The client receives these as starting positions and runs d3-force-3d for interactive physics — dragging nodes, exploding clusters, settling after new papers arrive.

This gives instant rendering on load (no simulation warm-up) while keeping the interactive feel of live physics.

Data Format for Visualization Clients

{
  "nodes": [
    {
      "id": "10.1115/1.4045389",
      "title": "Film Cooling Review",
      "authors": "Bogard, Thole",
      "year": 2019,
      "citations": 342,
      "cluster": "film_cooling",
      "tier": 1,
      "source_db": "openalex",
      "position": {"x": 1.2, "y": 0.5, "z": -0.3},
      "size": 0.8,
      "color": "#4a90d9"
    }
  ],
  "edges": [
    {"source": "10.1115/1.4045389", "target": "10.2514/1.J058123",
     "weight": 0.7, "type": "cites"}
  ],
  "clusters": [
    {"id": "film_cooling", "label": "Film Cooling",
     "color": "#4a90d9", "paper_count": 45}
  ],
  "meta": {
    "total_papers": 230,
    "total_edges": 1847,
    "layout_algorithm": "spring",
    "generated_at": "2026-03-15T12:00:00Z"
  }
}

Interaction Design (DRAFT — iterating)

The following sections describe the envisioned interaction paradigm. These are working hypotheses, not final decisions. They need prototyping and user testing before committing to implementation.

Three-Level Navigation

Overview ──select cluster──► Focus ──select paper──► Detail
   ▲                           ▲                        │
   └───────────back────────────┘────────back─────────────┘

Overview mode (the constellation): The full citation graph rendered as a force-directed 3D layout. Papers are small nodes. Edges are faint lines. Clusters are color-coded regions. The researcher sees the shape of their literature — where the mass is, where the gaps are, which clusters connect and which are isolated.

  • Camera position: slightly above and pulled back, looking down at the graph like a topographic map. This provides grounding (the graph is the ground plane) without requiring an artificial floor.
  • Node representation: small spheres or dots. Size scaled by citation count.
  • Cluster labels float above each cluster centroid.
  • Suitable for initial orientation and gap analysis.

Focus mode (the card surface): When the researcher selects a cluster or zooms into a region, papers transition from nodes to cards — title, authors, year, citation count visible as text. Cards arranged on a floating surface (or loosely grouped in space, not requiring a literal table).

  • Cards can be grabbed and rearranged (VR) or dragged (desktop).
  • Cards show enough metadata to make include/exclude decisions.
  • The cluster's internal citation edges remain visible as subtle lines between cards.

Detail mode (the paper view): Selecting a card expands it to show full metadata: abstract, DOI link, venue, quality tier, all citation edges, keywords. This is the "flip the card over" moment.

  • In VR: the card floats in front of the researcher at comfortable reading distance, other cards dim.
  • On desktop: a sidebar or overlay panel.
  • Action buttons: "Add to BibTeX", "Walk citations from here", "Exclude."

Visual Feedback (DRAFT — needs prototyping)

These are initial ideas for how the system communicates state through visual cues. The exact aesthetics need iteration.

Node properties mapped to visuals:

Data property Visual mapping Notes
Citation count Node size Logarithmic scale to prevent outliers dominating
Quality tier Opacity / saturation Tier 1 (peer-reviewed) = solid; Tier 4 = translucent
Cluster membership Color Consistent palette across sessions
Year Depth axis or brightness Older papers further back or slightly dimmer
Open access Ring / halo Subtle indicator, not primary

State transitions and animations:

Event Animation Purpose
New paper discovered (live search) Particle trail, floats into position Discovery excitement, shows the search working
Paper settles into cluster Gentle deceleration, slight bounce Physics feels natural
High-impact paper identified Brief golden pulse/glow Draws attention to important discoveries
Cluster grows past threshold Cluster label brightens, slight expansion Signals a rich area worth investigating
Gap detected (sparse region) Faint grid or fog in empty space Makes absence visible, not just presence
Citation edge traversed Edge briefly brightens, flows directionally Shows the citation direction (who cites whom)
Paper excluded by researcher Fades out, shrinks, drifts away Satisfying removal, reversible
Search completes All nodes settle, ambient glow stabilizes Signals "done, ready for review"

Open questions for prototyping:

  • Should papers have a ground plane (table/landscape) or float freely? Research suggests anchoring aids comfort, but floating feels more "spatial." May depend on number of papers — small graphs feel fine floating, large graphs need grounding.

  • What's the right density? A graph of 50 papers should feel explorable. A graph of 5,000 should not feel overwhelming. Probably need LOD (level of detail) — distant clusters collapse to a single labeled sphere, expand on approach.

  • How should the YAML config editor look in VR? Probably not VR-native for v1 — open the web dashboard on desktop, view results in VR. Config editing in VR is a v2+ feature if ever.

  • Sound design: subtle audio cues for discovery events? A soft chime when a high-impact paper arrives? This could enhance the dopamine feedback loop but needs taste — too many sounds becomes annoying fast.

  • Should clusters be labeled by their actual topic (from keyword co-occurrence analysis) or by the config's cluster ID? Probably both — config clusters are the researcher's intent, keyword clusters are the actual structure. Showing divergence between intent and reality is itself an insight.

Spatial Metaphor Options (under evaluation)

Three metaphors emerged from the literature review. The right choice may be a combination, or may vary by graph size.

The Constellation

Papers as stars in space. Brightness = impact. Clusters = named constellations. Natural 3D, beautiful, leverages the "golden twinkle" concept directly.

Best for: Overview mode, aesthetic impact, small-medium graphs. Weak at: Detailed reading (stars don't have text), large graphs (everything looks like noise).

The City (CodeCity / SecCityVR pattern)

Papers as buildings on a ground plane. Height = citation count. Districts = clusters. Streets = citation paths.

Best for: Large graphs, grounding/comfort, intuitive navigation. Weak at: Showing citation edges, feels corporate rather than academic.

The Library

Papers as cards/documents on surfaces. Grouped by topic on different floating shelves or tables. Grabbable, rearrangeable.

Best for: Focus mode, tactile interaction, decision-making (include/exclude). Weak at: Overview of large networks, showing global structure.

Proposed combination

  • Overview: Constellation metaphor (nodes, edges, floating in space with the graph as the ground plane).
  • Focus: Library metaphor (cards on surfaces when zoomed into a cluster).
  • Detail: Document metaphor (full paper card floating at reading distance).

Key References

  • Underkoffler J (2010). "Pointing to the future of UI." TED Talk. ted.com/talks/john_underkoffler_pointing_to_the_future_of_ui
  • Underkoffler J & Ishii H (1999). "Urp: a luminous-tangible workbench for urban planning and design." CHI '99. doi:10.1145/302979.303114
  • Ishii H & Ullmer B (1997). "Tangible Bits." CHI '97. doi:10.1145/258549.258715
  • Marriott K et al. (2018). "Immersive Analytics." Frontiers in Robotics and AI. doi:10.3389/frobt.2019.00082
  • Krokos E et al. (2018). "Virtual memory palaces: immersion aids recall." doi:10.1007/s10055-018-0346-3
  • Ware C & Mitchell P (2008). "Visualizing Graphs in Three Dimensions." ACM TAP 5(1). doi:10.1145/1279640.1279642
  • Venturini T, Jacomy M & Jensen P (2021). "What do we see when we look at networks." Big Data & Society. doi:10.1177/20539517211018488
  • Apple (2025). "Spatial Layout." Human Interface Guidelines. developer.apple.com/design/human-interface-guidelines/spatial-layout
  • Lee B et al. (2022). "Design Space for Vis Transformations Between 2D and 3D in Mixed Reality." CHI 2022. doi:10.1145/3491102.3501859
  • Olshannikova E et al. (2015). "Visualizing Big Data with AR and VR." doi:10.1186/s40537-015-0031-2
  • Dingler T et al. (2018). "VR Reading: Text Presentation in VR." CHI 2018.
  • vasturiano/3d-force-graph. github.com/vasturiano/3d-force-graph
  • Graph2VR (2024). "Visualization and exploration of linked data using VR." doi:10.1093/database/baae008

Consequences

Positive: - Single codebase serves desktop, wall display, and XR devices - Starlette server adds only two dependencies to litseer - three-forcegraph is proven and actively maintained - Progressive enhancement means the basic dashboard is useful immediately without XR hardware - Data format is frontend-agnostic — any future client (native visionOS, Unreal, etc.) can consume the same API

Negative: - WebXR interaction quality varies by platform (Quest best, visionOS limited to immersive-vr, SteamVR needs runtime config) - Per-platform interaction adapters are bounded but non-trivial work - Web-based 3D will never match native visionOS RealityKit visual quality - The interaction design is unproven and needs prototyping

Open: - Frontend framework: vanilla Three.js vs React Three Fiber vs Svelte? - Sound design: yes/no/optional? - LOD strategy for large graphs - Whether to pursue native visionOS later for visual quality

PDF Rendering in Spatial View (DRAFT)

Researchers need to read actual papers in the spatial environment — open a referenced paper at the citation site, see the key fact highlighted, verify whether the citation supports the claim. This requires high-quality PDF rendering in WebXR.

Architecture: Pre-Rasterization to GPU Textures

PDFs are not rendered in real-time in the 3D scene. Instead, pages are pre-rasterized to canvas bitmaps in Web Workers, then uploaded as GPU textures. This decouples rendering cost from frame rate.

PDF binary → pdf.js Web Worker → OffscreenCanvas → THREE.CanvasTexture → GPU

Why pdf.js (not mupdf or pdfium): The critical requirement is text position extraction for highlighting citation passages. pdf.js's page.getTextContent() returns character-level affine transforms — exact x, y, width, height for every text span. mupdf.js (WASM, faster rendering) is a fallback if pdf.js becomes a speed bottleneck, but pdf.js's text layer API is non-negotiable for the highlight feature.

Resolution Targets

Quest 3: ~25 pixels per degree (PPD). A paper page at comfortable reading distance (~0.6–0.8m) subtends ~30–40° vertically.

Distance from user Texture resolution GPU memory Purpose
Focused (< 0.5m) 2048 × 2650 ~21 MB Reading text
Nearby (0.5–2m) 512 × 512 ~1 MB Identifying papers
Peripheral (2–5m) 128 × 128 (atlas) ~65 KB Recognizing shape
Distant (> 5m) Solid color + title ~0 Orientation only

Memory budget (Quest 3, ~8 GB shared RAM): - 2–4 focused pages at full resolution: ~84 MB - 8–16 nearby pages at 512×512: ~16 MB - 1024 thumbnails packed into one 4096×4096 texture atlas: ~64 MB - Total: ~100–150 MB, well within the ~512 MB texture budget

The Physical Document Metaphor

Papers appear as thin planes with page-turn animation, not as scrolling views. This matches the mental model of handling physical papers and hides rendering latency behind a satisfying interaction.

Page stack visualization: - N thin meshes (BoxGeometry, 0.001 thickness) stacked with ~0.002 spacing - Each gets a low-res thumbnail texture immediately - On approach/gaze: top page begins rendering at higher LOD - Focused page swaps to full-resolution texture when render completes

Page flip animation: - PlaneGeometry with sufficient subdivision (32 × 42 segments) - Vertex deformation along a cylinder surface during flip - Cylinder radius animates from infinity (flat) to small value (tight curl) - The page curling reveals the next page underneath, which has already been pre-rendered at the appropriate LOD

Fan/splay effect for browsing: - Pages spread in an arc (like fanning a deck of cards) - Each page has slight Y-axis rotation offset - Selection animates target page forward, others compress back - Computationally cheap — transform matrices only, no mesh deformation

Highlighting Citation Passages

When a researcher opens paper A from a citation edge in the graph, the system highlights the passage where paper A cites paper B (or vice versa).

Text position mapping (pdf.js):

page.getTextContent() → items[].transform → [fontSize, 0, 0, fontSize, x, y]
PDF coordinates → canvas coordinates (flip Y) → UV coordinates → 3D position

Highlight rendering: Geometry overlay approach — thin transparent colored quads (PlaneGeometry + MeshBasicMaterial, opacity 0.3, polygonOffset to prevent z-fighting) placed slightly in front of the page mesh at the computed text coordinates. Each highlight is independently interactive: hover for context, click to navigate to the cited paper, dismiss to clear.

Snippet matching: Use pdf.js text extraction to fuzzy-match the citation context snippet against concatenated page text, computing character offsets that map back to position transforms. Pre-index text positions at ingest time and store in DuckDB alongside the citation graph.

Text Legibility Considerations

  • No subpixel rendering. ClearType/LCD AA assumes a fixed pixel-to- subpixel layout. In VR the head rotates relative to the display, breaking this assumption. Use grayscale antialiasing (OffscreenCanvas default).
  • Minimum 0.4° angular size per character for comfortable reading (Dingler et al., 2018). For 10pt body text at 0.7m: scale virtual page to ~1.4× physical size, or default reading distance of ~0.5m.
  • SDF text for overlay labels. All non-PDF text (titles, highlights, annotations) uses Signed Distance Field rendering (troika-three-text) — resolution-independent and sharp at any distance.
  • Background: medium gray (#404040–#606060) behind documents to reduce contrast strain vs. pure black VR environment.

Texture Compression Pipeline

For cross-platform GPU texture efficiency:

Canvas render → KTX2/Basis Universal encoding → GPU-native transcode
  Quest 3:  → ASTC 4×4 (native Adreno support, 8:1 compression)
  Desktop:  → BC7/DXT5 (native desktop GPU support)

Three.js KTX2Loader handles transcoding automatically. Encode once at render time, transcode to the best native format per device.

Rendering Worker Pool

  • 2–4 Web Workers running pdf.js instances
  • Priority queue: focused page renders first, then nearby, then thumbnails
  • OffscreenCanvas rendering keeps the main thread (and therefore frame rate) untouched
  • Pre-render visible thumbnails at initialization; full-res on demand
  • Cache rendered textures in IndexedDB for repeat visits

References

  • pdf.js OffscreenCanvas: https://github.com/nicholass003/pdf.js
  • mupdf.js (WASM): https://mupdf.com/docs/mupdf-js.html
  • troika-three-text (SDF): https://github.com/protectwise/troika
  • KTX2 / Basis Universal: https://github.com/BinomialLLC/basis_universal
  • Dingler et al. (2018). "Reading in VR." CHI 2018.
  • Kojic et al. (2020). "User Experience of Reading in VR." QoMEX 2020.