ADR-009: Spatial Visualization Architecture¶

Date: 2026-03-15 Status: Draft (iterating on interaction design)

Context¶

Litseer accumulates citation graph data, bibliometric network matrices, and portfolio summaries that are inherently graph-structured. Traditional 2D interfaces (spreadsheets, flat search result lists) lose this structure. The goal is a spatial visualization that lets researchers explore their literature as an interactive, immersive environment — from desktop browsers to wall displays to VR headsets.

The interaction design is still evolving. This ADR captures the architectural decisions that are stable, and marks the interaction paradigm sections as draft for continued iteration.

Design Principles¶

Three principles from John Underkoffler's work on spatial operating environments (MIT Tangible Media Group → Oblong Industries → the Minority Report interface design) shape this architecture:

1. The interface is the operating system. When the interaction paradigm changes fundamentally, you cannot bolt it onto the old architecture (Underkoffler, TED 2010: "the OS is the interface... they're inseparable, they are one"). Litseer's spatial view is not a "3D rendering of CLI output." The spatial environment is the research tool — search, browse, verify, annotate all happen in the same space. This means the litseer serve API must expose operations (trigger search, walk citations, exclude papers, export selection) as first-class actions in the spatial UI, not just passive graph rendering.

2. Input and output spaces must be co-located. The Luminous Room principle (Underkoffler & Ishii, 1999): when you use a mouse, your hand is on the desk while the pixels are on the screen — two different planes. In spatial computing, the researcher reaches into the citation graph and manipulates it directly. Selecting a paper, walking its citations, opening the referenced passage — these happen where the data is, not through an intermediary control surface. This is the difference between using a tool to look at data and being inside the data.

3. Non-spatial data can be spatialized to leverage human spatial cognition. "The part that isn't spatial can often be spatialized to allow our wetware to make greater sense of it" (Underkoffler, TED 2010). Citation networks are inherently graph-structured (spatial), but metadata like publication date, venue tier, and citation count are scalar. Encoding these as visual-spatial properties (node size, color saturation, depth-axis position) lets the researcher's spatial cognition process them without reading numbers. The node property → visual mapping table in the "Visual Feedback" section below implements this principle.

Corollary: Collaborative spatial analysis — multiple researchers exploring the same citation graph simultaneously, each able to select, annotate, and flag papers visible to others — is a natural extension of co-located input/output. This is a future consideration, not a v1 requirement, but the architecture should not preclude it.

References for Design Principles¶

Underkoffler J (2010). "Pointing to the future of UI." TED Talk. https://www.ted.com/talks/john_underkoffler_pointing_to_the_future_of_ui
Underkoffler J & Ishii H (1999). "Urp: a luminous-tangible workbench for urban planning and design." CHI '99. doi:10.1145/302979.303114
Ishii H & Ullmer B (1997). "Tangible Bits: towards seamless interfaces between people, bits and atoms." CHI '97. doi:10.1145/258549.258715

Decision¶

Platform Strategy: WebXR Progressive Enhancement¶

A single web application that renders in a standard browser and optionally enters immersive mode on XR-capable devices.

Platform	How it works
Desktop browser	Mouse orbit, click to select, scroll to zoom
Wall display	Same as desktop, optimized for large resolution
Quest 3	"Enter VR" button → full WebXR immersive-vr session
visionOS Safari	WebXR immersive-vr (gaze + pinch via transient-pointer)
SteamVR	WebXR via Chrome/Edge with OpenXR runtime

Rationale: WebXR is production-ready on all target platforms as of 2025-2026. Native development (Xcode/RealityKit, Unity/Unreal) is unnecessary for the initial implementation and would fragment the codebase. A single Three.js scene renders identically on all targets; only the interaction layer differs.

Tech Stack¶

Backend:   Starlette + uvicorn (same team as httpx, minimal deps)
Graph viz: three-forcegraph (raw Three.js Object3D, by vasturiano)
Renderer:  Three.js WebGPURenderer with automatic WebGL fallback
XR:        Native Three.js WebXR (renderer.xr.enabled = true)
Physics:   d3-force-3d (client-side, interactive simulation)

Why Starlette: Two dependencies (starlette + uvicorn). Same Encode team as httpx which litseer already uses. Starlette is pure ASGI — routes, static files, WebSocket. If we later need request validation or auto-generated API docs, upgrading to FastAPI is a one-import change (FastAPI is Starlette + Pydantic).

Why three-forcegraph: Proven library (vasturiano), renders as a plain Three.js Object3D that can be dropped into any scene — including one with WebXR enabled. The VR variant exists separately (3d-force-graph-vr via A-Frame) but we skip A-Frame and wire WebXR directly for fewer dependencies and more control.

Why WebGPU with fallback: Three.js WebGPURenderer (since r171) degrades automatically to WebGL 2. WebGPU is enabled by default on Chrome, Firefox 147+, Safari 26+, and visionOS. ~70% browser coverage as of early 2026. For citation graphs at realistic academic scale (hundreds to low thousands of nodes), WebGL is sufficient; WebGPU is a bonus for larger graphs or GPU-side physics.

Server Architecture (litseer serve)¶

┌─────────────────────────────────────────────────────┐
│  litseer core (existing)                            │
│  graph.py · networks.py · snowball.py · portfolio   │
│  sources/ · dedup · quality · export · sanitize     │
└──────────────────────┬──────────────────────────────┘
                       │ Python API
┌──────────────────────┴──────────────────────────────┐
│  litseer serve (new, Starlette + uvicorn)           │
│                                                      │
│  REST endpoints:                                     │
│    GET  /api/graph          full graph as JSON        │
│    GET  /api/graph/layout   graph + computed positions│
│    GET  /api/graph/stats    graph statistics          │
│    GET  /api/networks/{type} coupling/co-citation     │
│    GET  /api/configs        list YAML configs         │
│    PUT  /api/configs/{name} save/update config        │
│    POST /api/search         trigger search (job ID)   │
│    GET  /api/portfolio/{name} portfolio summary       │
│                                                      │
│  WebSocket:                                          │
│    WS   /ws/events          live search progress,    │
│                              new papers discovered   │
│                                                      │
│  Static files:                                       │
│    /    serve dashboard HTML/JS/CSS                   │
└──────────────────────┬──────────────────────────────┘
                       │ HTTP / WebSocket
┌──────────────────────┴──────────────────────────────┐
│  Visualization client (Three.js + WebXR)            │
│                                                      │
│  Shared: scene, rendering, data, layout physics      │
│  Per-platform:                                       │
│    DesktopControls (OrbitControls, mouse raycasting) │
│    QuestControls  (controller ray, trigger, teleport)│
│    VisionControls (transient-pointer, gaze + pinch)  │
└─────────────────────────────────────────────────────┘

Graph Layout: Server-Computed Initial + Client Physics¶

Hybrid approach: The server pre-computes initial node positions using networkx spring layout (fast, deterministic). The client receives these as starting positions and runs d3-force-3d for interactive physics — dragging nodes, exploding clusters, settling after new papers arrive.

This gives instant rendering on load (no simulation warm-up) while keeping the interactive feel of live physics.

Data Format for Visualization Clients¶

{
  "nodes": [
    {
      "id": "10.1115/1.4045389",
      "title": "Film Cooling Review",
      "authors": "Bogard, Thole",
      "year": 2019,
      "citations": 342,
      "cluster": "film_cooling",
      "tier": 1,
      "source_db": "openalex",
      "position": {"x": 1.2, "y": 0.5, "z": -0.3},
      "size": 0.8,
      "color": "#4a90d9"
    }
  ],
  "edges": [
    {"source": "10.1115/1.4045389", "target": "10.2514/1.J058123",
     "weight": 0.7, "type": "cites"}
  ],
  "clusters": [
    {"id": "film_cooling", "label": "Film Cooling",
     "color": "#4a90d9", "paper_count": 45}
  ],
  "meta": {
    "total_papers": 230,
    "total_edges": 1847,
    "layout_algorithm": "spring",
    "generated_at": "2026-03-15T12:00:00Z"
  }
}

Interaction Design (DRAFT — iterating)¶

The following sections describe the envisioned interaction paradigm. These are working hypotheses, not final decisions. They need prototyping and user testing before committing to implementation.

Overview ──select cluster──► Focus ──select paper──► Detail
   ▲                           ▲                        │
   └───────────back────────────┘────────back─────────────┘

Overview mode (the constellation): The full citation graph rendered as a force-directed 3D layout. Papers are small nodes. Edges are faint lines. Clusters are color-coded regions. The researcher sees the shape of their literature — where the mass is, where the gaps are, which clusters connect and which are isolated.

Camera position: slightly above and pulled back, looking down at the graph like a topographic map. This provides grounding (the graph is the ground plane) without requiring an artificial floor.
Node representation: small spheres or dots. Size scaled by citation count.
Cluster labels float above each cluster centroid.
Suitable for initial orientation and gap analysis.

Focus mode (the card surface): When the researcher selects a cluster or zooms into a region, papers transition from nodes to cards — title, authors, year, citation count visible as text. Cards arranged on a floating surface (or loosely grouped in space, not requiring a literal table).

Cards can be grabbed and rearranged (VR) or dragged (desktop).
Cards show enough metadata to make include/exclude decisions.
The cluster's internal citation edges remain visible as subtle lines between cards.

Detail mode (the paper view): Selecting a card expands it to show full metadata: abstract, DOI link, venue, quality tier, all citation edges, keywords. This is the "flip the card over" moment.

In VR: the card floats in front of the researcher at comfortable reading distance, other cards dim.
On desktop: a sidebar or overlay panel.
Action buttons: "Add to BibTeX", "Walk citations from here", "Exclude."

Visual Feedback (DRAFT — needs prototyping)¶

These are initial ideas for how the system communicates state through visual cues. The exact aesthetics need iteration.

Node properties mapped to visuals:

Data property	Visual mapping	Notes
Citation count	Node size	Logarithmic scale to prevent outliers dominating
Quality tier	Opacity / saturation	Tier 1 (peer-reviewed) = solid; Tier 4 = translucent
Cluster membership	Color	Consistent palette across sessions
Year	Depth axis or brightness	Older papers further back or slightly dimmer
Open access	Ring / halo	Subtle indicator, not primary

State transitions and animations:

Event	Animation	Purpose
New paper discovered (live search)	Particle trail, floats into position	Discovery excitement, shows the search working
Paper settles into cluster	Gentle deceleration, slight bounce	Physics feels natural
High-impact paper identified	Brief golden pulse/glow	Draws attention to important discoveries
Cluster grows past threshold	Cluster label brightens, slight expansion	Signals a rich area worth investigating
Gap detected (sparse region)	Faint grid or fog in empty space	Makes absence visible, not just presence
Citation edge traversed	Edge briefly brightens, flows directionally	Shows the citation direction (who cites whom)
Paper excluded by researcher	Fades out, shrinks, drifts away	Satisfying removal, reversible
Search completes	All nodes settle, ambient glow stabilizes	Signals "done, ready for review"

Open questions for prototyping:

Should papers have a ground plane (table/landscape) or float freely? Research suggests anchoring aids comfort, but floating feels more "spatial." May depend on number of papers — small graphs feel fine floating, large graphs need grounding.
What's the right density? A graph of 50 papers should feel explorable. A graph of 5,000 should not feel overwhelming. Probably need LOD (level of detail) — distant clusters collapse to a single labeled sphere, expand on approach.
How should the YAML config editor look in VR? Probably not VR-native for v1 — open the web dashboard on desktop, view results in VR. Config editing in VR is a v2+ feature if ever.
Sound design: subtle audio cues for discovery events? A soft chime when a high-impact paper arrives? This could enhance the dopamine feedback loop but needs taste — too many sounds becomes annoying fast.
Should clusters be labeled by their actual topic (from keyword co-occurrence analysis) or by the config's cluster ID? Probably both — config clusters are the researcher's intent, keyword clusters are the actual structure. Showing divergence between intent and reality is itself an insight.

Spatial Metaphor Options (under evaluation)¶

Three metaphors emerged from the literature review. The right choice may be a combination, or may vary by graph size.

The Constellation¶

Papers as stars in space. Brightness = impact. Clusters = named constellations. Natural 3D, beautiful, leverages the "golden twinkle" concept directly.

Best for: Overview mode, aesthetic impact, small-medium graphs. Weak at: Detailed reading (stars don't have text), large graphs (everything looks like noise).

The City (CodeCity / SecCityVR pattern)¶

Papers as buildings on a ground plane. Height = citation count. Districts = clusters. Streets = citation paths.

Best for: Large graphs, grounding/comfort, intuitive navigation. Weak at: Showing citation edges, feels corporate rather than academic.

The Library¶

Papers as cards/documents on surfaces. Grouped by topic on different floating shelves or tables. Grabbable, rearrangeable.

Best for: Focus mode, tactile interaction, decision-making (include/exclude). Weak at: Overview of large networks, showing global structure.

Proposed combination¶

Overview: Constellation metaphor (nodes, edges, floating in space with the graph as the ground plane).
Focus: Library metaphor (cards on surfaces when zoomed into a cluster).
Detail: Document metaphor (full paper card floating at reading distance).

Key References¶

Underkoffler J (2010). "Pointing to the future of UI." TED Talk. ted.com/talks/john_underkoffler_pointing_to_the_future_of_ui
Underkoffler J & Ishii H (1999). "Urp: a luminous-tangible workbench for urban planning and design." CHI '99. doi:10.1145/302979.303114
Ishii H & Ullmer B (1997). "Tangible Bits." CHI '97. doi:10.1145/258549.258715
Marriott K et al. (2018). "Immersive Analytics." Frontiers in Robotics and AI. doi:10.3389/frobt.2019.00082
Krokos E et al. (2018). "Virtual memory palaces: immersion aids recall." doi:10.1007/s10055-018-0346-3
Ware C & Mitchell P (2008). "Visualizing Graphs in Three Dimensions." ACM TAP 5(1). doi:10.1145/1279640.1279642
Venturini T, Jacomy M & Jensen P (2021). "What do we see when we look at networks." Big Data & Society. doi:10.1177/20539517211018488
Apple (2025). "Spatial Layout." Human Interface Guidelines. developer.apple.com/design/human-interface-guidelines/spatial-layout
Lee B et al. (2022). "Design Space for Vis Transformations Between 2D and 3D in Mixed Reality." CHI 2022. doi:10.1145/3491102.3501859
Olshannikova E et al. (2015). "Visualizing Big Data with AR and VR." doi:10.1186/s40537-015-0031-2
Dingler T et al. (2018). "VR Reading: Text Presentation in VR." CHI 2018.
vasturiano/3d-force-graph. github.com/vasturiano/3d-force-graph
Graph2VR (2024). "Visualization and exploration of linked data using VR." doi:10.1093/database/baae008

Consequences¶

Positive: - Single codebase serves desktop, wall display, and XR devices - Starlette server adds only two dependencies to litseer - three-forcegraph is proven and actively maintained - Progressive enhancement means the basic dashboard is useful immediately without XR hardware - Data format is frontend-agnostic — any future client (native visionOS, Unreal, etc.) can consume the same API

Negative: - WebXR interaction quality varies by platform (Quest best, visionOS limited to immersive-vr, SteamVR needs runtime config) - Per-platform interaction adapters are bounded but non-trivial work - Web-based 3D will never match native visionOS RealityKit visual quality - The interaction design is unproven and needs prototyping

Open: - Frontend framework: vanilla Three.js vs React Three Fiber vs Svelte? - Sound design: yes/no/optional? - LOD strategy for large graphs - Whether to pursue native visionOS later for visual quality

PDF Rendering in Spatial View (DRAFT)¶

Researchers need to read actual papers in the spatial environment — open a referenced paper at the citation site, see the key fact highlighted, verify whether the citation supports the claim. This requires high-quality PDF rendering in WebXR.

Architecture: Pre-Rasterization to GPU Textures¶

PDFs are not rendered in real-time in the 3D scene. Instead, pages are pre-rasterized to canvas bitmaps in Web Workers, then uploaded as GPU textures. This decouples rendering cost from frame rate.

PDF binary → pdf.js Web Worker → OffscreenCanvas → THREE.CanvasTexture → GPU

Why pdf.js (not mupdf or pdfium): The critical requirement is text position extraction for highlighting citation passages. pdf.js's page.getTextContent() returns character-level affine transforms — exact x, y, width, height for every text span. mupdf.js (WASM, faster rendering) is a fallback if pdf.js becomes a speed bottleneck, but pdf.js's text layer API is non-negotiable for the highlight feature.

Resolution Targets¶

Quest 3: ~25 pixels per degree (PPD). A paper page at comfortable reading distance (~0.6–0.8m) subtends ~30–40° vertically.

Distance from user	Texture resolution	GPU memory	Purpose
Focused (< 0.5m)	2048 × 2650	~21 MB	Reading text
Nearby (0.5–2m)	512 × 512	~1 MB	Identifying papers
Peripheral (2–5m)	128 × 128 (atlas)	~65 KB	Recognizing shape
Distant (> 5m)	Solid color + title	~0	Orientation only

Memory budget (Quest 3, ~8 GB shared RAM): - 2–4 focused pages at full resolution: ~84 MB - 8–16 nearby pages at 512×512: ~16 MB - 1024 thumbnails packed into one 4096×4096 texture atlas: ~64 MB - Total: ~100–150 MB, well within the ~512 MB texture budget

The Physical Document Metaphor¶

Papers appear as thin planes with page-turn animation, not as scrolling views. This matches the mental model of handling physical papers and hides rendering latency behind a satisfying interaction.

Page stack visualization: - N thin meshes (BoxGeometry, 0.001 thickness) stacked with ~0.002 spacing - Each gets a low-res thumbnail texture immediately - On approach/gaze: top page begins rendering at higher LOD - Focused page swaps to full-resolution texture when render completes

Page flip animation: - PlaneGeometry with sufficient subdivision (32 × 42 segments) - Vertex deformation along a cylinder surface during flip - Cylinder radius animates from infinity (flat) to small value (tight curl) - The page curling reveals the next page underneath, which has already been pre-rendered at the appropriate LOD

Fan/splay effect for browsing: - Pages spread in an arc (like fanning a deck of cards) - Each page has slight Y-axis rotation offset - Selection animates target page forward, others compress back - Computationally cheap — transform matrices only, no mesh deformation

Highlighting Citation Passages¶

When a researcher opens paper A from a citation edge in the graph, the system highlights the passage where paper A cites paper B (or vice versa).

Text position mapping (pdf.js):

page.getTextContent() → items[].transform → [fontSize, 0, 0, fontSize, x, y]
PDF coordinates → canvas coordinates (flip Y) → UV coordinates → 3D position

Highlight rendering: Geometry overlay approach — thin transparent colored quads (PlaneGeometry + MeshBasicMaterial, opacity 0.3, polygonOffset to prevent z-fighting) placed slightly in front of the page mesh at the computed text coordinates. Each highlight is independently interactive: hover for context, click to navigate to the cited paper, dismiss to clear.

Snippet matching: Use pdf.js text extraction to fuzzy-match the citation context snippet against concatenated page text, computing character offsets that map back to position transforms. Pre-index text positions at ingest time and store in DuckDB alongside the citation graph.

Text Legibility Considerations¶

No subpixel rendering. ClearType/LCD AA assumes a fixed pixel-to- subpixel layout. In VR the head rotates relative to the display, breaking this assumption. Use grayscale antialiasing (OffscreenCanvas default).
Minimum 0.4° angular size per character for comfortable reading (Dingler et al., 2018). For 10pt body text at 0.7m: scale virtual page to ~1.4× physical size, or default reading distance of ~0.5m.
SDF text for overlay labels. All non-PDF text (titles, highlights, annotations) uses Signed Distance Field rendering (troika-three-text) — resolution-independent and sharp at any distance.
Background: medium gray (#404040–#606060) behind documents to reduce contrast strain vs. pure black VR environment.

Texture Compression Pipeline¶

For cross-platform GPU texture efficiency:

Canvas render → KTX2/Basis Universal encoding → GPU-native transcode
  Quest 3:  → ASTC 4×4 (native Adreno support, 8:1 compression)
  Desktop:  → BC7/DXT5 (native desktop GPU support)

Three.js KTX2Loader handles transcoding automatically. Encode once at render time, transcode to the best native format per device.

Rendering Worker Pool¶

2–4 Web Workers running pdf.js instances
Priority queue: focused page renders first, then nearby, then thumbnails
OffscreenCanvas rendering keeps the main thread (and therefore frame rate) untouched
Pre-render visible thumbnails at initialization; full-res on demand
Cache rendered textures in IndexedDB for repeat visits

References¶

pdf.js OffscreenCanvas: https://github.com/nicholass003/pdf.js
mupdf.js (WASM): https://mupdf.com/docs/mupdf-js.html
troika-three-text (SDF): https://github.com/protectwise/troika
KTX2 / Basis Universal: https://github.com/BinomialLLC/basis_universal
Dingler et al. (2018). "Reading in VR." CHI 2018.
Kojic et al. (2020). "User Experience of Reading in VR." QoMEX 2020.