Architecture

PDAL → CSF / ML → COPC → R2 → CesiumJS. Same boxes, v0 through v4.

The shape of the pipeline doesn't change between v0 and the multi-tenant SaaS. We swap in components: CSF gets replaced by an ML model. Local upload gets replaced by resumable R2 multipart. The Cesium viewer gets a Potree power-user companion. The interfaces stay stable so each phase builds on the last.

                  ┌───────────────────────────────────┐
                  │ lidar.weygand.com (CF Pages)      │
                  │  - Upload UI (resumable, R2)      │
                  │  - QC viewer (CesiumJS + Potree)  │
                  └─────────────┬─────────────────────┘
                                │
                 ┌──────────────▼───────────────┐
                 │  Cloudflare R2 (storage)      │
                 │  /uploads/<job>/raw.laz       │
                 │  /jobs/<job>/normalized.copc  │
                 │  /jobs/<job>/classified.copc  │
                 │  /jobs/<job>/dtm.tif          │
                 │  /jobs/<job>/edits.delta      │
                 └──────────────┬───────────────┘
                                │
                 ┌──────────────▼───────────────┐
                 │  D1: jobs + planning_log      │
                 └──────────────┬───────────────┘
                                │
             ┌──────────────────┼──────────────────┐
             │                  │                  │
  ┌──────────▼────────┐ ┌───────▼──────┐ ┌─────────▼──────┐
  │ DO GPU droplet    │ │ Modal burst  │ │ DO CPU droplet │
  │ (L40S, ML inf.)   │ │ (queue > 2)  │ │ (PDAL, COPC,   │
  │ → classified.copc │ │              │ │  DTM, contours)│
  └───────────────────┘ └──────────────┘ └────────────────┘
Stages

The seven stages, end-to-end.

1

Ingest CF Pages → R2

Browser uploads .las / .laz directly to R2 via presigned multipart URLs. 10 GB chunks, resumable. Job row written to D1 with status queued. No file ever touches a Worker (well over the 100 MB Worker body limit).

2

Normalize PDAL on DO CPU droplet

Strip duplicates, reproject to a survey-grade CRS (configurable, EPSG:6442 default for AL state plane), thin to a working density if requested. Output normalized.copc. ~1–5 min per tract on c-8.

3

Classify (geometric baseline) CSF / SMRF

Run filters.smrf (Simple Morphological Filter) and filters.csf (Cloth Simulation) in parallel as the geometric safety net. Always runs, regardless of ML. Disagreement regions auto-flag for QC.

4

Classify (ML) PTv3 + Sonata · L40S

Sonata-pretrained PTv3 fine-tuned on Weygand tracts. ONNX inference on the L40S. Falls back to SuperPoint Transformer (200× fewer params, 3× faster) for cost-sensitive jobs. Writes classified.copc + per-point confidence.

5

QC browser, append-only delta

Surveyor opens classified.copc in Potree v2 embed. Edits stored as sparse Parquet delta (edits/vN.delta.parquet) — never rewriting source. Operations log (ops.jsonl) keeps polygon coords, user, timestamp for PLS audit trail.

6

Export LAS / DTM / SHP / DWG / E57

PDAL pipeline merges source COPC + deltas → final job.las + dtm.tif (Cloud Optimized GeoTIFF) + contours.shp + breaklines.dwg (ezdxf) + job.e57. Hot-folder push to customer Google Drive via the existing Weygand Drive token broker.

7

Deliver share link + email + Trimble Connect

Resend email with download bundle + Cesium share-link URL. Optional one-click push to Trimble Connect (their API, almost no competitor automates this). Customer iPad on a job site opens the share link, sees their site over satellite.

ML training plan

Sonata pretraining → PTv3 fine-tune → active learning loop.

The literature has matured. Pointcept (PTv3 + Sonata) is the most important repo in the field. We don't need to invent — we need to label well and iterate.

ModelDALES Ground IoUThroughput (A100)Train timeUse
PTv3~97+500K–1.5M pts/s24–72h scratch / 2–6h fine-tunePrimary, with Sonata pretraining
SuperPoint Transformer~96~3–5M pts/s3hCost-attractive fast path
KPConv-X~97700K–1.5M pts/s8–20hAblation / second opinion
CSF / SMRF (geometric)~92CPU-onlyn/aAlways-on safety net

Labeling investment

Unit economics

$0.05–0.50 COGS per km² · $50–800 price · >95% gross margin.

Stack50M-pt tile200M-pt tile500M-pt tile
CPU PDAL + CSF (16-core c7i)2–5 min · $0.02–0.0510–20 min · $0.10–0.2030–60 min · $0.25–0.50
GPU PTv3 (A100 spot)60–180 s · $0.02–0.094–12 min · $0.10–0.4010–30 min · $0.30–1.00
GPU SPT (A100 spot)20–60 s · $0.01–0.031–4 min · $0.03–0.133–10 min · $0.10–0.33

At 50 jobs/month × 30 GB avg = $2,375/mo all-in (R2 storage + DO L40S + CPU droplet + Modal burst + D1). Charge $300–800/job, margin is real. R2 zero-egress is the structural advantage that makes this economic.