The shape of the pipeline doesn't change between v0 and the multi-tenant SaaS. We swap in components: CSF gets replaced by an ML model. Local upload gets replaced by resumable R2 multipart. The Cesium viewer gets a Potree power-user companion. The interfaces stay stable so each phase builds on the last.
┌───────────────────────────────────┐
│ lidar.weygand.com (CF Pages) │
│ - Upload UI (resumable, R2) │
│ - QC viewer (CesiumJS + Potree) │
└─────────────┬─────────────────────┘
│
┌──────────────▼───────────────┐
│ Cloudflare R2 (storage) │
│ /uploads/<job>/raw.laz │
│ /jobs/<job>/normalized.copc │
│ /jobs/<job>/classified.copc │
│ /jobs/<job>/dtm.tif │
│ /jobs/<job>/edits.delta │
└──────────────┬───────────────┘
│
┌──────────────▼───────────────┐
│ D1: jobs + planning_log │
└──────────────┬───────────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
┌──────────▼────────┐ ┌───────▼──────┐ ┌─────────▼──────┐
│ DO GPU droplet │ │ Modal burst │ │ DO CPU droplet │
│ (L40S, ML inf.) │ │ (queue > 2) │ │ (PDAL, COPC, │
│ → classified.copc │ │ │ │ DTM, contours)│
└───────────────────┘ └──────────────┘ └────────────────┘ Browser uploads .las / .laz directly to R2 via presigned multipart URLs. 10 GB chunks, resumable. Job row written to D1 with status queued. No file ever touches a Worker (well over the 100 MB Worker body limit).
Strip duplicates, reproject to a survey-grade CRS (configurable, EPSG:6442 default for AL state plane), thin to a working density if requested. Output normalized.copc. ~1–5 min per tract on c-8.
Run filters.smrf (Simple Morphological Filter) and filters.csf (Cloth Simulation) in parallel as the geometric safety net. Always runs, regardless of ML. Disagreement regions auto-flag for QC.
Sonata-pretrained PTv3 fine-tuned on Weygand tracts. ONNX inference on the L40S. Falls back to SuperPoint Transformer (200× fewer params, 3× faster) for cost-sensitive jobs. Writes classified.copc + per-point confidence.
Surveyor opens classified.copc in Potree v2 embed. Edits stored as sparse Parquet delta (edits/vN.delta.parquet) — never rewriting source. Operations log (ops.jsonl) keeps polygon coords, user, timestamp for PLS audit trail.
PDAL pipeline merges source COPC + deltas → final job.las + dtm.tif (Cloud Optimized GeoTIFF) + contours.shp + breaklines.dwg (ezdxf) + job.e57. Hot-folder push to customer Google Drive via the existing Weygand Drive token broker.
Resend email with download bundle + Cesium share-link URL. Optional one-click push to Trimble Connect (their API, almost no competitor automates this). Customer iPad on a job site opens the share link, sees their site over satellite.
The literature has matured. Pointcept (PTv3 + Sonata) is the most important repo in the field. We don't need to invent — we need to label well and iterate.
| Model | DALES Ground IoU | Throughput (A100) | Train time | Use |
|---|---|---|---|---|
| PTv3 | ~97+ | 500K–1.5M pts/s | 24–72h scratch / 2–6h fine-tune | Primary, with Sonata pretraining |
| SuperPoint Transformer | ~96 | ~3–5M pts/s | 3h | Cost-attractive fast path |
| KPConv-X | ~97 | 700K–1.5M pts/s | 8–20h | Ablation / second opinion |
| CSF / SMRF (geometric) | ~92 | CPU-only | n/a | Always-on safety net |
| Stack | 50M-pt tile | 200M-pt tile | 500M-pt tile |
|---|---|---|---|
| CPU PDAL + CSF (16-core c7i) | 2–5 min · $0.02–0.05 | 10–20 min · $0.10–0.20 | 30–60 min · $0.25–0.50 |
| GPU PTv3 (A100 spot) | 60–180 s · $0.02–0.09 | 4–12 min · $0.10–0.40 | 10–30 min · $0.30–1.00 |
| GPU SPT (A100 spot) | 20–60 s · $0.01–0.03 | 1–4 min · $0.03–0.13 | 3–10 min · $0.10–0.33 |
At 50 jobs/month × 30 GB avg = $2,375/mo all-in (R2 storage + DO L40S + CPU droplet + Modal burst + D1). Charge $300–800/job, margin is real. R2 zero-egress is the structural advantage that makes this economic.