Scope brief — Bedrock

Prepared by May (Brian's chief of staff) on 2026-06-27. The full document covers the problem, ML approach, cloud architecture, QC editing data model, exports, honest risks, phased build plan, recommended Phase 0, and bottom line. Reproduced verbatim below.

0. Executive Summary

What's being scoped. A cloud-native SaaS that takes raw drone lidar (10–100 GB .las/.laz), produces a best-in-class ground classification specifically for heavy-vegetation tracts, lets you QC the result in a browser, and exports clean deliverables into LP360 / Carlson / Civil 3D / TBC. Built first for Weygand, designed to be sold to other surveying firms.

Is it feasible in 2026? Yes — every piece is buildable with open source or commodity infra. The ML literature (PTv3, KPConv-X, SuperPoint Transformer, Sonata self-supervised pretraining) has matured to where a small team can train a model that beats CSF/PMF/TerraScan on dense canopy.

Is there a moat? One real one: a proprietary labeled dataset of Southeast US heavy-vegetation lidar tracts, captured by Weygand's own flights, corrected by your own surveyors. Nobody else has it and nobody else can build it without your sensors and your terrain.

1. The Problem We're Solving

Every other lidar tool already classifies ground "well enough" in open terrain. The brutal failure mode — the one that costs your surveyors hours per job — is dense canopy with sparse ground returns:

Mixed Southeast hardwood/pine
Pine plantation with understory
Cypress swamp
Laurel / rhododendron thickets
Kudzu

In those conditions, per-pulse ground return rates can drop below 1 pt/m² even at 60 pts/m² aggregate density. Geometric filters (CSF, PMF, Axelsson's TIN-densification that powers TerraScan) all share the same failure mode: they assume "lowest local point = ground." When the lowest local point is actually a root buttress or low understory, the DTM floats 0.5–2m high.

Modern ML semantic segmentation models, given proper training data, see this case differently. 5–15% IoU improvement on dense forest specifically.

2. Competitive Landscape

Where the gap is: nobody combines all four — (a) cloud-native, (b) surveyor workflow & stamped deliverables, (c) heavy-vegetation ML specialty, (d) per-tract pricing under $5K/yr.

LP360 has (b)
DroneDeploy has (a)
Lidarvisor has (a)+(d)
Virtual Surveyor has (b) + bare-earth focus but is desktop-only
No one has the full stack.

3. ML Approach

Recommended stack (Pointcept library):

Primary backbone: Point Transformer v3 fine-tuned from Sonata self-supervised pretrained checkpoint.
Fast-path fallback: SuperPoint Transformer (200× fewer params, 3 hr training).
Always run CSF in parallel as the geometric baseline. Disagreement > 5% auto-flagged for human QC.

Realistic compute spend for full development + 20–30 ablation runs: $500–2,500. Trivial vs. labeling cost ($5–20K surveyor time).

4. Cloud Architecture

Storage format: COPC, full stop. Won the format war by 2024. Adopted by USGS 3DEP, OpenTopography, every major drone vendor. A COPC file is also a valid LAZ 1.4 — zero migration cost.

Recommended compute: R2 storage + DigitalOcean baseline GPU + Modal burst.

R2 zero-egress = browser streams a 50GB COPC for free
One always-on L40S handles steady state
Modal handles burst (10s cold-start on warm pool, pay-per-second)
D1 as queue is fine to ~100 concurrent jobs

5. Honest Risks

GeoCue's AI Ground+ and AI Forestry are aimed at exactly this wedge. 18–36 month window.
Distribution to small firms is slow. Conservative buyers, long sales cycles.
Per-tract pricing economics are tight at low volume. Lidarvisor at $89/mo struggles with margins.
The "internal tool" trap. Productization, support, billing, onboarding are 5× the engineering work.

6. Phased Build Plan

Phase 0 (2 wks): PDAL+Cesium spike. Three real flights. Measure vs LP360.
Phase 1 (6 wks): Internal pipeline. First paid Weygand job.
Phase 2 (8 wks): QC viewer. Replace LP360 internally.
Phase 3 (10 wks): ML model. ≥95% canopy IoU gate.
Phase 4 (8 wks): Customer SaaS. First paying outside customer.

7. Bottom Line

The internal-tool ROI alone is positive — that's the floor and it's a strong floor. The SaaS upside is real but takes commitment and probably a second hire. The single most valuable asset you'd create is the labeled dataset, not the software. Phase 0 is two weeks and a few hundred dollars in cloud spend. That's the only commitment to make today.

Read full brief on GitHub ↗

Cloud-native drone lidar ground classification — full brief.

The bottom line