Prepared by May (Brian's chief of staff) on 2026-06-27. The full document covers the problem, ML approach, cloud architecture, QC editing data model, exports, honest risks, phased build plan, recommended Phase 0, and bottom line. Reproduced verbatim below.
What's being scoped. A cloud-native SaaS that takes raw drone lidar (10–100 GB .las/.laz), produces a best-in-class ground classification specifically for heavy-vegetation tracts, lets you QC the result in a browser, and exports clean deliverables into LP360 / Carlson / Civil 3D / TBC. Built first for Weygand, designed to be sold to other surveying firms.
Is it feasible in 2026? Yes — every piece is buildable with open source or commodity infra. The ML literature (PTv3, KPConv-X, SuperPoint Transformer, Sonata self-supervised pretraining) has matured to where a small team can train a model that beats CSF/PMF/TerraScan on dense canopy.
Is there a moat? One real one: a proprietary labeled dataset of Southeast US heavy-vegetation lidar tracts, captured by Weygand's own flights, corrected by your own surveyors. Nobody else has it and nobody else can build it without your sensors and your terrain.
Every other lidar tool already classifies ground "well enough" in open terrain. The brutal failure mode — the one that costs your surveyors hours per job — is dense canopy with sparse ground returns:
In those conditions, per-pulse ground return rates can drop below 1 pt/m² even at 60 pts/m² aggregate density. Geometric filters (CSF, PMF, Axelsson's TIN-densification that powers TerraScan) all share the same failure mode: they assume "lowest local point = ground." When the lowest local point is actually a root buttress or low understory, the DTM floats 0.5–2m high.
Modern ML semantic segmentation models, given proper training data, see this case differently. 5–15% IoU improvement on dense forest specifically.
Where the gap is: nobody combines all four — (a) cloud-native, (b) surveyor workflow & stamped deliverables, (c) heavy-vegetation ML specialty, (d) per-tract pricing under $5K/yr.
Recommended stack (Pointcept library):
Realistic compute spend for full development + 20–30 ablation runs: $500–2,500. Trivial vs. labeling cost ($5–20K surveyor time).
Storage format: COPC, full stop. Won the format war by 2024. Adopted by USGS 3DEP, OpenTopography, every major drone vendor. A COPC file is also a valid LAZ 1.4 — zero migration cost.
Recommended compute: R2 storage + DigitalOcean baseline GPU + Modal burst.
The internal-tool ROI alone is positive — that's the floor and it's a strong floor. The SaaS upside is real but takes commitment and probably a second hire. The single most valuable asset you'd create is the labeled dataset, not the software. Phase 0 is two weeks and a few hundred dollars in cloud spend. That's the only commitment to make today.