Spatial AI in 2026, the Market, the Players and the Gaps

"Spatial AI is not another computer vision checkbox. It is the layer that lets machines understand the shape of the world, reason about it in natural language, and act on it in the physical world. The winners of this decade will be whoever controls that layer for their industry." — Dr. Florent Poux

This guide is a strategic primer, not a product brochure. I wrote it to answer the four questions that senior engineers, founders and R&D directors keep asking me when they discover the field: what is Spatial AI exactly, why is it suddenly everywhere, who is already winning, and where do I push if I want leverage? It pulls from my academic work, client projects at 3D Geodata Academy, and the market moves I track month by month.

Companion reading: the 12-step technical roadmap and the hands-on 3D scene graph tutorial.

1. What Spatial AI actually is

Spatial AI is the fusion of three things a classical AI stack never had together in the same engine:

  1. A geometric substrate. Point clouds, meshes, voxels, signed distance fields, Gaussian splats. Anything that gives a machine a metrically consistent model of a real place or object.
  2. A semantic overlay. What each piece of geometry is: wall, chair, support column, exit, pedestrian, crack. Produced by 3D deep learning models and increasingly by open-vocabulary foundation models.
  3. A reasoning layer. Today this is an LLM. Its job is to answer grounded questions, plan multi-step tasks, and call the right spatial operators on the right subset of geometry.
LAYER 1 Geometry points, meshes, splats LiDAR, SfM, MVS, depth how the world is shaped LAYER 2 Semantics segmentation, detection relations, scene graphs what each shape means LAYER 3 Reasoning LLM agent, tool use planning, verification what to do about it
The three layers of a working spatial AI system. Remove any one, the stack stops being useful.
Florent's take The reason “just plug GPT into my point cloud” keeps failing is that an LLM cannot tokenise a million points. It needs the semantic overlay to compress geometry into a graph it can actually reason about. That is the entire game and that is why scene graphs are back at the centre of the field.

2. Why now, the four forces that unlocked the field

Spatial AI is not a new word. Andrew Davison used it in SLAM research in 2018. What changed is not the idea, it is the convergence of four enabling shifts:

System view Each of these four forces alone is a nice research paper. Stacked, they collapse the cost of putting a spatially aware agent in front of a real environment from a multi-year research programme to a sprint. That is why every CTO with a physical asset portfolio started a spatial AI project between 2024 and 2026.

3. The market, in numbers

Below are the ranges I track. Sources are analyst firms and public filings. Treat them as orders of magnitude, not gospel. The direction of travel is unambiguous.

Segment2024 size2028 projectionCAGRWhat it contains
Spatial computing (umbrella)~$100B~$280B~28%AR, VR, spatial software, enterprise twins
Digital twin~$20B~$95B~37%Built environment, industrial, networks
3D scanning & reality capture~$10B~$18B~15%Hardware, capture services, software
Generative 3D / content~$2B~$12B~45%Foundation models, asset generation, avatars
Robotics with spatial AI~$15B~$45B~30%Humanoids, AMRs, drones, autonomous vehicles
Geospatial / earth observation AI~$6B~$18B~25%Satellite, aerial, UAV analytics
Segment size, 2024 vs 2028 projection (USD B) 0 100 200 300 Spatial computing $280B Digital twin $95B 3D scanning $18B Generative 3D $12B (45% CAGR) Robotics (spatial AI) $45B Geospatial / EO AI $18B 2024 size 2028 projection
Every segment grows above 15% CAGR. Generative 3D compounds fastest from a smaller base; digital twin has the largest absolute expansion.

The interesting signal is not the absolute numbers, it is that every sub-segment grows above 15% CAGR. That is rare. It means spatial AI is not cannibalising an older market, it is creating new budget lines in industries that did not have them five years ago.

Analyst sourcing Combining MarketsAndMarkets, Grand View Research, IDC, ABI Research and NVIDIA's own investor materials gives you the envelope above. The spatial computing figure swings depending on whether consumer AR/VR hardware is counted, which is why I show the range rather than a single number.

4. The player map, who is actually winning what

I group players into seven lanes. Any company can play more than one lane. The useful exercise is to decide which lane you plan to defend.

4.1 Platforms and compute

4.2 Reality capture specialists

4.3 Design, BIM and AEC stacks

4.4 Generative 3D and content

4.5 Robotics and embodied AI

4.6 Vertical SaaS with spatial AI inside

The seven lanes, who consolidates and who insurges Platforms & compute consolidators NVIDIA Microsoft Apple Google insurgents Unity Epic Niantic Spatial Cloud Reality capture consolidators Matterport Leica Trimble Faro insurgents NavVis Skydio Polycam Scaniverse Design, BIM, AEC consolidators Autodesk Bentley Esri Dassault insurgents Hypar Snaptrude Motif Arcol Generative 3D consolidators Luma Stability OpenAI Adobe Substance insurgents CSM Meshy Tripo Rodin Sloyd Robotics / embodied consolidators Waymo Mobileye Amazon/Covariant insurgents Figure 1X Agility Wayve Apptronik Vertical SaaS consolidators OpenSpace Hover Placer EagleView insurgents Buildots Disperse Reconstruct CAPE Research labs agenda-setters Stanford SVL MIT CSAIL CMU RI Berkeley BAIR ETH CVG EPFL CVLab INRIA Meta FAIR NVIDIA Research
Every lane has a consolidator and at least one insurgent. Your job as a builder is to pick the lane where the consolidator is complacent.

4.7 Research labs setting the agenda

Florent's take Ignore the logo race. The useful read on this map is: every lane has a consolidator (someone who owns the workflow) and at least one insurgent (someone who bets on a 10x better substrate, usually a foundation model). If you are building, pick a lane where the consolidator is complacent and the insurgent has not chosen an industry yet.

5. Where the value actually is, applications that already generate revenue

Research demos do not pay salaries. These are the application clusters I see producing recurring revenue today, with concrete outcome metrics.

ClusterConcrete use caseMeasured outcome
Construction progress Weekly site scan compared against the 4D BIM to flag drift Schedule slippage caught 2–4 weeks earlier, ~15% reduction in rework cost
Real-estate listings Matterport-style virtual tour with automatic room labelling and measurement ~30% more time-on-listing, ~20% faster close
Facility and asset ops Digital twin of a factory or hospital wing queried by technicians in natural language ~25% drop in average time-to-locate equipment, measurable MTTR reduction
Insurance & restoration Drone or phone capture, AI measurement, automated claim scoping Claim cycle from weeks to days, fraud detection materially improved
Retail analytics 3D heatmaps of customer flow through a store 3–8% uplift on shelf-level conversion after layout changes
Autonomous navigation Robots and drones that plan around a live 3D map Deployments that previously required weeks of integration go live in days
Heritage and culture Virtual visits, semantic query of scanned monuments, restoration planning Funding unlocked, damage assessment for sites like Notre-Dame, Mosul, Palmyra
Energy and utilities Inspection twins of pylons, substations, wind turbines, pipelines Inspection cost cut ~40%, safety incidents on inspection routes down
Pattern worth copying In every one of these clusters, the winning product is not the one with the best model. It is the one that removes the messy capture step and the messy integration step from the customer's plate. The AI is a feature. The workflow is the moat.

6. Five monetization patterns that actually work

If you are trying to turn a spatial AI capability into a business, there are really only five recurring shapes. Pick one, then be rigorous about the unit economics.

PATTERN 1 Capture as a service $/m² or $/asset drones, scanners site teams who NavVis, DroneDeploy local service shops PATTERN 2 Vertical SaaS per-seat, per-project wraps AI in a workflow who OpenSpace, Buildots Hover, PlanRadar PATTERN 3 Platform / SDK compute, runtime licence or usage picks-and-shovels who NVIDIA Omniverse Unity, Unreal, ARKit PATTERN 4 Foundation model API per-call, per-asset text→3D, depth scene understanding who Luma, CSM, Meshy Tripo, Stability PATTERN 5 Spatial hardware unit + platform fee bundled high margin later who Apple Vision Pro Figure, Skydio
Each pattern has different unit economics. Pick one, commit to it, measure against its specific moat.
  1. Capture-as-a-service. You own the scanner, the crew and the pipeline. Customer pays per square metre, per asset or per delivery. Commoditising fast. Defensible if you bundle a vertical workflow on top.
  2. Vertical SaaS. You pick one industry, solve one nasty workflow (progress tracking, claim scoping, asset inventory) and charge per seat or per project. The current highest-return quadrant. Customer acquisition is the bottleneck, not tech.
  3. Platform and SDK. You sell the picks and shovels. Omniverse, Unity, ARKit, RealityKit, Isaac. Few players will ever succeed here, but they win massive. Irrelevant for most founders.
  4. Foundation model API. You train the largest text-to-3D or scene-understanding model in a niche and sell it by the call. Marginal cost trends toward zero, marginal price follows. Winners own the data advantage, not the architecture.
  5. Spatial hardware plus bundled platform. Apple's Vision Pro playbook, applied to robotics, drones, mobile mapping and surgical nav. Capital-intensive. The prize is the default spatial computer of a category.
Strategic honesty Most founders in this space fail because they confuse patterns. A vertical SaaS priced like a platform dies. A platform priced like a SaaS never funds its R&D. Clarify your pattern before your price list.

7. Real products, real examples

If you want concrete anchors for each pattern, study these products. I chose them because they are all public enough to learn from.

Florent's take Every one of these products still has a painful capture step, a painful integration step, or a painful accuracy ceiling. Pick any of the three, own it for a single industry and you have a company. This is where I push most of my Accelerator students who are builders.

8. Where the real gaps still are, the opportunity map

When I advise teams on where to push, I keep coming back to the same six under-served gaps. They are not research problems, they are execution problems where the winner will be the first team that ships a usable workflow.

  1. Open-vocabulary outdoor scene graphs. Indoor is solved enough for a demo. Outdoor, at city scale, with vegetation, weather and occlusion, is wide open. Anyone who nails it at 10× cheaper than current survey pipelines owns a huge market.
  2. Scan-to-BIM that does not need a human. There is no product that takes a raw scan and produces a compliant IFC without human clean-up. The first team that hits 80% autonomy here unlocks billions in AEC.
  3. Spatial agent memory. LLMs forget the map. Robots forget the room. A persistent, scene-graph-backed memory layer for agents is a missing primitive. Whoever ships a standard gets the platform prize.
  4. Edge-grade 3D perception. Most 3D deep learning still requires a workstation GPU. The market of embedded devices that want spatial reasoning (drones, robots, wearables, cars) vastly outnumbers workstations. Quantisation and distillation for 3D are immature.
  5. Evaluation and trust. Nobody has an agreed accuracy standard for spatial agents the way we have mAP, BLEU or MMLU. The first credible benchmark that industry buyers trust is a moat.
  6. Domain-specific foundation models. A single foundation model for medical imaging, another for subsurface, another for heritage, another for industrial inspection. General-purpose text-to-3D is crowded. Vertical 3D foundation models are not.
Where to push If you want leverage in 2026, pick one gap in this list and own it in one industry. The technical moat is small, the workflow moat is large, the distribution moat is where you will actually win.

9. Honest challenges that will slow you down

Under the hood A realistic building-scale scene graph lands around 10 000 to 50 000 nodes, with 3–10× as many edges. That already exceeds the context window of most deployed LLMs. Retrieval-augmented reasoning over the graph, plus structured tool calls that fetch sub-graphs on demand, is the only scalable pattern I have seen work in production.

10. The five skills that matter most in 2026

If you are a practitioner deciding where to invest the next twelve months, this is the short list. Every one of them maps to a chapter of the Spatial AI Architect Programme.

  1. 3D deep learning on point clouds and meshes. PointNet++, Mask3D, OpenMask3D-family open-vocabulary segmentation.
  2. Scene graph construction and reasoning. From segmentation to relations, from relations to a queryable graph.
  3. Agentic LLM orchestration for 3D tools. Tool schemas, retrieval over the graph, verification loops.
  4. Production geometry pipelines. Open3D, PDAL, Trimesh, PyTorch3D, CUDA basics, streaming formats (E57, LAZ, glTF, USD).
  5. Digital twin integration. BIM, IFC, USD, OGC standards. The boring layer where enterprises pay you.
The five-skill radar of a spatial AI builder 3D deep learning PointNet++, Mask3D Scene graphs relations, queries Agentic LLMs tool use, verification Geometry pipelines Open3D, PDAL, USD Twin integration IFC, OGC, BIM Shaded area: the minimum profile enterprises start paying for in 2026.
You do not need depth in all five. You need a base level in all five, and a sharp peak in at least two. That is the T-shape that sells.

11. Research and primary sources

12. Your next move

Reading a market map is step zero. The useful next move is to decide where you will push, and to do it inside a structure that stops you guessing.

The 3D AI Accelerator is the programme I built for exactly that. Six acts, from foundations to a deployed spatial agent, with a live strategic compass keyed to your current role and goals.