Spatial AI in 2026, the Market, the Players and the Gaps

Q: What is Spatial AI?

Spatial AI is the discipline that gives machines geometric understanding of the world, so they can reason, plan and act inside 3D environments. It combines sensing (LiDAR, cameras, depth), representation (point clouds, meshes, scene graphs, Gaussian splats), semantic perception (segmentation, detection, relation extraction) and a reasoning layer, today usually a Large Language Model.

Q: How big is the spatial AI market?

Spatial computing, the broader umbrella, is projected above $280B by 2028. Digital twins alone are tracked at around $95B by 2028 with a 37% CAGR. The narrower 3D scanning and reality capture market is around $18B by 2027. Adjacent robotics and AR/VR add another $60B of spatial-AI-dependent demand.

Q: Who are the dominant players in spatial AI?

NVIDIA (Omniverse, Cosmos, Isaac), Apple (Vision Pro, ARKit, Object Capture), Meta (Reality Labs, Segment Anything, Habitat), Google DeepMind (Gemini, Genie, ARCore), Microsoft (Azure Digital Twins, HoloLens), Matterport, Niantic (Scaniverse, 8th Wall), Autodesk, Bentley iTwin, Esri, plus a vertical wave of robotics players like Figure, 1X, Skydio, Agility and Waymo.

Q: How do companies make money with spatial AI today?

Five dominant models: reality-capture services paid per square metre or per asset; vertical SaaS that wraps spatial AI into construction, retail, insurance or heritage workflows; platforms that sell compute and SDKs (Omniverse, ARKit, Unity); foundation models that license generative 3D or spatial reasoning APIs; and end-user hardware that bundles spatial AI as the default interaction layer.

            "Spatial AI is not another computer vision checkbox. It is the layer that lets machines understand the shape of the world, reason about it in natural language, and act on it in the physical world. The winners of this decade will be whoever controls that layer for their industry." — Dr. Florent Poux
        

This guide is a strategic primer, not a product brochure. I wrote it to answer the four questions that senior engineers, founders and R&D directors keep asking me when they discover the field: what is Spatial AI exactly, why is it suddenly everywhere, who is already winning, and where do I push if I want leverage? It pulls from my academic work, client projects at 3D Geodata Academy, and the market moves I track month by month.

Companion reading: the 12-step technical roadmap and the hands-on 3D scene graph tutorial.

1. What Spatial AI actually is

Spatial AI is the fusion of three things a classical AI stack never had together in the same engine:

A geometric substrate. Point clouds, meshes, voxels, signed distance fields, Gaussian splats. Anything that gives a machine a metrically consistent model of a real place or object.
A semantic overlay. What each piece of geometry is: wall, chair, support column, exit, pedestrian, crack. Produced by 3D deep learning models and increasingly by open-vocabulary foundation models.
A reasoning layer. Today this is an LLM. Its job is to answer grounded questions, plan multi-step tasks, and call the right spatial operators on the right subset of geometry.

The three layers of a working spatial AI system. Remove any one, the stack stops being useful.

Florent's take The reason “just plug GPT into my point cloud” keeps failing is that an LLM cannot tokenise a million points. It needs the semantic overlay to compress geometry into a graph it can actually reason about. That is the entire game and that is why scene graphs are back at the centre of the field.

2. Why now, the four forces that unlocked the field

Spatial AI is not a new word. Andrew Davison used it in SLAM research in 2018. What changed is not the idea, it is the convergence of four enabling shifts:

Cheap reality capture. A phone now ships with a LiDAR or a ToF sensor. A Matterport Pro3 captures a full building in hours. Drones with survey-grade GNSS cost less than a used car. The cost of a usable 3D dataset dropped by an order of magnitude between 2020 and 2025. Apple Object Capture
Foundation models that generalise to 3D. Segment Anything in 2023 ended the era of training a detector per object class. Depth Anything V2 gave us robust monocular depth. Point-E, Shap-E, Stable Zero123 and Luma's Genie closed the generative 3D loop. SAM 2023 Depth Anything V2
LLMs good enough to orchestrate tools. The tool-use primitive exposed by Claude, GPT, Gemini and open-weight models like Qwen and Llama is what turns a perception pipeline into an agent. Before 2023, you wrote the orchestration by hand. Today the model writes it. Toolformer ReAct
A real-time rendering substrate. 3D Gaussian Splatting, released in 2023, gave the industry a representation that is edit-friendly, streamable and differentiable. It now underpins consumer apps (Scaniverse, Luma) and enterprise stacks (Omniverse, Bentley iTwin experiments). 3D Gaussian Splatting 2023

System view Each of these four forces alone is a nice research paper. Stacked, they collapse the cost of putting a spatially aware agent in front of a real environment from a multi-year research programme to a sprint. That is why every CTO with a physical asset portfolio started a spatial AI project between 2024 and 2026.

3. The market, in numbers

Below are the ranges I track. Sources are analyst firms and public filings. Treat them as orders of magnitude, not gospel. The direction of travel is unambiguous.

Segment	2024 size	2028 projection	CAGR	What it contains
Spatial computing (umbrella)	~$100B	~$280B	~28%	AR, VR, spatial software, enterprise twins
Digital twin	~$20B	~$95B	~37%	Built environment, industrial, networks
3D scanning & reality capture	~$10B	~$18B	~15%	Hardware, capture services, software
Generative 3D / content	~$2B	~$12B	~45%	Foundation models, asset generation, avatars
Robotics with spatial AI	~$15B	~$45B	~30%	Humanoids, AMRs, drones, autonomous vehicles
Geospatial / earth observation AI	~$6B	~$18B	~25%	Satellite, aerial, UAV analytics

Every segment grows above 15% CAGR. Generative 3D compounds fastest from a smaller base; digital twin has the largest absolute expansion.

The interesting signal is not the absolute numbers, it is that every sub-segment grows above 15% CAGR. That is rare. It means spatial AI is not cannibalising an older market, it is creating new budget lines in industries that did not have them five years ago.

Analyst sourcing Combining MarketsAndMarkets, Grand View Research, IDC, ABI Research and NVIDIA's own investor materials gives you the envelope above. The spatial computing figure swings depending on whether consumer AR/VR hardware is counted, which is why I show the range rather than a single number.

4. The player map, who is actually winning what

I group players into seven lanes. Any company can play more than one lane. The useful exercise is to decide which lane you plan to defend.

4.1 Platforms and compute

NVIDIA — the quiet monopoly. Omniverse for synthetic scenes and twins, Cosmos for world foundation models, Isaac for robotics, CUDA for everything else. Ships the reference stack that every other vendor rents.
Microsoft — Azure Digital Twins, Azure Remote Rendering, plus HoloLens (slower, niched to defence and industrial). Strong in enterprise integration.
Google / DeepMind — ARCore on mobile, Gemini for multi-modal reasoning, Genie and Veo for world and video generation. Less noise, deep research.
Apple — Vision Pro is a spatial AI computer first, a VR headset second. ARKit, RoomPlan and Object Capture are the quietly dominant capture SDKs on mobile.
Unity and Unreal (Epic) — the rendering and runtime substrate of nearly every industrial digital twin demo you have seen in the last decade.

4.2 Reality capture specialists

Matterport (acquired by CoStar) — indoor capture plus a SaaS platform, dominant in real estate and facilities.
Leica Geosystems (Hexagon), Trimble, FARO, Topcon — survey-grade hardware with mature software ecosystems in AEC.
NavVis, GeoSLAM — mobile mapping systems that made SLAM a commodity.
DJI and Skydio — drones as spatial data collectors, with autonomy stacks on board.

4.3 Design, BIM and AEC stacks

Autodesk (Forma, Revit, Tandem) — pivoting hard into cloud-native and AI-assisted design.
Bentley Systems (iTwin, OpenCities) — infrastructure twins, often pioneering, often quiet.
Esri (ArcGIS) — still the default on geospatial, now shipping 3D and indoor.
Dassault Systèmes (3DEXPERIENCE, Virtual Twin) — heavy industrial and life-sciences twins.

4.4 Generative 3D and content

Luma AI — Genie for generative 3D, Dream Machine for video, Scaniverse on mobile.
Common Sense Machines (CSM), Meshy, Tripo, Rodin, Sloyd — text-to-3D and image-to-3D pipelines for games, e-commerce and synthetic data.
Stability AI (Stable Zero123), OpenAI (Shap-E, Point-E) — research-leaning labs with open checkpoints.
Niantic — Visual Positioning System, Lightship and the new Spatial Cloud, betting on large-scale world-scale maps.

4.5 Robotics and embodied AI

Humanoid race — Figure, 1X, Agility, Apptronik, Sanctuary, Tesla Optimus, Unitree. All using spatial AI to ground language commands into action.
Logistics and AMRs — Covariant (acquired by Amazon), Symbotic, Locus, Berkshire Grey.
Autonomous driving — Waymo, Cruise (paused), Wayve, Mobileye, Tesla FSD. All spatial AI pipelines at scale.
Drones with autonomy — Skydio, Anduril (Lattice), Shield AI.

4.6 Vertical SaaS with spatial AI inside

Construction / AEC — OpenSpace, Buildots, Disperse, DroneDeploy, Reconstruct, PlanRadar. They wrap capture plus spatial AI into a subscription that a site manager actually uses.
Retail and spaces — Placer.ai (2D/3D flow), RetailNext, the Matterport Retail Pack.
Insurance — Hover, CAPE Analytics, EagleView. Property claims reimagined around 3D.
Heritage and culture — CyArk, Iconem, national IGN programmes.
Healthcare spaces — surgical navigation, OR planning; players like Brainlab, Stryker Mako.

Every lane has a consolidator and at least one insurgent. Your job as a builder is to pick the lane where the consolidator is complacent.

4.7 Research labs setting the agenda

Stanford SVL / Fei-Fei Li's group — pushing the spatial intelligence thesis at the frontier.
MIT CSAIL — 3D-LLM, scene graph reasoning, task planning.
CMU Robotics Institute, Berkeley BAIR, ETH Zürich CVG, EPFL CVLab, INRIA — the classic strongholds.
Meta FAIR, NVIDIA Research, Google DeepMind, Apple AIML — industry labs with open publications that move the field.

Florent's take Ignore the logo race. The useful read on this map is: every lane has a consolidator (someone who owns the workflow) and at least one insurgent (someone who bets on a 10x better substrate, usually a foundation model). If you are building, pick a lane where the consolidator is complacent and the insurgent has not chosen an industry yet.

5. Where the value actually is, applications that already generate revenue

Research demos do not pay salaries. These are the application clusters I see producing recurring revenue today, with concrete outcome metrics.

Cluster	Concrete use case	Measured outcome
Construction progress	Weekly site scan compared against the 4D BIM to flag drift	Schedule slippage caught 2–4 weeks earlier, ~15% reduction in rework cost
Real-estate listings	Matterport-style virtual tour with automatic room labelling and measurement	~30% more time-on-listing, ~20% faster close
Facility and asset ops	Digital twin of a factory or hospital wing queried by technicians in natural language	~25% drop in average time-to-locate equipment, measurable MTTR reduction
Insurance & restoration	Drone or phone capture, AI measurement, automated claim scoping	Claim cycle from weeks to days, fraud detection materially improved
Retail analytics	3D heatmaps of customer flow through a store	3–8% uplift on shelf-level conversion after layout changes
Autonomous navigation	Robots and drones that plan around a live 3D map	Deployments that previously required weeks of integration go live in days
Heritage and culture	Virtual visits, semantic query of scanned monuments, restoration planning	Funding unlocked, damage assessment for sites like Notre-Dame, Mosul, Palmyra
Energy and utilities	Inspection twins of pylons, substations, wind turbines, pipelines	Inspection cost cut ~40%, safety incidents on inspection routes down

Pattern worth copying In every one of these clusters, the winning product is not the one with the best model. It is the one that removes the messy capture step and the messy integration step from the customer's plate. The AI is a feature. The workflow is the moat.

6. Five monetization patterns that actually work

If you are trying to turn a spatial AI capability into a business, there are really only five recurring shapes. Pick one, then be rigorous about the unit economics.

Each pattern has different unit economics. Pick one, commit to it, measure against its specific moat.

Capture-as-a-service. You own the scanner, the crew and the pipeline. Customer pays per square metre, per asset or per delivery. Commoditising fast. Defensible if you bundle a vertical workflow on top.
Vertical SaaS. You pick one industry, solve one nasty workflow (progress tracking, claim scoping, asset inventory) and charge per seat or per project. The current highest-return quadrant. Customer acquisition is the bottleneck, not tech.
Platform and SDK. You sell the picks and shovels. Omniverse, Unity, ARKit, RealityKit, Isaac. Few players will ever succeed here, but they win massive. Irrelevant for most founders.
Foundation model API. You train the largest text-to-3D or scene-understanding model in a niche and sell it by the call. Marginal cost trends toward zero, marginal price follows. Winners own the data advantage, not the architecture.
Spatial hardware plus bundled platform. Apple's Vision Pro playbook, applied to robotics, drones, mobile mapping and surgical nav. Capital-intensive. The prize is the default spatial computer of a category.

Strategic honesty Most founders in this space fail because they confuse patterns. A vertical SaaS priced like a platform dies. A platform priced like a SaaS never funds its R&D. Clarify your pattern before your price list.

7. Real products, real examples

If you want concrete anchors for each pattern, study these products. I chose them because they are all public enough to learn from.

Matterport Pro3 + Axis platform. A scanner priced at prosumer level, a SaaS platform for hosting, tagging and measuring, integrations into real estate and insurance. A decade-long masterclass in capture plus vertical SaaS.
OpenSpace. 360 camera on a hardhat, auto-aligned to the BIM, timeline scrub of the job site. Construction teams pay for the clarity. Vertical SaaS.
Hover. Phone photos of a house, AI measurements, exterior quote in minutes. The insurance and restoration ecosystem plugs into it. Vertical SaaS with API.
NVIDIA Omniverse. Universal Scene Description (USD) as the substrate, cloud and on-prem runtimes, Isaac and Cosmos as the AI back-ends. Platform.
Apple Vision Pro with ARKit, RoomPlan, Object Capture. A hardware flagship plus a developer SDK that already ingests your home in seconds. Hardware plus bundled platform.
Luma Genie / Dream Machine. Text-to-3D and image-to-video at consumer price points. Foundation model API.
Niantic Lightship / Visual Positioning System. A crowd-sourced world map sold as a developer SDK. Platform with data moat.
Skydio X10. Drone with on-board autonomy and spatial AI, sold to public safety, inspection and defence customers. Hardware plus bundled platform.
Figure 02 and 1X NEO. Humanoids running spatial AI plus a language model. Pilots at BMW, automotive logistics, home use cases. Hardware plus bundled platform, years out for stable unit economics.
Esri ArcGIS Indoors. Indoor GIS with spatial analytics, sold into facilities and airports. Vertical SaaS on top of a platform.

Florent's take Every one of these products still has a painful capture step, a painful integration step, or a painful accuracy ceiling. Pick any of the three, own it for a single industry and you have a company. This is where I push most of my Accelerator students who are builders.

8. Where the real gaps still are, the opportunity map

When I advise teams on where to push, I keep coming back to the same six under-served gaps. They are not research problems, they are execution problems where the winner will be the first team that ships a usable workflow.

Open-vocabulary outdoor scene graphs. Indoor is solved enough for a demo. Outdoor, at city scale, with vegetation, weather and occlusion, is wide open. Anyone who nails it at 10× cheaper than current survey pipelines owns a huge market.
Scan-to-BIM that does not need a human. There is no product that takes a raw scan and produces a compliant IFC without human clean-up. The first team that hits 80% autonomy here unlocks billions in AEC.
Spatial agent memory. LLMs forget the map. Robots forget the room. A persistent, scene-graph-backed memory layer for agents is a missing primitive. Whoever ships a standard gets the platform prize.
Edge-grade 3D perception. Most 3D deep learning still requires a workstation GPU. The market of embedded devices that want spatial reasoning (drones, robots, wearables, cars) vastly outnumbers workstations. Quantisation and distillation for 3D are immature.
Evaluation and trust. Nobody has an agreed accuracy standard for spatial agents the way we have mAP, BLEU or MMLU. The first credible benchmark that industry buyers trust is a moat.
Domain-specific foundation models. A single foundation model for medical imaging, another for subsurface, another for heritage, another for industrial inspection. General-purpose text-to-3D is crowded. Vertical 3D foundation models are not.

Where to push If you want leverage in 2026, pick one gap in this list and own it in one industry. The technical moat is small, the workflow moat is large, the distribution moat is where you will actually win.

9. Honest challenges that will slow you down

Data gravity. 3D datasets are huge. A single building scan can hit hundreds of gigabytes. Every architecture decision downstream is shaped by that fact. Plan storage, streaming and indexing before you plan models.
Semantic ambiguity. An LLM will happily answer a question about “the column in the corner” with no grounded referent. Without geometric grounding you are shipping confabulations, not answers. This is the hallucination risk of spatial agents.
Sparse 3D-relation benchmarks. ScanNet, 3RScan, HM3D, ScanNet++, Habitat, Replica, ScanScribe are great, but narrower than their 2D counterparts. Expect to build part of your evaluation set yourself. 3RScan ScanScribe
Privacy and scanning consent. Any capture of a real place touches privacy law. Facial blurring, person removal and geofencing are not features, they are entry tickets for enterprise sales.
Hardware churn. Sensors get rebuilt every 18 months. Whatever pipeline you lock into a specific sensor stack will need a migration path. Abstract the capture layer early.
Talent density. People who can read a scene graph paper and ship a Python pipeline are rare. Compete on culture and meaningful problems, not salaries.

Under the hood A realistic building-scale scene graph lands around 10 000 to 50 000 nodes, with 3–10× as many edges. That already exceeds the context window of most deployed LLMs. Retrieval-augmented reasoning over the graph, plus structured tool calls that fetch sub-graphs on demand, is the only scalable pattern I have seen work in production.

10. The five skills that matter most in 2026

If you are a practitioner deciding where to invest the next twelve months, this is the short list. Every one of them maps to a chapter of the Spatial AI Architect Programme.

3D deep learning on point clouds and meshes. PointNet++, Mask3D, OpenMask3D-family open-vocabulary segmentation.
Scene graph construction and reasoning. From segmentation to relations, from relations to a queryable graph.
Agentic LLM orchestration for 3D tools. Tool schemas, retrieval over the graph, verification loops.
Production geometry pipelines. Open3D, PDAL, Trimesh, PyTorch3D, CUDA basics, streaming formats (E57, LAZ, glTF, USD).
Digital twin integration. BIM, IFC, USD, OGC standards. The boring layer where enterprises pay you.

You do not need depth in all five. You need a base level in all five, and a sharp peak in at least two. That is the T-shape that sells.

11. Research and primary sources

Segment Anything (Kirillov et al., 2023) — foundation model for zero-shot 2D segmentation.
Depth Anything V2 (Yang et al., 2024) — robust monocular depth.
3D Gaussian Splatting (Kerbl et al., 2023) — the real-time rendering substrate.
Mask3D (Schult et al., 2023) — transformer-based 3D instance segmentation.
ConceptFusion / OpenScene family — open-vocabulary 3D segmentation at scale.
3D-LLM (Hong et al., 2023) — injecting 3D context into large language models.
Toolformer (Schick et al., 2023) — the original tool-use primitive.
ReAct (Yao et al., 2022) — reasoning and acting in agents.
The Smart Point Cloud (Poux, PhD thesis) — foundational framework this course stack is built on.
The 12-Step Roadmap to Master Spatial AI in 2026 — the technical companion to this strategic guide.
Build 3D Scene Graphs for Spatial AI LLMs — hands-on tutorial.

12. Your next move

Reading a market map is step zero. The useful next move is to decide where you will push, and to do it inside a structure that stops you guessing.

The 3D AI Accelerator is the programme I built for exactly that. Six acts, from foundations to a deployed spatial agent, with a live strategic compass keyed to your current role and goals.

Join the 3D AI Accelerator