Large multimodal models. Maybe using diffusion transformers. Possibly integrated...

Large multimodal models. Maybe using diffusion transformers. Possibly integrated into lightweight comfortable AR glasses via Wifi so they can see everything you can, optionally automatically receiving context such as frame captures. Optionally with a very realistic 3d avatar "teleported" into your space.