Imagine next-generation VR experiences with realistic scenes, intelligent characters, and complex situations that can be voiced and interacted with in real time. It’s coming with the convergence of technologies like advances in real-time 3D video.
I’ve been thinking about this for months now. Between AI video generators like Sora, AI character and story building tools, AI music and sound effects creation tools, and projects like Google Genie, which specializes in creating interactive games and entire experiences live in real-time. Most of the key elements are already in place. – in fetal form.
Admittedly, we don’t have a proper hologram generator yet, but if you’re going to embrace a VR headset for the first time, speed, latency, and convergence are the only barriers between your current situation and a fully functional Holodeck experience to me. I think so. Just say where you want to go, who else is there, and what should happen, and a version of that will appear in front of you as a fully interactive experience.
With the rapid advances in the field of AI, every once in a while we see something that seems to bring us one step closer to this kind of experience. And today we have a research paper titled Representation of long volume videos by temporal Gaussian hierarchies.
By their nature, volumetric videos are more complex than regular videos. Instead of a 2D array of square pixels that change over time, volumetric video produces cubic “voxels” in 3D space. This is a more convenient representation of a scene if you want to be able to walk around it and change your perspective. When you play a video game that presents the world in 3D, you’re looking at a three-dimensional video.
This paper details advances in volumetric video presentation that significantly reduce the video RAM and data storage required to render photorealistic videos from 3D video assets. Render highly detailed scenes for over 10 minutes at 1080p resolution and 450 frames per second using a standard nVidia RTX 4090 GPU. You can also do it in real time, which allows for things like interactive camera movements.
A related technique – temporal Gaussian hierarchy – essentially looks at a scene and determines which areas within the scene are changing quickly, which areas are moving more slowly, or which are not moving at all, and then rendering Create a hierarchy of expressions so you can spend more time. Save time by processing complex, fast-moving bits and allocating less processing to slow or static bits.
The boy does a good job too. Researchers, a multinational team from Zhejiang University, Stanford University, and the Hong Kong University of Science and Technology, say the technology allows them to generate 18,000 frames of video using just 17.2 GB of VRAM and 2.2 GB of storage, which is 30 times faster, respectively. The company said it was a 26-fold reduction. , compared to previous state-of-the-art 4K4D methods.
If you’re familiar with this kind of thing, check out the video below for a more detailed explanation.
(SIGGRAPH Asia 2024 (TOG)) Representation of long volume videos using temporal Gaussian hierarchy
Whatever the magic behind it, the results are extraordinary, as you can see in the videos embedded throughout this piece. Just the way the hair is rendered shocks my little heart. Again, this is real-time on standard, even high-end consumer video cards.
Efficient and instantaneous rendering of this kind of complex 3D world can become an important part of the Holodeck VR experience. If you can generate volumetric video at 450 frames per second, you can use Apple Vision Pro to generate stereo 1080p vision at 225 frames per second in a VR headset, as shown below.
This is pretty crazy stuff, but it’s another reminder of the breakneck acceleration we’ll see in multiple sectors in 2024. It’s very nice.
Source: GitHub via Min Choi