This example project combines several popular computer vision methods and uses Rerun to visualize the results and how the pieces fit together.
By combining MetaAI's Segment Anything Model (SAM) and Multiview Compressive Coding (MCC) we can get a 3D object from a single image.
The basic idea is to use SAM to create a generic object mask so we can exclude the background.
The next step is to generate a depth image. Here we use the awesome ZoeDepth to get realistic depth from the color image.
With depth, color, and an object mask we have everything needed to create a colored point cloud of the object from a single view
MCC encodes the colored points and then creates a reconstruction by sweeping through the volume, querying the network for occupancy and color at each point.
This is a really great example of how a lot of cool solutions are built these days; by stringing together more targeted pre-trained models. The details of the three building blocks can be found in the respective papers: