Single-View 3D Object Reconstruction from Shape Priors in Memory
Shuo Yang, Min Xu, Haozhe Xie, Stuart Perry, Jiahao Xia | CVPR 2021
In a Nutshell 🥜
Reconstructing 3D objects from a single RGB image is an ill-posed problem due to the infinite possibilities that could exist on the unseen side of an image. Based on the intuition that we humans can still picture and approximate the shapes fairly well with our memory, Shuo et al.1 proposed to use a memory network that stores 3D volumes from training set as priors for inference.
Specifically, given an image feature at inference time, the feature is compared with all the keys and K most similar keys were extracted, referring to K volumes as priors. The sequence of priors is fed into an LSTM encoder to acquire a shape prior vector. The two features are then sent through the shape decoder for reconstruction.
The paper achieved state-of-the-art results on ShapeNet dataset in terms of reconstructing voxel volumes, and have shown very good results on real-world images in Pix3D.
Some Thoughts ðŸ’
Some interesting are to explore is how well this type of memory-based network extend to few-shot scenarios where a kind of object is very limited during training.
Yang, S., Xu, M., Xie, H., Perry, S., & Xia, J. (2021). Single-View 3D Object Reconstruction from Shape Priors in Memory. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 3152-3161).