Given a single image of a target object, image-to-3D generation aims to reconstruct its texture and geometric shape. Recent methods often use intermediate media, such as multi-view images or videos, to bridge the input image and the 3D target, guiding the generation of both shape and texture. However, inconsistencies in the generated multi-view snapshots often introduce noise, artifacts, and errors along object boundaries, undermining the 3D reconstruction process. To address this, we leverage 3D Gaussian Splatting (3DGS), a powerful framework for 3D reconstruction, and integrate uncertainty-aware learning into its process. By leveraging the stochasticity between two Gaussian models, we estimate an uncertainty map, which is then used for uncertainty-aware regularization to reduce the impact of inconsistencies. Specifically, we optimize both Gaussian models simultaneously, calculating the uncertainty map by evaluating the differences between rendered images from identical viewpoints. We then apply adaptive pixel-wise loss weighting during training to regularize the models, reducing reconstruction intensity in high-uncertainty regions. This approach dynamically detects and alleviates conflicts in multi-view labels, producing smoother results and effectively mitigating artifacts. Extensive experiments demonstrate the effectiveness of our method in improving 3D generation quality by reducing inconsistencies and artifacts.
Overview of the RIGI pipeline, which takes a reference image as input and produces 3D assets as output. We adopt a two-stage approach, first using a multi-view video diffusion model to generate dense and high-quality frames, which are then served as pseudo-labels to guide 3D asset optimization. Specifically, we first use SV3D to generate multiple videos with a wide range of viewpoints, which serve as pseudo-labels for 3D asset optimization. Next, we introduce uncertainty-aware learning, estimating an uncertainty map by leveraging the stochasticity of two simultaneously optimized Gaussian models. Finally, we apply uncertainty-aware regularization to mitigate the impact of inconsistencies in the generated pseudo-labels, resulting in high-quality and visually impressive 3D assets.