Abstract:The coding-decoding model is used to realize the 3D reconstruction of a single image. Due to the lack of feature extraction ability, the details of the reconstructed object are missing. To solve this problem, this paper proposes a 3D reconstruction model based on attention and intermediate fusion representation, aiming at the reconstruction of a fine 3D model.The axial spatial attention mechanism is used to learn information from different directions and embed it into residual elements to capture local structural features. Based on the dual-flow network, the depth map and 3d average shape are deduced to design the intermediate fusion representation module, which effectively integrates the visible surface details and better describes the 3d spatial structure of the object. The experimental results show that the axial spatial attention mechanism and the intermediate fusion representation module enhance the ability of feature extraction, IoU and f-score are improved by 1.3% and 0.4% respectively compared with PixVox++, and the 3d reconstruction effect is better.