Operational pipeline for large-scale 3D reconstruction of buildings
from satellite images
Automatic 3D reconstruction of urban scenes from stereo pairs of satellite images remains a popular yet challenging research topic, driven by numerous applications such as telecommunications and defence. The quality of reconstruction results depends particularly on the quality of the available stereo pair. In this paper, we propose an operational pipeline for large-scale 3D reconstruction of buildings from stereo satellite images. The proposed chain uses U-net to extract contour polygons of buildings, and the combination of optimisation and computational geometry techniques to reconstruct a digital terrain model and a digital height model and to correctly estimate the position of building footprints. The pipeline has proven to be efficient for 3D building reconstruction, even if the close-to-nadir image is not available.
A few recent years have witnessed an increasing interest in the topic of 3D reconstruction of urban scenes from stereo satellite images. While until recently the quality of satellite imagery coupled with existing methodologies did not allow to produce 3D city models at a high-spatial-resolution in an automatic way , very-high-resolution commercial satellites (Worldview, Pleiades) launched in the last decade acquire high-quality stereo images all over the Earth, with a spatial resolution of up to 30 cm/pixel. This boosted the development of stereo reconstruction methods in the remote sensing community.
One of the first methods for urban scene reconstruction in LOD1 (model where buildings have flat roofs) has used a semi-global matching (SGM) technique  to find correspondences in a stereo pair of epipolar images, followed by a joint classification using image radiometry coupled with estimated elevation information to retrieve 3D city models . Even though this method offered a solution for 3D urban reconstruction at a large scale, small geometries could not be captured precisely. The recently released benchmarks for large-scale semantic 3D reconstruction [4, 5] further intensified the research on this topic. The winning solutions of the 2019 IEEE GRSS data fusion challenge mostly used U-net or ResNet for semantic labelling, and SGM or pyramid stereo matching network for disparity estimation .
In most existing works on semantic 3D reconstruction from a stereo pair of satellite images [6, 7], the output is a disparity map together with semantic segmentation of one or both given images. In this paper, we propose a complete operational pipeline, which takes as an input one stereo pair of satellite images and the corresponding rational polynomial coefficient (RPC) models , and automatically reconstructs a 3D model, consisting of a digital terrain model (DTM) and vectors of building footprints together with their heights in LOD1. One of the important contributions is a method which automatically projects building rooftops extracted by U-net from a single image to footprints (bases of buildings in the geographical coordinate system).
2. PROPOSED PIPELINE
The proposed chain for large-scale 3D reconstruction of urban scenes in LOD1 is described in Fig. 1.