At Luxcarta, we recently developed a novel, deep learning technique for 3D building extraction from textured meshes. The process significantly speeds up the creation of accurate 3D maps of dense urban areas.
- Accurate 3D maps of urban areas are vital to many industries but are challenging and time-consuming to produce.
- We developed a 3D 0 building extraction method that uses deep learning techniques to generalize and speed up the whole process.
- Learn about our new process and how we evaluated it.
- Read about potential uses for this new technique.
In recent years, vast swathes of the planet’s surface have been photographed using satellites, aircraft, drones, and other methods. Using powerful computers, it is now possible to stitch together these images to create a ‘textured mesh’. Using 3D building extraction from meshes, we can further enhance them. This process cuts out building footprints with heights from the mesh. It allows us to identify individual structures.
This is an incredibly powerful tool. Polygonal extraction of building footprints allows urban planners, architects, utilities providers, engineers, and many other professionals to achieve a far deeper understanding of urban areas and building heights for all sorts of purposes.
However, 3D mesh building modeling is typically very time-consuming and resource-intensive. So, we decided to experiment with a new deep-learning method for building segmentation from colour images with elevation data that speeds up the process of generating accurate LoD2 buildings. We show the performance and potential of our new method by evaluating it on three worldwide cities with different characteristics – which we presented at the SPIE conference in October 2023 (you can read the paper here).
A faster method for 3D building extraction is needed
For many years, cartographers have been able to manually ‘cut out’ building footprints from images and add them as a layer in their GIS mapping systems. However, this process is very time-consuming. Similarly, various techniques also exist for turning 2D aerial or satellite images into a 3D model. But again, this process tends to be resource-intensive and can take several days or even weeks to complete.
This is problematic for several reasons.
First and foremost, it adds a significant delay to any project. Imagine that a city wanted to create a map of the urban environment to help plan their flood defences. Creating a detailed, 3D map would usually add several weeks to the process – and may also require skilled (and expensive) consultants.
There’s also the issue of change. In modern cities, new buildings – both permitted and unofficial (i.e., informal housing) – can be added rapidly and so existing maps can quickly go out of date. If a utilities business wants to build new electricity lines, they need the most up-to-date maps to know where buildings are, and their height. If new structures have appeared in formerly empty space, then this could seriously disrupt the plans. Being able to create up-to-date and accurate 3D maps is therefore very valuable.
Another common problem is image noise and distortions. Satellite and aerial images must be orthorectified (the process of correcting images so they appear as if the photo was taken from directly above). But in urban environments, this can be very challenging – sometimes tall buildings obscure lower-level buildings next to them. The ability to identify these sorts of issues – and correcting them – usually requires highly experienced technicians.
A new process for 3D mesh building extraction
Recent advances in deep learning techniques present tantalising possibilities for polygonal extraction of building footprints from imagery. At Luxcarta, we wanted to explore the possibilities for the semantic segmentation from textured 3D meshes.
First, some definitions can be helpful:
- A ‘textured 3D mesh’ simply refers to a 3D map of a place, where coloured triangles are placed over surfaces and objects in the image. For example, in a map of a street, all pixels containing buildings might be shaded blue, all pixels containing roads red, and all pixels containing vegetation green. The aim is to help with recognising features of interest.
- ‘Semantic segmentation’ is a computer pattern recognition technique. Simplifying somewhat, labels are applied to pixels that contain the same categories of object. Rather than a human searching for the interested objects in the image and applying the corresponding labels to each pixel, deep learning techniques allow a machine to do this smartly, rapidly, and at scale.
Overview of the techniqueFor a complete description of our method, read the paper which was published in the SPIE journal. But here’s an overview of our technique:
- First, we collected 50km2 textured meshes of 22 different cities around the world from various resources
- We generated orthorectified images and their corresponding elevation maps from these meshes as our training data. Our experienced technicians manually generate Ground Truth building polygons. They then verify and validate these polygons.
- We used the U-Net based CNN architecture for semantic segmentation. An orientation map is also generated as an output layer to facilitate the polygonization. The segmentation quality is improved by leveraging multiple learning tasks our method.
- We applied our self-developed geometrical polygonization algorithm for compact LoD1 building vectorization. Our automatic pipeline also generates a high-quality Digital Height Model. This model is created by extracting the Digital Terrain Model. It assigns height values to building polygons.
- We then conducted various tests to validate the quality and accuracy of the 3D building models we had generated. We select three typical cities from Brazil, the USA, and France. Each represents a different urban design style. This selection showcases the visual quality and statistical accuracy of our method. For detailed technical statistics and comparisons, please refer to our paper: SPIE 2023.
The results were impressive. Our model demonstrated high (90%+) levels of accuracy (precision and recall), automatically identifying large numbers of building structures, their elevations, and footprints in very different urban environments – from suburban US cities, to compact semi-formal structures in Brazil, through to mixed building types in France. Most importantly, the process was significantly faster than manual polygonal extraction of building footprints. We estimate it could deliver afourfold increase in productivity.
Qualitative evaluation on Rio de Janeiro, Brazil: segmentation
Qualitative evaluation on Rio de Janeiro, Brazil: polygonization
Implications for our new model
Our new method for building extraction via semantic segmentation from textured 3D meshes has multiple potential use cases in almost any sector that requires accurate maps of towns and cities. The fact that it offers a much faster and more accurate method of building 3D models of large areas than what has been previously possible is particularly valuable, particularly in challenging areas. Here are just some example use cases:
- Deep learning in urban design and architecture: Will a new building be overlooked by neighbours? How will it change the feel of a street? Where will shadows be cast by a new structure? Our model can help architects and urban designers have a much better understanding of the impacts of their structures through accurate, compact, and light building layers rather than heavy meshes or other low-quality products
- Utilities: Our model can help utilities companies plan pipe networks, electricity cables, and other infrastructure depending on the height and location of buildings.
- Urban planning: The ability to quickly generate an accurate model of a town or city allows urban planners to better plan out interventions. From waste collection to cycle paths to the location of EV charging points or flood defences, accurate 3D maps allow them to plan the most effective possible interventions.
- Logistics: When planning routes for drivers, cycle couriers, or even drone delivery, an accurate view of building height, location, parking, and space is essential to logistics businesses.
- Telecoms: When planning out the location of cell towers, 5G infrastructure, or RF networks, telecoms firms need an extremely detailed view of what buildings are where. Our model helps them to clearly identify line of sight obstacles.
Need to map your world faster?
Our new 3D building extraction technique from meshes is robust, reliable, accurate, and fast. We can significantly speed up the mapping of towns and cities. We achieve this by efficiently and effectively applying deep learning techniques. These techniques perform semantic segmentation of textured 3D meshes.
For support with rapidly and accurately mapping your town or city, contact Luxcarta today.