RoBus: A Multimodal Dataset for Controllable Road Networks and Building Layouts Generation

 Tao Li, Ruihang Li, Huangnan Zheng, Shanding Ye, Shijian Li, Zhijie Pan
Department of Computer Science and Technology
Zhejiang University

Corresponding Author. The authors’ emails are {litaocs, 12221089, hnz, ysd, shijianli, zhijie_pan}@zju.edu.cn.
Abstract

Automated 3D city generation, focusing on road networks and building layouts, is in high demand for applications in urban design, multimedia games and autonomous driving simulations. The surge in generative AI models facilitates the design of city layouts in recent years. However, the lack of high-quality datasets and benchmarks hinders the progress of these data-driven methods in generating road networks and building layouts. Furthermore, few studies consider urban characteristics, which are generally analyzed using graphics and are crucial for practical applications, to control the generative process. To alleviate these problems, we introduce a multimodal dataset with accompanying evaluation metrics for controllable generation of Road networks and Building layouts (RoBus), which is the first and largest open-source dataset in city generation so far. RoBus dataset is formatted as images, graphics and texts, with 72,4007240072,40072 , 400 paired samples that cover around 80,000km280000𝑘superscript𝑚280,000\,km^{2}80 , 000 italic_k italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT globally. We analyze the RoBus dataset statistically and validate the effectiveness against existing road networks and building layouts generation methods. Additionally, we design new baselines that incorporate urban characteristics, such as road orientation and building density, in the process of generating road networks and building layouts using the RoBus dataset, enhancing the practicality of automated urban design. The RoBus dataset and related codes are published at https://github.com/tourlics/RoBus_Dataset.

Keywords Dataset  \cdot Generative Design  \cdot Road Networks  \cdot Building Layouts

Refer to caption
Figure 1: An example from the RoBus Dataset, including images, graphics, and texts that describe road networks and building layouts in a city tile. The RoBus Dataset is scalable for various 3D urban generation tasks and practical applications.

1 Introduction

The generative design of three-dimensional (3D) cities is of considerable importance across multiple fields such as urban planning [50, 25], multimedia games [38], and autonomous driving simulations [36]. Road networks and building layouts are core components in designing or generating 3D urban configurations. The manual design of these components has been prevalent in both the game industry and urban planning for a long time, which is often criticized for being costly expensive and time-consuming. Consequently, there is a growing demand for automated approaches to generate diverse and extensive urban layouts tailored to specific city characteristics, which has prompted a significant increase in research on automated urban generation recently. Such research not only alleviate the limitations of manual design but also deepens the understanding of our living environments.

Traditional procedural modeling methods progressively generate road networks or building layouts based on grammars or L-systems [30, 23, 22], which rely on expert knowledge to manually design certain rule sets. With the rapid advancement of generative artificial intelligence (AI), new approaches employing deep generative models have emerged in numerous design tasks. Highly relevant and prevalent research topics include the generation of poster layouts and house floor plans [15, 45], where numerous data-driven methods have been proposed to produce varied and contextually appropriate results, significantly reducing the reliance on extensive domain knowledge [40]. Overall, research on these small-scale (concerning the size of entities) layout generation has become mature due to the availability of numerous public datasets [49, 43]. In contrast, methods for large-scale urban layout generation remain relatively immature. Existing works often rely on a limited amount of self-collected data (generally private) for specific tasks, such as road completion [10] and buildings generation [41]. Obviously, the lack of adequate, high-quality, open-source datasets and benchmarks significantly hinders the progress of data-driven methods for city-scale road networks and building layouts. Furthermore, few existing studies consider urban characteristics, where graphics such as road network typologies and vectored buildings are generally expected, during the generative process. This oversight of graphic data and urban features results in existing methods being sub-optimal for synthesizing new road networks and building layouts with desired properties. These capabilities are crucial for practical applications in fields like urban planning and the game industry.

To address these challenges and support the advancement of techniques for automated 3D city generation, we introduce the RoBus dataset for controllable Road Networks and Building Layouts Generation. As shown in Fig. 1, the RoBus dataset includes images, graphics and texts, providing a comprehensive description of 3D urban layouts, which is the first and largest multimodal dataset in generative 3D urban designing. The RoBus dataset contains 72,4007240072,40072 , 400 paired samples that cover approximately 80,000km280000𝑘superscript𝑚280,000km^{2}80 , 000 italic_k italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT of different places across the world, showcasing remarkable diversity. Additionaly, the RoBus dataset is scalable for various tasks related to 3D urban generation, such as geometry constrained city layouts generation or competition, road graph generation, vectored building layouts generation, text to image generation and so on. To validate the effectiveness of the RoBus dataset, we apply prevalent methods for road network and building layout generation using the dataset. Besides, we establish a comprehensive benchmark to assess the quality, diversity, validity and urban properties of generated results. More importantly, we propose a baseline that integrates road attributes into generative models to synthesize desired road layouts, and enhances the traffic convenience by concentrating on topological structure of generated road networks. For the building layouts generation, we design a baseline that directly generate vectored buildings with height information, by incorporating building attributes (shown in Fig. 2) into latent space in the model to generate building layouts conditioned on the user-defines density. To validate the applicability of the RoBus dataset and proposed baselines, we apply the generated road networks and building layouts in Carla[9], which is designed for autonomous driving simulations based on the widely used UnrealEngine in game industry.

The proposed RoBus Dataset is characterized by its diversity, scalability, usability, and applicability. Our contributions can be summarized as follows:

  • We release RoBus dataset, the first and largest multimodal dataset in generative 3D urban design, including images, texts and graphics such as topological road networks and vectored building with height.

  • We propose two baselines that incorporate urban characteristics, such as road orientation and building density, into the generative process of deep-learning models based on the RoBus dataset.

  • We establish a benchmark to evaluate existing generative methods for road networks and building layouts in aspect of quality, diversity, validity and urban properties of synthesized results.

  • Experiments on prevalent generative tasks related to 3D urban generation demonstrate the usability and scalability of the RoBus dataset. We also extend the generated results into game engines to showcase the application values.

2 Related Works

Computer-aided urban design has been a significant focus of computer graphics research for many years. This field has witnessed a transition from traditional procedural modelling to data-driven generative AI models. Traditional methods [30, 23] depend on expert knowledge to manually create rules for generating roads, buildings or terrain. With the recent increase in deep generative models, more researchers are exploring the potential of these models for urban design. In this section, we briefly review these AI-based methods for designing road networks and building layouts. Then we summarize the datasets relevant to these data-driven approaches.

2.1 Road Networks Generation Methods

Methods based on generative AI models usually learn the distribution of road networks from real world, free from the dependency on extensive domain knowledge. These methods can be generally divided into image-based and graph-based approaches. Image-based road network generation approaches treat road network generation as an image generation problem. These methods utilize models such as generative adversarial networks (GANs) or variational autoencoders (VAEs) to learn the pixel-level distribution of road networks within images. Hartmann et al. first proposed the GAN-based road network generation pipeline StreetGAN [13]. Subsequently, numerous studies [19, 10, 47] have attempted to use conditional GANs to generate roads according to some user-defined inputs or contextual content. Besides GAN-based models, Murcio et al. [20] trained a VAE model to capture real-world road patterns and generate new roads by sampling the encoded features. To solve the problem of generating large-scale road networks, Birsak et al. [2] introduced the Variational Quantized VAEs into large-scale urban road network generation by incorporating population constraints. More recently, Przymus et al. [31] and Qin et al. [32] applied diffusion models to generate maps via text prompts. Graph-based road network generation approches model the road network as a topological planar graph consisting of nodes and edges, framing the road network generation as a planar graph generation problem. Inspired by traditional turtle graphics methods, Chu et al. [8] proposed the Neural Turtle Graphics (NTG) model for generating road graphs, which employs an encoder-decoder with a recursive neural network to process and generate road networks. However, it is primarily limited to understanding patterns within a specific local scope and is suited only for small-scale road network generation. To address this limitation, Owaki et al. [28] introduced RoadNetGAN, applying the principles of NetGAN [4] to large-scale road network generation using random walks. However, it struggles to produce diverse and plausible road configurations, often resulting in numerous sharp turns. Overall, Image-based approaches for road networks generation suffer from modeling topological features, while graph-based approaches struggle to encode spatial information. Furthermore, both of them fail to generate road networks with the desired properties.

2.2 Building Layouts Generation Methods

Research on building layouts generation have certain similarities with generating road network layouts, as both learn spatial patterns from real-world design instances. Models like LayoutGAN++ [21] and LayoutVAE [18] provide basic frameworks for layout generation, which are widely used and improved in tasks such as document and poster layouts generation [48, 33, 5], house floor planning [6, 27]. When it comes to larger-scale building layouts generation, Fedorova et al. [12] and Quan et al. [34] use GAN-based methods to generate building layouts for various cities, demonstrating the model’s adaptability to different urban morphologies. BlockPlanner [45] is proposed for generating building layouts in dense urban blocks, assuming buildings are in rectangular shapes within rectangular city plots, and uses graph models to manage the building layouts. BlockPlanner is limited for more complex block shapes, irregular buildings, and large-scale building layouts. To address these limitations, He et al. [14] introduced a VAE based graph attention network for modelling and generating building layouts, capable of handling arbitrary block shapes and building types. Additionally, generative AI design for building layouts has widely attracted experts on urban designing [25, 17, 39], which prompts the application in urban analysis and planning.

2.3 Datasets for Generative Urban Design

Generally, small-scale layout generation tasks such as poster or indoor room layout are relatively mature and widely studied, benefiting from numerous public dateset such as Magazine [49] and RPLAN [43] dataset, which collects 3,919 magazine pages and 80K room layouts respectively. For larger-scale generation tasks such as urban road networks and building layouts generation, it have been a long time that struggling with limited amount of data. For instance, Yan et al. [46] collect 2,19421942,1942 , 194 samples to train a model that classify building patterns. Wu et al. [41] collect data from Singapore to generate building layouts based on road networks. Przymus et al. [31] provide large amount raw maps for text to image generation. Thanks to recent works published by Chen et al. [7] and He et al. [14], which collects large amount of data for community building layouts generation, prompts the advancement of large-scale generation methods. However, these open-source data lacks of the city-scale buildings and matched road networks, which has a gap to applied to large-scale city generation. Situations in road network generation are even worse. Researchers on this area are struggling with inadequate data samples. For example, Chu et al. [8] collect 17171717 unique cities and select the most densely annotated 10km210𝑘superscript𝑚210km^{2}10 italic_k italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT region within each city from OSM by complex preprocessing. Owaki[29] et al. collect 4 cities from Japan of 10km210𝑘superscript𝑚210km^{2}10 italic_k italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Yang et al. [47] collect 2586258625862586 road images from three major Australian cities. Birsak et al. [2] collect around 400km2400𝑘superscript𝑚2400km^{2}400 italic_k italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT datas. Unfortunately, these self-collected data are of small amount and are close-sourced. Research in road networks generation are eaging for a large-scale dataset and benchmarks.

To summary, numerous datasets for poster and house floor plans are benefiting these small-scale layout generation tasks. Recent studies that focus on larger scale generation tasks, such as community layout planning, often provide datasets that only include buildings. These datasets are insufficient for city generation, as they lack data on city-scale buildings and corresponding matched road networks. Moreover, research on road network generation typically relies on self-collected data comprising a limited number of samples. This highlights a pressing need for large-scale, high-quality road network datasets and corresponding benchmarks. In this paper, we intend to overcome existing challenges in 3D urban generation through the release of the RoBust Dataset, which is expected to encourage progress in data-driven methods for generating road networks and building layouts.

Refer to caption
Figure 2: Considering urban characteristics, we classify buildings in a city block according to density and average height.
Refer to caption
Figure 3: The Generating Pipeline of RoBus Dataset.

3 RoBus Dataset

In this section, we firstly describe the proposed RoBus dataset, and then illustrate the pipeline to generate the dataset from raw data. Lastly, we analyze and explore the RoBus dataset statically and demonstrates its highlights.

3.1 Dataset Description

The RoBus dataset is a comprehensive and multimodal dataset for generative urban design, focusing on generation of road networks and building layouts. It is composed of:

Images: This component includes tif files with 6 channels, each representing a different urban element: primary roads, secondary roads, water bodies, green spaces, buildings with heights, and density. Each image contains coordinate reference information, which facilitates further processing and expansion.

Graphics: This component includes simplified road network graphs in gpickle format and vectored buildings with heights in geojson format. These 3D vectored buildings are provided at both city tile and block scales.

Texts: This component encompasses statistical values, labels, and descriptive texts that detail characteristics of urban tiles, such as road orientation, traffic convenience, building height, and density.

The RoBus dataset encompasses an extensive area, covering approximately 80,0008000080,00080 , 000 square kilometers across multiple regions in Australia, China, Europe, and the United States. This broad geographic coverage ensures that the dataset captures a diverse range of urban layouts. It is designed to be scalable for a variety of existing generative tasks related to road networks and building layouts, including: Geometry Constrained Generation such as generating secondary roads based on primary road networks, roads based on density and land-use maps [47], building layouts based on road layouts [41], or boundary maps [7]. Graphic Generation such as generating topological road networks [8, 29] and vectored building layouts [14]. Text-to-Image Generation techniques, such as urban map generation [31] and completion [32], utilize text descriptions with popular models like CLIP and Stable Diffusion. Others including urban analysis and planning [3], road network competition [10], and road topology extraction [24].

3.2 Data Collection and Generating Pipeline

As shown in Fig. 3, our data collection and generating pipeline can be generally summarized as follows:

Stage 1: Data Collection and Preprocessing. To capture the spatial patterns of road networks and building layouts on a large scale, we collect raw data from OpenStreetMap 111https://extract.bbbike.org/ to extract road networks, rivers, greenery and building contours by filtering different OSM tags, which are detailed in the appendix. However, data from OSM are generally noisy and incomplete, and thus cannot be directly applied to model training without complex preprocessing [7]. To clean the road network data, we rasterize roads into images and then apply thinning to modify the roads, as suggested by [41]. Additionally, we observed that building contours from OSM frequently lack height information and are often incomplete. To address these limitations, we enhanced the extracted OSM data with publicly available datasets regarding building heights. Notably, we collected data from Microsoft222https://planetarycomputer.microsoft.com/dataset/ms-buildings for areas in Australia, Europe, and the United States, as well as the CNBH dataset [44] for areas in China. We align OSM building contours with CNBH tiles and calculate the average height values of valid pixels. For areas with missing values, we estimate the heights using nearby buildings within a 300m300𝑚300m300 italic_m radius. If data are still unavailable, we default to a height of 24242424 meters. Through this meticulous preprocessing, we aimed to enhance the quality and reliability of the building information in the RoBus dataset, which is crucial for accurate urban modeling and generation.

Step 2: Construction of Images. To construct the image component of the RoBus dataset, we rasterize the preprocessed data to produce tif files with six channels. These channels include primary roads (Road-P), secondary roads (Road-S), water bodies (rivers or lakes), green spaces, density [2], and buildings with height information. We choose the Pseudo-Mercator projection (EPSG:3857) as coordinate reference system (CRS) to these tif images, with a resolution of 5 meters per pixel. Given that each geojson file may have slightly different boundaries when rasterized, we identified the common area and align them to ensure consistency. After alignment, we crop the tif images into small tiles with a 20% overlap, each sized at 256×256256256256\times 256256 × 256 pixels, corresponding to an actual area of approximately 1.6km21.6𝑘superscript𝑚21.6km^{2}1.6 italic_k italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. These tiles with 6666 channels constitute the image part of the RoBus dataset, which supports to various image-based generation tasks. Additionally, we preserve the coordinate system and geographic coordinates during the cropping process, aiming to facilitate further research on larger-scale generation tasks.

Step 3: Construction of Graphics. To construct the topological graph of road networks, we combine the first two channels of the images, which represent primary and secondary road networks, and then skeletonize the result using morphological thinning methods. Given that skeletonized road graphs are densely populated with redundant nodes, leading to excessive computational costs for further processing, we simplify these graphs using Eq. 1. This simplification process preserves critical features such as road joints and sharp turns. In Eq. 1, 𝐞kisubscript𝐞𝑘𝑖\mathbf{e}_{ki}bold_e start_POSTSUBSCRIPT italic_k italic_i end_POSTSUBSCRIPT represents the vector from node vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT to node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in the road graph GRsubscript𝐺𝑅G_{R}italic_G start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT. The function m()𝑚m(\cdot)italic_m ( ⋅ ) refers to the operation that merges two closely positioned nodes, and Ctr𝐶𝑡𝑟C{tr}italic_C italic_t italic_r is the threshold defined for smoothness. The simplified road network graph serves three primary purposes. First, it facilitates the computation of topological attributes essential for transportation research, crucial for generating the text domain in our dataset. Second, the topology of road networks typically forms cycles that outline city blocks. Third, the graphic structure is directly applicable to topological road generation methods like random walks [8].

Ctr<|cos(𝐞ki𝐞kj)|1,vk{vm(𝐆𝐑)|deg(v)=2}formulae-sequencesubscript𝐶𝑡𝑟subscript𝐞𝑘𝑖subscript𝐞𝑘𝑗1subscript𝑣𝑘conditional-set𝑣𝑚subscript𝐆𝐑𝑑𝑒𝑔𝑣2C_{tr}<|\cos{(\mathbf{e}_{ki}\cdot\mathbf{e}_{kj})}|\leq 1,v_{k}\in\{v\in m(% \mathbf{G_{R}})|deg(v)=2\}italic_C start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT < | roman_cos ( bold_e start_POSTSUBSCRIPT italic_k italic_i end_POSTSUBSCRIPT ⋅ bold_e start_POSTSUBSCRIPT italic_k italic_j end_POSTSUBSCRIPT ) | ≤ 1 , italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ { italic_v ∈ italic_m ( bold_G start_POSTSUBSCRIPT bold_R end_POSTSUBSCRIPT ) | italic_d italic_e italic_g ( italic_v ) = 2 } (1)

We provide vectored building layouts with heights at both the city tile scale and the city block scale. To generate building layouts at the block scale, we use boundaries that are automatically partitioned by road networks. A critical challenge in this process is to efficiently identify as many geometric minimal cycles within the road network’s topology as possible. For clarity, we define the geometric minimal cycle as a cycle in a graph with geometric coordinates that does not enclose any other cycles. Most existing algorithms primarily focus on finding basic cycles or enumerating all cycles within a topological graph, without considering the geometric positions. To address this gap, we develop a new algorithm tailored for finding geometric minimal cycles. Initially, we iteratively remove all vertices with a degree of one. Next, for each node with a degree of two, we identify simple paths that include the node and its two immediate neighbors, restricting the path length to a maximum of 12 edges to ensure efficiency. Finally, we examine all simple paths to pinpoint the shortest cycles, designated as geometric minimal cycles. We repeat the last two steps until no vertices remain in the graph. Details can be found in the appendix.

The topological graph of road networks and the vectored buildings with heights at both the city tile scale and the city block scale constitute the graphics component of the RoBus dataset. These graphics are crucial for modeling and generating urban layouts with specific properties, essential for advanced 3D urban analysis and planning.

Step 4: Construction of Texts. We analyze the attributes of the road networks and building layouts statically to generate labels and texts. Drawing on urban planning research, we categorize the road networks based on their density and orientation. Road density is classified as either ’dense’ or ’sparse’, influencing traffic flow and accessibility. The orientation, assessed through the entropy of street bearings [3], indicates the degree of order (e.g., grid-like road networks) or disorder (e.g., random road networks) in the road layout. We also categorize buildings based on density and average height, which includes categories illustrated in Fig. 2. On basis of these labels and categories, we generate descriptive text using predefined templates, where sentences begin with "OSM," as suggested by [31]. These texts serve as prompts to describe the characteristics of urban areas and hold potential for application in text-to-image models.

Step 5: Filter for Different Tasks. For specific generation tasks like creating building layouts from road networks, we filter out tiles that lack roads or buildings. This filtering process is similarly applied to other targeted tasks.

3.3 Dataset Analysis and Highlights

Table 1: Statistics of RoBus Dataset.
Stats Road Length (km𝑘𝑚kmitalic_k italic_m) # of Buildings # of Boundries
Max 35.13 4196 66
Mean 8.64 312.10 6.38
Total 625,944 22,596,169 461,666
Refer to caption
Figure 4: Visualization of Statics of RoBus Dataset.

To provide a more detailed demonstration of the RoBus dataset, we perform a static analysis of key urban elements such as road length, building count, and boundary count at the city tile scale, detailed in Table 1. To enhance the visualization of these static results, we present them in Fig. 4. Specifically: (a) illustrates the orientation of primary roads in Melbourne as selected from the RoBus dataset. (b) displays the proportion of different building types within a city block. (c) and (d) show the distribution of road orientation and road density in the RoBus dataset, respectively. We compare the RoBust dataset with existing relevant dataset, as shown in Tab. 2. The RoBus dataset is the largest of its kind with 72,4007240072,40072 , 400 paired samples that include multimodal data such as road graphs, building vectors, and labels, which are missing in existing datasets.

Table 2: Comparision with exisiting datasets
Dataset/Papers Accessibility Covering(Samples#) Road Graph Building Vec Labels Tasks
NTG [8] Closed 170km2170𝑘superscript𝑚2170km^{2}170 italic_k italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Road Graph Generation
Owaki et al. [28] Closed 11.56km211.56𝑘superscript𝑚211.56km^{2}11.56 italic_k italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Road Graph Generation
Fang et al. [11] Closed #56,000#56000\#56,000# 56 , 000 Image Road Network Completion
Birsak et al. [2] Closed 400km2400𝑘superscript𝑚2400km^{2}400 italic_k italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Image Road Image Generation
GanMapper [41] Closed #12,139#12139\#12,139# 12 , 139 Image Building Layouts Generation
Reco [7] OPEN #37,646#37646\#37,646# 37 , 646 Community Layouts generation
RoBus OPEN 80000km2(#72,400)80000𝑘superscript𝑚2#7240080000km^{2}(\#72,400)80000 italic_k italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( # 72 , 400 ) 3D Urban Generation

In summary, our dataset is highlighted as: Multimodal: The dataset delivers a comprehensive multimodal description of urban scenes, integrating images, graphics, and texts. The strong correlation among these modalities guarantees a synchronized perspective of urban layouts. Diversity: The dataset covers approximately 80,000km280000𝑘superscript𝑚280,000km^{2}80 , 000 italic_k italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT across various locations worldwide, showcasing extensive diversity. Scalability: The dataset supports a variety of tasks relevant to 3D urban generation, including geometry constrained image generation, graphics generation and text to image generation. Usability: Experiments have been conducted on multiple tasks using this dataset, demonstrating its effectiveness and usability. Applicaplity: The dataset accompanying with proposed generation baseline can be readily applied to 3D games. We expect the RoBust dataset to encourage progress of data-driven methods for 3D urban design.

4 Experiments

In this section, we design and conduct experiments to address the following research questions (RQs) both qualitatively and quantitative using the proposed benchmarks detailed in Section 5. We target at demonstrating the effectiveness, scalability, and applicability of the RoBus dataset, as well as the proposed baselines that incorporate urban attributes into the generative design process.

RQ1: Can the dataset be applied to existing road networks and building layouts generation methods conditioned on geometric constraints, and how does it perform?

RQ2: Can the dataset be applied to solve the proposed tasks that generating urban layouts with desired properties? Does it make any difference to the generated results?

RQ3: Are the generated results applicable for use in 3D games or autonomous driving simulations?

RQ4: How does the image resolution, size and distribution of the dataset affect the quality of the generated results?

4.1 Generation based on Geometry Constraints

We select two representative tasks, generating road networks conditioned on landuse maps (Task I) and generating building layouts conditioned on road networks (Task II), to conduct the geometry constraints based urban generation. GANs have marked a significant advancement in generating road networks [13, 47] and building layouts [41, 42, 17]. Given the widespread adoption of GANs, we trained the popular model pix2pix [16] for Task I and II.

4.2 Generating with Desired Properties

Refer to caption
Figure 5: Framework of Task III.

For the task of generating road networks (Task III), we introduce a baseline model that incorporates desired characteristics, such as grid-like road networks with low orientation order, and optimize road connectivity to achieve higher traffic convenience. To construct the simplest version of our baseline, we adapt the pix2pix model to constitute the backbone, as shown in Fig 5. The channel number of input local density maps and the targeting road network maps in the generator are limited to one. We integrate the road attributes R𝑅Ritalic_R such as global road density and orientation order in the encoded latent vector in the U-like generator. The decoder synthesize binary road images conditioned on the encoded local density maps and the concatenated road attributes vectors.

To enhance the topology of the generate roads, which have an great impact on the urban attribute of traffic convenience, we focus on the topological structure of generated roads. Specifically, we extract the topological skeleton of the synthesized images and calculate the center-line dice score to enhance the road connectivity, which has proven to be differentiable in [35, 26]. The overall loss of the generator is formulated as Eq. 2.

(Gθ)=λ11+λ2𝔼px(z|r)[logD(Gθ(z|r)]+λ3topo\mathcal{L}(G_{\theta})=\lambda_{1}\mathcal{L}_{1}+\lambda_{2}\underset{p_{x}(% z|r)}{\mathbb{E}}[-\log D(G_{\theta}(z|r)]+\lambda_{3}\mathcal{L}_{topo}caligraphic_L ( italic_G start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) = italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_UNDERACCENT italic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_z | italic_r ) end_UNDERACCENT start_ARG blackboard_E end_ARG [ - roman_log italic_D ( italic_G start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_z | italic_r ) ] + italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_t italic_o italic_p italic_o end_POSTSUBSCRIPT (2)

where 1subscript1\mathcal{L}_{1}caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is the L1 loss to evaluate the similarity of the generated image and the ground-truth images, r𝑟ritalic_r is the road attributes conditions, toposubscript𝑡𝑜𝑝𝑜\mathcal{L}_{topo}caligraphic_L start_POSTSUBSCRIPT italic_t italic_o italic_p italic_o end_POSTSUBSCRIPT is the center-line dice loss to enhance the topology of generated graph roads.

Refer to caption
Figure 6: Framework of Task IV.

For vectored building layout generation tasks, we focus on generating buildings layouts constrained by the boundary of a city block (Task IV), which match the predetermined density and heights. This is achieved by integrating building attributes into the generative models Conditional Variational Autoencoders (cVAEs). As shown in Fig. 6, we includes an autoencoder to compact the building boundaries into latent vectors b𝑏bitalic_b, and graph attention networks (GAT) [37] as the backbone of the encoder and decoder. Additionally, we encode the building heights and density as one-hot vector a𝑎aitalic_a to serve as the attribute prior of buildings in the city block. We follow GlobalMapper [14] to transform buildings into a canonical spatial format and subsequently into graph structures G𝐺Gitalic_G. To enable the learning of building heights, we add building heights to nodes’ attributes in G𝐺Gitalic_G, as well as the original building location and minimum bounding box in GlobalMapper [14].

During the training process, the conditional VAE learns to capture the distribution p(GB|b,a)𝑝conditionalsubscript𝐺𝐵𝑏𝑎p(G_{B}|b,a)italic_p ( italic_G start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT | italic_b , italic_a ) in the dataset, which represents the probability of generating a building graph GBsubscript𝐺𝐵G_{B}italic_G start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT conditioned on the encoded boundary vectors b𝑏bitalic_b and attribute vectors a𝑎aitalic_a. The learned distribution is then sampled for generating new building graphs. The model captures p(GB|b,a)𝑝conditionalsubscript𝐺𝐵𝑏𝑎p(G_{B}|b,a)italic_p ( italic_G start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT | italic_b , italic_a ) by maximizing its evidence lower bound, as frequently used in conditional VAEs [51, 14]. Additionally, to make the model focusing on building attributes such as heights, we measure the similarity with groundtruth GB^^subscript𝐺𝐵\hat{G_{B}}over^ start_ARG italic_G start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_ARG using L2 loss. To sum up, the overall loss functions is formulated as Eq. 3.

=β1||GBhGB^h||2+β2[𝔼𝑞(log(p(GB|z,b,a)))DKL(q||p(z|b,a))]\mathcal{L}=\beta_{1}||G_{B}^{h}-\hat{G_{B}}^{h}||_{2}+\beta_{2}[\underset{q}{% \mathbb{E}}(log(p(G_{B}|z,b,a)))-D_{KL}(q||p(z|b,a))]caligraphic_L = italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | | italic_G start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT - over^ start_ARG italic_G start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_ARG start_POSTSUPERSCRIPT italic_h end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ underitalic_q start_ARG blackboard_E end_ARG ( italic_l italic_o italic_g ( italic_p ( italic_G start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT | italic_z , italic_b , italic_a ) ) ) - italic_D start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT ( italic_q | | italic_p ( italic_z | italic_b , italic_a ) ) ] (3)

where hhitalic_h refers to the attributes in the graph such as heights, p(z|b,a))p(z|b,a))italic_p ( italic_z | italic_b , italic_a ) ) is the prior distribution of z𝑧zitalic_z conditioned b𝑏bitalic_b and a𝑎aitalic_a, and q𝑞qitalic_q refers to q(z|GB,b,a)𝑞conditional𝑧subscript𝐺𝐵𝑏𝑎q(z|G_{B},b,a)italic_q ( italic_z | italic_G start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT , italic_b , italic_a ), which is the approximate posterior distribution of the latent variables.

5 Results and Analysis

In this section, we introduce the benchmarks with comprehensive evaluation metrics. Additionally, we report our qualitative, quantitative and ablation results to answer the RQs in section 4.

5.1 Evaluation Metrics

To comprehensively evaluate methods related to the generation of road networks and building layouts, we introduce the benchmark that assesses the quality, diversity, validity, and urban properties of the synthesized results.

Refer to caption
Figure 7: Illustration of Traffic Convenience and Validity.

Quality. We applied commonly used FIDs [8] to evaluate the quality generated results, which are calculated through InceptionV3 by extracting the image features.

Diversity. We assess the method ability to create novel urban layouts for checking model collapse issuses. For road networks, we calculate the Chamfer Distance (CD) against all paired road graphs [8]. For building layouts, we measure the degree of overlap between the generated images and the ground truth using the Mean Intersection over Union (mIoU) [14]. Higher CD in road networks generation and lower mIoU in building layouts generation indicates results with better diversity.

Validity. It is designed for geometry constraint generation tasks such as road network constraints and boundary constraints in building layouts generation tasks. Validity is the percentage of invalid samples, which are out-of-geometry constraints as shown in Fig. 7.

Urban Properties. They are design to analysis the urban attributes of generated results. For road networks tasks, Traffic Convenience is the average value of dE(vi,vj)/dS(vi,vj)subscript𝑑𝐸subscript𝑣𝑖subscript𝑣𝑗subscript𝑑𝑆subscript𝑣𝑖subscript𝑣𝑗d_{E}(v_{i},v_{j})/d_{S}(v_{i},v_{j})italic_d start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) / italic_d start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) for all node pairs vi,vjsubscript𝑣𝑖subscript𝑣𝑗v_{i},v_{j}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT that are over 300m300𝑚300m300 italic_m, where dEsubscript𝑑𝐸d_{E}italic_d start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT and dSsubscript𝑑𝑆d_{S}italic_d start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT are euclidean distance and Dijkstra shortest distance respectively, as shown in Fig. 7. Orientation measures the entropy of street bearings [3]. Lower orientation indicates more ordered (such as grid-like) road networks. For building layouts tasks, we calculate the Wasserstein distance (WD) for building height, counts and density to measure the distribution between the generated results and ground-truth datasets.

5.2 Overall Results (RQ1,2)

We report the quantitative results for the four tasks in Tab. 3, where "DIV" stands for diversity, "O" for road orientation, "C" for traffic convenience, "Ro" for road networks and "Bu" for building layouts generation tasks. "WD" measures the distribution of building counts, and "V" refers to validity. We divided the dataset into training, validation, and testing sets with ratios of 8:1:1, respectively. The qualitative results are displayed in the (a)-th row of Figure 8, where the first and third columns show the generated results, and the second and fourth columns present the ground truth. It is evident that the model used for Task I has successfully learned the pattern that roads should not cross mountains. Compared Task I with III, both of which focus on road network generation, we conclude that Task III achieves superior image quality with a lower FID and higher road connectivity, but it also exhibits reduced diversity. Additionally, Task III is capable of generating more grid-like road networks when provided with corresponding attribute vectors.

For the building layout generation tasks (II and IV), we conclude that the model proposed in Task IV achieves much higher quality, evidenced by lower FID and WD. However, it tends to generate more buildings out of boundaries compared to the image-based model in Task II. The visualization of results for Tasks II and IV is presented in the (b)-th and (d)-th rows of Fig. 8, respectively. Additionally, the (d)-th row shows that building density attributes indeed controls the generative process.

Refer to caption
Figure 8: Examples of generated results. Row (a)similar-to\sim(d) are results from Task Isimilar-to\simIV respectively. The (e)th row is the visualization in UE of (d)th row.
Table 3: Quantitively results. Task II and IV are building layouts generation while others are road networks generation.
Baseline FID\downarrow DIV(Ro/Bu)\uparrow Ro(O\downarrow/C\uparrow) Bu(WD/V)\downarrow
Task I 27.03 22.50/– 3.27/0.75
Task II 32.73 –/79.57 7.64/1.89
Task IV 17.42 –/68.23 3.37/2.25
TaskIII(λ3subscript𝜆3\lambda_{3}italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT=0) 25.83 19.98/– 3.25/0.77
TaskIII(λ3subscript𝜆3\lambda_{3}italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT=0.3) 26.98 19.05/– 3.27/0.83
RES_5m 28.97 22.38/– 3.32/0.73
RES_10m 35.28 24.24 /– 3.38/0.69
US_1w 21.36 20.81/– 3.11/0.80
Global_1w 26.39 21.74/– 3.28/0.78

5.3 Ablation Studies (RQ2,4)

We conduct ablation studies for road network generation to answer RQ2 and RQ4 in this subsection, as presented in Tab. 3. We conclude that the methods proposed in Task III greatly enhance connectivity by utilizing topological loss. Additionally, we choose the 5m and 10m resolution (denoted "RES_5m" and "RES_10m" in Tab. 3) as comparison, and conclude that as resolution changing from 5m to 10m, the quality of generated results worsens while diversity increases. To explore the influence of dataset size, we randomly selected 10,000 samples from the entire dataset, denoted as "Global_1w". We conclude that a smaller dataset size leads to higher diversity, compared to Task I. Additionally, we compared "Global_1w" with "US_1w", which indicates selected 10,000 samples within the United States. We conclude that a smaller collection area results in lower diversity, which underscores the importance of the diverse geographic coverage provided by the RoBus dataset.

5.4 Applied to Auto Driving Simulations (RQ3)

To validate the applicability of the RoBus dataset and proposed baselines in Task III and IV, we applied the generated results to autonomous simulation softwares such as CARLA [9], which is built based on 3D game engine. As shown in last three rows of Fig. 8, our pipeline for generating 3D urban scenes proceeds as follows. Initially, we generate road network graphics with a low orientation (grid-like road networks), using the methods proposed in Task III. For each city block boundary partitioned by road networks, we generate vectored building layouts with height using the methods outlined in Task IV. We transfer the generated road networks to OpenDRIVE format, starting by partitioning the road network graph into linestrings, which are then applied with CRS (WGS84). More importantly, we construct 3D buildings based on the generated building layouts with heights, and render the white building models in UnrealEngine via randomly assigning different materials.

6 Coclusion and Future Work

In this work, we introduce the RoBus dataset, the first and largest open-source multimodal dataset designed for generative 3D urban design, specifically focusing on city-scale road networks and building layouts, which addresses the urgent need for comprehensive, high-quality training data for deep generative models. To make it more applicable, we apply the generated 3D cities in UnrealEngine. However, the automated rendering of buildings is neglected in our work, which represents a promising topic for further research. More importantly, there is substantial room for improvement on the generative model for generative 3D urban design, especially based on graphics like road networks topology and vectored buildings. We expect the RoBus dataset and proposed baselines to inspire more creative and practical work in 3D city generation for multimedia games, metaverse and other socially aware multimedia applications.

References

  • [1]
  • Birsak et al. [2022] Michael Birsak, Tom Kelly, Wamiq Para, and Peter Wonka. 2022. Large-Scale Auto-Regressive Modeling Of Street Networks. arXiv preprint arXiv:2209.00281 (2022).
  • Boeing [2019] Geoff Boeing. 2019. Urban spatial order: Street network orientation, configuration, and entropy. Applied Network Science 4, 1 (2019), 1–19.
  • Bojchevski et al. [2018] Aleksandar Bojchevski, Oleksandr Shchur, Daniel Zügner, and Stephan Günnemann. 2018. NetGAN: Generating Graphs via Random Walks. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. 609–618.
  • Chai et al. [2023] Shang Chai, Liansheng Zhuang, Fengying Yan, and Zihan Zhou. 2023. Two-stage Content-Aware Layout Generation for Poster Designs. In Proceedings of the 31st ACM International Conference on Multimedia. 8415–8423.
  • Chang et al. [2021] Kai-Hung Chang, Chin-Yi Cheng, Jieliang Luo, Shingo Murata, Mehdi Nourbakhsh, and Yoshito Tsuji. 2021. Building-GAN: Graph-conditioned architectural volumetric design generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11956–11965.
  • Chen et al. [2023] Xi Chen, Yun Xiong, Siqi Wang, Haofen Wang, Tao Sheng, Yao Zhang, and Yu Ye. 2023. ReCo: A dataset for residential community layout planning. In Proceedings of the 31st ACM International Conference on Multimedia. 397–405.
  • Chu et al. [2019] Hang Chu, Daiqing Li, David Acuna, Amlan Kar, Maria Shugrina, Xinkai Wei, Ming-Yu Liu, Antonio Torralba, and Sanja Fidler. 2019. Neural turtle graphics for modeling city road layouts. In Proceedings of the IEEE/CVF international conference on computer vision. 4522–4530.
  • Dosovitskiy et al. [2017] Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. 2017. CARLA: An Open Urban Driving Simulator. In Proceedings of the 1st Annual Conference on Robot Learning. 1–16.
  • Fang et al. [2022a] Zhou Fang, Ying Jin, and Tianren Yang. 2022a. Incorporating planning intelligence into deep learning: A planning support tool for street network design. Journal of Urban Technology 29, 2 (2022), 99–114.
  • Fang et al. [2022b] Zhou Fang, Jiaxin Qi, Lubin Fan, Jianqiang Huang, Ying Jin, and Tianren Yang. 2022b. A topography-aware approach to the automatic generation of urban road networks. International Journal of Geographical Information Science 36, 10 (2022), 2035–2059.
  • Fedorova [2021] Stanislava Fedorova. 2021. Generative adversarial networks for urban block design. In SimAUD 2021: A Symposium on Simulation for Architecture and Urban Design.
  • Hartmann et al. [2017] Stefan Hartmann, Michael Weinmann, Raoul Wessel, and Reinhard Klein. 2017. Streetgan: Towards road network synthesis with generative adversarial networks. (2017).
  • He and Aliaga [2023] Liu He and Daniel Aliaga. 2023. GlobalMapper: Arbitrary-Shaped Urban Layout Generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 454–464.
  • Inoue et al. [2023] Naoto Inoue, Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, and Kota Yamaguchi. 2023. Layoutdm: Discrete diffusion model for controllable layout generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10167–10176.
  • Isola et al. [2017] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1125–1134.
  • Jiang et al. [2023] Feifeng Jiang, Jun Ma, Christopher John Webster, Xiao Li, and Vincent JL Gan. 2023. Building layout generation using site-embedded GAN model. Automation in Construction 151 (2023), 104888.
  • Jyothi et al. [2019] Akash Abdu Jyothi, Thibaut Durand, Jiawei He, Leonid Sigal, and Greg Mori. 2019. Layoutvae: Stochastic scene layout generation from a label set. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9895–9904.
  • Kelvin and Anand [2020] Lin Ziwen Kelvin and Bhojan Anand. 2020. Procedural Generation of Roads with Conditional Generative Adversarial Networks. In 2020 IEEE Sixth International Conference on Multimedia Big Data (BigMM). 277–281. https://doi.org/10.1109/BigMM50055.2020.00048
  • Kempinska and Murcio [2019] Kira Kempinska and Roberto Murcio. 2019. Modelling urban networks using Variational Autoencoders. Applied Network Science 4, 1 (2019), 1–11.
  • Kikuchi et al. [2021] Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, and Kota Yamaguchi. 2021. Constrained graphic layout generation via latent optimization. In Proceedings of the 29th ACM International Conference on Multimedia. 88–96.
  • Lechner et al. [2006] Thomas Lechner, Pin Ren, Ben Watson, Craig Brozefski, and Uri Wilenski. 2006. Procedural modeling of urban land use. In ACM SIGGRAPH 2006 Research posters. 135–es.
  • Lechner et al. [2003] Thomas Lechner, Ben Watson, Uri Wilensky, and Martin Felsen. 2003. Procedural city modeling. In 1st Midwestern Graphics Conference, Vol. 4.
  • Li et al. [2023] Tao Li, Shanding Ye, Ruihang Li, Yongjian Fu, Guoqing Yang, and Zhijie Pan. 2023. Topology-aware Road Extraction via Multi-task Learning for Autonomous Driving. In 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2275–2281.
  • Liao et al. [2024] Wenjie Liao, Xinzheng Lu, Yifan Fei, Yi Gu, and Yuli Huang. 2024. Generative AI design for building structures. Automation in Construction 157 (2024), 105187.
  • Menten et al. [2023] Martin J Menten, Johannes C Paetzold, Veronika A Zimmer, Suprosanna Shit, Ivan Ezhov, Robbie Holland, Monika Probst, Julia A Schnabel, and Daniel Rueckert. 2023. A skeletonization algorithm for gradient-based optimization. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 21394–21403.
  • Nauata et al. [2020] Nelson Nauata, Kai-Hung Chang, Chin-Yi Cheng, Greg Mori, and Yasutaka Furukawa. 2020. House-gan: Relational generative adversarial networks for graph-constrained house layout generation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. Springer, 162–177.
  • Owaki and Machida [2020] Takashi Owaki and Takashi Machida. 2020. RoadNetGAN: generating road networks in planar graph representation. In Neural Information Processing: 27th International Conference, ICONIP 2020, Bangkok, Thailand, November 18–22, 2020, Proceedings, Part IV 27. Springer, 535–543.
  • Owaki and Machida [2022] Takashi Owaki and Takashi Machida. 2022. Road Network Generation with City Block Attributes Using Link Attribute Aggregation. In 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 262–267.
  • Parish and Müller [2001] Yoav IH Parish and Pascal Müller. 2001. Procedural modeling of cities. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques. 301–308.
  • Przymus and Szymański [2023] Marcin Przymus and Piotr Szymański. 2023. Map Diffusion-Text Promptable Map Generation Diffusion Model. In Proceedings of the 1st ACM SIGSPATIAL International Workshop on Advances in Urban-AI. 32–41.
  • Qin et al. [2024] Yiming Qin, Nanxuan Zhao, Bin Sheng, and Rynson WH Lau. 2024. Text2City: One-Stage Text-Driven Urban Layout Regeneration. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 4578–4586.
  • Qu et al. [2023] Leigang Qu, Shengqiong Wu, Hao Fei, Liqiang Nie, and Tat-Seng Chua. 2023. Layoutllm-t2i: Eliciting layout guidance from llm for text-to-image generation. In Proceedings of the 31st ACM International Conference on Multimedia. 643–654.
  • Quan [2022] Steven Jige Quan. 2022. Urban-GAN: An artificial intelligence-aided computation system for plural urban design. Environment and Planning B: Urban Analytics and City Science 49, 9 (2022), 2500–2515.
  • Shit et al. [2021] Suprosanna Shit, Johannes C Paetzold, Anjany Sekuboyina, Ivan Ezhov, Alexander Unger, Andrey Zhylka, Josien PW Pluim, Ulrich Bauer, and Bjoern H Menze. 2021. clDice-a novel topology-preserving loss function for tubular structure segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16560–16569.
  • Tian et al. [2024] Xiaoyu Tian, Tao Jiang, Longfei Yun, Yucheng Mao, Huitong Yang, Yue Wang, Yilun Wang, and Hang Zhao. 2024. Occ3d: A large-scale 3d occupancy prediction benchmark for autonomous driving. Advances in Neural Information Processing Systems 36 (2024).
  • Velickovic et al. [2018] Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks, international conference on learning representations. In International Conference on Learning Representations. 1–2.
  • Vimpari et al. [2023] Veera Vimpari, Annakaisa Kultima, Perttu Hämäläinen, and Christian Guckelsberger. 2023. “An Adapt-or-Die Type of Situation”: Perception, Adoption, and Use of Text-to-Image-Generation AI by Game Industry Professionals. Proc. ACM Hum.-Comput. Interact. 7, CHI PLAY, Article 379 (oct 2023), 34 pages. https://doi.org/10.1145/3611025
  • Wang et al. [2023] Lufeng Wang, Jiepeng Liu, Yan Zeng, Guozhong Cheng, Huifeng Hu, Jiahao Hu, and Xuesi Huang. 2023. Automated building layout generation using deep learning and graph algorithms. Automation in Construction 154 (2023), 105036.
  • Wang et al. [2024] Shiyu Wang, Yuanqi Du, Xiaojie Guo, Bo Pan, Zhaohui Qin, and Liang Zhao. 2024. Controllable Data Generation by Deep Learning: A Review. ACM Comput. Surv. (mar 2024). https://doi.org/10.1145/3648609 Just Accepted.
  • Wu and Biljecki [2022] Abraham Noah Wu and Filip Biljecki. 2022. GANmapper: geographical data translation. International Journal of Geographical Information Science 36, 7 (2022), 1394–1422.
  • Wu and Biljecki [2023] Abraham Noah Wu and Filip Biljecki. 2023. InstantCITY: Synthesising morphologically accurate geospatial data for urban form analysis, transfer, and quality control. ISPRS Journal of Photogrammetry and Remote Sensing 195 (2023), 90–104.
  • Wu et al. [2019] Wenming Wu, Xiao-Ming Fu, Rui Tang, Yuhan Wang, Yu-Hao Qi, and Ligang Liu. 2019. Data-driven interior plan generation for residential buildings. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1–12.
  • Wu et al. [2023] Wan-Ben Wu, Jun Ma, Ellen Banzhaf, Michael E Meadows, Zhao-Wu Yu, Feng-Xiang Guo, Dhritiraj Sengupta, Xing-Xing Cai, and Bin Zhao. 2023. A first Chinese building height estimate at 10 m resolution (CNBH-10 m) using multi-source earth observations and machine learning. Remote Sensing of Environment 291 (2023), 113578.
  • Xu et al. [2021] Linning Xu, Yuanbo Xiangli, Anyi Rao, Nanxuan Zhao, Bo Dai, Ziwei Liu, and Dahua Lin. 2021. BlockPlanner: city block generation with vectorized graph representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5077–5086.
  • Yan et al. [2019] Xiongfeng Yan, Tinghua Ai, Min Yang, and Hongmei Yin. 2019. A graph convolutional neural network for classification of building patterns using spatial vector data. ISPRS journal of photogrammetry and remote sensing 150 (2019), 259–273.
  • Yang et al. [2023] Lehao Yang, Long Li, Qihao Chen, Jiling Zhang, Tian Feng, and Wei Zhang. 2023. Street Layout Design via Conditional Adversarial Learning. arXiv preprint arXiv:2305.08186 (2023).
  • Zhang et al. [2023] Junyi Zhang, Jiaqi Guo, Shizhao Sun, Jian-Guang Lou, and Dongmei Zhang. 2023. Layoutdiffusion: Improving graphic layout generation by discrete diffusion probabilistic models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7226–7236.
  • Zheng et al. [2019] Xinru Zheng, Xiaotian Qiao, Ying Cao, and Rynson WH Lau. 2019. Content-aware generative modeling of graphic design layouts. ACM Transactions on Graphics (TOG) 38, 4 (2019), 1–15.
  • Zheng et al. [2023] Yu Zheng, Hongyuan Su, Jingtao Ding, Depeng Jin, and Yong Li. 2023. Road planning for slums via deep reinforcement learning. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 5695–5706.
  • Zhu et al. [2019] Yaochen Zhu, Zhenzhong Chen, and Feng Wu. 2019. Multimodal deep denoise framework for affective video content analysis. In Proceedings of the 27th ACM international conference on multimedia. 130–138.

Appendix A APPENDIX OVERVIEW

  • Section B introduces the key-value pairs used in OSM tags for extracting raw OSM data.

  • Section C describes the algorithm developed to find the Geometric Minimal Cycle in a graph.

  • Section D presents additional statistical results from the RoBus dataset.

  • Section E provides further examples from the RoBus dataset.

Appendix B Data collection

Roads plays a pivotal role in our dataset. We utilize road data sourced from OpenStreetMap (OSM) 333https://download.geofabrik.de/osm-data-in-gis-formats-free.pdf to acquire road segments and corresponding labels in OSM standards. Specifically, we extract key-value pairs assigned to each "way". The key used to identify ways as roads is "highway", and the associated value specifies the type of roads. In our dataset, roads are divided into primary and secondary roads to support tasks like graded road networks generation (e.g., generating secondary roads based on the primary roads in a city). We conduct the categorization based on the values delineated in Tab. 4. For instance, to construct Road-P in our dataset, we extract roads in OSM which have the tag of highway:motorway, highway:trunk, or highway:primary.

Table 4: Extracting the following key-value pairs in OSM to our two road classes.
Road-P Road-S
highway motorway secondary
trunck tertiary
primary residential
unclassified

Similarly, we also extract water bodies and greenery spaces according to corresponding key-value pairs in OSM. To extract water bodies, we select the key ’water’ with values ’reservoir’ and ’river’. Additionally, we select the key ’natural’ with values ’water’, ’wetland’, ’glacier’ and the key ’leisure’ with the value ’nature reserve’. We also select the key ’waterway’ with values "riverbank", "dock", "canal", "drain", "ditch", "stream", "brook", "wadi", and "drystream".

To extract greenery spaces, we mainly select the key ’landuse’ for values ’forest’, ’farmland’, ’allotments’, ’meadow’, "scrub", and "grass". To complete, we also select the key ’natural’ with the value ’wood’ and the key ’leisure’ with the value ’garden’.

Appendix C Geometric Minimal Cycle

We employ automated partitioning of boundaries by road networks to generate building layouts at the block scale. A key challenge in this process is the efficient identification of geometric minimal cycles, which are defined as the cycles in a graph with geometric coordinates that does not enclose any other cycles. The algorithm designed to find geometric minimal cycles is shown in Algorithm 1 and Fig. 9. Specifically, We begin by removing all vertices with a degree of one from the graph. Next, For each node with a degree of two, we identify simple paths connecting the node to its two immediate neighbors. Subsequently, we analyze all identified simple paths to determine the shortest cycles. We repeat the above steps iteratively until no more vertices remain in the graph.

Refer to caption
Figure 9: An Example of Finding Geometric Minimal Cycle.
Input: Graph G𝐺Gitalic_G with vertices set V𝑉Vitalic_V and edges set E𝐸Eitalic_E, the depth to stop search cutoff𝑐𝑢𝑡𝑜𝑓𝑓cutoffitalic_c italic_u italic_t italic_o italic_f italic_f
Output: The set C𝐶Citalic_C of geometric minimal cycles
while G𝐺Gitalic_G is not None do
       // Remove all vertices with degree 1
       foreach vertex vV𝑣𝑉v\in Vitalic_v ∈ italic_V do
             if deg(v𝑣vitalic_v) == 1 then
                   G,V,E𝐺𝑉𝐸G,V,Eitalic_G , italic_V , italic_E = RemoveVertex(v,G,V,E𝑣𝐺𝑉𝐸v,G,V,Eitalic_v , italic_G , italic_V , italic_E);
                  
             end if
            
       end foreach
      
      // Identify simple paths
      
      foreach vV𝑣𝑉v\in Vitalic_v ∈ italic_V and deg(v𝑣vitalic_v) == 2 do
             v1,v2neighbour(v)subscript𝑣1subscript𝑣2𝑛𝑒𝑖𝑔𝑏𝑜𝑢𝑟𝑣v_{1},v_{2}\leftarrow neighbour(v)italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ← italic_n italic_e italic_i italic_g italic_h italic_b italic_o italic_u italic_r ( italic_v ) ;
             // Find simple path from v𝑣vitalic_v to v1subscript𝑣1v_{1}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
             P𝑃absentP\leftarrowitalic_P ← FindSimplePath(v,v1,cutoff𝑣subscript𝑣1𝑐𝑢𝑡𝑜𝑓𝑓v,v_{1},cutoffitalic_v , italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c italic_u italic_t italic_o italic_f italic_f);
            
            // Find geometric minimal cycles
             mincNone𝑚𝑖𝑛𝑐𝑁𝑜𝑛𝑒minc\leftarrow Noneitalic_m italic_i italic_n italic_c ← italic_N italic_o italic_n italic_e ;
             foreach  pP𝑝𝑃p\in Pitalic_p ∈ italic_P  do
                   if v2psubscript𝑣2𝑝v_{2}\in pitalic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_p  then
                         cp+v𝑐𝑝𝑣c\leftarrow p+vitalic_c ← italic_p + italic_v ;
                 ��       if minc is None or len(c) <<< len(minc) then
                               mincc𝑚𝑖𝑛𝑐𝑐minc\leftarrow citalic_m italic_i italic_n italic_c ← italic_c
                         end if
                        
                   end if
                  
             end foreach
            if minc is not None then
                  C𝐶Citalic_C.append(minc𝑚𝑖𝑛𝑐mincitalic_m italic_i italic_n italic_c);
                   G,V,E𝐺𝑉𝐸G,V,Eitalic_G , italic_V , italic_E = RemoveVertex(v,G,V,E𝑣𝐺𝑉𝐸v,G,V,Eitalic_v , italic_G , italic_V , italic_E);
                  
             end if
            
       end foreach
      
end while
Algorithm 1 Find Geometric Minimal Cycles in a Graph

Appendix D Dataset Statics

We provide a more detailed statics of the RoBus dataset in Tab. 5 and Fig 10. As shown in Tab. 5, the RoBus Dataset encompasses a total of 72,400 tiles, with the United States contributing the highest number at 37,429 tiles. Australia’s data in the dataset consists of 4,101 tiles, with roads totaling 39,771 km and an area coverage of 4,493 square kilometers. China’s data in the dataset includes 19,243 tiles, with road lengths of 123,227 km and an area coverage of 21,211 square kilometers. The dataset includes 11,633 tiles from Europe, covering 12,960 square kilometers and including 105,450 km of roads. For the United States, the dataset contains 37,429 tiles that cover 41,278 square kilometers and include 357,496 km of roads. The RoBus Dataset’s selection from different regions around the world ensures substantial diversity, reflecting a wide range of geographical variations.

Table 5: Statistics of the RoBus Dataset by Country.
# of tiles Road Length (km𝑘𝑚kmitalic_k italic_m) Covering (km2𝑘superscript𝑚2km^{2}italic_k italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT)
Australia 4,101 39,771 4,493
China 19,243 123,227 21,211
Europe 11,633 105,450 12,960
United States 37,429 357,496 41,278
Total 72,400 625,944 79,942
Refer to caption
Figure 10: Statics of Building Density and Traffic Convenience.

Appendix E RoBus Examples

We visualize the tif images and graphics such as road topology (the fourth column) and building vectors (the fifth column) in Fig. 11. The visualization of tif images is structured into three columns, each highlighting different aspects of the data: The first column displays images visualizing all channels except the density channel. The images are processed to exclude the density information, providing a clear view of the spatial distribution and characteristics of the other data layers. The second column focuses on visualizing channels that represent water bodies, greenery spaces, and density. The images are crafted to specifically highlight these features, allowing for an immediate visual assessment of environmental and urban planning elements. The visualization in the third column is dedicated to road channels. Here, primary roads (Road-P) are colored in golden, while secondary roads (Road-S) are colored in silver. This color coding not only distinguishes the two types of roads but also aids in understanding their hierarchy and connectivity within the road network.

Additionally, the original files are included in the supplementary directory ’Samples_from_RoBus_Dataset’. For visualization and further analysis, users can utilize Python scripts or GIS software such as QGIS.

Refer to caption
Figure 11: More Examples from RoBus Dataset.