Deep learning-based extraction of Kenya’s historical road network from topographic maps

Deep learning-based extraction of Kenya’s historical road network from topographic maps

Collection of Historical Topographic Maps

The topographic maps used in this study were obtained from various sources in Kenya and the UK. All maps were created and published between the 1950s and early 1980s47,48. The majority of maps were collected in Kenya from the Survey of Kenya as the official custodian and authority for Kenyan topographic maps. Additionally, maps were obtained from different local county governments’ survey and urban planning departments. A total of 449 1:50,000 scale maps and 71 1:100,000 scale maps could be collected from Kenyan sources. However, 166 of the 1:50,000 maps and 94 of the 1:100,000 were only available as data frame clips with no further information on the year of publication and source. In addition, some maps were technical drawings rather than fully processed topographic maps and had a notably different standard than the other maps. Nevertheless, these maps also showed road networks, making them suitable for the study’s purposes.

As it was impossible to get all the necessary sheets to achieve complete coverage of the country from the Survey of Kenya, several additional maps were purchased from archives in Great Britain. These British archives contain historical maps originally created and published by the Directorate of Overseas Surveys (DOS) and the War Office, General Staff, Geographical Section. A total of nine maps were purchased from the Cambridge University Library and four maps were used from the Bodleian Libraries of the University of Oxford. Figure 1 provides an overview of all the acquired map sheets. Unfortunately, for an area of 31,400 km2, no suitable maps could be obtained for this study. The publication year of each map sheet is depicted in Fig. 2. All maps used in this study are not accessible online and are subject to the licences of the aforementioned British archives and the Survey of Kenya.

Fig. 1
figure 1

Source and location of the topographic maps used in this study covering the time period between the 1950s and 1980s. Each square represents a single map sheet. Large squares correspond to maps with a scale of 1:100,000, smaller squares to maps with a scale of 1:50,000.

Fig. 2
figure 2

Publication years of the topographic maps used in this study. Each square represents a single map sheet. Large squares correspond to maps with a scale of 1:100,000, smaller squares to maps with a scale of 1:50,000.

Provenance of the Historical Maps

The Survey Department established in Kenya’s colonial capital, Nairobi, in 1903, first began work to register land claims made by prospective white settlers, and to issue deeds to commercial enterprises. Later renamed as the Survey of Kenya, by the 1950s the department had a technical Training Wing and a large African Lands Division devoted to mapping consolidated plots of fragmented African rural land49. The government department still functions today at Ruaraka, and is responsible for geodetic, topographic, photogrammetric and cadastral surveying, and the publication of all official topographic and cadastral maps of Kenya. However, because of government controls, maps produced by the Survey of Kenya were difficult to obtain in Kenya until after 2000, and the public relied predominantly upon the commercially-produced Michelin Motoring Map Series 746 (and its predecessor Series 155)50.

The Survey of Kenya has historically seen its role as servicing the government, rather than the public. From the 1920s onwards, the Survey of Kenya enjoyed a good reputation for ground surveying, but the accuracy of the maps produced came to be reflected in their basis in extensive aerial survey, which began to be utilised in the late 1930s and became the norm after the Second World War51. The war also brought about far greater coordination of map-making across the British Empire, and led directly to the creation by the Colonial Office of the Directorate of Overseas Surveys (DOS). From 1946, this body was responsible for mapping in all British dependencies globally. The DOS contracted wide-ranging aerial surveys of each country, to provide a basis for cartography. Initially, air surveys in Kenya were undertaken by local aviation companies, such as Spartan Air Services (Eastern) Ltd, but by the end of the 1960s the job was being contracted to larger professional consultancy firms, such as Hunting Technical Services52.

The DOS operated independently, under the auspices of the Colonial (and then Foreign & Commonwealth) Office until 1984, when it merged with the British Ordnance Survey (OS). Throughout its existence, the DOS worked closely with the Survey of Kenya, producing annual reports that detail its activities in the former colony49 – these can be consulted in series OS 46 at the British National Archive, Kew53. The involvement of DOS in conducting aerial surveys continued after Kenya’s political independence in 1963. After 1984, DOS became known as the Overseas Surveys Directorate, operating within the UK’s Ordnance Survey department. It continued to function without much change until 1991, when the last significant aid-funded mapping projects in the former British colonies came to an end. At this point, the colonial ties were finally cut, and the name was changed yet again, to Ordnance Survey International (OSI), which now functioned as a commercial consultancy52.

From 1991, there was uncertainty about how, and where, the archive of the DOS and its successor should be housed. Over the next decade, many elements of the archive became scattered in libraries and other depositories around the UK – for example, at the Bodleian Library in Oxford. Only when Ordnance Survey International finally ceased to function in 2001, did responsibility for its records, and for the maintenance of the historical maps, finally pass to the UK National Archive. The collection of maps had been stored as a working collection with the Ordnance Survey in Southampton, but under Ordnance Survey International this was partially moved to the British Empire and Commonwealth Museum in Bristol. When this museum became defunct, the cartography collection eventually (2012) found a new home in the National Collection of Aerial Photographs, under the management of Historic Environment Scotland52,53.

The splintering of the UK overseas survey functions after 1991 coincided with a very difficult period of under-funding and administrative neglect for the Survey of Kenya, when the archiving of their own collection of maps fell into disarray – a situation that has only improved again in very recent years. This fragmented history of institutional change helps to explain why Kenya’s historical maps are so scattered among several incomplete collections. In a final complexity, the correspondence files relating to the creation of each series of maps that should accompany them, are stored separately at The National Archive, Kew, as part of the Overseas Development (OD) series53.

Scanning & Georeferencing

All available analogue maps were scanned into high-resolution raster data for further processing. All digitized maps were then imported into ArcGIS Pro 3.2 and manually georeferenced to the Universal Transverse Mercator (UTM) projection using Arc 1960 as the geodetic datum. As Kenya lies both to the south and the north of the equator and is covered by zones 36 and 37, the maps were projected to the coordinate systems Arc_1960_UTM_Zone_36N, Arc_1960_UTM_Zone_37N, Arc_1960_UTM_Zone_36S, Arc_1960_UTM_Zone_37S depending on their location. As reference grid for georeferencing, a Topographical Index Grid shapefile, produced by the Survey of Kenya for both the 1:50,000 and 1:100,000 scales, has been used. The Index Grid contains the sheet number of the respective topographic map and consists of regular quadrangles fitting to the coordinates of the map frame boundaries. We always used one map sheet per grid tile to avoid any overlap of map sheets. For maps that provided further coordinate information and contained a coordinate grid within the map, further reference points were set across the map to increase positional accuracy. Depending on the number of control points set, a 1st-, 2nd-, or 3rd-order polynomial transformation was used by opting for the one providing the smallest calculated residuals.

Training & Classification

Several methodological steps were necessary to perform the road extraction. The overall workflow is presented in Fig. 3. In the initial preparation step, the maps were sorted into different groups according to their map style, as they greatly differed in terms of appearance and symbology. Overall, the maps contained three different types of roads (Main Road, Secondary Road, Dry Weather Road) according to their symbology. However, since the maps were from various sources, editions and years, a road class was represented by different symbols on different maps. For example, main roads were shown in dark red lines on one map and in black lines on another map of different edition or source. Subsequently, each individual road symbol from each map group has to be trained and classified separately. In total, 20 individual classifications had to be performed to classify all different types of road symbology (Fig. 4).

Fig. 3
figure 3

Schematic workflow conducted in this study to extract historical roads from topographical maps.

Fig. 4
figure 4

Different road symbols that appeared in the used topographic maps and had to be extracted separately. Six road symbols belong to the category “Main Road with Bound Surface” in the historical maps, five symbols to the “Secondary Road with Loose Surface” category and nine symbols are representing the category “Dry Weather Road/Motorable Track”.

Before model training, a ground truth dataset was manually created for each road symbol in ArcGIS Pro over the roads shown in the georeferenced maps. To match the width of the roads on the map, a 20 m buffer was created around all digitized road lines. For the deep learning model training, it is crucial that the buffer width fully encompasses the entire road symbol, necessitating a buffer width that is slightly wider than the symbol itself. In our case, a buffer width of 20 m turned out to be appropriate for all used road symbols. It was important that the set of roads covered be as representative as possible, including all the different types of road representation on the maps, covering horizontal and vertical roads, very curvy and straight roads, and different types of road intersections. Following the initial preparation, the polygons were extracted into image chips along with corresponding labels using the ArcGIS Pro 3.2 tool “Export training data for Deep Learning”. This step ensured that the data adhered to the format required for subsequent deep learning model training. The parameters of the tool were set to automatically produce labels and image tiles of 512 × 512 pixel size with an overlap of 128 × 128 pixels. Setting an overlap of images increases the number of extracted image tiles, but overall reduces information loss during the deep learning training process. However, higher overlaps require more computational power and extend the overall computation time. The chosen value here represents a balance between these competing factors and proved to be the best compromise in our case between achieving high model accuracies and maintaining reasonable computation times. The output was exported in TIFF file format using the Metadata Format “Classified Tiles”. Depending on the available length of each road type, a total of 900 to 8400 image tiles were extracted as training. The export of training data for maps of both scales could be treated equally using the same workflow and values.

In the next step, the exported training data was used to train the deep learning model and perform the classification for all individual road types. This was done in a Python script to automate the process of model training and the classification of the trained road type across a large number of maps where the specific road type occurred54. The ArcGIS Pro deep learning-based “Multi-Task Road Extractor” tool was utilized for model training using a ResNet34 deep learning architecture, which is a convolutional neural network (CNN). The tool was set with “hourglass” architecture, using a U-Net model for image segmentation. An optimal learning rate was calculated before model training. A maximum of 50 learning epochs was set, with optional early stopping when model accuracy plateaued. To test the accuracy of each model, a split of the exported training data in 90% training and 10% test data was done within each model training by the “Multi-Task Road Extractor” tool and the mean Intersection over Union (mIoU) was calculated for each trained model. All trained models in this study had a mIoU of 0.992 or higher indicating a high performance of the trained deep learning models. The successfully trained model was then used to classify the trained road type on all maps using the ArcGIS Pro “Classify Pixels Using Deep Learning” tool, resulting in binary classifications of all individual roads across all available maps.

Postprocessing

Subsequently, the extracted roads were transformed from a raster dataset to editable polygon vector data allowing further dataset processing. To address small misclassifications outside of roads and fill small holes within the classified roads, the polygons were simplified with the tool “Simplify Polygon” and then converted into a line shapefile using the “Polygon to Centerline” tool. The objective was to create a seamless dataset from all classified pieces of roads by merging all classified road segments and assigning the road class to each individual road as an attribute. The resulting set of roads encompassed three distinct road classes based on the road symbology in the original maps, main roads with bound surface, secondary roads with loose surface and dry weather roads that are suitable for motorable off-road vehicles under favourable weather conditions.

To improve the overall classification accuracy, several post-processing steps were conducted. This involved the manual refinement of any remaining misclassifications and adding any road segments that were initially missed by the classifier. Furthermore, topological checks have been done to identify over- or undershooting lines to create a seamless dataset. To improve the topological integrity of the dataset, the ArcGIS Pro tools “Trim Line” and “Extend Line” have been used to automatically eliminate the majority of topological issues and to create connected intersections. Additionally, the dataset has been manually revised to eliminate remaining issues.

Validation

To assess the accuracy of the classified roads, several evaluation metrics were computed. First, the overall accuracy was calculated from confusion matrices of the portion of correctly and falsely classified road and non-road areas. Consequently, the common evaluation metrics precision (often also referred to as correctness) and recall (often also referred to as completeness) scores were calculated. Precision is a widely used metric in machine learning evaluation that quantifies the proportion of true positive predictions out of the total number of positive predictions made by the model. Essentially, it measures the model’s ability to avoid false-positive predictions. Recall is the ratio of correctly predicted positive instances to the total number of actual positive instances. In simpler terms, recall measures how effectively the model can detect positive instances, even if it occasionally misclassified some negative instances as positive. Both metrics can be calculated as follows:

$$\,{\rm{Precision}}=\frac{{\rm{True\; Positive}}}{{\rm{True\; Positive}}+{\rm{False\; Positive}}}$$

(1)

$$\,{\rm{Recall}}=\frac{{\rm{True\; Positive}}}{{\rm{True\; Positive}}+{\rm{False\; Negative}}}$$

(2)

By combining the scores of precision and recall, the F1 score is a widely used metric for assessing the performance of classification models. It is calculated by computing the harmonic mean of precision and recall using the following formula:

$$\,{\rm{F1}}=2\times \frac{{\rm{Precision}}\times {\rm{Recall}}}{{\rm{Precision}}+{\rm{Recall}}}$$

(3)

Additionally, we have calculated the intersection over Union score that is beneficial to analyse the overlapping level of the classified road segments. It divides the overlapping area between the classified and ground truth dataset by the total area covered by both datasets and is calculated using the following formula:

$$\,{\rm{IoU}}=\frac{{\rm{Area\; of\; Intersection}}}{{\rm{Area\; of\; Union}}}$$

(4)

All described metrics yield values ranging between 0 and 1. A value of 1 indicates perfect accuracy, while a value of 0 signifies an overall poor classification result. These metrics were calculated for the classified roads without further preprocessing on seven representative map sheets that are significantly varying in terms of map appearance and the depicted landscape. As a ground truth dataset, the road network of these map sheets has been drawn manually to perfectly match the displayed roads on the map. This ground truth road network was then compared with the uncorrected classification result, which was obtained from the classification process without any subsequent manual improvements. Since both datasets were created using the same maps, these accuracy values evaluate the classification accuracy only without considering the positional accuracy of the extracted roads. To consider the average width of roads shown on the maps, both datasets were compared using a calculated buffer of 20 m. The selected map sheets and their respective locations are depicted in Fig. 5.

Fig. 5
figure 5

Overview and location of the seven maps (labelled with their sheet number) and the created ground truth data used for accuracy analysis.

Additionally, we have assessed the positional accuracy of the extracted road dataset to account positional inaccuracies introduced by the maps themselves or during the georeferencing process. The assessment has been conducted by comparing spatial deviations of road intersections that already existed in the historical dataset and appear to be largely unchanged in a recent road dataset of Kenya. As reference data, we used a current road dataset provided by the Kenyan Roads Board55. This dataset is not openly accessible, but was provided to us by the KRB on request. To assess the spatial accuracy, 100 reference points were identified on the maps with a scale of 1:50,000 and 44 points were used on the 1:100,000 scale maps. Based on the deviations that we determined at these points between the intersections in both datasets, an RMSE and an MAD was calculated for each map scale.

link