Prepare for deep learning analysis

In the first part of this tutorial, you will set up the ArcGIS Pro project, choose a deep learning pretrained model, prepare imagery to better match the model, and understand the need for transfer learning.

Set up the project

To get started, you'll download a project that contains all the data for this tutorial and open it in ArcGIS Pro. You’ll then add imagery to the project map.

Download the Seattle_Building_Detection.zip file and locate the downloaded file on your computer.
Note:
Most web browsers download files to your computer's Downloads folder by default.
Right-click the Seattle_Building_Detection.zip file and extract it to a location on your computer, such as a folder on your C: drive.
Open the extracted Seattle_Building_Detection folder, and double-click Seattle_Building_Detection.aprx to open the project in ArcGIS Pro.
If prompted, sign in to your ArcGIS organizational account or in to ArcGIS Enterprise using a named user account.
Note:
If you don't have access to ArcGIS Pro or an ArcGIS organizational account, see options for software access.
The project opens.
The map contains only the default topographic basemap. In this workflow, you’ll use aerial imagery to detect buildings. You will now add that imagery to the map.
On the ribbon, click the View tab. In the Windows group, click Catalog Pane.
The Catalog pane appears.
In the Catalog pane, expand Folders, Seattle_Building_Detection, and Imagery_data.
Right-click Seattle_imagery.jp2 and choose Add To Current Map.
If prompted to calculate statistics, click Yes.
Statistics are required to perform certain tasks on the imagery, such as rendering it with a stretch. The imagery appears on the map. It represents an area of Seattle.
Note:
This aerial imagery comes from the U.S. National Agriculture Imagery Program (NAIP) website. NAIP imagery covering the entire United States can be downloaded from the USGS Earth Explorer website.
Zoom in and pan to examine the imagery. Observe that there are many buildings in this image.

Choose a pretrained model and inspect it

You want to use deep learning to extract buildings from the aerial imagery. If you don't already have a deep learning model available, this first requires training a model from scratch, feeding it large numbers of examples to show the model what a building is. High-performing models can require being exposed to tens of thousands of examples. An alternative is to use a model that was already trained for you. You will retrieve such a model and learn about its specifications.

Note:

Using the deep learning tools in ArcGIS Pro requires that you have the correct deep learning libraries installed on your computer. If you do not have these files installed, save your project, close ArcGIS Pro, and follow the steps delineated in the Get ready for deep learning in ArcGIS Pro instructions. In these instructions, you will also learn how to check whether your computer hardware and software are able to run deep learning workflows and other useful tips. Once done, you can reopen your project and continue with the tutorial.

Go to the ArcGIS Living Atlas of the World website.
In the search box, type Pretrained model and press Enter.
Browse the list of results to see the more than 70 pretrained models available.
In the search box, type Building Footprint Extraction and press Enter.
The list of results contains pretrained deep learning models for different regions of the world. Since your area of interest is in the United States, you will choose the model trained on that area.
In the list of results, click Building Footprint Extraction – USA.
The description page for the model appears. It contains a lot of relevant information about the model. The most important is to understand what type of input the model is expecting. If your input data is not similar enough to the type of data the model was trained on, the model will not perform well.
Take some time to read the content of that page. Review most particularly the section shown in the following example image:
You learn several facts about the model:
- Input—As input, the model expects 8-bit, 3-band high-resolution (10-40 cm) imagery. To know whether your data matches these specifications, you will need to investigate further. You will do that a bit further in the tutorial.
- Output—The model will produce a feature class containing building footprints. Getting building footprint polygons as an output is exactly what you are looking for.
- Applicable geographies—This model should work well in the United States. This is perfect since your area of interest is in the United States.
- Model architecture—The model uses the MaskRCNN model architecture. You should make a note of that information, as you will need it later in the workflow.
Since the model seems quite promising for your project, you will download it.
Under Overview, click Download.
After a few moments, the download is complete.
Locate the downloaded file, usa_building_footprints.dlpk, on your computer.
Tip:
Most web browsers download files to your computer's Downloads folder by default.
Create a folder named Pretrained_model in your Seattle_Building_Detection folder.
Move the usa_building_footprints.dlpk model file from your download location to the Pretrained_model folder.

Examine imagery properties

You will now investigate to understand how well your data matches the ideal 8-bit, 3-band, high-resolution (10-40 cm) imagery input.

Go back to your Seattle_Building_Detection project in ArcGIS Pro.
In the Contents pane, right-click Seattle_imagery.jp2 and choose Properties.
In the Layer Properties window, click Source and expand Raster Information.
Find the Number of Bands field.
Its value is 4. The NAIP program collects multispectral imagery composed of four spectral bands: red, green, blue, and near infrared. The near infrared band is often used for visualizing vegetation health. The model expects three bands instead (red, green, and blue). You will need to remedy this difference.
Find the Cell Size X and Cell Size Y fields.
The value is 1 in both cases. This means that each cell (or pixel) in the imagery measures 1 by 1 meter. This NAIP image was indeed captured at a 1-meter resolution. This is a lower resolution than the higher 10-40 cm resolution recommended by the model. You will also need to remedy this issue.
Find the Pixel Depth field.
Its value is 8 Bit, which matches the 8 Bit requested by the model.
Click OK to close the Layer Properties window.
You will learn of another way to visualize the number of bands.
In the Contents pane, right-click Seattle_imagery.jp2, and choose Symbology.
In the Symbology pane, for Red, click Band_1 to expand the drop-down list.
Four bands are listed. When viewing a multispectral image, only three bands can be displayed at a given time, through the red, green, and blue channels, combining the three bands selected into an RGB composite. However, you can see that four bands are present in the image and can be used for various analysis purposes.
Close the Symbology pane.

You have found that there is a mismatch on two criteria between your imagery and the pretrained model’s expectations: the number of bands and the resolution. You will learn how to remedy these two issues later in this workflow.

Select relevant imagery bands

You’ll now fix the band mismatch. Your imagery has four spectral bands:

Band 1—red
Band 2—green
Band 3—blue
Band 4—near infrared

And the model expects a three-band input (red, green, blue). To remedy that, you need to produce a new layer that contains only the first three bands of the NAIP imagery so that it better matches the model's expectations. This is an important step; if skipped, the model will underperform.

Note:

It is crucial to know the precise band order in your imagery. For instance, some other types of imagery might have their bands in a different order: band 1—blue, band 2—green, and band 3—red. You can find that information either in the properties of your imagery or in its documentation.

You will produce the new three-band layer using a raster function.

On the ribbon, on the Imagery tab, in the Analysis group, click the Raster Functions button.
In the Raster Functions pane, in the search box, type Extract Bands. Under Data Management, click Extract Bands.
Set the following Extract Bands parameter values:
- For Raster, choose Seattle_imagery.jp2.
- For Combination, verify that the value is 1 2 3, referencing the Bands 1 (red), 2 (green), and 3 (blue).
- For Missing Band Action, choose Fail.
Missing Band Action specifies the action that will occur if one of the bands listed is not available. Fail means that the raster function will abort and fail. You are choosing this option since it is imperative for all three bands to be present to complete this tutorial successfully.
Click Create new layer.
A new layer, named Extract Bands_Seattle_imagery.jp2, appears in the Contents pane. Layers created by raster functions are computed dynamically and not saved on disk. In this case, you want to persist the resulting layer as a TIFF file on your computer. You will do that with Export Raster.
Right-click Extract Bands_Seattle_imagery.jp2, and choose Data and Export Raster.
In the Export Raster pane, for Output Raster Dataset, click the Browse button.
In the Output Location window, browse to Folders > Seattle Building Detection > Imagery Data, for Name, type Seattle_RGB.tif, and click Save.
In the Output Location window, accept all other default values and click Export.
Note:
If your imagery is 16 bit, this Export Raster step would be a good time to convert to the 8-bit depth expected by the model. For Pixel Type, choose 8 Bit Unsigned and check the Scale Pixel Value box. Scale Pixel Value ensures that the values will be truly converted to the 8-bit scale (instead of the high values being dropped). For NoData value, enter the NoData value of your original image, for instance 0.
To find that NoData value, in the Contents pane, right-click the original image, choose Properties, and browse to Source > Raster Information > NoData Value.
The new Seattle_RGB.tif layer appears in the Contents pane.
Close the Export Raster pane.
You will now verify the number of bands.
In the Contents pane, right-click Seattle_RGB.tif, and choose Properties.
In the Layer Properties window, click Source and expand Raster Information.
Find the Number of Bands field.
The field value is 3, confirming that the layer now has three bands, just as the pretrained model expects.
Close the Layer Properties window.
You will now remove the imagery layers that you won’t need in the rest of the workflow.
In the Contents pane, right-click Extract Bands_Seattle_imagery.jp2 and choose Remove.
Similarly, remove Seattle_imagery.jp2.
You will save your project.
On the Quick Access Toolbar, click the Save button.

You now have a three-band imagery layer, as the pretrained model expects.

Understand the need for transfer learning

You now must fix the resolution mismatch, the model expecting a higher 10-40 cm resolution and the NAIP imagery having been captured at a lower 1-meter resolution. If you were to apply the Building Footprint Extraction – USA pretrained model to the Seattle_RGB.tif layer directly, you would get poor results, as you can see in the following example image:

Result of pretrained model applied directly

In that image, the buildings detected are shown in pink. Because of the resolution mismatch, the model could detect the larger buildings, but struggled to identify any of the smaller ones.

Note:

For an example of workflow where you use a pretrained model directly and successfully, see the Detect objects with a deep learning pretrained model tutorial.

One approach to remedy this issue is to use transfer learning. Transfer learning is a technique in machine learning in which knowledge learned from a task is reused to boost performance on a related task. Here, the original task was to detect buildings in 10-40 cm resolution imagery, and the new task is to detect buildings in 1-meter resolution imagery.

Note:

Transfer learning can be used for reasons other than an imagery resolution mismatch. For instance, starting from a model trained to detect buildings in a specific country, you could use transfer learning to have the model learn to detect buildings in another country.

A major advantage of transfer learning is that it requires a relatively small amount of training data and short training time compared to what would be needed to train a model from scratch.

Note:

There is a limit to what transfer learning can do if the mismatch between your imagery and the expected input is too extreme. For example, if you had 30-meter resolution satellite imagery, where you can barely see the smaller buildings, it is unrealistic to think that the model could be fine-tuned to be successful on that imagery. The more dissimilar the new task is from the original one, the less effective transfer learning will be.

Caution:

Transfer learning doesn’t work on all deep learning pretrained models. For instance, models relying on SAM and DeepForest don’t support transfer learning. You can review the description of the pretrained model on the ArcGIS Living Atlas website to see whether it relies on SAM or DeepForest.

In the rest of the tutorial, you will learn how to perform transfer learning to fine-tune the pretrained model to perform better on your data.

Prepare training samples for transfer learning

To perform transfer learning, you first need to produce training examples to show the model what a building looks like in your data. If you were training a model from scratch, you would need tens of thousands of building samples. Thankfully, with transfer learning, you only need a few hundred. In this part of the tutorial, you will learn to produce the training samples. First, you’ll create an empty feature class in which to store the samples. Then you’ll draw polygons representing buildings and add them to the feature class. Finally, you’ll export the feature class and the imagery into training chips used for transfer learning.

Create a feature class

First, you’ll create a feature class.

On the ribbon, on the View tab, in the Windows group, click Geoprocessing.
The Geoprocessing pane appears.
In the Geoprocessing pane, in the search box, type Create feature class. In the list of results, click the Create Feature Class tool to open it.
Set the following parameter values:
- For Feature Class Name, type Training_examples.
- For Geometry Type, verify that Polygon is selected.
- For Coordinate System, choose Seattle_RGB.tif.
Accept all the other default values and click Run.
In the Contents pane, the new Training_examples feature class appears. It is currently empty.

Draw training examples

You will now trace building footprints that will be saved as polygon features in the Training_examples layer.

On the ribbon, on the Edit tab, in the Features group, click Create.
The Create Features pane appears.
In the Create Features pane, click Training_examples, and click the Polygon button.
The construction toolbar appears on the map. By default, it is set to the Line mode, which draws straight lines.
On the construction toolbar, click the Right Angle Line button.
The Right Angle Line mode constrains all lines to be straight and all angles to be right angles. This is helpful when drawing building footprints, since most buildings have 90-degree corners. You can switch between this mode and the Line mode as needed during the drawing process.
On the ribbon, on the Map tab, in the Navigate group, click Bookmarks, and select Labeling extent.
This is the area where you’ll start drawing polygons to delineate buildings. This process is also called labeling, since you are telling the model where the objects of interest are in the image.
Note:
When deciding where to create the training examples in your image, choose an area that has typical buildings for your geographic location.
On the map, choose a specific building, and click one of its corners (or vertices).
Click each of its corners clockwise.
On the last corner, double-click to complete the polygon.
Note:
The color for the feature class (here light green) is assigned at random and might be different in your project.
Similarly, create two or three more polygons in the same area.
Tip:
If you don’t like a polygon you created, you can delete it. On the ribbon, on the Edit tab, in the Selection group, click Select. On the map, click the polygon. On the Edit tab, in the Features group, click Delete.
You’ll save the polygon features to the feature class.
On the construction toolbar, click the Finish button.
On the ribbon, on the Edit tab, in the Manage Edits group, click Save.
Close the Create Features pane.
In a real-life project, you would need to delineate 200 or 300 more buildings. However, for the brevity of this tutorial, you will use a set of about 200 training samples that were prepared for you.
At the bottom of the Geoprocessing pane, click Catalog to return to that pane.
In the Catalog pane, expand Databases and Output_provided.gdb.
Right-click Training_examples_larger_set and choose Add To Current Map.
The set of training samples appears.
Observe that a rectangular extent was chosen and polygons were created for every building in the extent. You’ll remove the Training_examples layer, as you no longer need it.
In the Contents pane, right-click the Training_examples layer and choose Remove.
Press Ctrl+S to save the project.

You now have a layer containing over 200 training samples.

Add a class field

Now that you have traced building footprint polygons, you must designate them all as belonging to a specific class. In some workflows, labeled objects might belong to different classes (or categories), such as building footprints, trees, or cars. In this tutorial, there is only one class: building footprints. You will add a Class field to the Training_examples_larger_set layer and populate it.

In the Contents pane, right-click the Training_examples_larger_set layer and choose Attribute Table.
The attribute table for the layer appears, showing information about each polygon.
In the Training_examples_larger_set attribute table, click Add.
On the Fields: Training_examples_larger_set tab, in the last row of the table, enter the following information:
- For Field Name, type Class.
- For Data Type, click Long and change it to Short.
The Short data type holds integer values.
On the ribbon, on the Fields tab, in the Manage Edits group, click Save.
Close the Fields: Training_examples_larger_set window.
Now that you have created the Class field, you will populate it with a numeric value. You arbitrarily decide that the building footprint class will be represented by the numeric value of 1.
In the Training_examples_larger_set attribute table, click Calculate.
In the Calculate Field window, set the following parameter values:
- For Field Name, choose Class.
- For Class =, type 1.
Accept all other default values and click OK.
In the Class column, verify that the value of 1 has been assigned to each polygon feature.
Thanks to the Class field, the model will know that all the training examples are the same kind of object: building footprints represented by 1’s.
Close the Training_examples_larger_set attribute table.

Learn about training chips and clip the imagery

A deep learning model can’t train over a large area in one pass, it can only handle smaller cutouts of the image, known as chips. A chip is made of an image tile and a corresponding label tile which shows where the objects (in this case, buildings) are located. These chips are fed to the model during the transfer learning training process.

Example of a training chip — A training chip is shown, with its image tile (left) and its corresponding label tile (right).

You will use the Seattle_RGB.tif imagery and the Training_examples layer to generate training chips. One important point is to avoid generating chips that contain unlabeled buildings. Having such chips would be the equivalent of showing buildings to the model, while stating that they are not buildings at all. This would be confusing for the model and hurt its performance. To prevent this, you will create a clip of the imagery that is limited to the extent where the training samples are located.

Example of a training chip where some buildings have not been labeled — An example shows a chip where some buildings have not been labeled. Such chips must be avoided.

At the bottom of the Catalog pane, click Geoprocessing.
In the Geoprocessing pane, click the Back button.
Search for and open the Clip Raster tool.
Set the following Clip Raster parameter values:
- For Input Raster, choose Seattle_RGB.tif.
- For Output Extent, choose Training_examples_larger_set.
- For Output Raster Dataset, click the Browse button. In the Output Raster Dataset window, browse to Folders > Seattle_Building_Detection > Imagery_data, for Name, type Seattle_RGB_clip.tif and click Save.
Click Run.
In the Contents pane, the Seattle_RGB_clip.tif layer appears.
In the Contents pane, click the box next to Seattle_RGB.tif to turn the layer off.
On the map, you now see only the clipped layer and the training samples. All the buildings that appear in the imagery have a corresponding building polygon.

Generate training chips

You will now generate the training chips. First, you’ll create a folder in which to store the data elements related to the transfer learning process.

Click the Catalog tab to switch panes.
If necessary, expand Folders and Seattle_Building_Detection.
Right-click Seattle_Building_Detection, point to New, and choose Folder.
For the New Folder name, type Transfer_learning_data and press Enter.
Click the Geoprocessing tab to switch panes.
In the Geoprocessing pane, click the Back button.
Search for and open the Export Training Data For Deep Learning tool.
Set the following parameter values for the Export Training Data For Deep Learning tool:
- For Input Raster, choose Seattle_RGB_clip.tif.
- For Output Folder, click the Browse button. In the Output Folder window, browse to Folders > Seattle_Building_Detection > Transfer_learning_data. For Name, type Training_chips, and click OK.
- For Input Feature Class, choose Training_examples_larger_set.
The chips generated from the clipped imagery and training examples will be stored in a folder named Training_chips.
For Class Value Field, choose Class.
As you defined it earlier, the Class field specifies which objects belong to what labels (in this case, all objects belong to the class 1, representing building footprints).
For Tile Size X and Tile Size Y, verify that the value is 256.
These parameters decide the size of the chip in the X and Y directions (in pixels). In this case, the default value of 256 is a good choice.
Note:
You want to make your training chips as similar as possible to the chips that were used to train the original model. The original model was trained on 512 x 512 chips produced from 10-40 cm resolution data. Your NAIP imagery is 1-meter resolution. A 256 x 256 pixels chip at that resolution will cover roughly the same area as a 512 x 512 chip at 40 cm resolution. So, 256 x 256 is a good chip size to choose.
One way to know the chip size that was originally used in the pretrained model is to look inside the dlpk package. In Microsoft File Explorer, make a copy of the usa_building_footprints.dlpk file to a separate folder and change its extension from .dlpk to .zip. Right-click the .zip file and extract it. Among the extracted files, locate usa_building_footprints.emd and change its extension to .txt. Open usa_building_footprints.txt in a text editor, and look for the lines "ImageHeight" and "ImageWidth".
For Stride X and Stride Y, type 64.
This parameter controls the distance to move in the X and Y direction (in pixels) when creating the next image chips. This value is decided by how much training data you have. You can maximize the number of chips generated by setting this value to be smaller. You can experiment with this value, however, for this tutorial, a value of 64 was found to work well.
For Metadata Format, choose RCNN Masks.
Different deep learning model types require different metadata formats for the chips. Earlier in the workflow, you noted that the pretrained model was based on the MaskRCNN architecture. Here you must choose the value corresponding to that model.
Tip:
To learn more about any of the tool’s parameters, point to the parameter and click the information button next to it.
Accept all other default values and click Run.
After a few moments, the process completes.

Examine training chips

You will examine some of the chips generated.

In the Catalog pane, expand Folders, Seattle_Building_Detection, Transfer_learning_data, and Training_chips.
The image tiles are in the images folder and the label tiles in the labels folder.
Expand the images folder, right-click the first image, 000000000000.tif, and choose Add To Current Map. If you are prompted to calculate statistics, click No.
In the Contents pane, turn off Training_examples_larger_set and Seattle_RGB_clip.tif to better see the tile.
In the Catalog pane, collapse the images folder, expand the labels and 1 folders, and add the first label tile, 000000000000.tif, to the map. If you are prompted to calculate statistics, click No.
Note:
Image and label pairs can be recognized by their identical names.
In the Contents pane, click the label tile on and off to reveal the image tile underneath.
Click some of the label tile pixels to view their values in the informational pop-up.
Note:
On the label tile, the pixels that don’t represent a building have the value 0. All pixels that represent a building have a value greater than 0. The specific values come from the object IDs of the original building polygons, such as 28 on the previous example image.
Optionally, add more image and label tile pairs to the map and examine them.
When done, remove all the tiles from the Contents pane and turn the Training_examples_larger_set and Seattle_RGB.tif layers back on.
In the Catalog pane, collapse the Training_chips folder.
Press Ctrl+S to save your project.

You generated training chips and you are now ready to start the transfer learning process.

Conduct transfer learning and extract buildings

You will now conduct transfer learning. You’ll use the chips you generated to further train the usa_building_footprints.dlpk pretrained model. You’ll then apply the fine-tuned model to your Seattle imagery and observe that it now performs much better.

Fine-tune the model

First, you’ll use the Train Deep Learning Model tool to fine-tune the model.

Switch to the Geoprocessing pane and click the Back button.
In the Geoprocessing pane, search for and open the Train Deep Learning Model tool.
Set the following parameter values for the Train Deep Learning Model tool:
- For Input Training Data, click the Browse button. Browse to Folders > Seattle_Building_Detection > Transfer_learning_data. Select Training_chips and click OK.
- For Output Model, click the Browse button. Browse to Folders > Seattle_Building_Detection > Imagery_data > Transfer_learning_data. Type Seattle_1m_Building_Footprints_model and click OK.
- For Pre-trained Model, click the Browse button. Browse to the folder where you saved the usa_building_footprints.dlpk pretrained model, select it, and click OK
Seattle_1m_Building_Footprints_model will be the name of the new fine-tuned model resulting from the transfer learning process.
Tip:
It is easier to remember what model was trained on which data if you keep each model and its corresponding training chips in the same folder.
Expand the Advanced section and verify that the Freeze Model box is checked.
The Freeze Model option ensures that only the final layer of the model will be impacted by the new training data, while its core layers remain unchanged. This setting is chosen in many transfer learning cases, as it avoids the risk of the model unlearning its core knowledge.
Note:
If you now see an error indicator next to Input Training Data, you do not have the correct version of the Deep Learning Libraries installed. Press Ctrl+S to save your project, close ArcGIS Pro, and follow the instructions to install the deep learning framework for ArcGIS. If you have installed the Deep Learning Libraries before, follow the instructions listed under Upgrading From a Previous Version. When the installation is complete, you can reopen your ArcGIS Pro project and continue with the tutorial.
Expand the Model Parameters section and verify that the Batch Size is set to 4.
Tip:
To learn more about any of the tool’s parameters, click the information button next to it.
In the Geoprocessing pane, click the Environments tab. For Processor Type, choose GPU.
Note:
This tutorial assumes that your computer has GPU capabilities. If you don't have a GPU, you can still do the process with your CPU, but it will take longer to process the data. In that latter case, choose the CPU option.
Accept all other default values and click Run.
The process might take 10 minutes or more to run.
Tip:
If you get an out of memory error, it may be because your computer doesn’t have enough memory to process four tiles at a time. Try decreasing the Batch Size value from 4 to 2 or 1. Decreasing this value will not affect the quality of the model, only the efficiency of the model’s training process.

You now have an enhanced model, Seattle_1m_Building_Footprints, that is fine-tuned to better perform on your data.

Run inference

Now that you have completed transfer learning, you will use your fine-tuned model to run inference on the Seattle_RGB.tif imagery layer and detect the buildings it contains.

In the Geoprocessing pane, click the Back button.
Search for and open the Detect Objects Using Deep Learning tool.
Set the following parameter values for the Detect Objects Using Deep Learning tool:
- For Input Raster, choose Seattle_RGB.tif.
- For Output Detected Objects, type Seattle_buildings.
- For Model Definition, click the Browse button. Browse to your Seattle_Building_Detection folder, expand Transfer_learning_data, and Seattle_1m_Building_Footprints_model, select Seattle_1m_Building_Footprints_model.dlpk and click OK.
As the model definition loads, the model’s arguments fill in automatically.
For padding, verify that the value is 64.
Padding designates a border area in every image chip that will be ignored during detection. If a fragment of a building appears at the edge of an image chip, the padding will ensure that it is not considered for detection. A value of 64 indicates that the padding will be 64 pixels wide on every side of the image chip.
Note:
The model will adjust the stride to match the value of the padding. As the model strides to neighboring areas, a building that appeared as a fragment on the edge of a previous image chip will soon appear in its entirety at the center of one of the next image chips, where it will be successfully detected. Learn more about padding (and other inferencing parameters) in the Deep Learning with ArcGIS Pro Tips & Tricks: Part 2 article, in the Understand parameters for inferencing section.
For batch_size, use the same value as you did for the training process (4 or less).
This will ensure that the tool can run within the amount of memory you have available on your computer.
For threshold, verify that the value is 0.9.
This is a cutoff value between 0 and 1. It expresses how confident the model must be before it declares an object to be a building. The 0.9 value indicates that the model should have a 90 percent confidence.
For tile_size, verify that the value is 256.
This indicates the size of the imagery chips that the model will take in to run inference. This value should be the same as the size of the chips that were used to train the model.
For Non Maximum Suppression, check the box.
When there are overlapping building footprint duplicates, the Non Maximum Suppression option ensures that only the building polygon feature with the highest confidence is kept and the other ones are deleted.
In the Geoprocessing pane, click the Environments tab.
For Processor Type, choose GPU.
At this point, you could run the tool as is: it would proceed to detect buildings over the entire Seattle_RGB.tif image, which could take 10 minutes to 1 hour based on your computer’s specifications. For the brevity of this tutorial, you will only detect buildings in a small subset of the image input.
On the ribbon, on the Map tab, in the Navigate group, click Bookmarks and choose Inference extent.
The map zooms in to a smaller area of Seattle.
In the Geoprocessing pane, on the Environments tab, under Processing Extent, for Extent, choose Current Display Extent.
Click Run.
After a few minutes, the process completes and the Seattle_buildings output layer appears in the Contents pane and on the map. This time, you can see that almost all the buildings were detected.

You successfully detected buildings in an area of Seattle using a pretrained model fine-tuned through transfer learning.

Compare the results

You will now compare two building footprint layers obtained from running the off-the-shelf pretrained model versus the model fine-tuned with transfer learning. In both cases, they show the results for the entire extent of your imagery. While you could generate both layers yourself, using the technique you learned in the previous section, in the interest of time, you’ll use layers that were prepared for you. First, you’ll open a map that contains these layers.

In the Catalog pane, expand Maps. Right-click Full extent results and choose Open.
The map appears. It contains two polygon feature classes:
- Seattle_buildings_off_the_shelf
- Seattle_buildings_with_transfer_learning
You’ll use the Swipe tool to compare the two layers.
In the Contents pane, click Seattle_buildings_off_the_shelf to select it.
On the ribbon, on the Feature Layer tab, in the Compare group, click Swipe.
On the map, use the swipe handle to drag repeatedly from top to bottom or side to side to peel off the top layer and reveal the one below.
Zoom in and out and pan to examine different areas and visually assess the difference in the quality of the results.
Tip:
While in swipe mode, you can zoom in and out with the mouse wheel, and pan by pressing C on the keyboard and dragging with the mouse.
The fine-tuned model does a much better job at identifying the building footprints of smaller buildings in your imagery, compared to the off-the-shelf model. You’ll now use the Swipe tool to compare the results in the transfer learning layer to the buildings you can observe visually in the imagery.
In the Contents pane, turn off the Seattle_buildings_off_the_shelf layer and select the Seattle_buildings_with_transfer_learning layer.
Use the Swipe tool to compare the two layers.
You might notice that the layer resulting from the fine-tuned model is still not perfect and a few buildings are missing here and there. Fine-tuning a model with transfer learning tends to be an iterative process. You could continue to improve your model’s performance by collecting more training examples and conducting another round of transfer learning training. For a quick overview, the steps would be as follows:
- First, observe the type of buildings that were missed by the model.
- Collect new example polygons targeting this type of buildings and generate new training chips, saving them to a new folder. You should follow the same guidelines as previously, clipping the imagery to ensure that no unlabeled buildings are included in the chips.
- Run a new training session, starting with the off-the-shelf pretrained model and feeding it all the chips created so far (that is, for the Input Training Data parameter, you’ll list all your chip folders). This is best practice to ensure that the model treats all the training chips equally.
When you are finished exploring the images, on the ribbon, on the Map tab, in the Navigate group, click the Explore button to exit the swipe mode.
Press Ctrl+S to save your project.

In this tutorial, you used deep learning to extract building footprints from aerial imagery in ArcGIS Pro. You chose a pretrained model from ArcGIS Living Atlas and learned the importance of matching your input data to the model’s expectations. You produced a new imagery layer with the expected number of bands. You then applied transfer learning to remedy a resolution mismatch and fine-tune the model’s performance on your imagery: you provided a small number of new training samples and further trained the model. You then applied the fine-tuned model to a Seattle neighborhood and obtained enhanced results.

Prepare for deep learning analysis Set up the project, choose a pretrained model, and prepare the imagery data to better fit the model.	30 minutes
Prepare training samples for transfer learning Draw training examples and generate training chips.	30 minutes
Conduct transfer learning and extract buildings Run transfer learning on the pretrained model, and apply the improved model over Seattle.	30 minutes

Requirements

Outline

Prepare for deep learning analysis

Prepare training samples for transfer learning

Conduct transfer learning and extract buildings

Prepare for deep learning analysis

Set up the project

Note:

Note:

Note:

Choose a pretrained model and inspect it

Note:

Tip:

Examine imagery properties

Select relevant imagery bands

Note:

Note:

Understand the need for transfer learning

Note:

Note:

Note:

Caution:

Prepare training samples for transfer learning

Create a feature class

Draw training examples

Note:

Note:

Tip:

Add a class field

Learn about training chips and clip the imagery

Generate training chips

Note:

Tip:

Examine training chips

Note:

Note:

Conduct transfer learning and extract buildings

Fine-tune the model

Tip:

Note:

Tip:

Note:

Tip:

Run inference

Note:

Compare the results

Tip:

Acknowledgements

Send Us Feedback

Share and repurpose this tutorial

Ready to learn more?

Related Esri training

Deep Learning Using ArcGIS Pro

Classifying Object Using Deep Learning in ArcGIS Pro

Deep Learning Using ArcGIS Image for ArcGIS Online