Set up the project and examine the data

You'll set up the ArcGIS Pro project and examine the input data. But first, you'll learn some basics about the machine learning workflow you'll use in this tutorial.

Understand the machine learning workflow

The fundamental concept of machine learning is to enable computers to learn from sample data and apply what they have learned to unknown data. One way of doing that is to train a regression model and use it to predict new results. This is the approach you'll take in this tutorial.

You want to predict the aboveground biomass (AGB) throughout several counties in Georgia. You'll need the following data:

Target sample data—This will be a set of known AGB values for sample locations. You'll use point data extracted from a GEDI satellite lidar trajectory dataset, as shown in the following example image.
Explanatory variables—This will be data that can explain the AGB sample values and can then help predict AGB values for new areas. You'll use Landsat 9 multispectral satellite imagery, digital elevation model (DEM) data, and additional derived raster layers. The following example images show the Landsat imagery (left) and DEM raster data (right).

The Landsat 9 multispectral satellite imagery was chosen as explanatory variable because the sensor's spectral characteristics respond to vegetation, which is directly related to biomass. Digital Elevation Model (DEM) captures the topological variability and terrain complexity, which can also be factors that influence vegetation growth.

You will train the model using the target sample data and explanatory variables as input. During the training, the model will capture the relationships between sample values and explanatory variables. Once you are satisfied with the model, you'll use it to predict AGB values throughout the entire Georgia counties extent. This output will be a raster, as shown in the following example image, where the higher AGB values appear in dark green and the lower values in white or light green.

Download and open the project

To get started, you'll download a project that contains the data for this tutorial and you'll open it in ArcGIS Pro.

Download the Estimate_Biomass.zip file and locate the downloaded file on your computer.
Note:
Most web browsers download files to your computer's Downloads folder by default.
The .zip file is 2.9 GB and might take a few minutes to download.
Right-click the Estimate_Biomass.zip file and unzip it to a location on your computer, such as drive C.
Open the extracted Estimate_Biomass folder and double-click Estimate_Biomass.aprx to open the project in ArcGIS Pro.
If prompted, sign in to your ArcGIS organizational account.
Note:
If you don't have access to ArcGIS Pro or an ArcGIS organizational account, see options for software access.
The project opens.
The map displays the study area boundaries as a polygon outlined in orange. This area represents 20 counties in Georgia.

Examine the input data

You'll now examine the rest of the input data provided in the project. First, you'll add the Landsat image to the map.

On the ribbon, click the View tab. In the Windows group, click Catalog Pane.
In the Catalog pane, expand Folders, Estimate_Biomass, and InputData.
Under InputData, expand LC09_L2SP_018038_20221004_20230327_02_T1.
This is a Landsat 9 satellite imagery scene that includes seven spectral bands with surface reflectance values:
- Band 1—Coastal Aerosol
- Band 2—Blue
- Band 3—Green
- Band 4—Red
- Band 5—Near Infrared (NIR)
- Band 6—Short wave infrared (SWIR) 1
- Band 7—Short wave infrared (SWIR) 2
Note:
You can drag to expand the width of the pane to better see the longer file names.
These bands will be used as explanatory variables. You'll now add the Landsat scene to the map.
Right-click LC09_L2SP_018038_20221004_20230327_02_T1_MTL.txt and choose Add To Current Map.
If prompted to calculate statistics, click Yes.
After a few moments, the image appears on the map. You'll rename it to a shorter name.
In the Contents pane, click Surface Reflectance_LC09_L2SP_018038_20221004_20230327_02_T1_MTL to select it, and click it once more to enter edit mode. Change the name to Landsat9 and press Enter.
You'll change the image rendering to natural color, a combination of the red, green, and blue bands, which shows colors close to what the human eye would usually see.
In the Content pane, make sure that Landsat9 is selected.
On the ribbon, click the Raster Layer tab, in the Rendering group, click the Symbology button.
In the Symbology pane, set the following parameter values:
- For Primary symbology, ensure that RGB is selected.
- For Red, choose SRB4
- For Green, choose SRB3
- For Blue, choose SRB2
The image rendering updates to the natural color rendering.
Close the Symbology pane.
Next, you'll add the digital elevation model (DEM) to the map.
In the Catalog pane, in the InputData folder, collapse LC09_L2SP_018038_20221004_20230327_02_T1.
Right-click DEM.tif and choose Add To Current Map.
In the Contents pane, rename the DEM.tif layer to DEM.
Examine the DEM layer on the map.
The DEM provides elevation data. The lighter tones indicate areas with higher elevation and the darker tones areas with lower elevation.
That layer will also be used as an explanatory variable. Next, you'll review the GEDI data.
In the Catalog pane, under InputData, expand the GEDI_L4A folder.
This folder contains eight GEDI files that will be used as the samples with known AGB values, or training targets. Note that these are trajectory HDF5 files: they are not raster files but trajectory data. You will learn how to handle this data and display it on the map later in the workflow.
There are two other data layers in the Content pane. You have already seen the AOI layer, which delineates the overall study area. There is also the Counties layer, which provides the county boundaries. You will turn it on.
In the Contents pane, expand the arrow next to the Counties layer to reveal its legend, and check the box next to the Counties layer to turn it on.
Review the AOI and Counties layers (orange and bright purple) on the map.
You will use these two layers later in the analysis.
Click the boxes next to the Counties, DEM, and Landsat9 layers to turn them off, as you won't need them for the next workflow steps.
On the Quick Access Toolbar, click Save to save your project.

In this part of the workflow, after an overview of the machine learning workflow, you set up the ArcGIS Pro project. You then examined the input data: a seven-band Landsat 9 scene, a DEM raster, GEDI data, and some boundary layers.

Process and extract GEDI data

AGB represents living vegetation above the ground, measured as mass per unit, typically megagram (that is, metric ton) per hectare). Measuring AGB physically on the ground over a large study area is labor intensive and nearly impossible. In contrast, estimating AGB using remote sensing data is a good alternative solution.

GEDI is a satellite lidar mission from NASA that measures the 3D structure of the Earth surface. This includes the forest canopy height and its vertical structure, that is, the stacked-up layers of trees and shrubs that might together amount to more or less biomass. GEDI captures sample points along the sensor's tracks. From those measurements, the aboveground biomass density (AGBD) can be derived, and the GEDI L4A product contains these derived AGBD point values. The following example image shows the GEDI tracks where sample AGBD data was captured, as they intersect in this tutorial's study area.

Such data is delivered as trajectory-structured HDF5 files and can be brought into ArcGIS as a trajectory dataset, a geodatabase data model meant to manage a collection of trajectory files. You will now create a trajectory dataset, add the provided GEDI data to it, and extract the relevant AGBD point data that will be used as training samples later in the workflow.

Create a trajectory dataset

First, you'll create an empty trajectory dataset in the project geodatabase.

In the Catalog pane, expand Databases.
Right-click Estimate_Biomass.gdb, click New, and choose Trajectory Dataset.
In the Geoprocessing pane, the Create Trajectory Dataset tool appears.
For Trajectory Dataset Name, type Gedi.
Accept the other default values and click Run.
The trajectory dataset appears in the Contents pane. It contains Footprint and Point sublayers.
This dataset is currently empty and will act as a container for the GEDI data.

Add GEDI data to the trajectory dataset

You'll now add the GEDI data that was provided for this workflow into the empty trajectory dataset you just created.

Switch back to the Catalog pane.
In the Catalog pane, expand the Estimate_Biomass.gdb geodatabase, right-click Gedi, and choose Add Trajectories.
First, you'll set up the trajectory dataset type and properties.
In the Add Data to Trajectory Dataset pane, for Trajectory Type choose GEDI.
Under Trajectory Type, click the Properties button.
In the Trajectory Type Properties window, click the Trajectory tab.
The GEDI data provided is of the L4A type, so you will set the properties accordingly.
Under Product Filter, choose GEDIL4A.
Under Groundtracks, check the box next to Name to select all tracks.
GEDI data is captured as eight distinct beams, and you want to include them all.
Under Predefined Variables, check the box for the Aboveground Biomass Density variable.
This is the only variable that you are interested in for this dataset.
Click OK to save the properties.
In the Add Data To Trajectory Dataset tool pane, under Input Data, choose Folder, and click the Browse button.
In the Input Data window, expand Folders, Estimate_Biomass, and InputData, click GEDI_L4A, and click OK.
In the Add Data To Trajectory Dataset tool pane, accept all other default values and click Run.
After a few moments, the GEDI data is added to the trajectory dataset and it appears on the map. You will zoom out to see the entire dataset.
In the Contents pane, right-click the Gedi layer, and choose Zoom To Layer.
The green polygons crisscrossing North America represent the footprints of the GEDI sensor's trajectories. These specific trajectories were selected because they intersect on the study area.
In the Contents pane, right-click the Footprint layer, and choose Attribute Table.
The Footprint attribute table appears.
Each row corresponds to one trajectory and contains information about it. For instance, the Count field indicates how many points there are in each trajectory.
Close the Footprint table.
You will now look at the individual points contained in the trajectories.
In the Contents pane, turn on the AOI layer. Right-click the AOI layer and choose Zoom To Layer.
Tip:
If the Gedi trajectory layer doesn't display on the map, zoom out a bit.
Turn off the Footprint layer, and turn on the Point sublayer.
The point layer may take some time to display because it contains hundreds of thousands of points.
Zoom in to an area of your choice until you see the individual points.
Each point contains an AGBD value.

You added GEDI data to a trajectory dataset and you examined it.

Extract the relevant AGBD point data

Only the GEDI points within the study area are relevant to your workflow. You will now extract the points located within the AOI boundary using the Clip tool. The output will be a point feature layer.

In the Geoprocessing pane, click the Back button.
In the Geoprocessing search box, type Clip. In the list of results, click the Clip tool to open it.
In the Clip tool pane, set the following parameters:
- For Input Features or Dataset, choose Point.
- For Clip Features, choose the AOI layer.
- For Output Features or Dataset, type AGBD_observations as the output name.
Click Run.
After a few moments, the AGBD_observations point layer is added to the map. You will examine it in more detail.
In the Contents pane, turn off the Gedi layer, as you won't need it any longer in this workflow.
Right-click the AGBD_observations layer and choose Zoom To Layer.
You can see that the AGBD_observations layer contains only the points within the study area.
In the Contents pane, right-click the AGBD_observations layer, and choose Attribute Table.
The AGBD_observations attribute table appears.
Each row corresponds to a point, and the AGBD field gives the aboveground biomass density value for each point (in metric tons per hectare). In total, there are 106,159 points in this layer.
Close the AGBD_observations attribute table.
Next, you will apply an imported symbology to this layer to visualize it more effectively.
In the Geoprocessing pane, click the Back button.
Search for the Apply Symbology From Layer tool and open it.
In the Apply Symbology From Layer tool, for Input Layer, choose AGBD_observations.
For Symbology Layer, click the Browse button. Browse to Folders > Estimate_Biomass > InputData and choose the AGBD.lyrx layer file.
Click Run.
The map updates.
The AGBD_observations layer now displays with a symbology where the points in dark green tones indicate the highest AGBD values and the points in light yellow color tones indicate the lowest AGBD values. This layer will be used as known samples, or training targets, during the model training.
Press Ctrl+S to save the project.

In this part of the workflow, you created a trajectory dataset and ingested the AGBD variable from a GEDI level 4A trajectory data into it. You then extracted the relevant AGBD points as a feature layer and symbolized it.

Prepare derived explanatory variables

You'll now prepare additional explanatory variables from the initial Landsat 9 scene and DEM raster. Specifically, you will create seven spectral indices derived from the Landsat 9 scene and one aspect raster derived from the DEM.

Generate spectral indices

A spectral index combines different spectral bands through a mathematical formula, usually computing some type of ratio. The resulting output is a new raster image that emphasizes a specific phenomenon, such as vegetation, water, urban development, or moisture. These spectral index layers will provide additional information to account for different vegetation conditions, in turn helping better predict AGB values.

Note:

Learn more about common spectral indices.

You'll create several indices that will serve as additional explanatory variables:

NDVI—Normalized difference vegetation index
EVI—Enhanced vegetation index
PVI—Perpendicular vegetation index
NBR—Normalized burn ratio
NDWI—Normalized difference water index
NDBI—Normalized difference built-up index
MSI—Moisture stress index

You'll start with NDVI, used to differentiate healthy vegetation from unhealthy vegetation or absence of vegetation. You'll use the Band Arithmetic raster function.

In the Contents pane, turn off the AGBD_observations layer.
On the ribbon, on the Imagery tab, in the Analysis group, click the Raster Functions button.
In the Raster Functions pane, in the search box, type Band Arithmetic.
In the list of results, click the Band Arithmetic raster function to open it.
In the Band Arithmetic Properties raster function pane, set the following parameters:
- For Raster, choose Landsat9.
- For Method, choose NDVI.
- For Band Indexes, type 5 4, corresponding to the near infrared and red bands that are needed for the NDVI calculation.
Click the General tab, and for Name, type NDVI.
Click Create new layer.
A new layer named NDVI_Landsat9 is added to the map. The raster in the map contains calculated NDVI values ranging between -1 (absence of vegetation) and 1 (healthy vegetation).
Next, you'll create the remaining spectral index layers—EVI, NBR, PVI, NDWI, and NDBI—following the same steps.

Repeat steps 4 to 7 with the following band settings:


Name/Method	Description (for reference)	Band Indexes	Band names
EVI	Enhanced vegetation index	5 4 2	NIR, red, blue
NBR	Normalized burn ratio (used to identify burn scars)	5 7	NIR, SWIR 2
PVI	Perpendicular vegetation index	5 4 0.3 0.5	NIR, red (and slope and gradient values)
NDWI	Normalized difference water index	5 3	NIR, green
NDBI	Normalized difference built-up index	6 5	SWIR 1, NIR

For MSI (moisture stress index), the Band Arithmetic raster function doesn't include the MSI option under Method. Instead, you'll use the User Defined option to calculate it, spelling out the mathematical formula explicitly: B6 / B5, where the bands are referred to by B + [a band number]. So, this formula means that the SWIR 1 band should be divided by the NIR band.

Repeat steps 4 to 7 to create the MSI layer, using the following parameters:
- For Raster, choose Landsat9.
- For Method, choose User Defined.
- For Band Indexes, type B6 / B5.
- Under General, for Name, type MSI.
At the end of this process, all seven index layers should be added to the map and listed in the Contents pane.

Derive an aspect layer from the DEM

You will now derive an aspect layer from the DEM layer using the Aspect raster function. The aspect indicates the direction that each downhill slope faces (north, south, east, west). It is relevant as an explanatory variable since solar illumination will vary according to the aspect value and this will affect vegetation growth.

In the Raster Functions pane, search for and open the Aspect raster function.
In the Aspect raster function pane, for Raster, choose the DEM layer.
Click Create new layer.
A layer named Aspect_DEM is added to the map.
In the next section, you will use all the explanatory variable layers you created as input to the machine learning model. However, you won't need to see them on your map, so you will now turn them off.
In the Contents pane, turn off all seven spectral index layers and the DEM and Aspect_DEM layers.
Press Ctrl+S to save the project.

In this part of the workflow, you prepared seven layers derived from the Landsat scene and one aspect layer derived from the DEM. These layers will be used as explanatory variables alongside the Landsat scene and the DEM when training the regression model.

Train a regression model and predict biomass density

You have now prepared the target sample data and explanatory variables. Next, you'll use all this data as input to train your regression model and capture the relationships between known AGBD values and explanatory variables. You will then examine the performance of your model, proceed to do some data cleanup, and retrain your model to obtain a higher performance. Then, you'll use the resulting model to predict AGBD values throughout the entire study area. Finally, you'll summarize the results to obtain the average AGBD by county in the study area.

Train a random tree regression model

First, you'll train the model to predict biomass with the Train Random Trees Regression Model tool. Random forest regression is a machine learning approach that operates by constructing a multitude of decision trees at training time.

In the Geoprocessing pane, if necessary, click the Back button.
Note:
If you closed the Geoprocessing tab, you can reopen it by going to the ribbon, to the Analysis tab, in the Geoprocessing group, and clicking Tools.
Search for and open the Train Random Trees Regression Model tool.
You'll define the explanatory variable inputs.
In the Train Random Trees Regression Model tool pane, for Input Rasters, add Landsat9, DEM, and all eight derived explanatory variable layers.
Caution:
You should use the exact same order for these layers now in the Train Random Trees Regression Model tool and later in the Predict Using Regression Model tool.
You'll then point to the AGDB target sample data.
For Target Raster or Points, choose AGBD_observations.
For Target Value Field, choose AGBD.
The resulting output model will be an .ecd file. You'll choose a name for it.
For Output Regression Definition File, click the Browse button.
In the Output Regression Definition File window, browse to Folders > Estimate_Biomass and for Name, type Biomass_model.ecd and click Save.
The output will also include some additional auxiliary files that you can use to understand the model's accuracy. You'll set up their names.
In the Train Random Trees Regression Model tool pane, expand Additional Outputs.
For Output Importance Table, click the Browse button, browse to Folders > Estimate_Biomass and for Name, type Importance.csv.
For Output Scatter Plots, click the Browse button, browse to Folders > Estimate_Biomass and for Name, type Biomass_scatterplots.pdf.
Finally, you will also set up the training option parameters.
Expand Training Options.
For Percent of Samples for Testing, type 5, and accept the other default values.
Note:
The 5 percent value (instead of the default 10) ensures that less data will be set aside for testing and more will remain available for training.
Click Run.
After a couple of minutes, the model training is complete.

Review the model performance

To understand the model performance, you will now review the outputs from the Train Random Trees Regression Model tool. Machine learning workflows are often iterative. You must decide if the model is performing optimally or whether cleaning up some of the input data could improve its performance. In that latter case, you will need to retrain the model using the cleaned-up data.

First, you will look at the content of the Importance.csv table, which shows how each explanatory variable contributed more or less to predict the target sample values. You'll create a bar chart to summarize that information.

In the Contents pane, under Standalone Tables, right-click the Importance.csv table layer, click Create Chart and choose Bar Chart.
An Importance.csv chart pane and a Chart Properties pane appear.
In the Chart Properties pane, set the following parameters:
- For Category or Date, choose Explanatory_Variables.
- For Aggregation, choose <none>.
- Under Numeric field(s), click Select, check the Importance field, and click Apply.
In the Importance.cvs chart pane, the Importance by Explanatory_Variable chart appears.
You can observe that the Landsat spectral bands, especially SWIR 1 (Landsat9_6) and near infrared (Landsat9_5) play important roles in explaining (or predicting) the biomass values. Additionally, several band indices make substantial contributions, especially MSI_Landsat9, PVI_Landsat9, and NDBI_Landsat9. On the other hand, the DEM and Aspect_DEM layers contribute the least, which make sense, since this study area is mostly flat terrain. However, in other extents with more elevation variation, the importance of the elevation data would probably be higher. Next, you'll review the scatterplots document.

Note:
The Random Trees algorithm is not deterministic, so the results you obtain may vary slightly.
Close the Importance.cvs chart pane.
In File Explorer, browse to the Estimate_Biomass folder, and double-click the Biomass_scatterplot.pdf file to open it.
In the PDF, the first scatterplot shows for each sample point used in training:
- The original known value (x axis).
- The predicted value, after the training is complete (y axis).
The R² value, ranging from 0-1, serves as an indicator of the model's performance. An R² value of 0.834 for the training performance is acceptable. However, while most values are concentrated under 1,000, you can observe some extremely high values scattered from a bit under 1,000 to over 4,000.
You suspect that these points might be erroneous outliers that degrade the model's learning performance. To decide whether you should keep these extreme points or remove them from the training data, you will review them on the map. First, you will look at a histogram chart for the AGBD_observations layer to choose a more precise threshold for the outlier points.
Close the PDF and switch back to ArcGIS Pro.
In the Contents pane, right-click the AGBD_observations layer and choose Attribute Table.
In the attribute table, right-click the AGBD field, and choose Visualize Statistics.
The statistics for the AGBD field appear in a histogram chart named Distribution of AGBD.
The histogram shows the distribution of the AGBD_observations point features across all possible AGBD values. You can see that most of the points have AGBD values that are less than 700, with only a few points having values greater than 1,000. You will choose 1,000 as the threshold to define outlier points.
You will now modify the display on the map to make the exploration of the high-value points easier.
In the Contents pane, drag the Landsat9 layer to position it just above Aspect_DEM, and turn on the AGBD_observations and Landsat9 layers.
Right-click the AGBD_observations layer and choose Symbology.
In the Symbology pane, for Primary symbology, choose Single Symbol.
Note:
The color of the symbol may vary.
This symbology will make it easier to see the points you select on the map.
Tip:
You can shrink the size of the chart pane to increase the size of the map.
You will now select the high-value AGBD points.
In the Contents pane, ensure that the AGBD_observations layer is selected.
On the ribbon, on the Map tab, in the Selection group, click Select By Attributes.
In the Select By Attributes window, under Expression, form the expression Where AGBD is greater than 1000.
Click OK.
About 40 points are selected, they appear in cyan blue on the map.
You will now review a few of these points individually.
Click the AGBD_observations tab, and click the Show Selected Records button at the bottom of the pane.
Only the selected features are now listed in the table.
Double-click the row header for the first feature.
On the map, the point appears highlighted in yellow.
Zoom in until you can see the imagery details underneath.
The point falls on some type of not-so-dense grass field, which should not have an AGBD value above 1,000. In contrast, you can see that neighboring points don't appear in cyan, because they were not selected. This means their AGBD value is under 1,000 and is not abnormally high.
In the attribute table, double-click the row header for the third feature.
That point also falls on some type of grass field, which should not have a value above 1,000. You can see that these high value points are outliers that must be faulty. You will delete them.

Clean up AGBD observations and retrain the model

You'll now delete the high-value outlier points. You'll also delete the points that have a null value, since they are of no use for training. Then, you'll retrain the model.

In the Contents pane, right-click AGBD_observations and choose Zoom To Layer.
On the ribbon, on the Map tab, click the Select By Attributes button.
In the Select By Attributes window, the first clause Where AGBD is greater than 1000 is still present. You will add a second clause to select the features with null values.
In the Select By Attributes window, click the Add Clause button.
For the new clause, form the expression Or AGBD is null and click OK.
In the AGBD_observations attribute table, there are now over 20,000 points selected, between abnormally high values and null values.
On the attribute table toolbar, click the Delete Selection button.
When prompted to confirm that you want to delete the data, click Yes.
You will save these edits.
On the ribbon, on the Edit tab, in the Manage Edits group, click Save.
The selected points are deleted from the AGBD_observations feature class. Next, you will rerun the training tool with the updated data to obtain a higher performing model.
On the ribbon, on the Analysis tab, in the Geoprocessing group, click History.
The History pane appears, it contains the history of all the tools you have run so far in this project.
In the History pane, double-click the Train Random Trees Regression Model entry.
The Train Random Trees Regression Model tool appears, with all the parameter values you used originally.
You will rename the outputs, so that they don't overwrite the original results.
For Output Regression Definition File, rename Biomass_model.ecd to Biomass_model2.ecd.
Expand Additional Outputs, rename Importance.csv to Importance2.csv, and rename Biomass_scatterplots.pdf to Biomass_scatterplots2.pdf.
Click Run.
After a couple of minutes, the model is retrained.
In File Explorer, browse to the Estimate_Biomass folder, and double-click the Biomass_scatterplots2.pdf file to open it.
In the PDF, in the first scatterplot, you can see that the model performance has improved to a R²= 0.888 (up from R²= 0.834 previously). You can also note that all the values in the plot are now lower than 1,000.
You have also obtained better results in the second and third scatterplots found in the PDF, which show the model performance on test points.
Close the PDF and switch back to ArcGIS Pro.

Create biomass prediction

You'll now use the model to predict biomass for the entire study area. You will do that with the Predict Using Regression Model tool. The input will be the same explanatory variables that you used for the model training (seven-band Landsat scene, DEM layer, spectral index layers, and aspect layer).

In the Geoprocessing pane, click the Back button.
Search and open the Predict Using Regression Model tool.
In the Predict Using Regression Model tool pane, for Input Rasters, add Landsat9, DEM, and all eight derived layers in the same order as before.
Caution:
It is important that you use the same order for these layers in the Predict Using Regression Model tool as you did earlier in the Train Random Trees Regression Model tool.
You will now point to the trained model.
For Input Regression Definition File, click the Browse button, browse to Folders > Estimate_Biomass, click Biomass_model2.ecd, and click OK.
Finally, you will name the output.
For Output predicted raster, type Biomass_prediction.crf.
Click Run.
After a few minutes, the resulting layer is added to the layer. You will now change the color scheme.
In the Contents pane, right-click the Biomass_prediction.crf symbol.
In the color scheme drop-down list, check the Show names box, and click the Blue-Green (Continuous) color scheme.
Turn off the AGBD_observations and Landsat9 layers.
Turn off all the derived layers (spectral indices and aspect).
On the map, review the Biomass_prediction.crf layer.
Dark green tones indicate the areas with the highest biomass density, and light or white tones indicate low density or absence of biomass.

Summarize biomass density by county

Finally, you'll compute the biomass density per county. You'll use the Counties polygon layer and the Zonal Statistics as Table tool to find the average biomass density per county and you'll generate a chart to give an overview of your results.

In the Contents pane, turn on the Counties layer.
The county boundaries appear on the map.
In the Geoprocessing pane, click the Back button.
Search for and open the Zonal Statistics as Table tool.
In the Zonal Statistics as Table tool pane, set the following parameters.
- For Input Raster or Feature Zone Data, choose Counties.
- For Zone Field, verify that Name is selected.
- For Input Value Raster, choose Biomass_prediction.crf.
- For Output Table, type Average_biomass_by_county.
- For Statistics Type, choose Mean.
Accept all other default values and click Run.
The Average_biomass_by_county table is added to the Contents pane.
In the Contents pane, under Standalone Tables, right-click the Average_biomass_by_county table, click Create Chart, and choose Bar Chart.
In the Chart Properties pane, on the Data tab, set the following parameters:
- For Category or Date, choose NAME.
- For Aggregation, choose <none>.
- Under Numeric field(s), click Select, check the MEAN field, and click Apply.
- Under Sort, choose Y-axis Descending.
Click the General pane and set the following parameters:
- For Chart title, type Average biomass by county.
- For X axis title, type Counties.
- For Y axis title, type Biomass density (in metric tons per hectare).
In the Average_biomass_by_county chart pane, view the Average biomass by county chart.
From the bar chart, you see that some counties, such as Telfair, Houston, Macon, and Ben Hill, have higher average biomass density. Based on the United States Energy Information Administration report, almost half of the households in Georgia use biomass as a fuel, and 80 percent of that happens in rural areas. Understanding the status of biomass in those rural counties will help the government develop practical policies to mitigate the biomass consumption and protect forest and biodiversity loss.
Note:
You can also join the Biomass_by_county table to the Counties layer, to create a thematic map showing the average biomass by county. To do that, in the Contents pane, right-click Counties, click Joins and Relates, and choose Add Join.
Press Ctrl+S to save the project.

In this tutorial, after setting up the project and examining the data, you prepared a trajectory dataset containing GEDI data and extracted the relevant AGBD point data for the study area. You used raster functions to prepare explanatory variables. You then trained a model to predict biomass density. You examined the performance of the model, proceeded to do some data cleanup, and retrained the model to obtain a higher performance. You used this better performing model to predict biomass density throughout your entire study area. And finally, you summarized the results to obtain the average biomass density per county in the study area.

For the brevity of this workflow, you used a relatively small study area. To apply a similar workflow to large areas that are represented across several Landsat scenes, and include images containing clouds or shadows, it is recommended that you first address cloud and shadow removal and compose these images into an mosaic dataset. Refer to the Python workflow and code-free workflow on creating a cloud-free image composite from satellite imagery. Furthermore, considering that the data used in this tutorial is also accessible from cloud platforms such as AWS or Microsoft Planetary Computer, you can leverage the capabilities of direct data access and cloud-based computing using ArcGIS Pro. To learn more, see the Cloud-Based Aboveground Biomass Mapping using Landsat and GEDI Data article.

You can find more tutorials in the tutorial gallery.

Set up the project and examine the data Download and open the project, examine the provided Landsat imagery, DEM, and GEDI data.	10 minutes
Process and extract GEDI data Create a trajectory dataset, add GEDI data to it, and extract aboveground biomass density (AGBD) point data, to be used as target sample data.	15 minutes
Prepare derived explanatory variables Generate several spectral indices and an aspect raster, to be used as explanatory variables along with the Landsat scene and the DEM.	15 minutes
Train a regression model and predict biomass density Train a random tree regression model using training data and explanatory variables, improve the model performance, use the model to predict biomass density for the entire study area, and summarize the results by county.	20 minutes

Requirements

Outline

Set up the project and examine the data

Process and extract GEDI data

Prepare derived explanatory variables

Train a regression model and predict biomass density

Set up the project and examine the data

Understand the machine learning workflow

Download and open the project

Note:

Note:

Examine the input data

Note:

Process and extract GEDI data

Create a trajectory dataset

Add GEDI data to the trajectory dataset

Tip:

Extract the relevant AGBD point data

Prepare derived explanatory variables

Generate spectral indices

Note:

Derive an aspect layer from the DEM

Train a regression model and predict biomass density

Train a random tree regression model

Note:

Caution:

Note:

Review the model performance

Note:

Note:

Tip:

Clean up AGBD observations and retrain the model

Create biomass prediction

Caution:

Summarize biomass density by county

Note:

Acknowledgements

Send Us Feedback

Share and repurpose this tutorial

Ready to learn more?

Introduction to Imagery and Remote Sensing

Monitor forest change over time

Download imagery from an online database

Get started with multidimensional multispectral imagery

Calculate impervious surfaces from spectral imagery

Related Esri training

Image Classification Using ArcGIS

Change Detection Using Imagery

Managing Raster Data Using ArcGIS