In the previous lesson, you learned how to use the Geostatistical Wizard to interpolate temperature measurements in Madison, Wisconsin, on August 8, 2016, at 8:00 p.m. You first used a classical interpolation method called simple kriging. You then learned to use a more modern and robust method called empirical Bayesian kriging (EBK) that provided moderately more accurate predictions using fewer parameters and settings. In this lesson, you will learn how to incorporate explanatory variables into the interpolation using EBK Regression Prediction.
An explanatory variable (sometimes called a covariate) is any dataset that is related to the variable you are investigating and can be incorporated into a model to improve its accuracy or reliability. As the name implies, EBK Regression Prediction is a regression-kriging method that is a hybrid of EBK and linear regression. EBK Regression Prediction allows you to use explanatory variable rasters that you know are related to the variable you are interpolating.
For these temperature measurements, you will incorporate the locations of impervious surfaces into the interpolation. Impervious surfaces are important contributors to urban heat islands because these surfaces (usually buildings and other manmade structures) trap the heat in the middle of dense cities and prevent it from diffusing into surrounding rural areas.
A deep understanding of regression is not required to complete this lesson, but a little background will be helpful. Both kriging and regression make predictions by explicitly separating an estimate of the average value and an estimate of the error:
Prediction = Average + Error
In regression, the average component of the prediction is estimated with a weighted sum of explanatory variables, and the error component is assumed to be random noise. In this sense, all of the predictive power in regression comes from the average component, and the error component is just noise that you want to minimize.
In kriging, however, the predictive power comes from the error component, and the average is equal to the average of the measured values of all the input points (or some other specified constant). The error component is estimated by the semivariogram and the values of the neighboring points. If the values of the neighbors tend to be above the average value of all input points, the error component will be positive, and the prediction will be larger than the average value of all the points. Conversely, if the values of the neighbors are below the average, the error component will be negative, and the prediction will be lower than the average.
At their mathematical cores, regression operates only on the average component and kriging operates only on the error component. Regression-kriging, however, operates on both components at the same time. It simultaneously estimates the average using linear regression and the error component using EBK. Because both kriging and regression are special cases of regression-kriging, EBK Regression Prediction has higher predictive power than either kriging or regression individually.
Due to the computational cost of the simulations in EBK and EBK Regression Prediction, many mathematical operations are optimized for different processors. Depending on the hardware of your computer, you may get slightly different results in this section. These differences can be as large as 1 percent in some cases.
Incorporate an Impervious Surface layer from the Living Atlas
In this section, you'll add a raster layer from the ArcGIS Living Atlas of the World and extract the Impervious Surface values within your study area. This layer comes from the National Land Cover Database (NLCD) and the value of each cell represents the proportion of the cell that is impervious to water as a result of development.
- If necessary, open your project.
- On the ribbon, on the Map tab, in the Layer group, click Add Data.
- In the Add Data window, expand Portal and click Living Atlas.
- In the search box, type Impervious and press Enter.
- In the search results, locate and choose USA NLCD Impervious Surface 2011.
- Click OK to add the layer to your map.
It may take a few minutes for the layer to load.
The USA NLCD Impervious Surface 2011 layer covers the entire continental United States, but your study area covers the extent of the Madison, Wisconsin, area. As a result, you'll create a subset of the source data to the extent of your study area by using the Extract By Mask geoprocessing tool.
- On the ribbon, on the Analysis tab, in the Geoprocessing group, click Tools.
The Geoprocessing pane opens.
- In the Geoprocessing pane search box, type extract.
- In the search results, click Extract by Mask
- In the Extract by Mask tool, set the following parameters:
- For Input raster, choose USA NLCD Impervious Surface 2011.
- For Input raster or feature mask data, choose Block_Groups.
- For Output raster, type Impervious_Surfaces.
The output raster will be saved in the default geodatabase of the project.
In addition to extracting Impervious Surface values within your study area, you also want to update the coordinate system to the same projection as the rest of your data and additionally resample the source data to a more suitable cell size of 100 meters. These changes will allow faster calculations later in the lesson.
- In the Geoprocessing pane, click the Environments tab and change the following parameters:
- For Output Coordinate System, choose Block_Groups.
- For Cell Size, type 100.
The output coordinate system for the output is now set the same as the Block_Groups layer, which is NAD_1983_2011_Wisconsin_TM, and the output cell size is set to resample to 100 meters.
- Click Run.
You don't need USA NLCD Impervious Surface 2011 any more, so you will remove it.
- In the Contents pane, right-click USA NLCD Impervious Surface 2011 and choose Remove.
Your raster layer Impervious_Surfaces is a subset of the USA NLCD Impervious Surface 2011 layer and contains extracted values covering the extent of the Block_Groups layer that are resampled to 100-meter cell size in the correct projection needed for your analysis.
- Using the Explore tool, zoom to the city center.
The highest percentage of impervious surfaces are in the middle of the city and along transportation corridors, and fewer impervious surfaces are located in suburban and rural areas surrounding the city, which generally have higher percentages of vegetation and open space.
There are also no impervious surface values covering the lakes. As a result, EBK Regression Prediction will not make temperature predictions across lakes. This is desirable because all your source temperature measurements were taken on the land, and are thus unlikely to reliably predict temperature over the lakes. Temperature variation across water is driven by different factors than land temperatures.
Create a scatter plot of temperature and impervious surfaces
You have strong reason to believe that impervious surfaces are related to and contribute to urban heat, but you need to quantify this assumption. To visualize the relationship, you'll extract the values of the Impervious_Surfaces layer and add these values to the temperature layer, and then visualize the relationship using a scatter plot.
- In the Geoprocessing pane, click Back twice to get back to the search box.
- In the search box, type extract values. In the search results, click Extract Values to Points.
- Set the following parameters:
- For Input point features, choose Temperature_Aug_08_8pm.
- For Input raster, choose Impervious_Surfaces.
- For Output point features, type Impervious_Points.
- Click Run.
The Impervious_Points layer is added to the Contents pane of the map. This layer is identical to the Temperature_Aug_08_8pm layer except that source points have a new field named RASTERVALU appended. This attribute represents the impervious surface value extracted from the Impervious_Surfaces raster layer for each point location.
- In the Contents pane, for Impervious_Points, right-click the point symbol and change the symbol color to green.
- In the Contents pane, right-click Impervious_Points, point to Create Chart, and choose Scatter Plot.
- If necessary, click the Properties button in the chart area to open the Chart Properties pane. In the Chart Properties pane, set the following parameters:
- For X-axis number, choose TemperatureF.
- For Y-axis number, choose RASTERVALU.
The chart updates to display the scatter plot and is titled Relationship between TemperatureF and RASTERVALU.
The scatter plot shows a clear positive relationship between the measured temperature (TemperatureF) and the percentage of impervious surfaces (RASTERVALU). In addition, the relationship appears to be roughly linear, as the trend line appears to pass through the middle of the points. The higher the percentage of impervious surfaces, the higher the temperature. This linear relationship between the variables is important because linear regression rests on this assumption.
- When you are done exploring the Relationship between TemperatureF and RASTERVALU scatter plot, close both the chart and chart properties panes.
- In the Contents pane, remove the Impervious_Points layer.
You only needed this layer to review the scatter plot.
- Check off the Impervious_Surfaces layer.
Interpolate temperature using the EBK Regression Prediction tool
In the previous section, you verified that impervious surfaces are an important explanatory variable for predicting temperature in Madison, Wisconsin. In this section, you'll use the EBK Regression Prediction geoprocessing tool to interpolate the temperature measurements using the impervious surfaces as an explanatory variable. You'll then compare the cross-validation results from EBK Regression Prediction to the previous two kriging models and apply meaningful symbology to your results.
EBK Regression Prediction can be executed from both the Geostatistical Wizard and a geoprocessing tool. The primary advantage of using a geoprocessing tool is the ability to incorporate the tool in a model or script for automation and documentation of a workflow, while using the Geostatistical Wizard is an excellent way to explore data and test various interpolation techniques and parameters before committing to one specific choice.
- In the Geoprocessing pane, click the Back button. In the search box, type EBK.
- In the search results, click EBK Regression Prediction.
- In the EBK Regression Prediction tool, set the following parameters:
- For Input dependent variable features, choose Temperature_Aug_08_8pm.
- For Dependent variable field, choose TemperatureF.
- For Input explanatory variable rasters, choose Impervious_Surfaces.
- For Output prediction raster, type Temperature_Prediction.
- In the tool, expand Additional Model Parameters. For Maximum number of points in each local model, type 50.
This parameter specifies that each subset will have 50 points, which matches the values used in EBK in the previous lesson.
- Click Environments. For Extent, choose Block_Groups.
The control updates to As Specified Below and the output minimum and maximum extent values are updated to match the minimum and maximum extent of the Block Group layer.
- Click Run.
It may take several minutes for the tool to execute and the resultant layer will be added to the contents pane upon completion.
Two layers, named EBKRegressionPrediction1 and Temperature_Prediction, are added to the Contents pane.
- In the Contents pane, turn off Temperature_Prediction.
The EBKRegressionPrediction1 layer shows the same interpolation pattern of urban heat as both simple kriging and EBK , but it clearly has a lot more precision. The contours are more refined, and the temperature values change over much shorter distances, indicating a higher degree of accuracy. No interpolation has occurred over the lakes, and as a result, we see a more realistic temperature map, which once again needs quantitative verification using cross-validation.
- In the Contents pane, right-click EBKRegressionPrediction1 and choose Cross Validation to display a cross-validation window.
This window is identical to the final page of the Geostatistical Wizard and allows the exploration of the geostatistical layers results. Summary statistics are organized on the right and graphical diagnostics on the left.
The following table compares summary statistics for this EBK Regression Prediction as well as for the EBK and simple kriging you completed in previous lessons:
You may notice slight variations due to rounding.
Summary statistic Simple kriging EBK EBK Regression Prediction
Inside 90 Percent Interval
Inside 95 Percent Interval
Average Standard Error
- For EBK Regression Prediction, the Average CRPS value is about 20 percent lower than EBK, and the Root-Mean-Square value is about 25 percent lower than EBK. These are both strong indications that EBK Regression Prediction is more accurate than EBK or simple kriging.
- The smaller Mean and Mean Standardized values also show that EBK Regression Prediction has the lowest level of bias, and the Average Standard Error value is closely aligned with the Root-Mean-Square value.
- There is some evidence that the standard errors are being slightly overestimated because the Root-Mean-Square Standardized value is less than one, and the Inside 90 Percent Prediction Intervals and Inside 95 Percent Prediction Intervals contain a slightly different percentage of points than they are expected to (91.971 and 93.431 percent, respectively), but the standard errors look accurate overall.
Based on these statistics, EBK Regression Prediction is clearly the most accurate and reliable of the three kriging models.
- Confirm that the Predicted tab is active in the graphical diagnostics pane.
In the Predicted graph, the regression line (blue) is almost perfectly aligned with the reference line (gray). There is a lot of variability in the points around the regression line, but this graph should give you further confidence in the accuracy of the model.
- Click the Error tab.
Like the two models before, the regression line in the Error graph is trending down. This indicates some smoothing in the model, but once again, the smoothing is not severe.
- Click the Normal QQ Plot tab.
Points in the Normal QQ Plot graph fall closer to the reference line than in either of the previous two models. Even the largest values fall very close to the line. There is some minor deviation from the line for the smallest values, but you can safely assume that the predictions follow a normal distribution based on this graph.
Based on the numerical and graphical cross-validation diagnostics, you now have strong evidence that the EBK Regression Prediction model provides the most accurate predictions of the three models you have used in these lessons. This is the model that will serve as your recommended procedure for interpolating temperature in Madison, Wisconsin.
Now that you have decided on using the EBK Regression Prediction model, you'll apply attractive and meaningful symbology to the Temperature_Prediction raster.
- Close the Cross validation window.
- In the Contents pane, turn off EBKRegressionPrediction1. Turn on Temperature_Prediction.
You will now apply more meaningful symbology to Temperature_Prediction by importing a custom stretch renderer from an existing layer file.
- In the Contents pane, right-click Temperature_Prediction and choose Symbology.
- In the Symbology pane, click the Menu button and choose Import.
- On the Import Symbology dialog box, browse to the location where you extracted the downloaded project in the first lesson, double-click analyze-urban-heat-using-kriging, and choose EBKRP_Symbology.lyrx.
The EBKRP_Symbology.lyrx file contains predefined symbolization methods and properties suitable for the Temperature_Prediction layer.
- Close the Symbology pane.
The layer is symbolized with a stretched color scheme ranging from 73 degrees Fahrenheit in the lightest shade of yellow to 86 degrees in the darkest shade of red. This color ramp matches the one that was used for temperature measurement points in the Temperature_Aug_08_8pm layer.
The urban heat effect is obvious just by viewing the layer. The hottest temperatures are in the middle of the city, and the coldest temperatures are in the surrounding rural areas. However, by including the impervious surfaces layer, you are getting far greater detail in the predicted surface. In some areas, you can even pick out urban corridors and view how the heat flows between the buildings and along the highways and freeways.
- Pan and zoom around the map to investigate any areas that interest you. Click several locations within the city center and suburban and rural areas to identify predicted temperature.
Estimate the average temperature within each block group
In this section, you'll predict the average temperature within each of the block groups using zonal statistics. You'll then join the predictions to the block groups and apply relevant symbology to visualize average temperatures.
- In the Contents pane, turn off Temperature_Prediction. Turn on Block_Groups.
- In the Geoprocessing pane, click the Back button, and search for zonal statistics. In the search results, click Zonal Statistics as Table.
- In the Zonal Statistics as Table tool, set the following parameters:
- For Input raster or feature zone data, choose Block_Groups.
- For Zone field, choose OBJECTID.
- For Input value raster, choose Temperature_Prediction.
- For Output table, type Mean_Temperature.
- For Statistics type, choose Mean.
Choosing Mean for the statistics type indicates that you want to determine the average of all temperature predictions within a block group.
- Click Run.
The table appears in the Contents pane, under the Standalone Tables section. It contains 269 records, one for each of the 269 block groups in the study area. In the table, the OBJECTID field identifies individual block groups and the Mean field contains the average predicted temperature within each block group.
Next you'll join the Mean_Temperature table to the block groups in order to add the Mean field values to each individual block group polygon.
- In the Geoprocessing pane, click the Back button, and search for Add Join. In the search results, click Add Join.
- In the Add Join tool, set the following parameters:
- For Layer Name or Table View, choose Block_Groups.
- For Input Join Field, choose OBJECTID.
- For Join Table, choose Mean_Temperature.
- For Output Join Field, choose OBJECTID.
- Click Run.
Attribute fields from the Mean_Temperature table are now joined to block groups using the OBJECTID to identify each unique block group.
- In the Contents pane, right-click Block_Groups and choose Attribute Table.
- In the Block_Groups attribute table, scroll to the far right and confirm that the Mean field has been appended to the table.
This field contains the average predicted temperature for each block group.
- Close the Block_Groups attribute table.
Next, you'll symbolize the block groups by the predicted average temperature and apply symbology from an imported layer file.
- In the Geoprocessing pane, click the Back button, and search for Apply Symbology. Open the Apply Symbology From Layer tool.
- In the Apply Symbology From Layer tool, set the following parameters:
- For Input Layer, choose Block_Groups.
- For Symbology Layer, browse to the location where you extracted the downloaded project, double-click analyze-urban-heat-using-kriging, and choose BG_temperature.lyrx.
- Under Symbology Fields, for Type, verify that the value is Value field.
- For Source Field, verify that the value is Mean_Temperature.MEAN.
- For Target Field, verify that the value is MEAN.
- Click Run.The block group symbology updates to show each block group polygon shaded by the average predicted temperature within that block group. The color range used is the same as the original Temperature_Aug_08_8pm layer. The average temperature follows the same patterns as the prediction raster: the hottest block groups are located in and around the center of the city, and the coldest block groups are in the surrounding suburban and rural areas.
- Open the pop-ups for several block groups that show high mean temperatures.
Identify block groups with high numbers of vulnerable residents
In the previous section, you used zonal statistics to predict the average temperature within each of the block groups. In this section, you'll use a query to identify any block groups that have both high average temperatures and a high density of residents over the age of 65. Elderly residents over 65 are most susceptible to heat-related illnesses, so priority for remedial measures should be given to areas of Madison that have the highest numbers of these at-risk residents. You'll build a query expression to select all block groups where the mean temperature is greater than 81 and the density of residents 65 years of age or older is greater than 100,000.
- In the Geoprocessing pane, search for Select Layer.
- In the search results, click Select Layer by Attribute.
Query expressions use the following syntax:
Field name + Operator + Value or Field
- In the Select Layer by Attribute tool, set the following parameters:
- For Input Rows, choose Block_Groups.
- For Selection type, choose New selection.
- In the Expression group, click New expression.
- Create the expression Mean is Greater Than 81. You may need to remove values after the decimal.
- Click Add Clause.
Expressions can include additional clauses or conditions that are connected to the original clause using a connector such as And or Or. Connectors indicate whether one or both clauses need to be true to select a feature.
- In the Expression group, click Add Clause to add a second clause to your query.
- Create the expression And DensityOver65 is Greater Than 100000.
- Click Enter.
Verify your expressions and make adjustments if necessary.
This expression selects block groups with an average temperature above 81 degrees Fahrenheit and a density of residents over the age of 65 that is greater than 100,000 people per square kilometer.
- Click Run.
- Close the Geoprocessing pane.
Five block groups are selected based on your criteria. They are all located in downtown areas and areas along transportation corridors, and they represent the areas of the city where there is high potential for heat-related illnesses in the vulnerable population. In an emergency, these are the areas that should be prioritized by health care authorities.
As a final check, you'll create a scatter plot of the average temperature versus the density of elderly residents to visualize the overall relationship.
- In the Contents pane, right-click Block_Groups, point to Create Chart, and choose Scatter Plot. If necessary, click Properties in the chart area to open the Chart Properties pane.
- In Chart Properties, for X-axis number, choose Mean.
- For Y-axis number, choose DensityOver65.
The scatter plot updates to show the relationship between average temperature and density of elderly residents. The five selected block groups remain selected in the scatter plot and indicate occurrences where the average temperature is above 81 degrees Fahrenheit and the density of residents over the age of 65 is above 100,000.
There appears to be no relationship between average temperature and density of elderly residents. The trend line is very flat with a slightly negative slope, and the scatter plot does not show any marked patterns. This is good news, because it means that elderly residents over 65 do not tend to live in the hottest parts of Madison, Wisconsin.
- In the Relationship between MEAN and DensityOver65 chart, click the single point located at the top of the graph.
The selected block group has a high density of elderly residents (over 700,000) and falls in the middle of the temperature range (around 80 degrees Fahrenheit).
Because this block group has such a high density of elderly residents, the temperature of the block group should be closely monitored by the emergency managers of Madison, Wisconsin. Fortunately, on August 8 at 8:00 p.m., this block group did not experience higher temperatures compared to the rest of Madison.
- Save the map.
Share your work
You've completed your analysis of temperature in Madison, Wisconsin, for August 8, 2016, at 8:00 p.m. You've developed a workflow for identifying block groups with high numbers of at-risk individuals and performed several different types of kriging. After comparing their results, you applied attractive and meaningful symbology. All you need to do now is to identify an efficient and suitable way to deliver your results to authorities and the public.
ArcGIS offers several ways for you to share your findings, each appropriate for different audiences. The traditional, static approach is to create a layout that can be printed or exported to a PDF or an image file. For a more dispersed audience, you could consider a more dynamic approach and share results online in the form of a web package, web layer, or web map.
Printed maps are still popular and offer a more accessible way to share results with many users. It is also possible to export a map to various image formats, such as PNG or JPEG, that can be embedded into in a presentation for use by those who do not have access to GIS software. Maps can also be exported to a PDF file that users can interact with by turning layers on and off.
Printed maps, PDF files, and images are generally the result of creating a map layout. A map layout allows you to communicate your map's message to users, so depending on the purpose, you'll need to make decisions based on the audience and the goal of the map.
When designing a layout, take note of the following elements:
- Page size
- Landscape or portrait orientation
- Operational layers
- Group layers
- Coordinate system
The addition of map elements further help to communicate the message of your map to your audience and may include many of the following elements:
- Map frame
- North arrow
- Scale bar
- Overview or reference map
- Supporting text (author, information about data, date)
- Coordinate grids
When sharing dynamic content, options include publishing layers, maps, data, and projects in the form of various package types or as a web layer or web map. Users can access shared content directly throughArcGIS Proor through ArcGIS Online. Packages are intended for sharing projects between ArcGIS Pro users, while web layers and web maps can be seen by a broader audience over the Internet.
If you choose to share parts of your ArcGIS Pro project or the entire project, you can create a package. Packages include layer packages, map packages, or project packages. Packages can be saved locally, or they can be shared on ArcGIS Online so that users can download your maps and data. When other users access a package that you have shared, they can unpack it locally and edit and modify the local copy of the shared map, layer, or project packages.
- Layer packages contain layer properties and the source data referenced by the layer.
- Map packages contain layer properties for each layer in the map, and the source data referenced by all layers.
- Project packages contain layer properties, maps, layouts, referenced data, models, toolboxes, geodatabases, and all other associated project elements.
When packaging a layer, you perform the following steps:
- Select whether to upload to a file or to your ArcGIS Online account.
- Provide a name for the package.
- Give the package an item description.
- Provide tags.
- Set sharing options.
- Analyze the package and correct any errors.
- Share the package.
A web layer is like a feature layer in ArcGIS Pro, but it is hosted online instead of stored locally on a computer. Web layers are used for map visualization and can be edited and queried. Web layers can be created from any feature layers you have in an ArcGIS Pro project.
A web map is an interactive collection of map layers that can used to create maps for visualization, editing, querying, and analysis. Web maps always contain one basemap and additional supporting operational layers. Web maps are often used for generating apps such as Story Maps.
When sharing a web layer, you perform the following steps:
- Provide a name for the web layer.
- Select features or tiles to share.
- Supply the web layer with an item description.
- Provide tags.
- Set sharing options.
- Analyze the web layer and correct any errors.
- Share the web layer.
For examples and instructions on how to create some of these forms of outputs, you can look at the following lessons: Get Started with ArcGIS Online walks you through creating a web app. Cartographic Creations in ArcGIS Pro shows a detailed, professional layout view, with explanatory text and elements. If you want to combine a web map with storytelling, look at Get Started with Story Maps to learn how to create a high-quality and highly accessible story map.
In these lessons, you learned how to develop a workflow to access interpolation procedures for analyzing urban heat in Madison, Wisconsin. By exploring the temperature measurements on the map and performing interpolation, you verified the presence of a suspected urban heat island in downtown Madison.
To make a temperature map for all of Madison, you first interpolated the data using simple kriging, one of the oldest and most researched geostatistical methods. This resulted in a scientifically and statistically defensible baseline for the interpolation. Once this baseline was established, you improved the results of the interpolation by using empirical Bayesian kriging. By using locally simulated semivariograms, you improved the accuracy and stability of the interpolated temperatures. Using a scatter plot chart, you then determined that the locations of impervious surfaces were highly related to temperature, and you incorporated this information into the interpolation using EBK Regression Prediction. This resulted in a 25 percent reduction in the Root-Mean-Square cross-validation error compared to EBK.
You competed the workflow by querying and locating census block groups in Madison that have the highest average temperature and the highest density of residents over the age of 65, who are at highest risk for heat-related illnesses.
Using selections, you identified five block groups with an average temperature above 81 degrees Fahrenheit and a population density of residents over the age of 65 above 100,000 people per square kilometer. A scatter plot chart revealed that the population density of elderly residents does not seem to be correlated with temperature. This was a desirable result because if elderly residents tended to live in the hottest parts of the city, that would pose extra challenges for emergency managers and health care providers when trying to mitigate the effects of extreme heat events.
The urban heat island effect is present in virtually every major city in the world, and the workflow you developed in these lessons can be used to analyze other cities and other dates. During the creation of these lessons, various potential explanatory rasters were investigated, including elevation, distance to industry, distance to open spaces, population density, and canopy cover. These variables did not significantly improve the interpolation results for Madison, Wisconsin, on August 8, 2016, at 8:00 p.m., but any of these (and many more) could prove useful for interpolating temperature in other urban settings. You are encouraged to attempt to repeat these exercises using temperature data from different cities on different days. You may find that different explanatory variables are useful for different locations and dates, and you should try to find the variables that work best for your data.
You can find more lessons in the Learn ArcGIS Lesson Gallery.