In the previous lesson, you mapped and explored the distribution of temperature measurements in Madison, Wisconsin, on August 8, 2016, at 8:00 p.m. By looking at the points symbolized with a graduated yellow-to-red color range and using selections in the histogram chart, you found strong visual evidence of the urban heat island effect at that date and time. In this lesson, you'll use the Geostatistical Wizard to interpolate the point temperature measurements and create a continuous surface that predicts the temperature at every location in Madison and surrounding areas.
Interpolate temperature using simple kriging
The Geostatistical Wizard is a guided step-by-step environment for building and validating interpolation models. At each step in the model-building process, you'll make important choices that will affect the final temperature map. You can learn more about the Geostatistical Wizard in Get started with Geostatistical Analyst in ArcGIS Pro.
- If necessary, open your project.
- On the ribbon, click the Analysis tab. In the Tools group, click Geostatistical Wizard.
The Geostatistical Wizard opens and shows the available interpolation methods in the left pane and dataset options in the right pane.
- Under Geostatistical methods, choose Kriging/CoKriging.
The right side of the Geostatistical Wizard updates to show applicable Kriging/CoKriging options.
- Under Input Dataset 1, confirm and if needed, set the following parameters:
- Source Dataset: Temperature_Aug_08_8pm
- Data Field: TemperatureF
By choosing Temperature_Aug_08_8pm as the source dataset and TemperatureF as the data field, you specify that you want to perform simple kriging on the temperature measurements. By not providing a second dataset, you'll perform kriging rather than cokriging. You can learn more about cokriging in Understanding cokriging.
- Click Next.
On the second page of the Geostatistical Wizard, you'll specify which type of kriging you want to perform and configure options applicable to that type of kriging.
- In the left pane, under Simple Kriging, confirm that Prediction is checked.
Simple kriging is one of the oldest and most-studied kriging models, and it will serve as a robust baseline for temperature interpolation. Choosing the Prediction option specifies that you want to predict the value of the temperature. Other options allow different types of outputs. You can learn more about the other output options in What output surface types can the interpolation models generate?
- For Dataset #1, change Transformation type to None.
This parameter specifies that you won't perform any transformations.
- Click Next.
The Semivariogram/Covariance Modeling page opens.
- In General Properties, change Function Type to Semivariogram.
This parameter updates the graph from covariance to semivariogram.
The graph on the left now updates to display a semivariogram instead of covariance. The semivariogram is the mathematical backbone of kriging, and fitting a valid semivariogram is almost always the most difficult and time-consuming step in building a kriging model.
The semivariogram can be considered a quantification of Waldo Tobler's First Law of Geography: "Everything is related to everything else, but near things are more related than distant things."
The semivariogram defines exactly how similar the values of the points are given how far apart they are. The x-axis of the semivariogram is the distance between any two data points, and the y-axis is the expected squared difference between the values of the two points. For any two locations on the map, you can use a semivariogram to estimate the similarity in the data values of the two locations. Because near points are more similar than distant points, the semivariogram always increases with distance before eventually becoming flat.
The semivariogram pane is composed of three sections:
- Semivariogram—The graph in the upper left of the pane, containing binned values (red points), averaged values (blue crosses), and the semivariogram model (blue curve).
- General Properties—The parameters in the right pane of the page, used to configure the shape of the blue semivariogram model.
- Semivariogram map—Located on the lower left of the page, used to detect anisotropy. Anisotropy will not be discussed in these lessons.
The semivariogram is configured by three parameters that are found in General Properties:
- Nugget—The value of the semivariogram at the y-axis, which represents the expected squared difference in the value of points that are zero distance apart. While in theory the expected squared difference for these points should be zero, a nugget value greater than zero often occurs due to microscale variation and measurement errors.
- Major Range—The distance where the semivariogram becomes flat. If two points are separated by a distance larger than the major range, the points are considered uncorrelated.
- Partial Sill—The value of the semivariogram at the major range is called the sill. The partial sill is calculated by subtracting the nugget from the sill and represents the expected squared difference in value between points that are spatially uncorrelated. This value provides information about the variance of the underlying spatial process.
The details of the semivariogram parameters do not need to be deeply understood for these lessons. You can learn more about the nugget, range, and sill in Understanding a semivariogram: The range, sill, and nugget.
The goal of the semivariogram page is to configure the parameters in General Properties such that the blue semivariogram passes as closely as possible through the middle of the binned and averaged values in the semivariogram graph.
The binned (red points) and averaged (blue crosses) values in the semivariogram graph are calculated directly from the input points using sectors that are defined by the Lag Size and Number of Lags parameters in General Properties. These averaged and binned values are together called an empirical semivariogram. The semivariogram model (blue curve) is then fitted to this empirical semivariogram using a simple curve-fitting algorithm. This process does not need to be understood for these lessons, but you can learn more about the procedure in Empirical semivariogram and covariance functions.
- In General Properties, change Model #1 to Spherical.
Watch as the blue semivariogram slightly changes after changing the model.
Note:There are many ways to fit a semivariogram to the same binned and averaged points, and every semivariogram model will estimate a different semivariogram for the same binned and averaged points. All semivariogram models will honor the same nugget, range, and sill, but they will have slightly different shapes.
There is a lot of detail packed into the semivariogram page, and it is often difficult even for experienced geostatisticians to determine the appropriate parameters of a semivariogram. For this reason, the Optimize model button was created.
- In General Properties, click the Optimize model button.
The purpose of the Optimize model button is to automate finding a nugget, major range, and partial sill that result in the smallest root mean square cross-validation error (cross-validation will be shown and explained later in this lesson). Because this optimization can sometimes take a long time to calculate, it is not done automatically by default.
After optimizations, the semivariogram and parameters are updated. These are the values that you will use for your first kriging model.
- Click Next.
The wizard updates to display the Searching Neighborhood page, which consists of a preview of the prediction map along with parameters that control the searching neighborhood.
You can click anywhere in the preview surface and see the predicted value at that location in the Identify Result section on the lower right. Alternatively, you can type an x,y coordinate, and the center of the searching circle will move to the specified location.
Each prediction is based on neighboring input points, and this page allows you to control how many neighbors will be used and which direction the neighbors will come from. Because your temperature measurements are evenly spread over the map, the default searching neighborhood does not need to be altered. If the input points were more clustered or unevenly spaced, you would need to account for this in the searching neighborhood.
- In Identify Result, change X to 571000 and Y to 290000. Press Enter between each entry.
The center of the searching circle moves to the specified x,y coordinate in the middle of a hot part of the city.
Have you pinpointed the center of a heat island at this location? No. Heat islands don't really have a center—they tend to spread out across a city.
At this x,y location, Identify Result predicts that the temperature is 83.26 degrees with a standard error of 0.51 degrees. Standard errors quantify the uncertainty in the predicted values. The larger the standard error of the prediction, the higher the uncertainty in the predicted value.
If the predictions are normally distributed, you can construct margins of error for each predicted value based on this rule: Double the standard error and add it to and subtract it from the predicted value to create a 95 percent confidence interval.
- In this location, for example, the lower bound of the 95 percent confidence interval is (83.26 – 2 * 0.51) = 82.24.
- The upper bound of the confidence interval is (83.26 + 2 * 0.51) = 84.28.
Therefore, the best estimate for the temperature at this location is 83.26 degrees Fahrenheit, but you can be 95 percent confident that the true temperature is somewhere between 82.24 and 84.28 degrees Fahrenheit.
- For Identify Result, change X to 572000 and Y to 307000. Press Enter between each entry.
The prediction location moves to the top of the study area in the coldest part of the map. The predicted value for this location is about 75.22 degrees with a standard error of 1.76. At this location, the standard error is much larger. This is because there are fewer temperature measurements toward the top of the map than there are in the city center. This results in larger uncertainty in temperature predictions in areas with fewer measurements.
Next, you'll explore the cross-validation page. The cross-validation page displays various numerical and graphical diagnostics that allow you to assess how well your interpolation model fits your data. Cross-validation is a leave-one-out validation method that sequentially hides each input point and uses all remaining points to predict back to the location of the hidden point. The measured value at the hidden point is then compared to the prediction value from cross-validation; the difference between these two values is called the cross-validation error.
- Click Next to display the cross-validation page.
The logic of cross-validation is that if your interpolation model is accurate and reliable, the remaining points should be able to accurately predict the measured value of the hidden point. If the predictions from cross-validation are close to the measured temperature values, this gives you confidence that your model can accurately predict temperature values at new locations.
- Review the Summary panel on the right side of the cross-validation page.
The summary is useful for quickly assessing the overall accuracy and reliability of the model. Each summary statistic provides different information about the model.
Diagnostic Value Significance
The number of input points.
Mean—The average of the cross-validation errors
This provides a measure of bias. A biased model is one that tends to predict values that are either too high or too low on average. If the model is unbiased, this value should be close to zero.
Root-Mean-Square—The square root of the mean squared error
This RMS measures how close the predicted values are to the measured values on average. The smaller the value, the more accurate the predictions.
Mean Standardized—A standardized version of the mean error
A value close to zero indicates that the model is unbiased. Because this value is standardized, it can be compared between different models that use different data and units.
Root-Mean-Square Standardized—A standardized version of the root mean square
This value quantifies the reliability of the standard errors of prediction. This value should be close to one. Significant deviation from one indicates that the standard errors of prediction are not accurate. It is standardized, so it can be compared between different models.
Average Standard Error—The average of the standard errors at the input point locations
This value should be close to the root mean square. If this value significantly deviates from the root mean square, this indicates that the standard errors may not be accurate.
Overall, these statistics are adequate to justify the accuracy of your kriging model.
- The Mean statistic indicates that on average the temperature predictions are 0.14 degrees too high, which is a small amount of bias and should not be concerning.
- The Root-Mean-Square statistic indicates that on average the predictions differed from the measured values by a little less than two degrees.
- Because the Root-Mean-Square Standardized statistic is larger than one, this indicates that the standard errors are being slightly underestimated.
- On the graphical diagnostics pane, click the Predicted tab to select it, if necessary.
The Predicted graph displays a scatterplot of the cross-validation predictions (x) versus measured values (y) for each input point. In addition, a blue regression line is fitted to the data and a gray reference line is used to compare the blue regression line to the ideal. If your interpolation model is valid, the predictions should be approximately equal to the measured values, so the regression line would follow a 45-degree angle.
In your graph, the blue regression line follows the reference line very closely, which gives you further confidence in the accuracy of your model.
- Click the Error tab.
Notice in the Error graph, your blue regression line is decreasing. This indicates that the interpolation model is smoothing the data, meaning that large values are being underpredicted, and smaller values are being overpredicted. Some degree of smoothing occurs in almost every geostatistical model, and in this result, smoothing is not severe.
- Click the Normal QQ Plot tab to display the distribution of standardized errors versus the equivalent quantiles from the standard normal distribution.
In the Normal QQ Plot graph, if the red dots fall close to the gray reference line, it indicates that the predictions follow a normal distribution. In your graph, the red points do generally fall close to the reference line, but there are some deviations, especially for the points on the upper right part of the graph. While interpreting QQ plots is not an exact science, your graph indicates that you are justified in assuming that the predictions follow a normal distribution.
- Click Finish.
The final page of the wizard is the Method Report page, which displays all the parameters and settings that were used for the interpolation.
- On the Method Report page, click OK.
The Geostatistical Wizard closes and a layer named Kriging, showing predicted temperature values, is added to the Contents pane of your map.
Explore the Kriging layer on the map
In the previous section, you used the Geostatistical Wizard to interpolate the temperature measurements using simple kriging. You finished by creating a geostatistical layer of your kriging results. Geostatistical layers are custom layers that are only created and analyzed in the Geostatistical Analyst extension. They allow fast visualization and analysis, and they can be exported to raster or feature formats. In this section, you'll explore your geostatistical layer on the map.
- In the Contents pane, uncheck the Temperature_Aug_08_8pm layer.
- Expand the Kriging layer legend to review the symbology used to indicate warmer and cooler interpolated temperatures.
The urban heat island effect is clear just from looking at the map. The highest predicted temperatures are in the downtown area of Madison, with temperatures generally in the range of 80 to 84 degrees. Lower predicted temperatures are in the surrounding suburban and rural areas, with temperatures in the range of 73 to 78 degrees.
- On the ribbon, click the Map tab. In the Navigate group, click Explore.
- Click several locations on the map to preview predicted temperatures and the standard error of the prediction. Make sure to click some areas in the middle of the city as well as some locations in the suburban and rural areas outside the city.
As you investigate higher predicted temperature locations in the middle of the city, notice the associated lower standard errors. It is safe to assume that the predicted temperatures are higher due to the urban heat island effect, and the standard errors are lower because there are more temperature measurements in the middle of the city.
- In the Contents pane, for the Kriging layer, collapse the legend and turn the layer off.
- Turn on Temperature_Aug_08_8pm.
- Save the project.
In this lesson, you used the Geostatistical Wizard to create a map predicting the temperature in Madison, Wisconsin, on August 8, 2016, at 8:00 p.m. You started with 139 points measuring the temperature across the city. In the first lesson, you found evidence of the urban heat island effect by exploring the temperature measurements using symbology and the histogram chart. To verify this observation, you used the Geostatistical Wizard to interpolate the temperature measurements using simple kriging. By creating a continuous map predicting the temperature across Madison and surrounding townships, you confirmed that there is nearly a 10-degree difference in temperature between the middle of the city and surrounding rural areas.
In the next lesson, you'll interpolate the temperature measurements again using a newer type of kriging called empirical Bayesian kriging. You'll then compare the results from empirical Bayesian kriging to the results from simple kriging.