In the previous lesson, you explored the temperature measurements in Madison, Wisconsin, and used the Geostatistical Wizard to create a simple kriging layer predicting the temperature across the entire city, which confirmed the presence of the urban heat island effect. The simple kriging model that you created is a classical kriging model, and it is the exact kind of model that you would expect to find in geostatistical textbooks and published scientific journals. In recent years, however, the rapid increase in computer processing power has led to the development of more sophisticated kriging models that are both more accurate and easier to configure. In this lesson, you will interpolate the temperature measurements using one of these new kriging models known as empirical Bayesian kriging.
Empirical Bayesian kriging (EBK) was developed specifically to overcome some of the more difficult theoretical and practical limitations of classical kriging. By far, the biggest limitation of classical kriging is the assumption that one single semivariogram can accurately represent the spatial structure of the data everywhere. Recall that the semivariogram represents the expected difference in data value for pairs of points that are a given distance apart. Regardless of where the points are on the map, if two pairs of points are the same distance apart, they are supposed to have the same difference in data values. However, for most datasets this assumption is not reasonable. One semivariogram model may fit best in one part of the map and a completely different semivariogram model may fit best in a different part of the map. In situations like this, you cannot hope to find a single semivariogram model that accurately represents the data everywhere on the map.
Even if there were a single semivariogram that fit well everywhere in the dataset, you would still need to estimate it. Unfortunately, the mathematical equations behind classical kriging assume that the semivariogram has been modeled perfectly, and any inaccuracy in the semivariogram parameters will not be properly accounted for in the predictions and standard errors. Because the math of kriging is based entirely on this single semivariogram, it is critical to estimate it as well as you possibly can. This is why there are so many parameters that can be used to change the shape of a semivariogram: you need as much flexibility as possible to accommodate all of the possible spatial structures of different datasets.
Empirical Bayesian kriging overcomes these problems through a process of subsetting and simulation. EBK starts by dividing the input data into small subsets. In each subset, a semivariogram is estimated automatically, and this semivariogram is used to simulate new data values in the subset. These simulated data values are then used to estimate a new semivariogram for the subset. This simulation and estimation process repeats many times, and it results in many simulated semivariograms in each subset. These simulations are then mixed together to produce the final prediction map.
By estimating the semivariograms on small subsets, different semivariograms will be estimated in different regions of the study area. This allows the model to change locally, and you no longer need to assume that a single semivariogram model can fit the data everywhere. Additionally, by simulating many semivariograms in each subset, you do not have to worry as much about the accuracy of any single semivariogram. When all math is based on a single semivariogram, you must be very careful to make sure that it is as good as it possibly can be, but when many semivariograms are simulated, it is not critical that each of them be perfect.
Perform empirical Bayesian kriging in the Geostatistical Wizard
You'll use the Geostatistical Wizard to interpolate the temperature measurements using empirical Bayesian kriging.
Due to the computational cost of the simulations in EBK, many mathematical operations are optimized for different processors. Depending on the hardware of your computer, you may get slightly different results in this section. These differences can be as large as 1 percent in some cases.
- If necessary, open your project.
- On the ribbon, on the Analysis tab, in the Tools group, click Geostatistical Wizard.
- For Geostatistical methods, choose Empirical Bayesian Kriging.
- Under Input Dataset, for Source Dataset, choose Temperature_Aug_08_8pm.
- For Data Field, choose TemperatureF.
- Click Next to update the Empirical Bayesian Kriging semivariogram and preview.
The top left pane displays a preview of the interpolated surface with a searching circle centered in the middle of the data extent.
The lower right displays Identify Result.
General Properties shows parameters for the semivariograms and the searching neighborhood.
Parameters in General Properties provide control over subsets and simulations in EBK:
- Subset Size specifies the number of points in each subset.
- Overlap Factor allows you to control how much these subsets overlap each other.
- Number of Simulations controls how many semivariograms will be simulated in each subset.
The Simulated semivariograms (blue lines) and Empirical semivariogram (blue crosses) are displayed in the lower left. The median semivariogram is solid red, and the first and third quartiles are displayed as dashed red lines.
- In General Properties, for Subset Size, type 50 and press Enter.
The preview surface updates to reflect the new subset size. With 139 input points, using a subset size of 50 will create approximately three subsets. This ensures that the semivariograms will be sufficiently estimated at a local level, while still maintaining enough points in each subset to reliably estimate the semivariogram parameters.
- In Identify Result, change X to 571000 and Y to 290000. Press Enter between each entry.
The predicted temperature at this location is about 83.39 degrees with a standard error of 0.63 degrees. In the previous lesson, simple kriging predicted 83.26 degrees with a standard error of 0.51 degrees at this same location.
Both simple kriging and EBK predict nearly the same temperature, but there is a notable difference in the standard errors of the predictions. This is because simple kriging almost always underestimates standard errors due to only using a single semivariogram. While a larger standard error in EBK seems to imply that EBK has larger uncertainty than simple kriging, the truth is that the standard errors of simple kriging are incorrectly low.
At this location (571000, 290000), the semivariograms seem to pass through the averaged values (blue crosses) fairly well, particularly at short distances. The averaged values at the largest distances tend to be on the lower end of the spectrum, but it is most critical to properly model the semivariogram at short distances, as these are the distances that will contribute most to the predicted values.
- In Identify Result, change X to 572000 and Y to 307000. Press Enter between each entry.
The prediction location moves to the top of the study area in the coldest part of the map. The predicted value for this location (572000, 307000) is about 74.14 degrees with a standard error of 2.28. Simple kriging predicted about 75.22 degrees with a standard error of 1.76. This time, the two predictions differ by a full degree, but this is likely due to the larger uncertainty in the predicted values at this location. This uncertainty can be seen in the larger standard errors, different than the previous x,y location.
- Click other locations on the preview surface to see the predicted values and the simulated semivariograms until you are satisfied that the semivariograms seem to fit the averaged values well almost everywhere on the map.
- Click Next to display the cross-validation page.
As with simple kriging, the cross-validation page displays summary statistics on the right and graphical diagnostics on the left. In the EBK summary statistics, there are now three additional statistics that did not appear in simple kriging:
- Average CRPS—This statistic simultaneously quantifies the accuracy and stability of the model, and it should be as small as possible. Unfortunately, it has no direct interpretation, and it can only be used to compare different interpolation models.
- Inside 90 Percent Interval—The percent of cross-validation points contained in a 90 percent prediction interval. This value should be close to 90. Your value of 89.928 is nearly perfect.
- Inside 95 Percent Interval—The percent of cross-validation points contained in a 95 percent prediction interval. This value should be close to 95. Your value of 96.403 is quite close to the ideal value of 95.
The following table shows a comparison of cross-validation summary statistics from EBK and simple kriging:
Your values may differ slightly from the table below due to rounding.
Summary statistic Simple kriging EBK
Average Standard Error
- Larger Mean and Mean Standardized values in EBK indicate that it has slightly more bias than simple kriging, but overall both models have very small amounts of bias.
- The slightly lower Root-Mean-Square value indicates that on average EBK predicts slightly more accurate temperature values.
The biggest difference in the two models is that the standard errors in EBK are much more accurate.
- The larger Average Standard Error value in EBK shows that on average, EBK is estimating larger standard errors than simple kriging.
- The nearly perfect Root-Mean-Square Standardized value in EBK (recall that ideally it should be one) indicates that these standard errors are being more correctly estimated.
- The Average Standard Error value of EBK also more closely matches that Root-Mean-Square value than it does in simple kriging.
Taken together, this is strong evidence that the EBK model is more reliable than the simple kriging model.
- Confirm that the graphical diagnostics pane is displaying the Predicted graph.
The graph shows predicted values from cross-validation versus measured values. The blue regression line is so close to the gray reference line that you can hardly see the reference line. In simple kriging, the regression line was not as perfectly aligned with the reference line. This should give you further confidence that the EBK model is more reliable.
- Click the Error tab.
Like the simple kriging model before, the blue regression line is slightly decreasing, which indicates that the model has performed smoothing of the data, but this smoothing is not severe.
- Click the Normal QQ Plot tab.
The red points very closely follow the gray reference line. There is still some deviation from the reference line for the largest values, but this deviation is smaller than it was in simple kriging. Based on this graph, you can safely assume that the predictions follow a normal distribution.
- Click Finish.
- On the Method Report page, click OK.
The Geostatistical Wizard closes and the Empirical Bayesian Kriging geostatistical layer is added to the Contents pane. This layer has the same symbology as the Kriging layer, so they can be visually compared.
- In the Contents pane, turn off Temperature_Aug_08_8pm. Turn on Kriging and keep Empirical Bayesian Kriging turned on too. Click Empirical Bayesian Kriging to select it.
Slight variations may be noticed as a result of rounding.
- On the ribbon, on the Appearance tab, in the Effects group, click Swipe. Swipe up and down or left and right to display the difference between the Empirical Bayesian Kriging and Kriging layers.
- On the Map tab, in the Navigate group, click Explore.
- Click several locations on the map to preview predicted temperatures and the standard error of the prediction. Make sure to click some areas in the middle of the city as well as some locations in the suburban and rural areas outside of the city.
- When finished, turn off Empirical Bayesian Kriging and Kriging, and collapse their legends, if necessary.
- Save the project.
In this lesson, you interpolated the temperature measurements using empirical Bayesian kriging in the Geostatistical Wizard. As with simple kriging in the previous lesson, you could confirm the presence of an urban heat island on the prediction map; the center of the city is notably warmer than the surrounding areas. Using cross-validation, you showed that EBK produced a moderately more accurate temperature prediction map, particularly for the standard errors of predicted temperatures.
In the next lesson, you'll use an even more sophisticated version of kriging called EBK Regression Prediction, which will allow you to incorporate the locations of impervious surfaces into the interpolation.