Map and explore oxygen level data using charts
First, you'll use line and histogram charts to explore the properties and characteristics of your data. Exploring your data is an important first step of nearly every analytical workflow. Then, using these charts, you'll determine whether the data is viable for an interpolation workflow. By using a line chart to see how dissolved oxygen levels change over time, you can choose appropriate time windows for the analysis. Once time windows are chosen, the histogram chart allows you to see the various levels of dissolved oxygen across the bay.
Download and open the project
A folder has been provided with water quality data collected from the estuaries of Chesapeake Bay and several data layers in an ArcGIS Pro package. This data was supplied by the Chesapeake Bay Program.
- Download the Chesapeake_WaterQuality.zip file.
- Locate the downloaded file on your computer.
Depending on your web browser, you may have been prompted to choose the file's location before you began the download. Most browsers download to your computer's Downloads folder by default.
- Right-click the file and extract the contents to a convenient location on your computer, such as your Documents folder.
- Open the unzipped folder to view the contents.
- If you have ArcGIS Pro installed on your computer, double-click Chesapeake_WaterQuality.ppkx to open the project.
If you don't have ArcGIS Pro or an ArcGIS account, you can sign up for an ArcGIS free trial.
- If prompted, sign in using your licensed ArcGIS account.
The project contains a map named Chesapeake Bay Dissolved O2 that contains a topographic basemap and the following data layers:
- The DissolvedO2 layer shows locations where dissolved oxygen and numerous other compounds have been monitored since 1984. While you only see 131 points on the map, each location contains hundreds or thousands of historical measurements.
- The Bay layer represents a simplified polygon of the coastline of the bay.
Dissolved oxygen is measured in milligrams per liter of water (mg/L). According to the National Oceanic and Atmospheric Administration (NOAA), any persistent dissolved oxygen levels below 5.0 mg/L are considered unhealthy, and any locations with persistent levels below 0.2 mg/L are dead zones where fish and plants cannot survive.
- In the Contents pane for the Chesapeake Bay Dissolved O2 map, turn on the Bay layer.
Depending on your default ArcGIS Pro configuration, the Contents pane may not open automatically. If necessary, on the ribbon, click the View tab. In the Windows group, click Contents.
- On the Map tab, in the Navigate group, click Explore.
- Click and pan the map to the northern tip of Chesapeake Bay.
- In the Contents pane, click the Bay layer to select it. On the ribbon, click the Appearance tab. In the Compare group, click Swipe.
When you point to the map, the pointer changes.
- Click and drag the pointer up and down or left to right to hide the Bay layer.
The extent of the Bay polygon does not exactly match the Topographic basemap below. The Bay layer has been simplified and generalized from the actual Chesapeake Bay boundary. The generalization will make future analysis faster.
- On the Map tab, click Explore. Scroll your mouse wheel to zoom back to the full extent of Chesapeake Bay.
Reselecting the Explore tool disables the swipe effect, allowing you to pan and zoom normally.
- In the Contents pane, turn off the Bay layer and turn on the DissolvedO2 layer.
The DissolvedO2 layer is sourced from a CSV file downloaded from the Chesapeake Bay Program Water Quality Database (1984 – present). This data was geocoded, projected, and filtered to retain data sampled between early 2014 and late 2015 that related to dissolved oxygen.
- Use the Explore tool to visualize the distribution of the dissolved oxygen measurements throughout the Chesapeake Bay.
Using the Chesapeake Bay Program Water Quality Database (1984 – present) link, you can download additional nutrient data for multiple years that you can investigate on your own.
Create a line chart of dissolved oxygen
Now that you've explored the data, you'll create a line chart for the dissolved oxygen levels. A line chart is a type of graph that shows how a value changes over time. Your line chart will show how the average dissolved oxygen level of the entire bay changed during 2014 and 2015.
Setting SampleDate as the Date or Number variable field specifies that the date and time each DissolvedO2 measurement was taken is plotted on the horizontal x-axis of the line chart.
- In the Contents pane, right-click DissolvedO2, point to Create Chart, and choose Line Chart.
The Chart Properties - DissolvedO2 and the Dissolved02 – Chart of DissolvedO2 panes appear.
- In the Chart Properties pane, on the Data tab, for Date or Number, choose SampleDate. For Aggregation, choose Mean.
- UnderNumeric field(s), click Select. Check MeasureValue, then click Apply.
The chart now shows the average (mean) of the dissolved oxygen measurements for each date.
The dissolved oxygen measurements stored in the MeasureValue field are plotted on the vertical y-axis of the line chart. You can now choose to aggregate the data differently. Because the SampleDate attributes are stored as dates, the default option is Count. This method counts the number of days that observations were recorded. MeasureValue is stored as a number, allowing different arithmetic operations to be applied.
- In the Time binning options section, confirm Interval size, is set to 5 Days. For Empty bins, choose Connect line.
Connect line makes the line chart more readable by ensuring that the line connects even if there are dates with no available measurements.
The title of the chart and chart pane are updated to Dissolved02 – Change in mean MeasureValue over SampleDate, reflecting the variables used to generate the line chart.
- In the chart pane, visually identify the average dissolved oxygen levels above 12–13 mg/L observed between 4/1/2014 and 4/1/2015. In addition, identify the summer dates that correspond to average dissolved oxygen levels lower than 5–6 mg/L.
Your chart content may appear different than the example image due to monitor resolution and chart size that affect which sample dates and measure values are displayed on the horizontal and vertical axis. The line color of your chart may also differ, but the results are the same.
There is a clear seasonal cycle to dissolved oxygen in Chesapeake Bay. The average dissolved oxygen level is highest during the winter (with average levels as high as 12–13 mg/L) and lowest during the summer (with average levels as low as 5–6 mg/L). Since anything below 5.0 mg/L is considered unhealthy, the dissolved oxygen level between June and September needs to be investigated. It is, however, encouraging to see that at no time was the average dissolved oxygen level close to 0.2 mg/L, which indicates the inability to sustain marine life.
Filter the line chart to show summer 2014 data
Although you've observed a seasonal cycle in the dissolved oxygen levels, you want to look more closely at individual seasons. While the general trend of the data rises and falls, there is a lot of variation between each observation. You'll use a task to select measurements taken between June 15, 2014, and September 15, 2014, and sampled at a depth greater than 5 meters. A task is a set of preconfigured steps that guide you through a workflow. The task for this selection query is included in your project.
- On the ribbon, click the View tab. In the Windows group, click Catalog Pane.
- In the Catalog pane, expand the Tasks folder.
- Double-click the Filter Samples for Summer 2014 and Summer 2015 task.
The Tasks pane appears.
- In the Tasks pane, double-click Apply Summer 2014 Filter.
The task opens. This task consists of one step that executes a three-part query on the DissolvedO2 layer.
You can resize the pane by pointing to the right side of the pane and dragging the pane to a larger size.
The task parameters are as follows:
- For Input Rows, select DissolvedO2.
- For Selection type, select New selection.
The expression uses the following SQL queries:
- Where TotalDepth is greater than 5
- And SampleDate is after 6/15/2014 12:00:00 AM
- And SampleDate is before 9/16/2014 12:00:00 AM
The query's expressions select all samples taken at a depth greater than 5 meters between June 15, 2014, and September 15, 2014.
To learn how to write your own SQL query expression, see Write a query in the query builder.
- Click Run.
The task filter selects the points in the summer of 2014 on your line chart.
- At the top of the line chart, click the Filter by Selection button.
The chart updates to only show the selected points.
In the summer months of 2014, the average dissolved oxygen level fluctuates up and down without any clear pattern. The seasonal trends that you could see in the entire dataset disappear when viewing a single season. This is good; trends can cause difficulties for interpolation workflows. It appears that if you only use measurements within this three-month window, seasonal trends can be ignored.
- In the Tasks pane, click Finish to stop running the task. Close the Tasks pane.
Create a filtered histogram chart
In the previous section, you used a line chart to determine that you should limit your analysis to the summer months of 2014. These are the months when the average dissolved oxygen levels are near unhealthy levels. However, the line chart only showed the average dissolved oxygen level of the entire bay. What if some parts of the bay have low levels and other parts of the bay have high levels? Could the average be hiding some very low levels? To answer these questions, you'll create a histogram chart for the selected data.
- In the Contents pane, right-click DissolvedO2, point to Create Chart, and choose Histogram.
- In the Chart Properties pane, on the Data tab, make the following changes:
- Under Variable, for Number, choose MeasureValue.
- Under Bins, choose 64.
The DissolveO2 – Distribution of MeasureValue pane updates, showing a histogram of DissolvedO2 for all samples.
Notice the samples from the summer of 2014 remain selected in blue.
- At the top of the line chart, click the Filter by Selection button.
The histogram updates to display only selected samples for summer 2014. In the summer of 2014, most of the data measurements ranged between 3 mg/L and 9 mg/L of dissolved oxygen. The average (mean) level over the three summer months was 5.26 mg/L.
However, the two bars on the far left side of the histogram are worth noting, as the dissolved oxygen levels are far below the average and are observed for a high number of samples. You'll investigate these next.
- In the Distribution of MeasureValue histogram, hover over the first data bin (bar) on the left to show MeasureValue and Count values for the data bin.
Measure values may differ slightly due to rounding.
The bin properties reveal that 185 samples out of the total of 4,086 samples had dissolved oxygen levels between 0 and 0.2. This is indicative of a dead zone, and the result should be very concerning. However, a dead zone only occurs when the level of dissolved oxygen is persistently low for extended periods of time. Whether these locations have persistently low dissolved oxygen will be the focus of the next lesson.
- Close the Chart panes and the Chart Properties pane.
Closing the charts will not delete them from the project.
- On the quick access toolbar, click the Save button to save your project.
You've used a line chart and histogram to explore your data while applying a selection filter. The line chart indicated a strong seasonal pattern of dissolved oxygen distribution, with the lowest levels occurring in the summer months. In the summer of 2014, the average dissolved oxygen of the bay was close to the unhealthy level of 5 mg/L.
In the histogram, it was clear that some individual points had levels indicative of dead zones if the dissolved oxygen levels remain very low for extended periods. It is now critical to determine whether there are any areas of the bay that have persistently low dissolved oxygen levels.
Previously, you used the line and histogram charts to determine that there may be dead zones in Chesapeake Bay. Next, you'll use the Geostatistical Wizard to create an interpolated surface to locate the dead zones. Spatial interpolation is the process of taking measurements at a set of points and predicting the value everywhere between the measured points. Because it is not practical to collect data at every possible point, individual locations (samples) are measured, and interpolation is used to fill in the gaps between the measured points.
All interpolation methods must define how to measure the distance between any two points, and almost all interpolation methods use straight-line (Euclidean) distance. However, for data collected in an estuary, this definition of distance doesn't work because the straight line between two points may cross over land. Instead, you need an interpolation method that can use water distance. The water distance between any two points is the shortest distance between them that only goes through water.
The following image shows the difference between a Euclidean and a water distance between two points in Chesapeake Bay.
The Euclidean distance between the points crosses over land, while the water distance follows the shortest path around the land. Water distance best represents how dissolved oxygen moves through the water. To interpolate using water distances, you'll perform an interpolation with barriers.
In geostatistics, a barrier is anything that causes an instantaneous change in the value of data. Common forms of barriers include faults, cliffs, and shorelines. By using a barrier in an interpolation, all distances are calculated as the shortest distance not crossing the barrier. In this lesson, the coastline of Chesapeake Bay will be used as a barrier, thus ensuring that distances are calculated as water distances.
Open the Geostatistical Wizard and choose an interpolation method
The Geostatistical Wizard is a guided step-by-step environment for performing spatial interpolation. You'll start by choosing an interpolation method that supports barriers; then you'll provide the data you want to interpolate. You'll configure the options for the interpolation method, use the graphs and diagnostics provided, and assess how well your interpolation model performs.
- If necessary, open your Chesapeake_WaterQuality project.
- On the ribbon, on the Analysis tab. In the Workflows group, click the Geostatistical Wizard button.
The Geostatistical Wizard opens, showing available interpolation methods on the left and dataset options on the right.
The Geostatistical Wizard offers several interpolation tools, such as Inverse Distance Weighting and Kriging, that apply various geostatistical and deterministic interpolation methods. Choosing which interpolation method and tool to use for your data is critical to achieving successful and meaningful results. Because this data was collected in an estuary, you will perform interpolation with barriers to use water distances in the interpolation.
- In the Geostatistical Wizard, for Interpolation with barriers, choose Kernel Interpolation.
There are two available methods for interpolation with barriers, kernel and diffusion. Kernel interpolation is the most common method and is most often used for data collected in estuaries. Kernel Interpolation with barriers is a type of local predictor that allows for the inclusion of feature barriers. The distance between two locations in this method is defined as the shortest sequence of straight lines that connect two locations but do not cross a barrier.
Next, you'll create a result by interpolating the MeasureValue field of the DissolvedO2 layer, using the Bay layer as a barrier.
- In the Geostatistical Wizard, choose the following parameters for Dataset:
- For Source Dataset, select DissolvedO2.
- For Data Field, select MeasureValue.
- Fro Barrier Features, select Bay.
- Click Next.
You are notified that two or more samples exist at the same location. You are prompted to select a method for handling the coincident points.
- Confirm that Use Mean is chosen.
This will average dissolved oxygen values for all measurements at each location before proceeding with the interpolation. Other options provide the ability to ignore all coincident points (Remove all), only use the smallest of the coincident points (Use Minimum), only use the largest (Use Maximum), or use all coincident points (Include all).
- Click Next.
The Geostatistical Wizard completes an initial interpolation using default settings for the kernel interpolation with barriers method.
The current window of the Geostatistical Wizard is divided into three sections: the Preview surface on the left, General Properties at the upper right, and Identify Result at the lower right.
In the next section, you will explore and understand how these three sections work together in building an interpolation model.
Explore and configure kernel interpolation with barrier options
Before configuring options and exploring how the preview surface responds to changes, familiarize yourself with the contents of this pane in the wizard.
Any changes made to the general properties will automatically update the preview surface. You can then click anywhere in the preview surface to see the predicted values. This interaction allows you to quickly make changes to interpolation parameters and explore how these changes affect the prediction surface. Most interpolation methods offered in the Geostatistical Wizard allow you to interact in this way.
- Review the preview image of your interpolated map displayed in the Preview surface pane located on the left of the wizard.
The interpolated values are colored using contour polygons. Red areas have the largest predicted values, blue areas have the smallest, and orange, yellow, and green are in the middle.
- In the upper right General Properties section, review the parameters and options supported by kernel interpolation.
The general properties include advanced options for kernel function, order of polynomial, and ridge.
The Output Surface Type option allows you to switch the preview surface from prediction to standard errors of prediction and vice versa.
The Bandwidth value controls the radius of the searching circle in the preview surface. For this data, the bandwidth is measured in meters, and a default is provided by the software based on a simple optimization.
- For Bandwidth, type 37540.48.
- In the lower right pane, review the Identify Result section.
The pane displays the x- and y-coordinates of the crosshairs in the preview surface as well as the prediction and the standard error of prediction at that location. In this case, the starting coordinate is slightly outside the boundary of Chesapeake Bay, so the Prediction and Standard Error of Prediction options both initially display Not a Number to indicate that a prediction cannot be made at that coordinate.
Standard errors of prediction will not be used in these lessons but are important to understand. Standard errors are statistical measures that quantify the uncertainty in a predicted value. A common rule of thumb is to double the standard error and add it to or subtract it from the predicted value to create a 95 percent confidence interval for the predicted value. For example, if a location has a predicted value of 10 with a standard error of 1, you can be 95 percent confident that the true value is between 8 and 12.
- In the Identify Result pane, for X, type 370000 and press Enter.
- For Y, type 4220000 and press Enter.
The Prediction and Standard Error of Prediction options for the location update and the searching circle in the Preview surface moves to the new coordinate location.
- In the Preview surface pane, zoom to the searching circle using your mouse wheel.
Make sure when you zoom in that the x- and y-coordinates are still set to (370000,4220000).
In the preview surface, neighbors are highlighted in colors to represent how much influence they have on the predicted value. Points colored in red have the largest influence, green points have the least, and brown and yellow are in the middle.
To calculate the prediction, a set of neighboring points must be identified near the prediction location. These neighbors are chosen with the searching circle around the prediction location. The radius of this circle is controlled by the Bandwidth option in General Properties, and only points within the bandwidth distance of the prediction location can be used as neighbors in the calculation.
- If necessary, zoom closer to the crosshairs representing the prediction location in the searching circle.
Four points at the upper right of the searching circle are not highlighted in any color. These points are excluded, as their water distance is farther than the bandwidth from the prediction location.
The blue line in the following image represents the water distance from one of the points to the prediction location. Since the length of the blue lines exceeds the bandwidth, the point cannot be used as a neighbor.
Because the bandwidth value controls which neighbors can be used in the calculation, it is the most critical choice you'll make when performing kernel interpolation with barriers.
Discover how bandwidth changes affect interpolation
Next, you'll change the bandwidth value to see how the preview surface changes.
- In the General Properties section, for Bandwidth, type 10000 and press Enter.
The preview surface updates to reflect the new bandwidth.
- In the Preview surface pane menu, click the Reset View button.
The preview pane updates to show the entire Bay layer.
By decreasing the bandwidth, you introduced many holes in the preview surface where no predictions can be made. The predicted value also changes erratically as it moves from point to point. These are both indications that this bandwidth is too small.
- In the General Properties pane, for Bandwidth, type 100000 and press Enter.
The Preview surface section updates to reflect the new bandwidth.
By increasing the bandwidth, you introduced many neighbors that can be used in the calculation. However, more neighbors are not necessarily better. By including so many neighbors from the lower and upper parts of the bay, you have made the surface too smooth. The contour lines now indicate a gradual change that is unrealistic for dissolved oxygen values.
- For Bandwidth, click the Click to Optimize button.
The bandwidth reverts to the original default value (37540.4870045742), or approximately 37.5 kilometers.
- Click Next.
The Geostatistical Wizard updates and displays a Cross validation window. This is the final window of the Geostatistical Wizard and contains numerous graphical and numerical diagnostics that allow you to determine how well the interpolation performs. You'll learn more about cross validation later.
- Click Finish.
The Method Report window appears, showing a summary of interpolation and the settings you are applying to your data.
- Click OK.
The Geostatistical Wizard closes and a layer named Kernel Interpolation is added to the map.
The Kernel Interpolation layer is a custom layer type only used with the ArcGIS Geostatistical Analyst extension. It is optimized for quick visualization and calculation and can be exported to either a raster or feature layer.
- In the Contents pane, turn off the DissolvedO2 layer.
- Zoom to the full extent of Chesapeake Bay.
On the map, red and orange represent the highest average dissolved oxygen levels. Note that most of these high values are in the southern section of the bay near the Atlantic Ocean and at the tips of the inlets. The lowest levels (indicated by blue and green) are in the middle and upper parts of the bay.
- In the Contents pane, expand the legend of the Kernel Interpolation layer.
The ranges of values for the different colors indicate that significant areas of Chesapeake Bay had predicted average dissolved oxygen levels below the unhealthy level of 5.0 mg/L, but no locations appear to have had average levels near the fatal level of 0.2 mg/L in the summer of 2014. This means that the values as observed in the histogram chart that were below 0.2 mg/L must have persisted for short periods of time, and the average dissolved oxygen levels did not stay near the fatal level all summer in 2014.
- Save the project.
You've used the Geostatistical Wizard to interpolate average dissolved oxygen levels during the summer of 2014 in Chesapeake Bay. You saw how the searching neighborhood affected the interpolation result, and you produced a geostatistical layer of the interpolation in your map. Based on the interpolated map, you can infer that some areas of Chesapeake Bay may have been under the healthy level of dissolved oxygen in the summer of 2014, but there was no indication that there were any persistent dead zones where fish and plants couldn't survive.
Assess and compare interpolation results
Previously, you used the Geostatistical Wizard to perform interpolation with barriers. You experimented with some of the parameters, which changed the surface output. If you had made different choices, you would have gotten a different map.
The accuracy of an interpolation model is defined by how closely the predicted value of a location matches the actual value at that location. However, this definition of accuracy immediately presents a seeming contradiction. If you only measured the dissolved oxygen at a particular set of locations, how can you judge how well the interpolation model is predicting at new locations? If you don't know the real values at the new locations, what basis do you have for the accuracy of the prediction? It seems like an impassable contradiction, but there is a common and accepted resolution to this contradiction, known as cross validation.
Cross validation is a "leave-one-out" statistical method. The accuracy of a model is assessed by sequentially removing each measured point from the dataset and using the remaining points to predict a value at the location of the removed point. If your interpolation model is reliable, the remaining points should accurately predict the true (measured) value of the hidden point. You can then compare the prediction to the true measured value and see how close it is. The difference between the true value and the predicted value for a particular point is called the cross validation error. After cross validating every measured point, various numerical and graphical diagnostics can be generated to allow you to assess the overall accuracy of your model. You'll interpret cross validation diagnostics by interpolating average dissolved oxygen levels from the summer of 2014 and comparing the results to those of the summer of 2015.
Open and explore the Cross validation window
Next, you'll look at the Cross validation window of the layer you created in the previous lesson and interpret its various elements.
- If necessary, open your Chesapeake_WaterQuality project.
- In the Contents pane, right-click the Kernel Interpolation layer and choose Cross Validation.
Cross validation is a property of a geostatistical layer and is not supported for any other layer type.
The Cross validation window for the Kernel Interpolation layer appears.
To learn about all of the various tabs and statistics available in the Cross validation window, see Performing cross validation and validation.
- On the right side of the Cross Validation window, click the Table tab.
The table contains cross validation results for every measured point.
- If necessary, resize the window to display the Error column.
For each point, the Measured value for the point as well as the Predicted value from cross validation is maintained. The Error value is calculated by subtracting the Measured value from the Predicted value. If the Error value is greater than zero, that means the prediction from cross validation was higher than the true value. If the Error value is less than zero, the prediction was lower than the true value.
- Click the Error column title to sort from lowest to highest.
In the resorted Error column, the lowest cross validation error is -2.76. This means that cross validation predicted a dissolved oxygen level 2.76 mg/L less than the actual value at that location.
- Click the Error column title to sort from highest to lowest.
The highest cross validation error is approximately 3.03. This means that cross validation predicted a dissolved oxygen level approximately 3.03 mg/L higher than the measured value for that point.
- Click the first row to select the point with the highest cross validation error.
Selecting the record in the table highlights the associated point in the graph on the left. For this record, the point is on the x-axis of the graph.
This graph displays a scatter plot of the predicted values versus the measured values for each point together with a blue regression line for the points. Ideally, the predicted values will be close to the measured values, so you want to see the regression line follow at a 45-degree angle. A gray reference line is shown in the window to assess how close the regression line is to this ideal 45-degree angle. For this point, the blue regression line is a bit flatter than the gray reference line, and there is a lot of variability in the points around the lines. However, the difference does not appear to be too severe. If the blue line were close to completely flat or vertical, that would indicate severe problems that should not be accepted.
- In the graphical diagnostics portion of the window, click the Error tab.
The Error tab displays a scatter plot of the measured values versus cross validation errors. This graph is used to determine whether the cross validation errors are independent of the measured values.
Independence between the errors and the measured values is important because you want to make equally accurate predictions for low, medium, and high levels of dissolved oxygen. Independence between the errors and measured values is indicated by a flat regression line. In your graph, the regression line is decreasing, which indicates that the highest measured values were underpredicted, and the lowest measured values were overpredicted.
This is known as smoothing, and it is a common phenomenon. The degree of smoothing in your graph is typical, but you should be aware that this smoothing means that the model could be incorrectly predicting safe levels of dissolved oxygen in locations that actually have unhealthy or dangerous levels. This should not dissuade you from continuing your analysis, but it is something that should be disclosed when reporting your findings.
- In the numerical diagnostics portion of the Cross validation window, click the Summary tab.
The Summary tab displays summary statistics for the information on the Table tab and provides a simple and useful way to assess the cross validation results.
Root-Mean-Square is the most important statistic for judging the accuracy of a model. Its value will always be greater than zero, but the closer it is to zero, the closer the cross validation predictions are to the measured values, on average. Your Root-Mean-Square value of approximately 1.12 indicates that, on average, the cross validation errors were off from the true values by a little more than 1 mg/L of dissolved oxygen. All the other statistics give useful information about the model, but the Root-Mean-Square value is the only one that measures the accuracy of the predictions directly.
The other summary statistic to focus on for these lessons is the Mean value. This is the average (mean) of the cross validation errors, and it is used to assess whether the model has a tendency to predict too high or too low (this is known as bias). If the model is unbiased, this value should be close to zero. If this value is significantly larger than zero, it means that the model is systematically making predictions that are too high. Similarly, if the value is significantly less than zero, it means that the model is systematically making predictions that are too low. Your value of approximately 0.045 indicates that this model has very little bias. On average, it is making predictions that are about 0.045 mg/L too high, but that is a very small amount. You can safely assume that your model is unbiased based on such a small Mean value.
- Close the Cross validation window.
View the line chart and histogram for summer 2015
Next, you'll select the dissolved oxygen measurements taken during the summer of 2015. You'll explore the data with charts.
- If necessary, open the Filter Samples for Summer 2014 and Summer 2015 task.
On the ribbon, select View, then click Catalog Pane. Expand the Tasks folder.
- Double-click Apply Summer 2015 Filter.
- Click Run.
The measurements that were taken between June 15, 2015, and September 15, 2015, at depths greater than 5 meters are selected.
- Click Finish and close the Tasks pane.
- In the Contents pane, click the List By Drawing Order button.
You can switch back to the Contents pane by clicking the Contents tab at the bottom of the Tasks pane.
Charts are stored as a type of layer property that you manage along with the list of layers in the map Contents pane.
- Double-click Distribution of MeasureValue to reopen the histogram.
- If necessary, click Filter By Selection button to display only selected samples for summer 2015
The histogram chart automatically opens and is updated to show only the dissolved oxygen measurements from the summer of 2015.
- In the Chart Properties pane, under Statistics, turn on Median and Std. Dev.
This histogram looks similar to the histogram for the summer of 2014. Most dissolved oxygen measurements are between approximately 3 mg/L and 9 mg/L, and there is also a large bar on the left side at levels close to the dangerous level of 0.2 mg/L.
- In the Contents pane, double-click Change in mean MeasureValue over SampleDate to reopen the line chart.
- In the Chart Properties pane, for Time binning options, change the Interval size to 5 Days.
- If necessary, click the Filter By Selection button to display only selected samples for summer 2015
The line chart also looks similar to summer 2014. The overall average dissolved oxygen level of Chesapeake Bay moves up and down without any clear pattern. This means that you can safely average the values at each location during this time period.
- Close the Chart properties pane as well as both charts.
Interpolate average dissolved oxygen levels from summer 2015
Earlier, you used the Geostatistical Wizard to interpolate the measurements from the summer of 2014. However, most of the interpolation methods that are available in the Geostatistical Wizard are also available as geoprocessing tools. Next, you'll interpolate the average dissolved oxygen levels from the summer of 2015 using the Kernel Interpolation With Barriers geoprocessing tool.
- On the ribbon, on the Analysis tab, in the Geoprocessing group, click Tools.
The Geoprocessing pane appears.
- In the Geoprocessing pane, search for Kernel.
The search returns several possible geoprocessing tools that implement or contain the search term.
- Click Kernel Interpolation With Barriers.
The Kernel Interpolation With Barriers geoprocessing tool opens in the Geoprocessing pane.
- For Input features, choose DissolvedO2.
This parameter specifies that the DissolvedO2 layer contains the points that you want to interpolate.
- For Z value field, choose MeasureValue.
This parameter specifies that the MeasureValue field contains the dissolved oxygen measurements.
- For Output geostatistical layer, type Summer 2015.
This parameter specifies the resultant geostatistical layer name.
- For Input absolute barrier features, choose Bay.
This parameter specifies that the Bay layer will be used as a barrier in the interpolation. This will allow the tool to use water distances.
- Accept the remaining default values.
By leaving the Bandwidth parameter empty, the tool will determine the value of the bandwidth that results in the smallest possible Root-Mean-Square cross validation error. This is also how the Geostatistical Wizard determined the optimal bandwidth in the previous lesson.
The Kernel Interpolation With Barriers tool will take the average of all coincident points by default, so this does not need to be explicitly specified in the geoprocessing tool. The other aggregation methods for coincident points can be found on the Environments tab of the tool.
- Click Run.
The tool executes. A layer named Summer 2015 is added to the Chesapeake Bay Dissolved O2 map. This layer represents the predicted average dissolved oxygen level across Chesapeake Bay for the summer of 2015.
- In the Contents pane, turn off the DissolvedO2 layer.
- In the Contents pane, turn the Summer 2015 layer on and off and compare it to the layer Kernel Interpolation, which contains data from the summer of 2014.
As with summer 2014, the highest average dissolved oxygen levels in summer 2015 are at the tips of the inlets and near the Atlantic Ocean in the southern part of the bay. The lowest dissolved oxygen levels are again in the middle and upper parts of the bay.
Compare the Summer 2014 and Summer 2015 layers using cross validation
Next, you'll view the Cross validation window for the layer created in the previous section and compare the numbers and graphs to the map from summer 2014.
- In the Contents pane, double-click the Kernel Interpolation layer.
The Layer Properties window appears.
- On the General tab, for Name, delete Kernel Interpolation and type Summer 2014.
Renaming the layer Summer 2014 will help you differentiate and compare results for 2014 and 2015.
- Click OK.
- In the Contents pane, right-click Summer 2014 and choose Cross Validation.
The Cross validation window for the dissolved oxygen levels of summer 2014 appears.
- In the Contents pane, right-click Summer 2015 and choose Cross Validation.
The Cross validation window for the dissolved oxygen levels of summer 2015 appears.
- Compare the Root-Mean-Square and Mean values for both summer 2014 and summer 2015.
Summary Summer 2014 Summer 2015
The Root-Mean-Square dropped from 1.117 in summer 2014 to 1.002 in summer 2015. This indicates that the predictions from cross validation were about 10 percent more accurate in summer 2015 than summer 2014. This is likely because summer 2015 had about 10 percent more data (85 points versus 78 points), as indicated by the Count value.
The Mean value changed from 0.045 in summer 2014 to 0.021 in summer 2015. This value should be as close to zero as possible, so summer 2015 had slightly lower bias than summer 2014 (though both summers had low levels of bias).
- In the graphical diagnostics, click the Predicted tab for both Summer 2014 and Summer 2015.
- Compare the graphs on the Predicted tab. If necessary, position the windows for Summer 2014 and Summer 2015 side by side for comparison.
The blue regression line for Summer 2015 (right) appears to fall closer to the gray reference line than the regression line for Summer 2014 (left).
- In the graphical diagnostics, click the Error tab for both Summer 2014 and Summer 2015.
The graphs on the Error tab for Summer 2014 and Summer 2015 appear to be nearly identical. You may recall that ideally the blue regression line will be flat. A regression line that is decreasing, as in both Summer 2014 and Summer 2015, indicates that the model is smoothing the data and underpredicting large values and overpredicting small values.
- Compare the slope of the Regression function, located at the lower left of each graph.
Regression function Summer 2014 Regression function Summer 2015
The Regression function shows that the slope of the blue regression line is slightly more negative for Summer 2014 than it is for Summer 2015 (-0.668 versus -0.581). This indicates that there is slightly more smoothing in Summer 2014 than Summer 2015.
Thus, you can conclude that the interpolation of Summer 2015 is slightly less likely than the interpolation of Summer 2014 to incorrectly predict safe levels of dissolved oxygen in locations where the actual levels are unhealthy or dangerous. However, neither year shows severe levels of smoothing.
- Close both Cross validation windows.
- Save the project.
You've assessed and compared the accuracy and reliability of an interpolation model using cross validation. By learning about the cross validation table, summary statistics, and graphs, you are now equipped to quantify the accuracy and reliability of an interpolation model. These skills also allow you to make important disclosures about the limitations of your models. It is necessary to disclose that your models appear to be smoothing the data, as this could potentially hide some dangerous dissolved oxygen levels of Chesapeake Bay.
Create a poster using a layout to share results
Previously, you performed geostatistical interpolation using geoprocessing tools. You also learned how to explore your data using charts and how to assess and compare the reliability of interpolation models. Now that the statistical components of the analysis are complete, you need to present this information so that it can be easily understood by colleagues and decision makers. Even the best analysis will have no impact if the right people are not alerted to it or are unable to quickly understand it.
Next, you'll export your geostatistical layers to rasters and apply an attractive and meaningful color ramp using the Symbology pane. Then, you'll add these rasters to a layout to make a poster of your findings. Along with a written analysis and interpretation of results, this poster will allow colleagues and collaborators to learn about the dissolved oxygen levels across Chesapeake Bay during the summer months of 2014 and 2015.
Export geostatistical layers to rasters and apply a custom color scheme
First, you'll export your geostatistical layers to rasters and apply custom symbology. Geostatistical layers are useful for quick visualization and easy access to cross validation results, but they are not recommended to be displayed in your final map. Instead, they should be converted into raster format to support visualization options not available for geostatistical layers.
- If necessary, open your Chesapeake_WaterQuality project.
- In the Contents pane, right-click Summer 2014, hover over Export Layer, and click To Rasters.
The GA Layer To Rasters geoprocessing tool opens in the Geoprocessing pane.
- In the GA Layer To Rasters tool, confirm that Input geostatistical layer is Summer 2014.
- For Output raster, type DissolvedO2_2014.
- For Output cell size, type 500.
- In the GA Layer To Rasters pane, click the Environments tab.
The Environments tab displays the geoprocessing environments parameters supported by the GA Layer To Rasters geoprocessing tool. You'll set the extent for your output raster to match the extent of the Bay layer.
- For Extent, choose Bay.
This specifies that you want the output raster to have the same extent as the Bay layer. The Extent control updates to As Specified Below and displays an updated minimum and maximum extent as defined by the Bay layer.
- Click Run.
The DissolvedO2_2014 raster layer is added to the map.
- In the Contents pane, right-click Summer 2015, hover over Export Layer, and click To Rasters.
- In the GA Layer To Rasters tool, set the following parameters:
- For Input geostatistical layer, choose Summer 2015.
- For Output raster, type DissolvedO2_2015.
- For Output cell size, type 500.
- On the Environments tab, set Extent to Bay.
- Click Run.
Now that both the Summer 2014 and Summer 2015 geostatistical layers have been converted to rasters, you can remove them from your map.
- In the Contents pane, click List By Drawing Order.
- Right-click the Summer 2014 layer and click Remove.
- Remove the Summer 2015 layer.
Symbolize the rasters
Next, you'll symbolize the new rasters.
- In the Contents pane, turn off the DissolvedO2_2015 layer. Right-click DissolvedO2_2014 and choose Symbology.
The Symbology pane appears, displaying the default raster symbology for the DissolvedO2_2014 layer. To better symbolize dissolved O2 values in the Chesapeake Bay, you'll modify the default symbology using a custom stretch renderer.
- In the Symbology pane, click the Menu button and choose Import from layer file.
You'll import and apply symbology from an existing layer file.
- Browse to the location of the extracted Chesapeake_WaterQuality.zip file that you downloaded. (You may have saved it to your Documents folder.) Double-click Symbology.lyrx.
This Symbology.lyrx layer file contains existing symbology that is appropriate for use on the Dissolved O2 rasters. When you apply it, the DissolvedO2_2014 raster layer displays using a Stretch renderer. Each cell is symbolized based on its value as follows:
- Cells with predicted values of dissolved oxygen levels below 5.0 mg/L are displayed in shades of red and orange.
- Cells with predicted values of dissolved oxygen levels above 5.0 mg/L are displayed in shades of yellow and green.
Because 5.0 mg/L is the accepted threshold between healthy and unhealthy levels, this color scheme allows you to quickly see the areas of Chesapeake Bay that are predicted to have healthy and unhealthy levels of dissolved oxygen. Now you'll apply this symbology to the DissolvedO2_2015 layer.
- In the Contents pane, turn on the DissolvedO2_2015 layer. Right-click the layer and choose Symbology.
- In the Symbology pane, click the Menu button at the upper right and choose Import from layer file.
- Browse to the location of the extracted Chesapeake_WaterQuality.zip file and double-click Symbology.lyrx.
The same stretched symbology is applied to the Summer 2015 layer.
- Close the Symbology pane.
Change the basemap and add the Summer 2015 results to a map
Next, you'll prepare your maps to be added to a layout poster. First, you'll change the basemap to one that looks better on a poster. Then, you'll move your results from 2015 into the new map.
- On the ribbon, on the Map tab, in the Layer group, click Basemap and choose Light Gray Canvas.
The current Topographic basemap layer is replaced by the World Light Gray Reference and World Light Gray Canvas Base basemap layers.
Your layout will display two maps, one for DissolvedO2_2014 and one for DissolvedO2_2015. To create the second map, you'll copy the first one.
- In the Catalog pane, expand the Maps folder. Right-click Chesapeake Bay Dissolved O2 and choose Copy.
- Right-click the Maps folder and click Paste.
The Chesapeake Bay Dissolved O2 map is duplicated and named Chesapeake Bay Dissolved O21.
- Right-click Chesapeake Bay Dissolved O2, click Rename, and type 2014 Chesapeake Bay Dissolved O2. Press Enter.
- Change the name of Chesapeake Bay Dissolved O21 to 2015 Chesapeake Bay Dissolved O2.
- Right-click 2015 Chesapeake Bay Dissolved O2 and click Open.
A new map tab for the 2015 Chesapeake Bay Dissolved O2 map appears below the ribbon. You can use the tabs to switch between both maps.
- Click the map tab for 2014 Chesapeake Bay Dissolved O2.
- In the Contents pane, remove the DissolvedO2_2015 layer.
- Click the map tab for 2015 Chesapeake Bay Dissolved O2.
- In the Contents pane, remove the DissolvedO2_2014 layer.
Insert a layout and add the maps
Next, you'll create a layout containing the 2014 Chesapeake Bay Dissolved O2 and 2015 Chesapeake Bay Dissolved O2 maps. One of the goals of layout design is to leave as little white space as possible. You'll arrange your map with this in mind.
- On the ribbon, click the Insert tab. In the Project group, click New Layout. Under ANSI - Landscape, choose Letter 8.5" x 11".
A new layout is created and displayed in the map viewer. Your layout is currently empty. You'll add the map frames that contain your 2014 and 2015 Dissolved O2 maps.
- On the ribbon, on the Insert tab, in the Map Frames group, click the Map Frame drop-down arrow and choose the 2014 Chesapeake Bay Dissolved O2 map.
- Click and drag a rectangle on the map layout to place the map.
- In the Contents pane, double-click Map Frame.
The Format Map Frame pane appears.
- In the Format Map Frame pane, under Options, name the map frame 2014 Chesapeake Bay Dissolved O2.
- On the ribbon, click the Format tab.
- In the Size & Position group, change Width to 4.5 in and Height to 6 in (or 114 mm and 152 mm in metric units).
The map frame size adjusts. Next, you will align the frame to the page edges.
- In the Size & Position group, change the X value to 0.5 in and the Y value to 1 in. Press Enter.
This will change the X anchor position for the map frame from the left edge of the map to a position 0.5 inches from the left edge.
The map frame is now sized and aligned, with space to add the title, subtitle, and the 2015 map.
- On the ribbon, on the Insert tab, click the Map Frame drop-down arrow and choose the 2015 Chesapeake Bay Dissolved O2 map. Click and drag a rectangle on the map layout to place the map.
- In the Format Map Frame pane, in the Options section, for Name, type 2015 Chesapeake Bay Dissolved O2.
- On the ribbon, on the Format tab, in the Size & Position group, change Width to 4.5 in and Height to 6 in (or 114 mm and 152 mm in metric units).
- In the Size & Position group, change the X value to 6 in and the Y value to 1 in. Press Enter.
- Confirm that your layout now displays both the 2014 Chesapeake Bay Dissolved O2 and 2015 Chesapeake Bay Dissolved O2 maps positioned .5 inches from the left and right margins of the page.
- In the Contents pane, expand the 2014 Chesapeake Bay Dissolved O2 map frame. Right-click DissolvedO2_2014 and click Zoom To Layer.
The map on the left updates to show Chesapeake Bay centered on the map. You'll repeat the process for the map on the right.
- In the Contents pane, expand the 2015 Chesapeake Bay Dissolved O2 map frame. Right-click DissolvedO2_2015 and click Zoom To Layer.
The two map frames are now in alignment and are zoomed in to the area of Chesapeake Bay. By placing the map frames side by side, it is easier to compare and contrast the dissolved oxygen levels from summer 2014 and summer 2015.
Add text to the layout
Without a title and subtitles, your map doesn't make much sense to viewers. This text will be placed above the map and provide important information for understanding what the map shows. You'll add both the title and subtitle using text rectangles
- On the ribbon, on the Insert tab, in the Graphics and Text group, click Rectangle text button.
- Draw a long rectangle that spans the upper part of the map.
- In the rectangle, delete the placeholder text and type Dissolved Oxygen Levels in Chesapeake Bay.
- Click outside the rectangle to save the text.
The text is small and uses a generic font. You'll make it bigger and eye-catching.
- If necessary, in the Contents pane, double-click the text box to open the Format Text pane.
- In the Format Text pane, expand the General section and change Name to Title text.
- Click the Text Symbol tab. In General, expand Appearance and change the following parameters:
- For Font name, choose Century Gothic.
- For Font style, choose Bold.
- For Size, type 30 pt.
- For Color, choose Gray 80%.
- For Outline color, choose No color.
- Expand Position. For Horizontal alignment, click the Center button.
- Click the Formatting button.
- Expand the Formatting section, and change Letter spacing to 5%.
- Click Apply.
- Move the text rectangle to center it at the top of the layout, and resize it to fit the entire title.
You can use the handles at the corners and on the sides of the text box to resize the element.
Adding subtitles for both maps will use a similar process.
- Select the title text box and press Ctrl+C to copy.
- Press Ctrl+V.
A duplicate of the formatted text box appears.
- In the Format Text pane, under Options, expand General. For Name, type 2014 text.
- In the Format Text pane, under Text, type 2014.
- Click the Text Symbol tab, under Appearance, change the Font Size to 24 and Color to Gray 60%.
- Click Apply.
- Resize and move the duplicate text box above the 2014 Chesapeake Bay Dissolved O2 map frame.
- With the 2014 title text box selected, press Ctrl+C then Ctrl+V to duplicate the formatted text box.
- Resize and move the newly duplicated text box to center it above the 2015 Chesapeake Bay Dissolved O2 map.
- In the Format Text pane, expand the General. For Name type 2015 text.
- In the Format Text pane, under Text, type 2015.
To give a visual indication of direction, you'll add a north arrow to your layout.
- On the ribbon, on the Insert tab, in the Map Surrounds group, click North Arrow.
This inserts a default north arrow that can be modified as needed. There is also a gallery of north arrow styles available. For this project, you don't want to draw attention away from the maps, so you'll keep the default.
- On the map, draw a rectangle to position the North Arrow at a location between the map frames in the layout.
To modify the appearance of the north arrow, select the north arrow in the map layout, and use the controls on the North Arrow contextual tab.
Add a legend to the layout
One of the most important elements of the layout is the legend. The legend explains the symbology used in the Dissolved O2 layers for summer 2014 and summer 2015.
- In the Contents pane, click the 2014 Chesapeake Bay Dissolved O2 map frame to select it.
- On the ribbon, on the Insert tab, in the Map Surrounds group, click Legend.
- Draw a rectangle between the map frames in the layout.
The legend displays for the 2014 Chesapeake Bay Dissolved O2 map frame. Because you used the same stretched symbology for both maps, the legend is the same for both. The legend's title and subtitle aren't needed because the map title already indicates what the map frames show.
- In the Contents pane, expand Legend. Uncheck all layers except DissolvedO2_2014 .
If the symbol for DissolvedO2 and Bay were displayed, they will no longer appear in the legend. Any layers that you have turned on in the 2014 Chesapeake Bay Dissolved O2 map will appear in the legend by default.
- Under Legend, click DissolvedO2_2014.
- In the Format Legend Item pane, under Show, uncheck everything except Label (or layer name).
All text in the legend except the labels showing the range of dissolved oxygen levels are removed. Finally, you'll change the appearance of the layer text to match the rest of the text in your layout.
- Click the arrow next to Legend Item and choose Labels.
Options to modify the label text become available.
- Expand Appearance and update the following text parameters:
- For Font name, choose Century Gothic.
- For Font Style, choose Bold.
- For Size, type 18 pt.
- For Color, select Gray 70%.
- For Outline color, select No color.
- For Letter spacing, type 5%.
- Click Apply.
Next, you will add a title to the legend so viewers can understand what the values mean.
- In the Contents pane, click the Legend item.
The Format Legend pane appears.
- In the Format Legend pane, next to the Labels tab, click the drop-down arrow and choose Legend. Under Legend, check the box for Show and for Title, type Dissolved Oxygen (mg/L).
- Next to the Legend tab, click the drop-down arrow and select Title.
- Change the parameters as follows:
- For Font name, choose Century Gothic.
- For Font style, choose Bold.
- For Size, type 8.
- For Horizontal alignment, click the Center button.
- Click Apply.
The legend now has a title and unit label, but the color ramp for the legend is short. Adjusting the legend patch sizes will lengthen and widen the legend symbol display, making it easier to visualize the range of values.
- In the Contents pane, under Legend, select the Dissolved02_2014 legend item.
The Format Legend Item pane appears.
- On the Legend Item tab, for Sizing, change the following patch parameters:
- For Patch width, type 24 pt.
- For Patch height, type 24 pt.
The patch size updates on the legend.
- If necessary, resize and position the legend to a location between the map frames and above the north arrow.
Add a scale bar
The final element that a good reference map needs is a scale bar. You'll add one to both map frames for clarity.
- On the ribbon, on the Insert tab, in the Map Surrounds group, click the Scale Bar drop-down arrow and select Scale Line 1.
Clicking the top half of the button inserts a default scale bar. As with the north arrow, there is a gallery of scale bar styles available, but the default is suitable for your purposes.
- Click and drag a rectangle on the map layout to place the scale bar.
- Move and position the scale bar below the 2014 map frame. Resize the scale bar so the upper range stops at 120 miles.
- In the Contents pane, right-click Scale Bar and choose Copy. Right-click Layout and choose Paste.
- In the Layout pane, drag the new scale bar under the 2015 map frame.
- Save the project.
The layout now showcases the dissolved O2 maps you created for the summers of 2014 and 2015. It includes information relevant to people who need to understand your map, such as the title, subtitles, and legend. It also includes elements that help orient the audience, including the north arrow and scale bars. There is very little white space left in the layout.
Print the map
Next, you'll print your map to show others.
- Optionally, adjust any of the map elements until everything is the way you want it.
- On the ribbon, click the Share tab. In the Output group, click the Print Layout button.
The Print Layout pane appears.
Alternatively, in the Output group, you can click the Export Layout button to save a copy of the map to your computer.
- Confirm that your printer and print settings are as expected (depending on your printer, you may need to change the paper size to 8.5 by 11 inches), and click Print.
In this lesson, you used the ArcGIS Geostatistical Analyst extension to analyze the average dissolved oxygen levels in Chesapeake Bay during the summers of 2014 and 2015. Using interpolation, you created geostatistical layers that predicted the average dissolved oxygen levels across the entire bay. Then, you cross validated the results to quantify the accuracy of the interpolation. Finally, you converted your geostatistical layers to raster layers and added them to a layout to display and share your results.
Based on your results for the Chesapeake Bay, the average levels were never near the dangerous level of 0.2 mg/L, but many individual measurements were near or below this critically low level. Although mitigation efforts must be taken to bring dissolved oxygen levels in Chesapeake Bay above the healthy level of 5.0 mg/L, your analysis provides a scientifically and statistically defensible conclusion that even in the worst parts of the summer months, the dissolved oxygen levels were sufficient to sustain a thriving marine ecosystem.
Dead zones are a problem around the world. Similar processes of interpolating levels of dissolved oxygen could be used in places such as the Gulf of Mexico, the English Channel, and the East China Sea. The process of exploring data with charts, interpolating the data with the Geostatistical Wizard, and assessing the accuracy of your results with cross validation is common to almost all interpolation workflows. You are encouraged to download data from other sources and for other years from the Chesapeake Bay Program Water Quality Database (1984 – present) and repeat the lesson steps using this new or updated data.
You can find more lessons in the Learn ArcGIS Lesson Gallery.