Fill gaps in your data with areal interpolation

Interpolate the percentage of seniors across Poland

If you know the values for most of the features in your dataset, you can use them to predict continuous values across the entire area. You'll do this to map the spatial distribution of seniors in Poland.

  1. Download the FillGaps project package.
  2. Locate the downloaded file on your computer. Double-click FillGaps.ppkx to open it.
    Note:

    If you don't have access to ArcGIS Pro or an ArcGIS organizational account, see options for software access.

    The project opens in ArcGIS Pro.

    Map of Poland with powiaty colored by percentage of seniors. 10 polygons are empty

    This map depicts powiaty, which are administrative units similar to counties, in Poland. The polygons are colored to represent the percentage of the population aged 65 years or older. Unfortunately, the data is incomplete. Ten powiaty contain no value for the percentage of seniors.

    This spatial data can be found on ArcGIS Living Atlas of the World. The values for the percentage of seniors were provided by Statistics Poland. (The missing values were artificially removed for the purpose of this tutorial).

    Demographic data is often difficult to model with geostatistics because urban areas show dramatically different patterns than rural ones. In this case, the spatial variation in this data is relatively smooth, without dramatically distinct breaks. This means that the data might be appropriate for geostatistics.

  3. On the ribbon, on the Analysis tab, in the Workflows group, click Geostatistical Wizard.

    The Geostatistical Wizard on the Appearance tab of the ribbon

    The Geostatistical Wizard window appears.

  4. In the Geostatistical Wizard window, under Geostatistical methods, choose Areal Interpolation.

    Most interpolation methods require point data as the input, but areal interpolation uses polygons. In this tutorial, you are using polygons that are nearly complete and fit together like puzzle pieces. You can also use polygons that are widely spaced or overlapping. For example, you may have data representing observations of birds, which is stored in polygons for the ground covered by each observer.

    Note:

    You can read more about this geostatistical method at What is areal interpolation?

    Areal interpolation will process values differently if you declare them as representing averages, rates, or events. You are mapping the percentage of a population over a certain age, which is a rate.

  5. Under Input Dataset 1, for Type, choose Rate. For Source Dataset, choose Powiaty_Seniors.
  6. For Count Field, choose 2017 Senior Population, and for Population Field, choose 2017 Total Population.

    Areal Interpolation selected in the Geostatistical Wizard with Type set to Rate

  7. Click Next.

    The next window shows a covariance chart. The blue crosses represent your data without any modeling. The blue line represents the model that will be used to predict the percentage of seniors over the entire area. You want to edit parameters of the model until the model line follows the path of the crosses and 90 percent of the crosses fall within the red confidence intervals. Currently, that is not the case.

    Covariance graph

    Not only does the line not follow the crosses closely, but there are two crosses that lie far away from the path. In many situations you won't be able to accomplish an ideal model, but you can try to get as close as possible. A good place to start is by making the lag size smaller. Doing so will reduce the area that is searched when sampling to generate the blue crosses.

  8. Under General Properties, for Lag Size, type 12000.

    The model changes. However, the crosses are now even farther from the confidence intervals.

    Covariance graph

    Next, you'll try to improve the model by changing its shape.

  9. For Model, choose Stable.
    Note:

    Stable and K-Bessel models often give the best result, but also take more time to process.

    Covariance graph with Model set to Stable

    Achieving a perfect model can be difficult or even impossible, especially if you are working with demographic data instead of a natural phenomenon. In this scenario, even though only one of the crosses falls within the confidence intervals, the model line follows the crosses relatively closely. This model isn't perfect, but it is a suitable compromise.

  10. Click Next.

    The next window contains a preview map.

    The Searching Neighborhood page of the Geostatistical Wizard with neighboring polygons highlighted

  11. Click different parts of this preview map.

    The map highlights neighboring polygons that will be used to determine the predicted value for the location you clicked. Polygons colored red will be weighted heavier in the analysis than those colored green.

  12. Click Next.

    The Cross validation page opens. Cross-validation assesses the accuracy of a prediction surface. It does so by removing a single polygon from the dataset and using the remaining data to predict a value within the removed polygon.

    Cross-validation results with Predicted scatterplot and Summary values

    The Predicted scatterplot for this model does not look good. Ideally, the red values should follow the trend of the blue and gray lines. Your chart looks more like a random cloud of points. On the other hand, the values listed on the Summary tab look good. These numbers should all be close to zero except for Root-Mean-Square Standardized, which should be close to 1. The Root-Mean-Square value of 0.02 means that the predicted proportion of senior citizens will be off by 2 percent on average from the real value. This is a reasonable margin of error. These values are more indicative of the quality of your model than the scatterplot.

  13. Click Finish. In the Method Report window, click OK.

    An interpolated layer is added to the map.

  14. In the Contents pane, turn off Powiaty_Seniors and turn on Powiaty_Seniors outlines.

    The areas with heavy black outlines are the ones with missing data.

    Orange and blue interpolated surface beneath powiaty outlines

Create polygons from the interpolation

The interpolation you created is continuous and ignores the polygon outlines. Geostatistics has smoothed the demographic data to create a gradual surface. While it may not match known data precisely, smooth interpolations like this are often better at predicting unknown values.

Next, you'll convert the continuous interpolation surface into polygons.

  1. On the ribbon, on the Map tab, in the Navigate group, click Bookmarks and choose Kluczborski.

    The Bookmark gallery opened from the Map tab of the ribbon

    The map navigates to Kluczborski powiat.

    Kluczborski powiat covers four colors on the underlying geostatistical layer

    The Areal Interpolation layer is a geostatistical layer, which means that every location on the map has a slightly different value. Some of the polygons that you need to fill, such as this one, have a wide range of predicted values. You'll convert this predicted surface into a polygon layer with a single predicted value for each powiat.

  2. On the ribbon, on the Analysis tab, in the Geoprocessing group, click Tools.

    Select Geoprocessing Tools

    The Geoprocessing pane appears.

  3. In the Geoprocessing pane, in the search bar, type Areal Interpolation Layer and in the list of results, choose the Areal Interpolation Layer To Polygons tool.
  4. In the Areal Interpolation Layer To Polygons tool pane, enter the following:
    • For Input areal interpolation geostatistical layer, choose Areal Interpolation.
    • For Input polygon features, choose Powiaty_Seniors.
    • For Output polygon feature class, change the output name to Interpolated_Polygons. Make sure to include the underscore.

    Areal Interpolation Layer To Polgyons tool with parameters filled

  5. Click Run.

    A polygon layer is added to the map.

  6. On the ribbon, on the Map tab, in the Navigate group, click the Full Extent button to return to the default view of the map.

    The Full Extent button on the Map tab of the ribbon

  7. In the Contents pane, drag the Interpolated_Polygons layer below the Powiaty_Seniors outlines layer.

    The Interpolated_Polygons layer dragged to below the Powiaty_Seniors outlines layer

  8. Turn off Areal Interpolation.

    You now have a value for percentage of seniors in every polygon.

    The Contents pane and map with Powiaty_Seniors outlines and Interpolated_Polygons as the only visible layers

    Although you have the real values for most of those polygons, you only want to use the predicted values for 10 of them. You will select the 10 polygons with missing value and use the Calculate Field tool to add values for those polygons alone.

  9. Right-click Interpolated_Polygons and choose Attribute Table.

    The attribute table appears. It contains all of the data from the Powiaty_Seniors layer and it also has three new fields: Included, Predicted, and Standard Error.

    Included, Predicted, and Standard Error columns in the attribute table

  10. Double-click the header for the Percent Seniors column to sort it.

    The header of the Percent Seniors column in the attribute table

    Now, all the empty records are at the top of the table. Next, you'll replace these <Null> values with the data from the Predicted field.

  11. Select all the rows with missing senior data.
    Note:

    Click the row number for the first record and select multiple rows, press the Shift key or drag the cursor across the row numbers you want to select. You can also use the Select by Attribute tool.

    Rows where Percent Seniors is Null selected in the attribute table

  12. At the top of the attribute table, click the Calculate button.

    The Calculate button at the top of the attribute table. 10 rows are selected

    The Calculate Field tool opens in a pop-up window. The field calculation will only be applied to the selected rows.

  13. For Field Name, choose Percent Seniors.

    Calculate Field in the Geoprocessing pane, with Field Name set to Percent Seniors

  14. In the Fields list, scroll down and double-click Predicted.

    The PercentSeniors = box populates with !Predicted! This will take the values from the Predicted field and paste them into the Percent Seniors field. But the existing values in these two fields are formatted as decimal values, not percent values. To convert them, you'll multiply values by 100.

  15. After !Predicted!, type * 100.

    PercentSeniors = box set to !Predicted! * 100

  16. Click Apply.
  17. In the attribute table, click the Show Selected Records button.

    Show Selected Records button

    The <Null> values in the Percent Seniors column have been replaced. The unselected rows remain unchanged.

    The attribute table showing new Percent Seniors values in the ten selected rows

  18. At the top of the attribute table, click Clear to clear the selection.

    The Clear button at the top of the attribute table

  19. Close the attribute table.

Symbolize the map

Finally, you'll symbolize the new layer to match the original one. Instead of setting the symbology parameters one by one, you'll import them from the Powiaty_Seniors layer.

  1. In the Contents pane, turn off Powiaty_Seniors outlines and click Interpolated_Polygons to select it.
  2. On the ribbon, on the Feature Layer tab, in the Drawing group, click Import.

    The Import button on the Appearance tab of the ribbon

    The Import Symbology window appears.

  3. In the Import Symbology window, for Symbology Layer, choose Powiaty_Seniors.

    The Import Symbology tool with Symbology Layer set to Powiaty_Seniors

  4. Click Apply then click OK.

    The symbology of Areal_Interpolation_Polygons now matches that of Powiaty_Seniors, your initial layer, but there are no longer any holes in the data.

    Map of Poland with powiaty colored by percentage of seniors, without any gaps

  5. On the Quick Access Toolbar, click the Save button.

    The Save button on the Quick Access Toolbar

The process of substituting values to replace missing data is called imputation. Often, values are imputed using the average of the remaining dataset. When your data is spatial, you have better options available to you, because you can assume that things that are closer together are more similar than things that are farther apart. In this tutorial, you used areal interpolation to create a continuous surface across Poland to model the percentage of the population that is over 65 years of age. You then sampled from that surface to predict values for the polygons that were missing data.

Don't forget to tell your map readers that some of the values were imputed. This can be done with labels, a list, or symbology. If your map is included in a report, you can describe the method of imputation.

The Fill Missing Values tool can accomplish the same task. For some datasets, this tool will give better results. For others, geostatistics will be better. It is difficult to know until you have tried both, but if the spatial transition between values is not smooth, Fill Missing Values is recommended.

Note:

Optionally, for an extra challenge, find the Fill Missing Values tool in the Geoprocessing pane and use it to impute the missing values in the Powiaty_Seniors layer. Compare your results to the real values in the Powiaty_full_dataset, which can be accessed by opening the Catalog pane, expand the Maps folder, and double-click the Full Dataset map.

Read more in Fill Missing Values (Space Time Pattern Mining) and this ArcUser article Dealing with Missing Data .

You can find more tutorials in the tutorial gallery.