Fill gaps in your data with areal interpolation

Interpolate the percentage of seniors

If you know the values for most of the features in your dataset, you can use them to predict continuous values across the entire area. You'll do this to map the spatial distribution of seniors in Poland.

  1. Download the FillGaps project package.
  2. Locate the downloaded file on your computer and double-click it to open the project in ArcGIS Pro. If prompted, sign in using your licensed ArcGIS account.
    Note:

    If you don't have access to ArcGIS Pro or an ArcGIS organizational account, see options for software access.

    The project's map depicts powiaty, which are administrative units similar to counties, in Poland. The polygons are colored to represent the percentage of the population aged 65 years or older. Unfortunately, the data is incomplete. Ten powiaty contain no value for the percentage of seniors.

    Map of Poland with powiaty colored by percentage of seniors

    This spatial data can be found on ArcGIS Living Atlas of the World. The values for the percentage of seniors are from Statistics Poland (the missing values were artificially removed for the purpose of this tutorial).

    Demographic data is often difficult to model with geostatistics because urban areas show dramatically different patterns than rural ones. In this case, the spatial variation in this data is relatively smooth, without dramatically distinct breaks. This means that the data might be appropriate for geostatistics.

  3. On the ribbon, click the Analysis tab. In the Workflows group, click Geostatistical Wizard.

    Geostatistical Wizard button on the ribbon

    The Geostatistical Wizard window appears.

  4. In the Geostatistical Wizard window, under Geostatistical methods, choose Areal Interpolation.

    Areal Interpolation option in the Geostatistical Wizard window

    Most interpolation methods require point data as the input, but areal interpolation uses polygons. In this tutorial, you are using polygons that are nearly complete and fit together like puzzle pieces. You can also use polygons that are widely spaced or overlapping. For example, you may have data representing observations of birds, which is stored in polygons for the ground covered by each observer.

    Note:

    You can read more about this geostatistical method at What is areal interpolation?

    Areal interpolation will process values differently if you declare them as representing averages, rates, or events. You're mapping the percentage of a population over a certain age, which is a rate.

  5. Under Input Dataset 1, for Type, choose Rate. For Source Dataset, choose Powiaty_Seniors.
  6. For Count Field, choose 2017 Senior Population. For Population Field, choose 2017 Total Population.

    Input Dataset 1 parameters

  7. Click Next.

    The next page shows a covariance chart.

    Default covariance chart

    The blue crosses represent your data without any modeling. The blue line represents the model that will be used to predict the percentage of seniors over the entire area. You want to edit parameters of the model until the model line follows the path of the crosses and 90 percent of the crosses fall within the red confidence intervals. Currently, that is not the case.

    Not only does the line not follow the crosses closely, but there are two crosses that lie far away from the path. In many situations you won't be able to accomplish an ideal model, but you can try to get as close as possible. A good place to start is by making the lag size smaller. Doing so will reduce the area that is searched when sampling to generate the blue crosses.

  8. Under General Properties, for Lag Size, type 12000 and press Enter.

    Lag Size parameter

    The model changes. However, the crosses are now even farther from the confidence intervals.

    Covariance chart with Lag Size set to 12000

    Next, you'll try to improve the model by changing its shape. Stable and K-Bessel models often give the best result, but also take more time to process.

  9. For Model, choose Stable.

    Model parameter

    The model updates. The blue crosses are now much closer to the red confidence intervals, although most crosses do not fall within them.

    Covariance chart with Model set to Stable

    Achieving a perfect model can be difficult or even impossible, especially if you are working with demographic data instead of a natural phenomenon. In this scenario, even though only one of the crosses falls within the confidence intervals, the model line follows the crosses relatively closely. This model isn't perfect, but it is a suitable compromise.

  10. Click Next.

    The next page contains a preview map.

    Preview map

  11. Click different parts of this preview map.

    The map highlights neighboring polygons that will be used to determine the predicted value for the location you clicked. Polygons colored red will be weighted heavier in the analysis than those colored green.

  12. Click Next.

    The Cross validation page appears. Cross-validation assesses the accuracy of a prediction surface. It does so by removing a single polygon from the dataset and using the remaining data to predict a value within the removed polygon.

    Cross validation page

    The Predicted scatterplot for this model does not look good. Ideally, the red values should follow the trend of the blue and gray lines. Your chart looks more like a random cloud of points. On the other hand, the values listed on the Summary tab look good:

    Summary tab

    The summary numbers should all be close to zero except for Root-Mean-Square Standardized, which should be close to 1. The Root-Mean-Square value of 0.02 means that the predicted proportion of senior citizens will be off by 2 percent on average from the real value. This is a reasonable margin of error. These values are more indicative of the quality of your model than the scatterplot.

  13. Click Finish. In the Method Report window, click OK.

    An interpolated layer is added to the map.

  14. In the Contents pane, turn off Powiaty_Seniors and turn on Powiaty_Seniors outlines.

    The areas with heavy black outlines are the ones with missing data.

    Interpolated surface beneath powiaty outlines

Create polygons from the interpolation

The interpolation you created is continuous and ignores the polygon outlines. Geostatistics has smoothed the demographic data to create a gradual surface. While it may not match known data precisely, smooth interpolations like this are often better at predicting unknown values.

Next, you'll convert the continuous interpolation surface into polygons.

  1. On the ribbon, click the Map tab. In the Navigate group, click Bookmarks and choose Kluczborski.

    Kluczborski bookmark in the Map Bookmarks gallery

    The map navigates to Kluczborski powiat, one of the powiaty with missing data.

    Kluczborski powiat on the map

    The Areal Interpolation layer is a geostatistical layer, which means that every location on the map has a slightly different value. Some of the polygons that you need to fill, such as this one, have a wide range of predicted values. You'll convert this predicted surface into a polygon layer with a single predicted value for each powiat.

  2. On the ribbon, click the Analysis tab. In the Geoprocessing group, click Tools.

    Tools button on the ribbon

    The Geoprocessing pane appears.

  3. In the Geoprocessing pane, in the search bar, type Areal Interpolation Layer. In the list of results, click Areal Interpolation Layer To Polygons.

    Areal Interpolation Layer To Polygons tool in the list of search results

  4. For the Areal Interpolation Layer To Polygons tool, enter the following parameters:
    • For Input areal interpolation geostatistical layer, choose Areal Interpolation.
    • For Input polygon features, choose Powiaty_Seniors.
    • For Output polygon feature class, change the output name to Interpolated_Polygons. Make sure to include the underscore.

    Areal Interpolation Layer To Polgyons tool parameters

  5. Click Run.

    The Interpolated_Polygons layer is added to the map.

  6. On the ribbon, click the Map tab. In the Navigate group, click the Full Extent button.

    Full Extent button on the ribbon

    The map zooms out to show the full extent of the data.

  7. In the Contents pane, drag the Interpolated_Polygons layer under the Powiaty_Seniors outlines layer.

    Interpolated_Polygons layer in the Contents pane

  8. Turn off the Areal Interpolation layer.

    You now have a value for percentage of seniors in every polygon.

    Map with interpolated polygons

Replace missing values with predicted values

You have the real values for most of the powiaty polygons, so you only want to use the predicted values for the 10 with missing values. You'll select these 10 polygons and use the Calculate Field tool to add values for those polygons alone.

  1. In the Contents pane, right-click Interpolated_Polygons and choose Attribute Table.

    Attribute Table option

    The attribute table appears. It contains all of the data from the Powiaty_Seniors layer, but also has three new fields: Included, Predicted, and Standard Error.

    Included, Predicted, and Standard Error columns in the attribute table

  2. Double-click the header for the Percent Seniors column.

    Header of the Percent Seniors column

    The column is sorted. Now, all the empty records (<Null>) are at the top of the table. You'll replace these empty values with the data from the Predicted field.

  3. Click the row number for the first record to select it. Press the Shift key and click the row number of the last record with missing data (row 10).

    All of the records with missing data are selected.

    Rows with missing data

  4. Click the Calculate button.

    Calculate button

    Options to calculate a field appear.

  5. Confirm PercentSeniors is chosen as the field to calculate.

    PercentSeniors field

  6. For Enter an expression to calculate field values, click the Add Fields to Expression button.

    Add Fields to Expression button

  7. In the list of fields, click Predicted.

    The box populates with !Predicted! This will take the values from the Predicted field and add them to the Percent Seniors field. The existing values in the Predicted field are formatted as decimal values, not percent values. To convert them, you'll multiply values by 100.

  8. After !Predicted!, type *100.

    Formula to convert predicted values into percentages

  9. Click Calculate Selected (10).

    Calculate Selected (10) button

    The <Null> values in the Percent Seniors column have been replaced. The unselected rows remain unchanged.

    Attribute table showing Percent Seniors values

  10. At the top of the attribute table, click Clear.

    Clear button

    The selection is cleared.

  11. Close the attribute table.

Symbolize the map

Lastly, you'll symbolize the new layer to match the original one. Instead of setting the symbology parameters one by one, you'll import them from the Powiaty_Seniors layer.

  1. In the Contents pane, turn off Powiaty_Seniors outlines. If necessary, click Interpolated_Polygons to select it.
  2. On the ribbon, click the Feature Layer tab. In the Drawing group, click Import.

    Import button on the ribbon

    The Import Symbology window appears. A message at the top of the window informs you that your data has pending edits that haven't been saved. These edits are the values you calculated in the attribute table. Before you continue, you'll save them.

  3. In the Import Symbology window, click the Save Edits button.

    Save Edits button

    The edits are saved.

  4. For Symbology Layer, choose Powiaty_Seniors.

    Symbology Layer parameter in the Import Symbology window

  5. Click OK.

    The symbology of Areal_Interpolation_Polygons now matches that of Powiaty_Seniors, your initial layer, but there are no longer any holes in the data.

    Map of Poland with powiaty colored by percentage of seniors, without any gaps

  6. On the Quick Access Toolbar, click the Save Project button.

    Save Project button on the Quick Access Toolbar

    Note:

    If you receive a message informing you that this project was created using a previous version of ArcGIS Pro and asking you if you want to proceed, click Yes.

The process of substituting values to replace missing data is called imputation. Often, values are imputed using the average of the remaining dataset. When your data is spatial, you have better options available to you, because you can assume that things that are closer together are more similar than things that are farther apart. In this tutorial, you used areal interpolation to create a continuous surface across Poland to model the percentage of the population that is over 65 years of age. You then sampled from that surface to predict values for the polygons that were missing data.

Don't forget to tell your map readers that some of the values were interpolated. This can be done with labels, a list, or symbology. If your map is included in a report, you can describe the method of interpolation.

The Fill Missing Values tool can accomplish the same task of interpolating values. For some datasets, this tool will give better results. For others, geostatistics will be better. It is difficult to know until you have tried both, but if the spatial transition between values is not smooth, Fill Missing Values is recommended.

Note:

Optionally, for an extra challenge, find the Fill Missing Values tool in the Geoprocessing pane and use it to interpolate the missing values in the Powiaty_Seniors layer. Compare your results to the real values in the Powiaty_full_dataset, which can be accessed in the Catalog pane.

Read more in Fill Missing Values (Space Time Pattern Mining) and the article Best Practices for Dealing with Missing Data .

You can find more tutorials in the tutorial gallery.