Skip To Content

Perform statistical analysis using R and ArcGIS Pro

In the previous lesson, you enriched your data with additional attributes, including attributes about population. Next, you’ll calculate the crime rate for each location on your map. A crime rate determines how many crimes occur relative to the population. This will allow you to better compare crime counts between areas with vastly different amounts of people, as well as determine how crime rate may be influenced by the other attributes you added to your data.

While you could use the attribute table’s Field Calculator in ArcGIS to determine the number of crimes per 100,000 population, you want to ensure that the crime rates you calculate are statistically robust. You’ll use functions in R to smooth your crime rate.

For this analysis, you’ll use the Empirical Bayes smoothing method. Empirical Bayes smoothing is a rate smoothing technique that uses the population in each of your bins as a measure of confidence in the data, with higher populations lending a high confidence. It then adjusts the areas with a lower confidence towards the mean. This technique will give the crime rates stability.

Bridge your data into R

Next, you'll work in RStudio to perform Empirical Bayes smoothing on your crime rates. Because you have the R-ArcGIS bridge, the data in your ArcGIS Pro project is connected to and accessible from RStudio.

  1. If necessary, open your Crime Analysis project in ArcGIS Pro, and then open RStudio.

    Next, you'll run a command that loads all the functions for the arcgisbinding package. Then you'll run another command that performs a quick check to ensure that the bridge is running correctly and that R recognizes the version of ArcGIS Pro you're using.

  2. In the R console, type the following code and press Enter:
    library(arcgisbinding)
    Note:

    If you receive a message such as Error in library(arcgisbinding): there is no package called ‘arcgisbinding’, ensure that you have correctly installed the R-ArcGIS bridge.

  3. In the R console, type the following code and press Enter:
    arc.check_product()

    The arc.check_product() function causes the RStudio Console to print information regarding your ArcGIS product and license.

    After the arcgisbinding package has been loaded into the RStudio workspace and the connection from R to ArcGIS has been initialized, data from your current project in ArcGIS can be loaded into the RStudio workspace. Shapefiles, feature classes from geodatabases, and tables are all valid arguments to use in the open function.

  4. Use the arc.open() function. For its argument, type the full path to your enriched data subset (San_Francisco_Crimes_Enrich_Subset) and press Enter.
    Note:

    You may have saved your project data to a different location than shown in the code example. If you're copying and pasting, update the path accordingly. To specify paths in RStudio, you may need to replace back slashes " \ " with a double back slash ' \\ ' or you can replace back slashes with a single forward slash " / ".

    For example, replace "C:\San-Francisco\Crime Analysis.gdb\San_Francisco_Crimes_Enrich_Subset" with "C:\\San-Francisco\\Crime Analysis.gdb\\San_Francisco_Crimes_Enrich_Subset" or "C:/San-Francisco/Crime Analysis.gdb/San_Francisco_Crimes_Enrich_Subset".

    enrich_df <- arc.open(path = 'C:/Lessons/San-Francisco/Crime Analysis.gdb/San_Francisco_Crimes_Enrich_Subset')

    The open function returns a new arc.dataset class object, which you stored in the variable enrich_df. This object contains both the spatial information and the attribute information for your ArcGIS data and can now be used in other functions.

    With the arc.select() function, you can choose a subset of attributes from the enrich_df object that you want to use as data for your analysis.

  5. Use the arc.select() function. For the first argument, put enrich_df as the object from which you're making a subset. For the second argument, add a character vector containing the name of each attribute from your dataset that you want in your subset and press Enter.
    enrich_select_df <- arc.select(object = enrich_df, fields = c('OBJECTID', 'SUM_VALUE', 'populationtotals_totpop10', 'wealth_medval_cy', 'wealth_medhinc_cy', 'ownerrenter_renter_cy', 'businesses_n13_bus', 'businesses_n37_bus'))
    Note:

    In RStudio, the arc.select() function does not recognise field aliases and you therefore need to specify actual field names to be used in the subset. The list below shows actual fields names and associated alias names for the San_Francisco_Crimes_Enrich_Subset feature class.

    • populationtotals_totpop10 - 2010 Total Population
    • wealth_medval_cy - 2018 Median Home Value
    • wealth_medhinc_cy - 2018 Median Household Income
    • ownerrenter_renter_cy - 2018 Renter Occupied HUs
    • businesses_n13_bus - Food & Beverage Stores Bus (NAICS)
    • businesses_n37_bus - Food Service/Drinking Estab Bus (NAICS)

    Your enrich_select_df variable now contains an R data frame object with the eight attributes you selected from your full original shapefile in R. These attributes include an ID value, the crime counts for each hexagon bin, and the six attributes with which you enriched your data.

    Finally, you’ll convert your R data frame into a spatial data frame object using the arc.data2sp() function. A spatial data frame object is one of the spatial data classes contained in the sp package. The sp package offers classes and methods for working with spatial data such as points, lines, polygons, pixels, rings, and grids. With this function, you can transfer all of the spatial attributes from your data, including projections, from ArcGIS into R without worrying about a loss of information.

    If you've never used the sp package, you need to install the sp package into your RStudio package library, and load the functions from the sp package into your workspace environment.

  6. Use the library() function to load the sp package into RStudio.
    install.packages("sp")
    library(sp)
  7. Use the arc.data2sp() function. For the first argument, use the enrich_select_df data frame as the object you are converting to an sp object.
    enrich_spdf <- arc.data2sp(enrich_select_df)

Your data has been bridged from ArcGIS Pro to RStudio.

Calculate smoothed crime rates

All the information that you need to start your analysis is in place, but you may notice that the current attribute labels are cryptic and abbreviated as they represent the original field names from the San_Francisco_Crimes_Enrich_Subset feature class . In R, you can change the names of the enriched attributes to make them easier to identify. Once that's complete, you'll perform an Empirical Bayes smoothing on the data.

  1. Create a character vector called col_names. For each item in the vector, provide the new attribute name with which you want to replace the current label.
    col_names <- c("OBJECTID", "Crime_Counts",
    "Population", "Med_HomeValue", "Med_HomeIncome",
    "Renter_Count", "Grocery",
    "Restaurant")
  2. Use the colnames() function. For its argument, select the data attribute of your spatial polygons data frame by using enrich_spdf@data. Assign the col_names vector that you created as the attribute label values to be assigned in place of the original variable names.
    colnames(enrich_spdf@data) <- col_names

    You have updated the column names for the data attribute of your spatial polygons data frame. To check your changes, you can view the first few lines of the data attribute by entering the following command in your R console:

    head(enrich_spdf@data)

    Next, you’ll use the EBest() function to perform Empirical Bayes smoothing on your crime rates. The EBest() function is contained in the spdep package. As before, if you've never worked with the spdep package, you’ll need to install the package before running the library(spdep) line to load the spdep library into your current workspace.

  3. In the console, type the following lines of code to calculate Empirical Bayes smoothed crime rates for each hexagon bin. You can either run each line individually or paste the entire code block and run all the lines at once.
    Note:

    Copying and pasting a code block into the console can sometimes result in a syntax error. If you encounter an error, consider typing the lines of code individually instead.

    install.packages("spdep")
    library(spdep)
    n <- enrich_spdf@data$Crime_Counts
    x <- enrich_spdf@data$Population
    EB <- EBest (n, x)
    p <- EB$raw
    b <- attr(EB, "parameters")$b
    a <- attr(EB, "parameters")$a
    v <- a + (b/x)
    v[v < 0] <- b/x
    z <- (p - b)/sqrt(v)

    The EBest() R function performs a particular type of empirical Bayesian estimation. Generally speaking, whereas more traditional Bayesian methods work with priors before any data values have been observed, empirical Bayesian estimation is an approximation of these techniques, since the beta prior distribution used is estimated from the data. However, the EBest() R function used in this lesson actually uses a modified version of empirical Bayesian estimation with parameter estimates determined by the method of moments. In the code, the variables a and b refer to the method of moments phi and gamma values, respectively. These values are the estimates of the population parameters (population in this case does not refer to the population value of the hexagon bins, but rather to the concept of sample and population data in statistics) and are what we use to perform the smoothing of the rates. Smoothing in this case is performed by calculating the standard score, otherwise known as a z-score. This calculation is performed by subtract the population mean (our gamma value, estimated by the method of moments) from each raw crude rate value and dividing by the standard deviation. The standard deviation is calculated by taking the square root of the variance which cannot be a negative value. Hence the calculations performed regarding the variable v.

    Finally, you'll add your smoothed crime rates as a new attribute to your spatial data frame object.

  4. Create a new attribute called EB_Rate to your spatial polygons data frame by using enrich_spdf@data$EB_Rate and assign to it the numeric vector z, which contains your crime rates for each location.
    enrich_spdf@data$EB_Rate <- z

    Your data now contains a new column called EB_Rate that contains the crime rate values that you calculated above for each hexagon bin.

    Now that you're done in R, you'll return to ArcGIS to explore your newly created crime rates by mapping and analyzing them.

    First, you'll use the arc.sp2data() function to convert your data back from an R spatial data frame object to an ArcGIS file type. You'll then use arc.write() to write your data to your project of choice.

  5. Use the arc.sp2data() function and enter enrich_spdf as its argument. Assign the result to a new variable, such as arcgis_df and press Enter.
    arcgis_df <- arc.sp2data(enrich_spdf)

    Your R spatial polygon data frame has now been converted back to a data frame object and can be written to your ArcGIS project. You can write your data frame object back to your ArcGIS project as a shapefile, table, or feature class in a geodatabase.

  6. Use the arc.write() function. For the first argument, type the path to your SF_Crime geodatabase and name the feature class San_Francisco_Crime_Rates. For the second argument to the arc.write() function, enter the R object that you want to write to ArcGIS. Finally, add a third optional parameter to specify the spatial reference of the object that you're writing to ArcGIS. Press Enter.
    arc.write('C:/Lessons/San-Francisco/SF_Crime.gdb/San_Francisco_Crime_Rates', arcgis_df, shape_info = arc.shapeinfo(enrich_df))

    The spatial reference ensures that the data is projected correctly when you add it to a map.

Using R, you transformed your data into something powerful. You turned your crime counts for each hexagon bin into crime rates that account for changes in population and showed how that impacts the number of crimes that occur. You used Empirical Bayes smoothing to ensure that the rates you created were robust to the amount of information available for each location.

Next, you’ll reexamine your data in ArcGIS Pro so you can visualize the trends and patterns of crime.

Continue analysis in ArcGIS Pro

Now that it's time to bring your R-adjusted data back into ArcGIS, you'll minimize RStudio and maximize your ArcGIS project.

  1. Minimize RStudio and maximize ArcGIS Pro.
  2. In the Catalog pane, right-click your SF_Crime geodatabase and choose Refresh.

    Updated SF_Crime geodatabase

    The SF_Crime geodatabase now includes the San_Francisco_Crime_Rates feature class, which contains the smoothed crime rates.

  3. Right-click the San_Francisco_Crime_Rates feature class and click Add To New, then choose Map.

    The file is added to a map in your project and can be used for analysis.

Identify areas with unusually high crime rates

To see which areas in San Francisco have an unexpectedly high number of crimes given the number of people, you’ll run another hot spot analysis.

The first hot spot analysis you ran identified areas with statistically significant higher numbers of crimes than expected and provided information on how the number of crimes for each location were changing over time. This hot spot analysis identifies statistically significant clusters of high and low crime rates, allowing you to see the areas with usually high numbers of crimes given the population in the area.

  1. In the Geoprocessing pane, search for and open the Optimized Hot Spot Analysis tool.
  2. For Input Features, choose San_Francisco_Crime_Rates.
  3. ForOutput Feature Class, browse to the default project geodatabase, named Crime Analysis.gdb and name the output feature class San_Francisco_Crime_Rates_OHSA.
  4. For Analysis Field, choose EB_Rate.

    By running this tool on the crime rates that you calculated in R contained in the EB_Rate column, the results of this tool will locate the statistically significant spatial clusters of high or low crime rate values.

  5. Click Run.

    Analysis result map

    On your map, the bright red hexagon bins signify areas where there is intense clustering of high values with 99 percent confidence. These are areas where there are unusually high numbers of crimes occurring even when accounting for population. Notice that once the population of each area is considered, there are no statistically significant cold spots (areas of clustering of low crime counts).

  6. Save your project.

In this lesson, you used the R-ArcGIS bridge to transfer your data into R to take advantage of functionality needed to calculate smoothed crime rates. You then transferred your data back into ArcGIS to continue your analysis and to further pinpoint areas in need of extra police resources to reduce the number of crimes occurring.

In the next lesson, you'll perform an exploratory data analysis in R to determine whether the trends in crime rates can be linked to any of the other attributes with which you enriched your data.