Install the R-ArcGIS bridge and start statistical analysis

You'll install the R-ArcGIS bridge and begin analyzing your dataset.

Download R and RStudio

First, you'll download and set up R and RStudio, a free integrated development environment for R. RStudio helps you work in R by providing a coding platform with access to CRAN, the Comprehensive R Archive Network, which contains thousands of R libraries, a built-in viewer for charts and graphs, and other useful features. (If you already have R and RStudio installed, skip to the next section.)

  1. If necessary, download and install R 3.2.2 or later. Accept all defaults in the installation wizard.
  2. If necessary, download and install RStudio Desktop. Accept all defaults in the installation wizard.

Create an ArcGIS project

Now you'll add data to an ArcGIS project to create a map of San Francisco crimes.

  1. Download the San-Francisco.zip file.
  2. Locate the downloaded file on your computer and extract its contents to a folder named San-Francisco in a location of your choice.
  3. Open the San-Francisco folder.

    The folder contains the SF_Crime geodatabase, which has crime data that you'll add to a map.

  4. Start ArcGIS Pro. If prompted, sign in using your licensed ArcGIS account.
    Note:

    If you don't have ArcGIS Pro or an ArcGIS account, you can sign up for an ArcGIS free trial.

  5. Under New, click Map.
  6. In the Create a New Project window, for Name, type Crime Analysis. For Location, browse to and choose your San-Francisco folder. Uncheck Create a new folder for this project.
  7. Click OK.

    The project is created.

  8. In the Catalog pane, on the Project tab, expand Folders and expand the San-Francisco folder.
  9. Expand the SF_Crime geodatabase, right-click the San_Francisco_Crimes feature class, and choose Add To Current Map.

    Map showing raw data

    The map shows locations where crimes occurred from January 2014 through December 2014 in San Francisco.

Install the R-ArcGIS bridge

Once the R-ArcGIS bridge is installed, you can begin reading and writing data to and from ArcGIS and R. You can also begin running script tools that reference an R script.

  1. On the ribbon, click the Project tab.

    Project tab on the ribbon

  2. Click Options. In the Options pane, under Application, click Geoprocessing.
  3. In the R-ArcGIS Support section, select your desired R home directory.
    Note:

    All versions of R that are installed on your computer will appear in the list. Select R 3.2.2 or a later version. (The example image uses R 3.6.2.)

    Options window

    If you haven't installed the R-ArcGIS bridge, a warning indicates that you need to install the ArcGIS R integration package to connect R with ArcGIS. You can automatically download and install the package, download the package separately, or install the package from a file. If you have previously installed the R-ArcGIS bridge, an installed message appears indicating the version of your arcgisbinding package. You're presented with options to check for updates, download the latest version, or update from a file.

  4. If you do not have the ArcGIS R integration package installed, next to Please install the ArcGIS R integration package, click Install package and choose Install package from the Internet. When asked to confirm the installation, click Yes, and when the installation is complete, click Close.
  5. If you already have the ArcGIS R integration package installed, next to Installed 'arcgisbinding' package version, click the Check for updates button and choose Check package for updates to ensure that you have the latest version of the package.
  6. In the Options window, click OK.
  7. Click the Back button to return to the map.

Aggregate point data by counts within a defined location

At first glance, the map may be overwhelming and it may be difficult to understand what the data represents. Before you start your analysis, you need to aggregate crime counts by space and time. Aggregation reveals the spatial and temporal relationships in your data that may not have been visible previously. Aggregating allows you to summarize your crime points in space-time bins that combine the crimes that have occurred into counts by space and time increments of your choosing.

  1. If necessary, open the Geoprocessing pane. (On the Analysis tab, in the Geoprocessing group, click Tools.)
  2. In the search box, type Create Space Time Cube and press Enter.
  3. In the results, click Create Space Time Cube By Aggregating Points to open the tool. Change the following parameters:
    • For Input Features, choose San_Francisco_Crimes.
    • For Output Space Time Cube, browse to your San-Francisco folder and name the output San_Francisco_Crimes_Space_Time_Cube.nc.
    • For Time Field, choose Dates.
    • For Time Step Interval, type 1 and choose Months.
    • For Time Step Alignment, confirm that End time is chosen.
    • For Aggregation Shape Type, choose Hexagon grid.
    • For Distance Interval, type 300 and choose Meters.

    Create Space Time Cube By Aggregating Points tool

    These parameter values specify the size and shape of the space-time bins that you are creating. Because your data is for the year 2014, analyzing crimes by each month is a natural breaking point. Additionally, your department wants to analyze crimes at a local level, so you select a small distance interval value. Hexagon bins are selected because they are preferable in analyses that include aspects of connectivity or movement paths.

  4. Click Run.

    The Create Space Time Cube By Aggregating Points tool creates a netCDF file (.nc), which allows you to view spatial patterns and trends over time. The tool aggregated the 74,760 points in the San_Francisco_Crimes layer into 3,510 hexagons (the polygon bins). Each hexagon represents an area of approximately 78,000 square meters. The Distance Interval and Time Step Interval parameters impact the number of resulting bins and the size of each bin. These values can be chosen based on prior knowledge of the analysis area, or the tool will calculate values for you based on the spatial distribution of your data. You can confirm that the Create Space Time Cube By Aggregating Points tool successfully created the file by checking the San-Francisco folder.

Analyze crime hot spots

Next, you'll analyze where statistically significant clusters of crime are emerging and receding throughout the city. Your analysis will help the department anticipate problems and evaluate the effectiveness of resource allocation for their crime prevention measures.

  1. In the Geoprocessing pane, click the Back button. Search for and open the Emerging Hot Spot Analysis tool.
  2. Change the following parameters:
    • For Input Space Time Cube, browse to and choose the San_Francisco_Crimes_Space_Time_Cube.nc file.
    • For Analysis Variable, choose COUNT.
    • For Output Features, browse to your San-Francisco folder and name the output San_Francisco_Crimes_Hot_Spots.shp.

    Emerging Hot Spot Analysis tool

    By using the default value for Neighborhood Distance, you are letting the tool calculate a distance band for you based on the spatial distribution of your data. The Neighborhood Time Step value is set to one time step interval (one month in this case) by default. These settings are ideal for an exploratory analysis; however, if you knew the optimal distance band and time step interval for your analysis, you could set them.

  3. Click Run.

    The tool runs and its results are added to the map. (A warning message informs you of the value that the tool used for the Neighborhood Distance parameter.)

  4. In the Contents pane, turn off the San_Francisco_Crimes layer.

    Emerging Hot Spot Analysis Results map

    Trends in statistically significant hot and cold spots are shown on the map. Red areas indicate that over time there has been clustering of high numbers of crime, and blue areas indicate that over time there has been clustering of low numbers of crime. Each location is categorized based on the trends in clustering over time.

    The dark red hexagon bins are persistent hot spots. These are locations that have been statistically significant hot spots for 90 percent of all of your time slices. However, these locations do not have a discernable increase or decrease in the intensity of clustering of crime counts over time.

    In contrast, the light red with beige outlined hexagon bins are intensifying hot spots. These are locations that have been statistically significant hot spots for 90 percent of all of your time slices. In addition, these are locations where the intensity of clustering of crime counts is increasing over time, and that increase is statistically significant.

    Conversely, the dark blue bins are persistent cold spots. These are areas where crime is statistically, and persistently, less prevalent. The light blue outlined bins are intensifying cold spots but means the opposite of its counterpart. Clusters of low crime counts in these cells are becoming more intense over time. In other words, the cold spots are getting colder.

    The department needs to be especially concerned about the areas where crime is persistent or intensifying. They may move resources to these areas from the places where crime cold spots occur.

  5. Save the project.

You installed the R-ArcGIS bridge, prepared your data for statistical analysis, and started using some of the available tools. Next, you'll add additional attributes to your dataset, allowing you to draw conclusions from your analysis about what factors likely influence the occurrence of crime.


Enhance your data with additional attributes

Previously, you installed the R-ArcGIS bridge and downloaded the data for your statistical analysis. Then, in ArcGIS, you aggregated your data based on areas and times of interest and began to explore temporal trends in your dataset. For the department to better understand what factors influence the prevalence of crime, you'll add additional information.

Add additional attributes to your dataset

Now that you know where crime hot spots are emerging, you'll try to determine why they are emerging. In particular, you'll examine the relationship between an area's crime and its population. Statistical analysis can determine if the number of crimes occurring in a particular area is influenced by population. In addition, your department is interested in analyzing the presence of certain types of businesses, as well as the prevalence of parks, the amount of public land in a given area (hexagon bins), the median household income and home value, among other factors.

Currently, the hexagon bins in the space time cube layer contain no attribute information suitable for this kind of analysis. You'll run another geoprocessing tool to enrich the layer with relevant attribute information.

Note:

If your version of ArcGIS Pro is earlier than 1.4, you'll need to subset the data into two smaller pieces because the Enrich tool may not run successfully on a dataset with over 1,000 rows.

  1. If necessary, open your Crime Analysis project in ArcGIS Pro.
  2. On the Analysis tab, in the Geoprocessing group, click Environments.

    You'll ensure that output field names will not include the table name of the source from where the field was obtained. This setting is relevant when working with enriched data that consists of fields joined from one or more sources.

  3. In the Environments window, scroll down to the Fields section. Uncheck Maintain fully qualified field names.
    Update Field environment settings
  4. Click OK.
  5. If necessary, open the Geoprocessing pane. Search for and open the Enrich tool.
  6. Change the following parameters.
    Caution:

    This step requires approximately 50 ArcGIS service credits. If you don't have sufficient credits allocated to your ArcGIS organizational account (or if you're not sure), you can use the result data provided in the StepResult folder. Instead of running the tool, copy the San_Francisco_Crimes_Enrich feature class from the Result.gdb into the Crime Analysis.gdb, add that feature class to your map, and skip to the next step.

    • For Input Features, choose San_Francisco_Crimes_Hot_Spots.
    • For Output Feature Class, browse to the default project geodatabase, Crime Analysis.gdb, and name the output feature class San_Francisco_Crimes_Enrich.
    • For Variables, click the plus button. In the Add Variable window, search for and choose the following variables and click OK:
      • 2010 Total Population (Esri 2019)
      • 2019 Median Home Value
      • 2019 Median Household Income
      • 2019 Renter Occupied HUs
      • Food & Beverage Stores Bus (NAICS)
      • Food Service/Drinking Estab Bus (NAICS)
    Note:

    Demographic data is updated periodically, so the available variables and values may differ from those specified in the lesson. If necessary, use the most recent data.

    Enrich Layer with data variables

    Note:

    The specific variables you add are important because specific variable names are used in the R script you'll run later in the lesson. If your variable names are different than those shown in the example image, you'll need to edit them when you paste the R script or the line won't run.

    While not an exhaustive list of the variables that could potentially be linked to crime rates, this list will provide a good start for your analysis.

  7. Click Run.

    The tool runs and the result layer is added to the map.

  8. In the Contents pane, right-click the San_Francisco_Crimes_Enrich layer and choose Attribute Table.
  9. If necessary, scroll to the right in the attribute table until you can see the fields with which you chose to enrich the layer.

    The newly added enrichment fields display in the table with alias names that are more descriptive than the original field names. In the list below, alias names are listed and followed by original field names in brackets.

    The result of the Enrich Layer tool includes the following fields and values (they may not be in the following order):

    • HasData—Indicates whether the Enrich Layer tool found data for the given hexagon bin, with 0 meaning a hexagon had no available data for all of the attributes you selected and 1 meaning a hexagon bin had data for at least one of the attributes you selected. You can use this field to filter your data so that only features with relevant attribute information appear on the map.
    • 2010 Total Population (Esri 2019) (historicalpopulation_tspop10_cy)—Contains the population count per hexagon bin. Some hexagons have a population of 0. A hexagon bin may have a population of 0 because it is located in an industrial area or in a park. The first priority of your department is to reduce crimes in populated areas, so you'll focus only on populated locations.
    • 2019 Median Home Value (wealth_medval_cy)—Contains the median home value per hexagon bin.
    • 2019 Median Household Income (wealth_medhinc_cy)—Contains the median household income value per hexagon bin.
    • 2019 Renter Occupied HUs (ownerrenter_renter_cy)—Contains the number of renter occupied households per hexagon bin.
    • Food & Beverage Stores Bus (NAICS) (businesses_n13_bus)—Contains the count of food and beverage stores located within each hexagon bin.
    • Food Service/Drinking Estab Bus (NAICS) (businesses_n37_bus)—Contains the count of businesses that serve food, beverages, or both located within each hexagon bin.

    You've created a feature class that contains the information needed to perform your analysis, but you now have some data that is not pertinent to your analysis goals. Hexagon bins that do not have information for your attributes of interest do not add any value or new information to help you answer your questions. Additionally, areas that are not populated are not of high priority for your department at this time. As a result, you'll need to trim down your enriched dataset to contain only the information most useful to you.

  10. Close the table.

Prepare your dataset for additional analyses

Next, you'll select the data that is relevant to your analysis and make a subset with only that information. This way, you still have access to all your enriched data should you need it for further analyses, but you can continue your current analysis with only the necessary data.

  1. In the Geoprocessing pane, click the Back button. Search for and open the Select Layer By Attribute tool.
  2. For Input Rows, choose San_Francisco_Crimes_Enrich. For Selection type, confirm that New selection is chosen.
  3. Click New Expression. Create the expression Where HasData is equal to 0.
  4. Click Add Clause and add the expression Or 2010 Total Population (Esri 2019) is Equal to 0.
    Note:

    For more information about writing SQL expressions, see SQL reference for query expressions used in ArcGIS.

    Select Layer By Attribute tool

  5. Click Run.

    The tool runs and selects features that have no enriched data, or that have zero population. These may be industrial sites or parks.

  6. Open the attribute table for the San_Francisco_Crimes_Enrich layer.

    The table indicates that 222 of 1,996 rows are selected, meaning they have 0 values for the HasData or 2010 Total Population (Esri 2019) fields. You'll create a new dataset without these selected features so you can focus on features that have data relevant to your analysis.

  7. In the attribute table, click the Switch button.

    Attribute Switch button

    The button swaps the selection from the 222 rows that had no data or no population, to all of the other rows. You should have 1,774 of 1,996 rows selected, and you can now copy the enriched and populated data to its own layer.

  8. In the Geoprocessing pane, click the Back button. Search for and open the Copy Features tool.
  9. For Input Features, choose San_Francisco_Crimes_Enrich.

    When you have specific rows selected, the Copy Features tool only copies those rows into your new feature class result.

  10. For Output Feature Class, browse to Crime Analysis.gdb and name the output feature class San_Francisco_Crimes_Enrich_Subset.
  11. Click Run.

    You now have two layers, San_Francisco_Crimes_Enrich and San_Francisco_Crimes_Enrich_Subset. The former contains the full dataset, and the latter contains only the data for areas with enriched attributes or areas with people living in them.

  12. Close the table. On the ribbon, on the Map tab, in the Selection group, click Clear.
  13. Save your project.

Next, you'll learn how to analyze these attributes in R, and how they may influence the likelihood an area experiences crime.


Perform statistical analysis using R and ArcGIS Pro

Previously, you enriched your data with additional attributes, including attributes about population. Next, you'll calculate the crime rate for each location on your map. A crime rate determines how many crimes occur relative to the population. This will allow you to better compare crime counts between areas with vastly different amounts of people, as well as determine how crime rate may be influenced by the other attributes you added to your data.

While you could use the attribute table's Field Calculator in ArcGIS to determine the number of crimes per 100,000 population, you want to ensure that the crime rates you calculate are statistically robust. You'll use functions in R to smooth your crime rate.

For this analysis, you'll use the Empirical Bayes smoothing method. Empirical Bayes smoothing is a rate smoothing technique that uses the population in each of your bins as a measure of confidence in the data, with higher populations lending a high confidence. It then adjusts the areas with a lower confidence towards the mean. This technique will give the crime rates stability.

Bridge your data into R

Next, you'll work in RStudio to perform Empirical Bayes smoothing on your crime rates. Because you have the R-ArcGIS bridge, the data in your ArcGIS Pro project is connected to and accessible from RStudio.

  1. If necessary, open your Crime Analysis project in ArcGIS Pro. Open RStudio.

    Next, you'll run a command that loads all the functions for the arcgisbinding package. Then you'll run another command that performs a quick check to ensure that the bridge is running correctly and that R recognizes the version of ArcGIS Pro you're using.

  2. In RStudio, in the R console, type the following code and press Enter:
    library(arcgisbinding)
    Note:

    If you receive a message such as Error in library(arcgisbinding): there is no package called ‘arcgisbinding', ensure that you have correctly installed the R-ArcGIS bridge.

  3. In the R console, type the following code and press Enter:
    arc.check_product()

    The arc.check_product() function causes the RStudio Console to print information regarding your ArcGIS product and license.

    After the arcgisbinding package has been loaded into the RStudio workspace and the connection from R to ArcGIS has been initialized, data from your current project in ArcGIS can be loaded into the RStudio workspace. Shapefiles, feature classes from geodatabases, and tables are all valid arguments to use in the open function.

  4. Run the arc.open() function as shown in the following code block. For the argument, type the full path to your enriched data subset (San_Francisco_Crimes_Enrich_Subset) and press Enter.
    Note:

    You may have saved your project data to a different location than shown in the code example. If you're copying and pasting, update the path accordingly. To specify paths in RStudio, you may need to replace back slashes (\) with a double back slash (\\). Alternatively, you can replace back slashes with a single forward slash (/).

    For example, you would replace "C:\San-Francisco\Crime Analysis.gdb\San_Francisco_Crimes_Enrich_Subset" with "C:\\San-Francisco\\Crime Analysis.gdb\\San_Francisco_Crimes_Enrich_Subset" or "C:/San-Francisco/Crime Analysis.gdb/San_Francisco_Crimes_Enrich_Subset".

    enrich_df <- arc.open(path = 'C:/Lessons/San-Francisco/Crime Analysis.gdb/San_Francisco_Crimes_Enrich_Subset')

    The function stores a new arc.dataset class object in the variable enrich_df. This object contains both the spatial information and the attribute information for your ArcGIS data and can now be used in other functions. The variable is listed in RStudio under Data.

    With the arc.select() function, you can choose a subset of attributes from the enrich_df object that you want to use as data for your analysis.

  5. Run the arc.select() function as shown in the following code block. For the first argument, put enrich_df as the object from which you're making a subset. For the second argument, add a character vector containing the name of each attribute from your dataset that you want in your subset and press Enter.
    enrich_select_df <- arc.select(object = enrich_df, fields = c('OBJECTID', 'SUM_VALUE', 'historicalpopulation_tspop10_cy', 'wealth_medval_cy', 'wealth_medhinc_cy', 'ownerrenter_renter_cy', 'businesses_n13_bus', 'businesses_n37_bus'))
    Note:

    In RStudio, the arc.select() function does not recognize field aliases and you therefore need to specify actual field names to be used in the subset. The following list shows actual fields names and associated alias names for the San_Francisco_Crimes_Enrich_Subset feature class.

    • historicalpopulation_tspop10_cy - 2010 Total Population (Esri 2019)
    • wealth_medval_cy - 2019 Median Home Value
    • wealth_medhinc_cy - 2019 Median Household Income
    • ownerrenter_renter_cy - 2019 Renter Occupied HUs
    • businesses_n13_bus - Food & Beverage Stores Bus (NAICS)
    • businesses_n37_bus - Food Service/Drinking Estab Bus (NAICS)

    Your enrich_select_df variable now contains an R data frame object with the eight attributes you selected from your full original shapefile in R. These attributes include an ID value, the crime counts for each hexagon bin, and the six attributes with which you enriched your data.

    Finally, you'll convert your R data frame into a spatial data frame object using the arc.data2sp() function. A spatial data frame object is one of the spatial data classes contained in the sp package. The sp package offers classes and methods for working with spatial data such as points, lines, polygons, pixels, rings, and grids. With this function, you can transfer all of the spatial attributes from your data, including projections, from ArcGIS into R without worrying about a loss of information.

    If you've never used the sp package, you need to install the sp package into your RStudio package library, and load the functions from the sp package into your workspace environment.

  6. Run the library() function as shown in the following code block to load the sp package into RStudio.
    install.packages("sp")
    library(sp)
  7. Run the arc.data2sp() function as shown in the following code block. For the first argument, use the enrich_select_df data frame as the object you are converting to an sp object.
    enrich_spdf <- arc.data2sp(enrich_select_df)

Your data has been bridged from ArcGIS Pro to RStudio.

Calculate smoothed crime rates

All the information that you need to start your analysis is in place, but you may notice that the current attribute labels are cryptic and abbreviated as they represent the original field names from the San_Francisco_Crimes_Enrich_Subset feature class . In R, you can change the names of the enriched attributes to make them easier to identify. Once that's complete, you'll perform an Empirical Bayes smoothing on the data.

  1. Run the following code to create a character vector called col_names. For each item in the vector, provide the new attribute name with which you want to replace the current label.
    col_names <- c("OBJECTID", "Crime_Counts",
    "Population", "Med_HomeValue", "Med_HomeIncome",
    "Renter_Count", "Grocery",
    "Restaurant")
  2. Run the colnames() function as shown in the following code block. For its argument, select the data attribute of your spatial polygons data frame by using enrich_spdf@data. Assign the col_names vector that you created as the attribute label values to be assigned in place of the original variable names.
    colnames(enrich_spdf@data) <- col_names

    You have updated the column names for the data attribute of your spatial polygons data frame.

  3. Optionally, to check your changes, run the following command:

    head(enrich_spdf@data)

    The first few lines of the data attribute are displayed.

    Next, you'll use the EBest() function to perform Empirical Bayes smoothing on your crime rates. The EBest() function is contained in the spdep package. As before, if you've never worked with the spdep package, you'll need to install the package before running the library(spdep) line to load the spdep library into your current workspace.

  4. In the console, run the following lines of code to calculate Empirical Bayes smoothed crime rates for each hexagon bin. You can either run each line individually or paste the entire code block and run all the lines at once.
    Note:

    Copying and pasting a code block into the console can sometimes result in a syntax error. If you encounter an error, consider typing the lines of code individually instead.

    install.packages("spdep")
    library(spdep)
    n <- enrich_spdf@data$Crime_Counts
    x <- enrich_spdf@data$Population
    EB <- EBest (n, x)
    p <- EB$raw
    b <- attr(EB, "parameters")$b
    a <- attr(EB, "parameters")$a
    v <- a + (b/x)
    v[v < 0] <- b/x
    z <- (p - b)/sqrt(v)

    The EBest() R function performs a particular type of empirical Bayesian estimation. Generally speaking, whereas more traditional Bayesian methods work with priors before any data values have been observed, empirical Bayesian estimation is an approximation of these techniques, since the beta prior distribution used is estimated from the data. However, the EBest() R function used in this lesson actually uses a modified version of empirical Bayesian estimation with parameter estimates determined by the method of moments. In the code, the variables a and b refer to the method of moments phi and gamma values, respectively. These values are the estimates of the population parameters (population in this case does not refer to the population value of the hexagon bins, but rather to the concept of sample and population data in statistics) and are what we use to perform the smoothing of the rates. Smoothing in this case is performed by calculating the standard score, otherwise known as a z-score. This calculation is performed by subtract the population mean (our gamma value, estimated by the method of moments) from each raw crude rate value and dividing by the standard deviation. The standard deviation is calculated by taking the square root of the variance which cannot be a negative value. Hence the calculations performed regarding the variable v.

    Finally, you'll add your smoothed crime rates as a new attribute to your spatial data frame object.

  5. Run the following line of code to create a new attribute called EB_Rate to your spatial polygons data frame:
    enrich_spdf@data$EB_Rate <- z

    Your data now contains a new column called EB_Rate that contains the crime rate values that you calculated above for each hexagon bin.

    Now that you're done in R, you'll return to ArcGIS to explore your newly created crime rates by mapping and analyzing them.

    First, you'll use the arc.sp2data() function to convert your data back from an R spatial data frame object to an ArcGIS file type. You'll then use arc.write() to write your data to your project of choice.

  6. Run the arc.sp2data() function and enter enrich_spdf as its argument. Assign the result to a new variable, such as arcgis_df.
    arcgis_df <- arc.sp2data(enrich_spdf)

    Your R spatial polygon data frame has now been converted back to a data frame object and can be written to your ArcGIS project. You can write your data frame object back to your ArcGIS project as a shapefile, table, or feature class in a geodatabase.

  7. Run the arc.write() function as shown in the following code block. For the first argument, type the path to your SF_Crime geodatabase and name the feature class San_Francisco_Crime_Rates. For the second argument to the arc.write() function, enter the R object that you want to write to ArcGIS. Finally, add a third optional parameter to specify the spatial reference of the object that you're writing to ArcGIS.
    Note:

    You may need to replace the path in the example code with the path to your SF_Crime geodatabase.

    arc.write('C:/Lessons/San-Francisco/SF_Crime.gdb/San_Francisco_Crime_Rates', arcgis_df, shape_info = arc.shapeinfo(enrich_df))

    The spatial reference ensures that the data is projected correctly when you add it to a map.

    Using R, you transformed your data into something powerful. You turned your crime counts for each hexagon bin into crime rates that account for changes in population and showed how that impacts the number of crimes that occur. You used Empirical Bayes smoothing to ensure that the rates you created were robust to the amount of information available for each location.

    Next, you'll reexamine your data in ArcGIS Pro so you can visualize the trends and patterns of crime.

Continue analysis in ArcGIS Pro

Now that it's time to bring your R-adjusted data back into ArcGIS, you'll minimize RStudio and maximize your ArcGIS project.

  1. Minimize RStudio and maximize ArcGIS Pro.
  2. In the Catalog pane, right-click your SF_Crime geodatabase and choose Refresh.

    Updated SF_Crime geodatabase

    The SF_Crime geodatabase now includes the San_Francisco_Crime_Rates feature class, which contains the smoothed crime rates.

  3. In the SF_Crime geodatabase, right-click the San_Francisco_Crime_Rates feature class, point to Add To New, and choose Map.

    The file is added to a map in your project and can be used for analysis.

Identify areas with unusually high crime rates

To see which areas in San Francisco have an unexpectedly high number of crimes given the number of people, you'll run another hot spot analysis.

The first hot spot analysis you ran identified areas with statistically significant higher numbers of crimes than expected and provided information on how the number of crimes for each location were changing over time. This hot spot analysis identifies statistically significant clusters of high and low crime rates, allowing you to see the areas with usually high numbers of crimes given the population in the area.

  1. In the Geoprocessing pane, search for and open the Optimized Hot Spot Analysis tool.
  2. For Input Features, choose San_Francisco_Crime_Rates.
  3. For Output Feature Class, browse to Crime Analysis.gdb and name the output feature class San_Francisco_Crime_Rates_OHSA.
  4. For Analysis Field, choose EB_Rate.

    By running this tool on the crime rates that you calculated in R contained in the EB_Rate column, the results of this tool will locate the statistically significant spatial clusters of high or low crime rate values.

  5. Click Run.

    Analysis result map

    On your map, the bright red hexagon bins signify areas where there is intense clustering of high values with 99 percent confidence. These are areas where there are unusually high numbers of crimes occurring even when accounting for population. Notice that once the population of each area is considered, there are no statistically significant cold spots (areas of clustering of low crime counts).

  6. Save your project.

You've used the R-ArcGIS bridge to transfer your data into R to take advantage of functionality needed to calculate smoothed crime rates. You also transferred your data back into ArcGIS to continue your analysis and to further pinpoint areas in need of extra police resources to reduce the number of crimes occurring.

Next, you'll perform an exploratory data analysis in R to determine whether the trends in crime rates can be linked to any of the other attributes with which you enriched your data.


Identify attributes that influence crime

Previously, you created a map in ArcGIS Pro showing where there are unusually high numbers of crimes occurring adjusted to account for population. Next, you'll determine which of the other attributes you added may be influencing the prevalence of crime across San Francisco. To do this, you will use R to leverage exploratory data analysis tools to identify the most significant influencers on crime. The most influential attributes can be used as a starting point for the department when it's ready to begin researching possible predictive models for future planning and the proactive distribution of resources in ArcGIS and R.

Create a correlation matrix in R to evaluate attribute relationships

As an important first step in modeling the relationships between crime and the variables you've chosen, you can use exploratory data analysis tools in R. These tools allow you to identify the most likely statistically relevant predictors for your analysis, potentially making future models you build more effective in identifying future trends. The exploratory data analysis tool you're going to use is a correlation matrix. The correlation matrix tool creates an illustrated grid of values that measure the level of association between your added attributes and your population smoothed crime rates

Note:

The following workflow is derived from a guide created by STHDA, a website dedicated to tutorials for data analysis and data visualization using R, and explains how to improve the appearance of your matrix so you can better communicate the results of your analysis.

  1. If necessary, open your Crime Analysis project in ArcGIS Pro and open RStudio.

    You'll bring your most recent version of the data into R.

  2. In RStudio, run arc.open() to establish the project and dataset you are working with by specifying the path to your San_Francisco_Crime_Rates feature class as the first argument. Store the result in the rate_df variable.
    Note:

    If necessary, be sure to replace the path in the example code block with the path to your SF_Crime geodatabase.

    rate_df <- arc.open('C:/Lessons/San-Francisco/SF_Crime.gdb/San_Francisco_Crime_Rates')
  3. Run arc.select() to choose the variables you want from your data to bring into R. For the first argument, specify the object from which you are selecting attributes, in this case, rates_df. For the second argument, list each attribute that you want to select in a character vector.
    rate_select_df <- arc.select(rate_df, fields = c("OBJECTID", "Crime_Counts", "Population", "Med_HomeValue", "Med_HomeIncome", "Renter_Count", "Grocery", "Restaurant", "EB_Rate"))
  4. Run the arc.data2sp() function to convert your feature class into a spatial data frame object.
    rate_spdf <- arc.data2sp(rate_select_df)

    To enhance the appearance of the correlation matrix, you'll load several function libraries and custom functions that can be found in the correlation matrix tutorial. You'll use the following custom functions:

    • Get lower triangle of the correlation matrix—By default, a correlation matrix returns the correlation coefficient for each pair of attributes twice. This function identifies and returns only the values for the lower triangle of your correlation matrix. All other values are set to NA.
    • Get upper triangle of the correlation matrix—By default, a correlation matrix returns the correlation coefficient for each pair of attributes twice. This function identifies and returns only the values for the upper triangle of your correlation matrix. All other values are set to NA.
    • Reorder correlation coefficients—This function reorders the correlation matrix by correlation coefficient magnitude.

    These functions produce a correlation matrix that is polished, easier to analyze, and ready to be shared with the police department.

  5. Run the following code in your RStudio console to add the custom functions to your workspace:
    # Get lower triangle of the correlation matrix
    get_lower_tri<-function(cormat) {
         cormat[upper.tri(cormat)] <- NA
         return(cormat)
    }
    #
    # Get upper triangle of the correlation matrix
    get_upper_tri <- function(cormat) {
         cormat[lower.tri(cormat)] <- NA
         return(cormat)
    }
    #
    reorder_cormat <- function(cormat) {
         # Use correlation between variables as distance
         dd <- as.dist((1-cormat) / 2)
         hc <- hclust(dd)
         cormat <- cormat [hc$order, hc$order]
    }
  6. Run the following code to add the functions from the reshape2, ggplot2, and ggmap libraries to your workspace.
    install.packages("reshape2")
    library (reshape2)
    install.packages("ggplot2")
    library (ggplot2)
    install.packages("ggmap")
    library (ggmap)
    Note:

    If a package is already installed, you may receive a message that the package will be updated and RStudio restarted. You may choose to update or cancel the package update.

  7. Run the following code to create the correlation matrix:
    corr_sub <- rate_spdf@data [ c ("Grocery", "Restaurant", "Med_HomeIncome", "Renter_Count", "Med_HomeValue", "EB_Rate")]
    cormax <- round (cor(corr_sub), 2)
    upper_tri <- get_upper_tri (cormax)
    melted_cormax <- melt (upper_tri, na.rm = TRUE)
    cormax <- reorder_cormat (cormax)
    upper_tri <- get_upper_tri (cormax)
    melted_cormax <- melt (upper_tri, na.rm = TRUE)
    ggheatmap <- ggplot (melted_cormax, aes (Var2, Var1, fill = value)) +
         geom_tile(color = "white") +
         scale_fill_gradient2 (low = "blue", high = "red", mid = "white", midpoint = 0, limit = c(-1,1), space = "Lab", name = "Pearson\nCorrelation") +
         theme_minimal() + # minimal theme
         theme (axis.text.x = element_text(angle = 45, vjust = 1, size = 12, hjust = 1)) +
         coord_fixed()
    print (ggheatmap)
    ggheatmap +
         geom_text (aes (Var2, Var1, label = value), color = "black", size = 4) +
         theme (
              axis.title.x = element_blank(),
              axis.title.y = element_blank(),
              panel.grid.major = element_blank(),
              panel.border = element_blank(),
              axis.ticks = element_blank(),
              legend.justification = c (1, 0),
              legend.position = c (0.6, 0.7),
              legend.direction = "horizontal") +
         guides (fill = guide_colorbar (barwidth = 7, barheight = 1, title.position = "top", title.hjust = 0.5))
  8. If necessary, click the Plots tab to view the resulting correlation matrix.

    Plots tab

    A correlation matrix helps identify predictors you may want to focus on when deciding what attributes influence the occurrence of crime. The matrix shows values that measure how strongly correlated the attributes are with the desired dependent variable and with one another. Additionally, by identifying attributes with a higher correlation to your dependent variable, you have a better idea of what predictors to try first when attempting to find a predictive model that fits your data well. Your correlation matrix can also be used to identify possible instances of multicollinearity between predictors.

    When a predictor has a positive correlation with your dependent variable, it means that as the magnitude of the predictor increases, the magnitude of the dependent variable increases as well. The higher the correlation coefficient, the stronger this relationship is. Whereas with a negative correlation, as the magnitude of the predictor increases, the magnitude of the dependent variable decreases.

    Multicollinearity measures the similarity of two predictors. When a predictive model contains variables that represent the same effect, those variables tend to negate one another to the point where the effect being accounted for in the model does not impact the response as strongly, and can cause instability in the model. Both of these measures are valuable to consider as you investigate what influence the occurrence of high crime rates.

    Correlation matrix results

    The column above and to the right of your EB_Rate variable has fairly light colors and low values indicating that it is not strongly correlated with the other attributes. In contrast, you can observe some possible instances of multicollinearity among your potential predictors. In particular, the Restaurant and Grocery attributes have strong correlation, as indicated by the bright red square with a correlation coefficient 0.81.

    While the correlation matrix can offer clues about what predictors to include in your model, this type of exploratory data analysis is just one part of building a good model. For the purposes of this lesson, you'll use only the correlation matrix to help make decisions. For more information about other exploratory functions in R that are useful for developing predictive models, see this blog.

Based on the results of your correlation matrix, it appears that your department still has some work to do in terms of identifying attributes that influence the occurrence of crime. Ideally, you'd like to see attributes with much larger correlation values to your response variable so you are fairly confident that a relationship exists between the two. However, this is not surprising, as the process of finding good predictors in a given data set for a particular response variable can be tricky and can require advanced statistical methods to account for possible non-linear terms, spatial trends, and other factors.

Through this lesson, you learned how to install and set up the R-ArcGIS bridge, how to use the bridge to transfer data between ArcGIS and R, and you have seen one of the possible ways R can enhance your ArcGIS workflows through its powerful statistical libraries. You've analyzed statistically significant spatiotemporal trends; enriched your data with a wealth of available socioeconomic, demographic, business, and environmental factors; calculated robust crime rates; found hot spots of crime rates; and began to explore relationships that might help explain those patterns. With the addition of the R-ArcGIS bridge to your workflow, you now have more possibilities at your fingertips than ever before to assist you and your department as you work to understand crimes in San Francisco and what you can do to reduce them.

You can find more lessons in the Learn ArcGIS Lesson Gallery.