In the previous lesson, you installed the R-ArcGIS bridge and downloaded the data for your statistical analysis. Then, in ArcGIS, you aggregated your data based on areas and times of interest and began to explore temporal trends in your dataset. For the department to better understand what factors influence the prevalence of crime, you'll add additional information.
Add additional attributes to your dataset
Now that you know where crime hot spots are emerging, you'll try to determine why they are emerging. In particular, you'll examine the relationship between an area's crime and its population. Statistical analysis can determine if the number of crimes occurring in a particular area is influenced by population. In addition, your department is interested in analyzing the presence of certain types of businesses, as well as the prevalence of parks, the amount of public land in a given area (hexagon bins), the median household income and home value, among other factors.
Currently, the hexagon bins in the space time cube layer contain no attribute information suitable for this kind of analysis. You'll run another geoprocessing tool to enrich the layer with relevant attribute information.
If your version of ArcGIS Pro is earlier than 1.4, you'll need to subset the data into two smaller pieces because the Enrich tool may not run successfully on a dataset with over 1,000 rows.
- In the Analysis tab, Geoprocessing group, click Environments.
- In the Environments pane, locate the Fields parameter settings. Uncheck Maintain fully qualified field names.
This ensures that output field names will not include the table name of the source from where the field was obtained. This setting is relevant when working with enriched data that consists of fields joined from one or more sources.
- Run the Enrich (Analysis Tools) tool with the following parameters.
This step requires approximately 50 ArcGIS service credits. If you don't have sufficient credits allocated to your ArcGIS organizational account (or if you're not sure), you can use the result file provided in the StepResult folder. Instead of running the tool, add San_Francisco_Crimes_Enrich.shp to your map and then skip to the next step.
- For Input Features, choose San_Francisco_Crimes_Hot_Spots.
- For Output Feature Class, browse to the default project geodatabase, named Crime Analysis.gdb and name the output feature class San_Francisco_Crimes_Enrich.
- For Variables, click the plus. In the Add Variable window, use the search bar to locate and choose 2010 Total Population, 2018 Median Home Value, 2018 Median Household Income, 2018 Renter Occupied HUs, and Food & Beverage Stores Bus and Food Service/Drinking Estab Bus.
Demographic data is updated periodically, so the available variables and values may differ from those specified in the lesson. If necessary, use the most recent data.
The way you add variables is important because it could affect the variable names. If your variable names are different than those shown, you'll need to edit them when you paste the R script or the line won't run.
While not an exhaustive list of the variables that could potentially be linked to crime rates, this list will provide a good start for your analysis. Next, you'll take a look at the results.
- In the Contents pane, right-click the San_Francisco_Crimes_Enrich layer and choose Attribute Table.
- If necessary, scroll to the right in the attribute table until you can see the last eight columns.
The newly added enrichment fields display in the table with alias names that are more descriptive than the original field names. In the list below, alias names are listed and followed by original field names in brackets.
The result of the Enrich Layer tool includes the following fields and values:
- HasData—Indicates whether the Enrich Layer tool found data for the given hexagon bin, with 0 meaning a hexagon had no available data for all of the attributes you selected and 1 meaning a hexagon bin had data for at least one of the attributes you selected. You can use this field to filter your data so that only features with relevant attribute information appear on the map.
- 2010 Total Population (populationtotals_totpop10 - 2010)—Contains the population count per hexagon bin. Notice that some hexagons have a population of 0. A hexagon bin may have a population of 0 because it is located in an industrial area or in a park. The first priority of your department is to reduce crimes in populated areas, so you'll focus only on populated locations.
- 2018 Median Home Value (wealth_medval_cy)—Contains the median home value per hexagon bin.
- 2018 Median Household Income (wealth_medhinc_cy)—Contains the median household income value per hexagon bin.
- 2018 Renter Occupied HUs (ownerrenter_renter_cy)—Contains the number of renter occupied households per hexagon bin.
- Food & Beverage Stores Bus (NAICS) (businesses_n13_bus)—Contains the count of food and beverage stores located within each hexagon bin.
- Food Service/Drinking Estab Bus (NAICS) (businesses_n37_bus)—Contains the count of businesses that serve food, beverages, or both located within each hexagon bin.
You've created a feature class that contains the information needed to perform your analysis, but you now have some data that is not pertinent to your analysis goals. Hexagon bins that do not have information for your attributes of interest do not add any value or new information to help you answer your questions. Additionally, areas that are not populated are not of high priority for your department at this time. As a result, you'll need to trim down your enriched dataset to contain only the information most useful to you.
Prepare your dataset for additional analyses
Next, you'll select the data that is relevant to your analysis and make a subset with only that information. This way, you still have access to all your enriched data should you need it for further analyses, but you can continue your current analysis with only the necessary data.
- In the Geoprocessing pane, search for and open the Select Layer By Attribute tool.
- For Input Rows, choose San_Francisco_Crimes_Enrich.
- For Selection type, make sure New selection is chosen.
- To add an expression, click the Add Clause box.
- In the Field box, choose HasData.
- For the expression operator, choose is Equal to and type the value 0 in the value parameter box. Click Enter to complete the expression.
Next add a 2nd expression.
- Click Add Clause and add the expression Or 2010 Total Population is Equal to 0.
- Click Enter to complete the second expression. Click the green check to verify that the syntax of both expressions is valid.
For more information about writing SQL expressions, see SQL reference for query expressions used in ArcGIS.
- Click Run.
The tool runs and selects features that have no enriched data, or that have zero population. These may be industrial sites or parks.
- If necessary, open the attribute table for the San_Francisco_Crimes_Enrich layer.
You see that 231 of 1996 rows are selected, meaning they have 0 values for the HasData or TOTPOP10 fields. You'll create a new dataset without these selected features so you can focus on features that have data relevant to your analysis.
- In the attribute table, click the Switch button.
The button swaps the selection from the 231 rows that had no data or no population, to all of the other rows. You should have 1765 of 1996 rows selected, and you can now copy the enriched and populated data to its own layer.
- In the Geoprocessing pane, search for and open the Copy Features tool.
- For Input Features, choose San_Francisco_Crimes_Enrich
When you have specific rows selected, the Copy Features tool only copies those rows into your new feature class result.
- ForOutput Feature Class, browse to the default project geodatabase, named Crime Analysis.gdb and name the output feature class San_Francisco_Crimes_Enrich_Subset.
- Click Run.
You now have two layers, San_Francisco_Crimes_Enrich and San_Francisco_Crimes_Enrich_Subset. The former contains the full dataset, and the latter contains only the data for areas with enriched attributes or areas with people living in them.
- Save your project.
In the next lesson, you'll learn how to analyze these attributes in R, and how they may influence the likelihood an area experiences crime.