Import census tracts and join tables
You will prepare an indicator layer for New York City. You will combine attributes from several existing data sources into one spatial layer. By having all the attributes in the census tracts layer, you can map and analyze the tracts using the additional attributes. You will use existing spatial and tabular data from the American Community Survey (ACS), raster data to measure tree canopy, and proximity analysis to measure access to specific women's resources to build the indicator layer.
Download data and prepare the project
First, you'll download the data that you'll use in the tutorial.
- Download the data that you'll use in the tutorial.
- In Microsoft File Explorer, create a folder on the C:\ drive named IndicatorData.
- Extract the contents of the downloaded .zip file to the IndicatorData folder.
- Start ArcGIS Pro. If prompted, sign in using your licensed ArcGIS organizational account.
Note:
If you don't have access to ArcGIS Pro or an ArcGIS organizational account, see options for software access.
When you open ArcGIS Pro, you're given the option to create a new project or open an existing one. If you've created a project before, you'll see a list of recent projects.
- Under New Project, click Map.
- In the New Project window, for Name, type Indicators, for Location, accept the default folder, and click OK.
Now that you have a project, you will add a folder connection so you can easily access your data.
- View the Catalog pane. If you don't see it, from the View menu, in the Windows group, choose Catalog Pane.
- In the Catalog pane, right-click Folders and choose Add Folder Connection.
- In the Add Folder Connection window, browse to and select the IndicatorData folder, select it, and click OK.
- Expand the IndicatorData folder to view its contents.
In the folder, there are several .csv and .xlsx files that store attribute information. There is also a shapefile named nyct2020 that you will import into the geodatabase. Once you import the shapefile into the geodatabase as a feature class, it will act as the foundation for the remaining indicator information. Joining the tabular information to the spatial data allows you to analyze and visualize all the information.
Import a shapefile into the geodatabase
You have downloaded the data, created a project, and connected to a folder to access the data. Now you will import the shapefile into the geodatabase.
- In the Catalog pane, expand Databases.
Every project in ArcGIS Pro comes with a default geodatabase, named the same as the project. The project geodatabase is called Indicators.gdb.
- Right-click Indicators.gdb, point to Import and choose Feature Class(es).
The Feature Class To Geodatabase tool opens.
- For Input Features, click the browse button, browse to the IndicatorData folder, click nyct2020, and click OK.
Because you right-clicked the geodatabase and chose to import data into it, the Output Geodatabase parameter is already set to the project geodatabase.
- Click Run.
- In the Catalog pane, expand the Indicators geodatabase.
The shapefile was converted into the geodatabase and stored as a feature class that contains polygons. Feature classes are collections of data of the same type, point, line, or polygon.
- From the Indicators geodatabase, click the nyct2020 layer and drag it onto the map to add it.
The census tracts appear on the map and are displayed using a default color.
Note:
Your census tracts may display using a different color than what is shown in the image.
- In the Contents pane, click the nyct2020 layer once to select it and click it again to make the name editable.
- Type NY Census Tracts and press Enter.
You have imported a shapefile into a geodatabase feature class, added it to a map, and renamed the layer. Now that you have the foundational spatial dataset in the database, you will add information to it through table joins.
Explore tabular data
Next, you will join ACS data to the census tracts. The ACS data for the entire state of New York is currently in a .csv file, or a nonspatial format. You will join the two datasets based on a common attribute to incorporate the ACS data into the census tracts.
- In the Catalog pane, expand Folders and expand the IndicatorData folder.
The Data4Join.csv file contains the ACS data for the whole state that you want to join to the NY Census Tracts layer.
- Drag the Data4Join.csv file onto the map.
The .csv file appears in the Contents pane under Standalone Tables.
Tables such as a .csv file do not have a spatial component, so they are listed in the Standalone Tables section of the Contents pane. While tabular data doesn't appear on the map by default, you can use it to enhance your feature layers by joining data, or if the table has coordinates, you can display the data based on the coordinates.
Next, you'll inspect the table.
- Right-click the Data4Join.csv table and choose Open.
The table contains many attributes that you can use for mapping. Currently, the table is in .csv format and doesn't have an OBJECTID field, which means you cannot join it with another layer. Also, the GEO_ID field that you will use as the matching field in the join is a different type than the same field in the census tracts layer. To join tables, you must have common fields that have the same data type.
- In the table, click the options button and choose Fields View.
- Locate the GEO_ID field and notice that its Data Type value is Big Integer.
- In the Contents pane, right-click NY Census Tracts and choose Attribute Table.
- As you did with the .csv table, click the options button and choose Fields View.
The GEOID field in the census tracts layer contains the same information as the GEO_ID field in the table, but it is in text format.
The field types must match for a join to work properly. To ensure you can use this table in a join, you will import it into the geodatabase and add and calculate a text field to store the information.
- Close all tables and Fields views.
Prepare the data for a join
Now that you have identified the need to import the .csv table into your geodatabase and add and calculate a field to use for the join, you will perform these operations to prepare the data appropriately.
- In the Catalog pane, right-click the Indicators geodatabase, point to Import, and choose Table.
- For Input Table, click the drop-down menu and choose Data4Join.csv.
- Click Run.
- In the Catalog pane, in the Indicators geodatabase, right-click Data4Join.csv and choose Rename. Type ACS_Data and press Enter.
- Add the ACS_Data table to the map.
- In the Contents pane, right-click Data4Join.csv and choose Remove.
Before you join the tables, you must add a text field and calculate it.
- Right-click the ACS_Data table, point to Data Design, and choose Fields.
- At the bottom of the fields list, click Click here to add a new field.
- For Field Name, type GEOID, and for Data Type, choose Text.
- On the ribbon, on the Fields tab, in the Manage Edits group, click Save.
- Close the Fields view.
- Open the ACS_Data table.
- Scroll to the end and find the GEOID field. Right-click it and choose Calculate Field.
- In the Expression section, for Fields, double-click GEO_ID to add it to the expression.
You are using the GEO_ID field to populate the GEOID field that you added.
- Click OK.
The field is now of the correct type and populated with the correct information.
- Close the table.
You are now ready to perform the join. In this case, you will join the ACS_Data table to the NY Census Tracts layer to supplement your spatial data.
Join ACS data to the census tracts layer
Next, you will join ACS data to the census tracts. You will join the two datasets based on a common attribute to incorporate the ACS data into the census tracts.
- In the Contents pane, right-click the NY Census Tracts layer, point to Joins and Relates and choose Add Join.
The Add Join window appears. Here, you can input the parameters for the join, such as the tables involved and the matching fields.
- In the Add Join tool, enter or verify the following parameters:
- For Input Table, verify that NY Census Tracts is selected.
- For Input Field, verify that GEOID is selected.
- For Join Table, verify that ACS_Data is selected.
- For Join Field, verify that GEOID is selected.
- Uncheck Keep all input records.
- For Join Operation, choose Join one to first.
You have entered all the parameters for the join. Next, you will validate the join to ensure it will work properly before you run the tool.
- Click the Validate Join button.
The Message window appears.
There are 2,325 matching records for the join. This is the same number of census tracts in the feature layer. Even though the ACS table contains data for the whole state of New York, it will only join the information that matches with the tracts, based on the common fields.
- In the Message window, click Close, and in the Add Join window, click OK.
The join is complete, but there is no visible change on the map. Where you will see the difference is in the layer's attribute table.
- Open the NY Census Tracts attribute table.
- Scroll to the right and notice the ACS_Data fields.
Now all the fields from the ACS_Data table are joined to the census tracts based on the common field.
- Close the table.
- From the Quick Access Toolbar, click Save to save the project.
You have joined the ACS attributes to the census tracts layer. Now you can use these fields for symbology, labeling, and analysis.
Export the joined layer
The join is virtual in the layer, but not stored separately as its own data source. You will export the census tracts layer as a feature class to store the joined fields with the census tract features.
- In the Contents pane, right-click NY Census Tracts, point to Data and choose Export Features.
The Export Features window appears. The Input Features parameter is already set properly because you right-clicked the layer to export it.
- For Output Feature Class, replace the default name with NY_ACS_Tracts and click OK.
- Click OK.
- Remove the NY Census Tracts layer from the map.
- In the Contents pane, rename NY_ACS_Tracts to NYC Census Tracts.
Now the census tracts layer you have in the map contains all the attributes from the ACS table and is its own data source. If you share this data source in any form, all the attributes will be present.
- Save the project.
Add and calculate fields
Next, you will add and calculate two fields, both percentage fields to account for education level and reproductive age.
- In the Contents pane, right-click NYC Census Tracts, point to Data Design, and choose Fields.
- Scroll to the bottom and click Click here to add a new field two times.
Two rows appear. Next, you will edit the field properties.
- For the first row, enter the following properties:
- For Field Name, type or copy and paste Bachelors_degree_higher_women.
- For Alias, type or copy and paste Is a bachelor's degree or higher attainable for women?
- For Data Type, choose Double.
- For the second field, enter the following properties:
- For Field Name, type or copy and paste Percent_reproductive_age.
- For Alias, type or copy and paste What percent of women are at reproductive age?
- For Data Type, choose Double.
Note:
The green boxes next to the field indicate there are unsaved changes.
- On the ribbon, on the Fields tab, in the Manage Edits group, click Save.
- Close the Fields view.
To measure the indicators for education and reproductive health, you will calculate the fields to percentages.
- Open the attribute table for NYC Census Tracts and scroll to the end of the table to see the two fields you just added.
- Right-click Is a bachelor's degree or higher attainable for women? and choose Calculate Field.
- In the Expression section, for EducationForWomen =, copy and paste the following expression: (!Women_getting_a_Bachelor_s_Degree_or_higher! / !Total_Female_Population_for_Education!) * 100.
- Click the green check mark to validate the expression and click Apply.
A message window appears and states a warning because not all records have a value. This is fine and you will proceed.
- Close the warning window and click OK.
You have calculated the percentage of women with a bachelor's degree or higher. Next, you will calculate the other field in a similar manner.
- In the attribute table, right-click What percent of women are at reproductive age? and choose Calculate Field.
- In the Expression section, for WomenAtReproductiveAge =, clear the existing expression.
- Copy and paste the following expression: (!Women_at_reproductive_age_15_to_44! / !Total_Female_Population_for__reproductive_health!) * 100.
- Click Apply.
A similar warning appears, which is fine and expected.
- Close the warning window and click OK.
In the United States, it is commonly known that higher education leads to higher incomes. When you look at this table, you may think about whether women living in these areas have good models of success. This success model is measured by median income, educational attainment, and earnings relative to men. The percentage of women of reproductive age can be used to measure the impact of changes in state laws, such as an abortion ban. You can use the measure to increase outreach for gender-specific health services.
- Close the table and save the project.
You have added and calculated two fields to account for key indicators in your analysis: percentages of women at certain education levels and at reproductive age.
You have created an indicator layer from existing data sources, including a shapefile and CSV file. You imported the data into the geodatabase, added fields, joined the data, exported it, and calculated fields. Next, you will create an environmental indicator using raster data.
Use raster data to create a tree canopy layer
In this section, you will prepare an indicator to measure tree canopy. Tree canopy is often a measure used for environmental indicators and can be paired with other environmental indicators such as temperature to provide a more full picture of the area. Another aspect of tree canopy is that historically, there is often unequal distribution of trees across many of America's cities. Tree canopy is a luxury in an urban space such as New York City. You'll use tree canopy as an environmental indicator to understand tree distribution and which women have access to shade.
Explore a land-cover image
You will start by adding land-cover data created from lidar data for New York City. The image has classified eight types of land cover.
- Return to the project in ArcGIS Pro.
- Go to the IndicatorData folder connection and expand the Land_Cover folder.
This image is a 6-inch resolution land-cover raster dataset for New York City.
- Add the image to the map.
- In the Contents pane, turn off the NYC Census Tracts layer.
This image layer has been classified into eight classes. Next, you'll review the image data by exploring its attribute table.
- Open the attribute table for the NYC_2017_LiDAR_LandCover.img layer.
Notice the eight land-cover classifications that are present in the table. There are 7,446,483,259 cells in the raster classified as Tree Canopy.
When you think of places such as New York City or other urban spaces, you probably think of all the buildings, sidewalks, and busy streets. This reality makes trees and grass a luxury.
- Close the table.
Reclassify the land-cover image
Of the eight land-cover classifications in the image, you are only interested in the Tree Canopy classification. Next, you'll use a geoprocessing tool to reclassify the image and isolate only the cells classified as tree canopy.
- On the ribbon, click the Analysis tab. In the Geoprocessing group, click Tools.
The Geoprocessing pane appears. From here, you can search for tools by name or by the toolbox they are stored in.
- In the Geoprocessing pane, in the Find Tools bar, type reclass. Click the Reclassify (Spatial Analyst Tools) tool.
- In the Reclassify tool, set the following parameters:
- For Input raster, click the drop-down menu and choose NYC_2017_LiDAR_LandCover.img.
- Ensure that Reclass field is set to Class.
- In the Reclassification table, for the Tree Canopy row, leave the value in the New column set to 1. Change the value in the New column for all other classes, except for NODATA, to 0.
- For Output raster, click the Browse button and browse to the IndicatorData folder. For Name, type TreeCanopyNYC.tif and click Save.
Note:
Depending on your system, the Reclassify tool may take up to 20 minutes to complete.
Alternatively, you can download the results data to use the TreeCanopyNYC.tif image file. To use this data instead, download and extract the .zip file to your computer and add it to your project in place of TreeCanopyNYC.tif.
- Click Run.
When the image is finished processing, it appears on the map.
- In the Contents pane, remove the NYC_2017_LiDAR_LandCover.img layer.
The TreeCanopyNYC.tif layer has two classes: Tree Canopy in one class and all other land-cover classifications in the other class. You can use this raster to calculate the presence of tree canopy variable that will be the measure for the environment indicator.
- Save the project.
Next, you will use the Zonal Statistics as Table tool to summarize the amount of tree canopy in each census tract.
Summarize tree canopy within each census tract
For the indicator, you're interested in the presence of trees, and the higher value will represent more trees or a positive environmental factor. To determine tree cover in each census tract, you will summarize the tree canopy cells based on the census tract polygons.
- In the Geoprocessing pane, click the back button. Search for and open the Zonal Statistics as Table (Spatial Analyst Tools) tool.
This tool will summarize the number of tree canopy cells within each census polygon and provide a count of the total number of cells within each zone (polygon). This will allow you to calculate the percentage of the polygon cells covered with trees.
- In the Zonal Statistics as Table tool, enter the following parameters:
- For Input Raster or Feature Zone Data, choose NYC Census Tracts.
- For Zone Field, choose GEOID [GEOID].
- For Input Value Raster, choose TreeCanopyNYC.tif.
- For Output Table, type TreePixels.
- For Statistics Type, choose Sum.
Note:
Depending on your system, the Zonal Statistics as Table tool may take up to 30 minutes to complete.
Alternatively, you can download the results data, extract the zip file, and add the TreePixels table to your project.
- Click Run.
When the tool completes, the TreePixels table appears in the Contents pane under Standalone Tables.
- Open the TreePixels table.
The table contains two columns of interest: COUNT, which is the total number of pixels within each census tract, and SUM, which is the sum of tree canopy pixels.
You'll calculate the percent of tree canopy for each census polygon using the following formula: PctTreeCanopy = (Sum / Count) * 100.
- In the attribute table, click Calculate.
The Calculate Field tool appears. Previously, you created fields before opening the Calculate Field tool. This time, you will create the field and calculate it simultaneously.
- In the Calculate Field tool, for Field Name (Existing or New), type PctTreeCanopy.
- For Field Type, choose Double (64-bit floating point).
- Under Expression, for PctTreeCanopy =, build the expression (!SUM! / !COUNT!)*100.
- Click OK.
The PctTreeCanopy field appears at the end of the attribute table and is calculated.
The PctTreeCanopy value represents the percentage of the census tract with tree cover and is the measure for the environment indicator.
- Close the TreePixels table, turn off TreeCanopyNYC.tif, and save the project.
You have reclassified a land-cover image to isolate the cells that you want to include in the indicator: tree cover, and summarized the tree cover by census tracts. Now you know the percentage of tree cover in each census tract in New York City. The TreePixels table is ready to join the layer of the NYC Census Tracts layer.
Add an indicator based on proximity
The next indicators you create will measure access to specific things. Oftentimes, organizations are trying to determine where things are located, for example, gender-based resources. Once you identify the locations, the next step is determining access to these locations. Usually, access to something is measured in proximity to that location. You will create point layers that represent the locations of women's facilities. Then you will buffer the facilities by a half-mile to determine proximity to those facilities. Also, you will do the same with eviction locations because studies have shown that Black and brown women are often negatively impacted by forced ejectments. You want to know the areas in New York City where women are experiencing forced ejectments from their homes or rentals.
Create points from a table
You have worked with tabular data throughout this tutorial, but thus far, all of it was nonspatial, or didn't have some type of spatial component, such as coordinates. Next, you will map evictions from a table that contains coordinates of their locations.
- In the Catalog pane, from the IndicatorData folder, add Evictions.csv to the map.
- Open the Evictions.csv table and scroll to the right until you see the Latitude and Longitude fields.
The Latitude and Longitude fields store the coordinates for each eviction. You will use these fields to map the evictions as points on the map.
- Close the table.
- On the ribbon, on the Map tab, in the Layer section, click XY Table To Point.
The XY Table To Point tool appears in the Geoprocessing pane.
- In the XY Table To Point tool, set or verify the following parameters:
- For Input Table, choose Evictions.csv.
- For Output Feature Class, replace the default name with Evictions.
- For X Field, verify that Longitude is selected.
- For Y Field, verify that Latitude is selected.
- For Coordinate System, verify that GCS_WGS_1984 is selected.
The XY Table To Point tool chooses smart parameter defaults based on the field names.
- Click Run.
Note:
You will get a warning about null values and can ignore it.
Next, you will add a table containing women's facilities and map those locations using the same tool.
- From the Catalog pane, add Womens_Facilities.csv to the map.
- On the Map tab, click XY Table To Point.
- In the XY Table To Point tool, set the following parameters:
- For Input Table, choose Womens_Facilities.csv.
- For Output Feature Class, change the name to WomensResources.
- For X Field, choose Location 2.
- For Y Field, choose Location 1.
- For Coordinate System, verify that GCS_WGS_1984 is selected.
- Click Run.
- In the Contents pane, turn off Evictions to see the WomensResources points.
Note:
To see the points better, you can change the color.
You have created two feature layers from nonspatial tables to map important criteria for the indicators.
Filter data to only show certain types of features
Now that you have all the points on the map, you will narrow the focus of your analysis to only include a specific type of eviction. For evictions, you're only interested in ejectments, so you will filter out what you need. A big part of analysis is narrowing the focus of your data to include only specific things, such as tree canopy cover and ejectments.
- Open the attribute table for Evictions.
- Scroll and locate the Ejectment field.
You'll use this field to make the attribute selection.
- In the table, click Select By Attributes.
- For Where, click the drop-down menu and choose Ejectment.
- For the second drop-down menu, keep is equal to and for the last drop-down menu, choose Ejectment.
- Click OK.
- In the lower-left corner of the table, click Show Selected Records.
Now only the selected records show. There should be 67 records selected. You will switch the selection to select the features that you don't want to use and delete them.
- In the table, click Switch Selection.
Now, 89,835 records that you don't need are selected.
- Click Delete Selection.
- Click Yes to confirm the deletion.
- Click Show All Records.
- Close the table and save the project.
Now the Evictions table contains only the 67 records that you want to include in your analysis.
Create walk-time buffers
Next, you will incorporate proximity to the evictions and women's facilities into your analysis. You will create half-mile buffers around the features to represent walking distance.
- In the Geoprocessing pane, search for and open the Pairwise Buffer tool.
- In the Pairwise Buffer tool, set the following parameters:
- For Input Features, choose WomensResources.
- For Output Feature Class, replace the default with ResourcesBuffer.
- For Distance, type 0.5.
- Under Linear Unit, choose US Survey Miles.
- For Method, choose Geodesic (shape preserving).
- For Dissolve Type, choose Dissolve all output features into a single feature.
- Click Run.
- In the Contents pane, ensure that the only visible layers, aside from the basemaps, are WomensResources and ResourcesBuffer.
You have created buffers for the resources points. Next, you will create buffers for the evictions features.
- In the Pairwise Buffer tool pane, which is still open, update the following parameters:
- For Input Features, choose Evictions.
- For Output Feature Class, replace the default with EvictionsBuffer.
- Click Run.
- In the Contents pane, turn off WomensResources and ResourcesBuffer and turn on Evictions and EvictionsBuffer.
You have created layers to represent half-mile buffers around the evictions and women's resources points. Having these buffers allows you to incorporate proximity into your indicator preparation.
Create indicator tables
Now you are ready to create the indicator tables.
- In the Geoprocessing pane, click the back arrow. Search for and open the Tabulate Intersection tool.
- In the Tabulate Intersection tool, set the following parameters:
- For Input Zone Features, choose NYC Census Tracts.
- For Zone Fields, choose GEOID [GEOID].
- For Input Class Features, choose EvictionsBuffer.
- For Output Table, type EvictionsIndicator.
- For Sum Fields, choose SHAPE_Area.
- Click Run.
In the Contents pane, the EvictionsIndicator table appears under Standalone Tables.
Next, you will create the indicator table for women's resources.
- In the Tabulate Intersection tool, change only the following parameters:
- For Input Class Features, choose ResourcesBuffer.
- For Output Table, change the name to ResourcesIndicator.
- Click Run.
In the Contents pane, the ResourcessIndicator table appears under Standalone Tables.
- Open both indicator tables.
- Click the tab for one of the tables and drag it until you see the options for docking. Dock it to the right of the other table.
Each table contains a PERCENTAGE field that measure access to two different things.
Higher percentage values for evictions are bad because they represent forced unhousing of people. On the other hand, access to women's resources is a good measure. Therefore, higher percentages mean increased access to gender-specific services.
- Undock the table, close both tables, and save the project.
Next, you will join the evictions and resources indicator tables to census tracts so you have percentages of each for each tract.
Organize the Contents pane
Now that you have all the data that you want for the indicators, you will quickly organize the Contents pane before you join the data. You'll create a group layer to help organize the layers.
- In the Contents pane, press Ctrl and click all the layers except NYC Census Tracts to simultaneously select them.
- Right-click one of the selected layers and choose Group.
This groups all selected layers in a group called New Group Layer.
- Click the name New Group Layer one time to select it and click it again to make it editable.
- For the name, type Working Data.
Next, you will join indicator data.
Join indicator tables to census tracts
You have three indicators in stand-alone tables: TreePixels, EvictionsIndicator, and ResourcesIndicator. To get this information into the census tracts, you will perform three join operations to append the fields from the indicator tables to the census tracts.
- In the Contents pane, right-click NYC Census Tracts, point to Joins and Relates, and choose Add Join.
- In the Add Join tool, enter the following parameters:
- For Input Table, choose NYC Census Tracts.
- For Input Field, choose GEOID [GEOID].
- For Join Table, choose TreePixels.
- For Join Field, choose GEOID.
- Leave Keep all input records checked.
- For Join Operation, choose Join one to first.
- Click OK.
Nothing happens on the map, but the attributes are appended to the NYC Census Tracts table. You will complete the other two joins and explore the table.
Next, you'll repeat the join for the EvictionsIndicator and ResourcesIndicator tables.
- Open the Add Join tool for the NYC Census Tracts layer and enter the following parameters:
- For Input Table, choose NYC Census Tracts.
- For Input Field, choose GEOID (there are many now due to the joins, but any will work).
- For Join Table, choose EvictionsIndicator.
- For Join Field, choose GEOID.
- For Join Operation, choose Join one to first.
- Leave Keep all input records checked.
- Click Run.
Finally, you will join the WomensResources table to the census tracts.
- Open the Add Join tool for the NYC Census Tracts layer and enter the following parameters:
- For Input Table, choose NYC Census Tracts.
- For Input Field, choose GEOID (there are many now due to the joins, but any will work).
- For Join Table, choose ResourcesIndicator.
- For Join Field, choose GEOID.
- For Join Operation, choose Join one to first.
- Leave Keep all input records checked.
You have joined all the tables that you need to the NYC Census Tracts layer. Next, you will export the joined layer to its own feature class and clean up the fields in the process.
Export census tracts
The NYC Census Tracts layer now has four tables joined to it. As you did earlier with the join, you'll export the layer to its own data source.
- In the Contents pane, right-click NYC Census Tracts, point to Data, and choose Export Features.
- In the Export Features tool, change the Output Feature Class parameter to Indicators.
When you join data, you are appending many fields into one table and you may want to either delete or rename some field aliases to make the data more clear. Next, you will clean up the fields before you export the data.
- Expand Fields, check Use Field Alias as Name, and click Edit.
The Field Properties window appears. You'll keep only the fields for the exploratory analysis and rename the indicator fields.
- If necessary, point to the vertical divider next to the Fields section and resize it so you can see the full field aliases.
- In the Fields section, click What's the median income for women? In the Properties section, for Alias, type Median Income Women.
- Using the same workflow, change the alias for each of the following fields as stated:
- Change Are women earning more than men? to Pay Equity.
- Change Is there an abortion ban? Yes or No to Abortion Ban.
- Change Are child marriages legal? Yes or No to Child Marriages.
- Change Percent White Women to White Women.
- Change Percent Black Women to Black Women.
- Change Percent American Indian or Alaska Native Women to AIAN Women.
- Change Percent Asian Women to Asian Women.
- Change Percent Native Hawaiian or Other Pacific Islander Women to NHOPI Women.
- Change Percent Mixed Race Women to Mixed Race Women.
- Change Percent Hispanic or Latino Women to Hispanic or Latino Women.
- Change EducationForWomen to Education.
- Change WomenAtReproductiveAge to Women at Reproductive Age.
- Change PctTreeCanopy to Tree Canopy.
- Change PERCENTAGE (EvictionsIndicator.PERCENTAGE) to Evictions.
- Change PERCENTAGE (ResourcesIndicator.PERCENTAGE) to Gender Based Resources.
Next, you will delete some fields that you don't need.
- In the Fields list, click Total Female Population for Education and click the Remove button.
- In the same manner, remove the following fields:
- Women getting a Bachelor's Degree or higher.
- Total Female Population for reproductive health.
- Women at reproductive age 15 to 44.
- Click OK to close the Field Properties window and click OK again to run the export.
The Indicators layer appears on the map and in the Contents pane.
- Open the attribute table for the Indicators layer and scroll to the right until you see the updated aliases being used as the field header.
Modifying the aliases during the export was a good way to make the table easier to interpret. Now you have all the indicators available in the tracts layer. You can use those indicator fields for symbology, labeling, querying, and analysis.
You have created point layers from coordinates in tables to map evictions and women's resources. You buffered the evictions and women's resources points by a half-mile and used the buffers to create indicators for each variable. You also performed several joins to get all the indicators into the census tracts layer and exported it to its own feature class. The two indicator tables you created measure proximity, but for very different reasons. Higher percentages for evictions are bad because it represents forced unhousing, but it is important to highlight areas burdened by this issue. On the other hand, access to women's resources is a positive measure because women have more support in these areas. Next, you'll join the evictions and women's resources tables to the census tracts and dig deeper into the data relationships using exploratory data analysis.
Explore the data using charts and symbology
Now that you have all the indicators in one layer, you will explore the variables in a scatter plot matrix to gain a better understanding of their relationships. An important part of conducting any analysis is to evaluate the resulting data after calculations are complete. This will help you determine whether the dataset contains skewed data distribution, which could impact your analysis and inform if additional adjustments or methods need to be implemented for the most accurate analysis results.
Explore the indicator data
You will create a scatter plot matrix to compare the relationship between each indicator. This is a helpful way to determine positive and negative correlations and the degree or magnitude of those correlations.
- In the Contents pane, right-click the Indicators layer, point to Create Chart, and choose Scatter Plot Matrix.
The Chart Properties pane and an empty chart window appear. When you set properties in the Chart Properties pane, the chart will automatically display and update in the chart window.
- In the Variables section, click Select.
A list of the attributes in the Indicators layer appears. For a scatter plot matrix, you must select at least three variables. One of the variables that you want to explore is Median Income, but it is not showing up in the list.
- Open the Fields view for the Indicators layer.
- Locate the Median Income Women field and view its Data Type.
The Median Income Women field has a type of Text. You cannot plot a text field in a scatter plot matrix, so you must add a numeric field and calculate it to store the income values.
- Using the skills you have performed in this tutorial, add a field called WomensMedianIncome with an Alias of Womens Median Income and a Data Type of Double.
- Calculate the WomensMedianIncome based on the Median Income Women field.
You can disregard any warnings in the calculation.
- In the Chart Properties pane, click Select.
- In the variables list, check the boxes for Pay Equity, Education, and Womens Median Income.
The selected variables are listed.
The variables appear on the scatter plot matrix.
- Under Trend, click Show trend line.
The trend lines appear for each variable to indicate how the variable is trending.
- In the Matrix Layout section, for Lower left, verify that Scatterplots is selected, and for Upper right, click the drop-down menu and choose Pearson's r.
The scatter plot matrix allows you to explore many relationships in a single chart. It visualizes the bivariate relationship between the variables you selected. Next, you'll explore the relationship of economic outcomes for white, Black, and Latino women.
- In the Chart Properties pane, for Variables, click Select and check the boxes for White Women, Black Women, and Hispanic or Latino Women.
These mini plots show r-values with diverging colors that correspond to the strength and direction of the relationship.
Next, you'll sort the mini plots.
- In the Chart Properties pane, in the Sort section, click the Sort by drop-down menu and choose Pearson's r. For Target field, choose Womens Median Income, and for Sort direction, choose Descending.
Generally, the values will be between +1 and -1. There are three relationships to look for in the scatter plot matrix:
- Positive correlation, values closer to +1.
- No correlation, values close to 0.
- Negative correlation, values close to -1.
Three plots show a strong positive relationship, with values of 0.8, 0.55, and 0.6, respectively. Next, you'll explore the variables for each of the relationships.
- In the chart, click the box with the Pearson's r value of 0.8.
The corresponding scatter plot for Education and Womens Median Income is outlined in the scatter matrix plot.
The plot with a value of 0.8 represents the relationship between the Education and Womens Median Income variables. It is expected that as education increases, income would also increase.
- Click the box with the r-value of 0.55.
The plot for the White Women variable is outlined. There is a strong positive relationship between white women and median income, so as the percentage of white women increases, so does the median income.
- Click the box with the r-value of 0.6.
The plot showing the relationship between the White Women and Education variables is outlined. Based on the chart, as the percentage of white women increases, the percentage of women with a bachelor's degree or higher also increases. Next, you'll explore whether there is a similar relationship for Black women.
- In the chart, click the box with the r-value of -0.26 and -0.32.
The plots for Black Women highlights, showing the relationships between Black women, income, and education show a negative correlation; therefore, as the percentages of these two groups increase, both income and education decrease.
- To explore the relationship between Hispanic or Latino women, income, and education, click the r-values of -0.43 and -0.47.
The relationships between Hispanic or Latino women, income, and education show a negative correlation; therefore, as the percentages of these two groups increase, both income and education decrease.
- Select the box with the r-value of -0.63.
The selected plot represents the relationship between percentages of Black and white women, which means as the percentages of one group increase, the other decreases. Therefore, it is likely that these two groups often don't live in the same areas.
- Close any open windows except the map. Close the Chart Properties pane and save the project.
You've just explored the data using a scatter plot matrix with Pearson's r values. If you were to use these indicators in an index, you would consider whether they are important to the outcomes and/or whether the indicator is the focus of the index. For example, you wouldn't include race and/or ethnicity in the index value calculations; however, you may use these factors to disaggregate the index. Next, you'll consider another example: pay equity. Pay equity is a derived variable of income between women and their male counterparts. Pay equity provides great insight into how gender parity is measured by income, but for an index with the current set of indicators, you may want to exclude it. You already have median income as a variable. Additionally, if you were to expand these topic areas and consider having subindices like economics having a median income, pay equity, and a few other data points, it would work better.
Map an indicator
Now that you've explored the indicator data using a scatter plot matrix and gained an understanding about the variables, you will display the Indicators layer using bivariate symbology. You'll create a relationship map of education and income. Relationship maps show a visual representation of two variables. This will help you see the interaction of the indicators in more than one dimension, which is often referred to as superdiversity or intersectionality.
- In the Contents pane, right-click the Indicators layer and click Symbology.
The Symbology pane appears.
- For Primary symbology, click the drop-down menu and choose Bivariate Colors.
- For Field 1, choose Education.
- For Field 2, choose Womens Median Income.
- For Method, verify that Quantile is selected.
- For Grid Size, verify that 3 x 3 is selected and keep the Pink-Blue-Purple color scheme.
Next, you'll change the outline color.
- For Template, click the existing color.
- Click the Properties tab. For Outline color, click the existing color and choose Gray 30%.
- For Outline width, change the current value to 0.2 pt.
- Click Apply.
This will symbolize the relationship between education and median income from low to high. Where both education and median income for women are high, those areas will be shaded purple. This area is primarily in Manhattan and a portion of Brooklyn.
- Change the name of the layer to Education x Median Income for Women.
- Save the project.
You've just completed two methods for exploratory data analysis: charts and mapping. Using charts, you can investigate relationship strength and identify indicators to exclude from an index. Typically, these will be highly correlated indicators that can skew index values. Mapping visualizations allows you to see patterns of multiple indicators, which is a key to understanding social processes.
In this tutorial, you introduced the geographic approach to racial equity and social justice and applied it to indicator development. You prepared indicator layers using the American Community Survey data to obtain education, pay, and income data. You also learned how to reclassify imagery and calculate tree canopy based on pixels in the polygon tracts. Then, you developed an indicator based on proximity to look at access to gender-based resources. The final step was to perform an exploratory data analysis that you can use to identify highly correlated indicators, which can skew an index.
You can apply this indicator development methodology to other areas of interest around the world and can include data specific to your community. When preparing your own indicators, use data processing and indicators specific to your long-term goals, outcomes, and populations. You can find more on exploratory data analysis in this blog post.
You can find more tutorials in the tutorial gallery.