Explore the data

Exploratory data analysis is one of the most important steps in spatial analysis and modeling, as it enables you to understand your data. In this module, you'll explore several aspects of the data provided for this lesson.

Get set up and examine the analysis criteria

In this section, you'll set up your project, learn about the criteria chosen for the analysis, and examine census-tract-level data.

First, you'll download the project package containing all the data for the lesson and open it in ArcGIS Pro.

  1. Go to the Shade_Equity item page and click Download.

    Download button

    Note:

    Most browsers download to your computer's Downloads folder by default.

  2. Locate the downloaded Shade_Equity.ppkx file on your computer and move it to a location where you can easily find it, such as your Documents folder.
  3. Double-click the Shade_Equity.ppkx file to open the project in ArcGIS Pro. If prompted, sign in using your licensed ArcGIS account.
    Note:

    If you don't have ArcGIS Pro or an ArcGIS account, you can sign up for an ArcGIS free trial.

    The project opens with a map showing the average percentage of tree canopy coverage in the city of Los Angeles at the census tract level.

    Initial overview

    Tree canopy is defined as the area of coverage that a tree provides from its branches and leaves. The data represents the percent coverage of tree canopy within each census tract. For example, a value of 10% would mean that approximately 10 percent of the total census tract has tree canopy cover. On the map, the lighter shade represents lower tree canopy coverage, and the darker shades represent higher tree canopy coverage.

  4. Review the darker-shaded region just north of Santa Monica.

    North of Santa Monica

    This area contains some of the highest tree canopy coverage in the city of Los Angeles. This is a mountainous region, which is underdeveloped due to the difficulty of building within canyons and steep mountainous terrain. As a result, it has many more trees than other neighborhoods.

  5. Review the lighter-shaded region south of downtown Los Angeles.

    South downtown Los Angeles

    This area has less than 5 percent canopy coverage.

    The City of Los Angeles could choose to distribute the 90,000 trees across in the areas of the map that show low canopy coverage (yellow and light green). However, the City has decided to be more proactive in addressing social and environmental equity issues, and wants to distribute the trees based on these three specific objectives:

    The city chose the following criteria for each objective:

    ObjectiveCriteria

    Equity

    Hispanic or Latino population, Non-Hispanic Black population, households with income below the poverty level

    Environment

    Land surface temperature, air quality, annual daily traffic, current tree canopy coverage

    Susceptibility

    Seniors (>65 years old), young children (<5 years old), people with asthma, people who commute by bus, people who walk to work

    The populations listed for the Equity objective were selected, as they are often affected by inequitable conditions, for historic, societal, or economic reasons.

    Note:

    Race and ethnicity are complicated and fluid social constructs. A person may identify with multiple races, for example. The Hispanic or Latino category includes people who identify with Black, White, Asian, or other racial categories, but who also identify with the Hispanic or Latino ethnicity. Going further, it is important to remember that races do not have any basis in biology or genetics.

    While race and ethnicity are imperfect social constructs, they do have tangible consequences in society. Including them in analyses, therefore, can provide important insight into racial disparities.

    Regarding the Environment objective, the focus is on criteria that can be mitigated by a higher tree canopy coverage.

    As for the Susceptibility objective, the idea is that seniors, young children, people with respiratory illnesses, and people who wait in the sun for buses or walk to work will be most adversely impacted by air pollution, heat, and lack of shade.

    Note:

    While this lesson is modeled after an actual project led by the City of Los Angeles, some details concerning the criteria chosen and the analysis workflow have been modified.

    Also, while these 12 criteria are appropriate for the City of Los Angeles, another city might elect to use different or additional criteria.

    You'll now examine the data included in the project package to understand how it relates to the City of Los Angeles' tree planting objectives and criteria.

  6. In the Contents pane, right-click the LA City Tract Data layer and choose Attribute Table.

    Attribute Table option

    The table appears. It contains most of the variables (or attributes) that you will use to model tree planting priorities.

  7. Scroll horizontally to review the variables included in the table.

    Scroll horizontally through the attribute table.

    Most of them correspond to criteria selected by the city.

    Note that there are two variables that pertain to tree canopy: Ave Percent Tree Canopy Coverage, which you already saw used in the first map, and Total Canopy Coverage in Sq Km, which you'll use later in the analysis. In addition, later in the lesson, you'll extract two more variables from the Land Surface Temperature and Air Quality raster layers.

    Note:

    In the variable 2020 Hispanic Population, the term Hispanic is used as a shorthand for Hispanic and Latino population.

    For more information about the source of each variable, check the Learn about data sources section at the end of the lesson.

  8. Close the attribute table.

    Close the attribute table.

In this section, you set up your project, learned about the criteria chosen by the City of Los Angeles, and examined data variables at the census tract level. Next, you'll examine some of the data relationships.

Examine a data relationship with a scatterplot

One objective important to the City of Los Angeles is to plant trees to promote equity. In articles such as Shade and 'Turn Off the Sunshine': Why Shade Is a Mark of Privilege in Los Angeles, the authors report on a considerable lack of shade in low-income neighborhoods. You'll verify whether your data confirms this by creating a scatterplot that shows how median household income relates to tree canopy coverage in the city of Los Angeles.

  1. In the Contents pane, right-click LA City Tract Data, point to Create Chart, and choose Scatter Plot.

    Scatter Plot menu option

    The Chart Properties pane appears.

  2. In the Chart Properties pane, for the X-axis Number parameter, choose Ave Percent Tree Canopy Coverage. For Y-axis Number, choose 2020 Median Household Income (at the bottom of the drop-down list).

    Scatterplot showing the relationship between tree canopy coverage and median income

    The trend line shows a positive relationship (or correlation) and an R2 value of 0.49. This indicates that as median household income increases, so does tree canopy coverage. Wealthier census tracts tend to have more trees, and impoverished census tracts tend to have fewer trees. This shows an inequitable tree canopy distribution based on income.

  3. Close the scatterplot and the Chart Properties pane.

In this section, you explored the relationship between tree canopy coverage and median household income by creating a scatterplot. Next, you'll explore the relation between two variables with a bivariate map.

Explore disparities with a bivariate map

In this section, you'll use a bivariate map to examine where there might be more or less inequity for the seniors' ability to access tree canopy in the city of Los Angeles.

The City wants to plant the trees where they are most needed, in terms of social equity and other criteria. However, it also needs to take into account the current tree presence. For instance, if a census tract has a high number of seniors and few trees, the city should plant more trees in that tract to make the situation more equitable. If a census tract has a high number of seniors but also has a high number of trees, the city might not need to plant any more trees in that tract to be equitable to seniors.

You can define three possible categories for census tracts with regard to seniors:

  • High number of seniors
  • Average number of seniors
  • Low number of seniors

Similarly, you can also define three possible categories with regard to trees:

  • High access to trees
  • Average access to trees
  • Poor (inequitable) access to trees

You'll create a bivariate map showing how all these categories combine.

Note:

The Bivariate Colors symbology shows the quantitative relationship between two variables.

  1. In the Contents pane, right-click the LA City Tract Data layer and click Symbology.

    Symbology menu option

    The Symbology pane appears.

  2. For the Primary symbology parameter, choose Bivariate Colors.
  3. For Field 1, keep Ave Percent Tree Canopy Coverage, and for Field 2, choose 2020 Senior Population.

    Symbology parameters

  4. For Color scheme, expand the drop-down list and click Show names. Choose the Olive-Blue-Green 3x3 option.

    Bivariate map color scheme: Olive-Blue-Green 3x3

  5. Click the Color scheme options button and choose Apply to fill and outline.

    Apply to fill and outline option

  6. On the Legend tab, for Orientation, choose High values/Low values.

    Bivariate legend orientation

    The Contents pane reorients the legend.

    Bivariate legend

    • The tracts symbolized with dark green have many seniors and many trees.
    • The white tracts have few seniors and few trees.
    • The bright-blue tracts have many seniors but few trees.
    • The olive tracts have few seniors and many trees.
  7. Review on the map how the different colors are distributed.

    Bivariate map of senior population and average percent tree canopy coverage

    The City will want to give the highest tree planting priority to the tracts with large numbers of seniors and low tree canopy cover percentages (bright-blue color). Similarly, it will want to give the smallest priority to tracts with few seniors and many trees (the olive color on the map).

    The map you just created examines only the senior population and their relation to the tree canopy cover. But, of course, you similarly need to take into account Hispanic residents and their relation to tree canopy cover, people who walk to work and their relation to canopy cover, and so forth. In the next module of the lesson, you'll achieve this by creating a disparity index for every demographic variable.

    Note:

    In this data exploration module, you only looked at one demographic variable at a time (the income level or the senior population, for example). However, in real life, a single individual can belong to several groups, for instance, living in poverty, being a senior citizen, identifying as Hispanic, and being affected by asthma. These combinations of factors, referred to as intersectionality, create complex dynamics and are worth further exploration. Because of time constraints, you won't do that in this lesson. However, it should be highlighted that the analysis method you'll implement later in the lesson does take into account intersectionality dynamics, as it will assign a higher score to the census tracts where several of the variables have a high value.

    Another important observation is that the lesson focuses only on 12 variables, but many others can be taken into consideration. This is especially true for additional demographic variables: other races and ethnicities, economic classes, genders, sexual orientations, and religions, for example. Ideally, you want tree distribution to be equitable to individuals belonging to any combination of these identity markers. In practice, however, a large number of variables complicates the analysis. It is important to work closely with community stakeholders early in the project to identify the full list of possible variables and to decide on a manageable subset, focusing on the most relevant variables to best meet the project's goals and priorities.

  8. On the Quick Access Toolbar, click Save to save the project.

    Save button

In this module, you explored the analysis criteria and corresponding data. You then created a graph and uncovered a positive relationship between tree canopy coverage and median household income. Finally, you created a bivariate map to understand visually the many possible combinations between tree canopy coverage and senior population. Next, you'll prepare your data for the suitability analysis.


Prepare data for analysis

Based on the City's planting objectives of improving social and environmental equity, your ultimate goal is to compute a priority score for every census tract. Based on these scores, you'll decide how many trees will be assigned to each tract. In this module, you'll prepare your data for that analysis. First, you'll compute disparity indices for each of the demographic variables. You'll also summarize the land surface temperature and air quality raster data by census tract to produce the last two environmental variables you need for your analysis.

Find the total sums for demographic variables

For each of the eight demographic variables (Hispanic or Latino population, non-Hispanic Black population, households with income below the poverty level, seniors, young children, people with asthma, people who commute by bus, people who walk to work), you'll compute a disparity index. The disparity index compares the percentage of seniors (or Hispanic people, or people with asthma, and so on) in each census tract to the percentage of tree coverage. The disparity index is the difference between the two percentages. If there is equity, the expectation is that a tract with 2 percent of all seniors in the city of Los Angeles, for example, should have 2 percent of all the existing tree canopy coverage in the city. If the percentage of seniors is much higher than the percentage of existing tree canopy coverage, there is inequity, and the tract should get priority for tree planting. When there is equity, the difference between the percentage values is zero. When there is inequity, the difference is a positive number, and the larger the number is, the higher the priority for planting trees.

Note:

You will want to use disparity index values in your analysis instead of the raw demographic data, because you are not as interested in the populations themselves as in a measure of inequity for those populations.

To compute the disparity indices, the first step is to find the total sum of values for each demographic variable and for the Total Canopy Coverage in Sq Km variable. For instance, for the households with income below poverty level variable (2018 HHs w Income Below Poverty Level), you'll sum up the households below poverty in all the census tracts to find how many such households there are in the entire city of Los Angeles.

  1. In the Contents pane, right-click the LA City Tract Data layer, and choose Data Engineering.

    Open the Data Engineering tool.

    The data engineering panel appears with the variables listed on the side of the panel.

  2. Press the Ctrl key, highlight the eight demographic variables and Total Canopy Coverage in Sq Km, and drag them to the right side of the panel.
    • 2020 Children Under the Age of 5
    • 2020 Hispanic Population
    • 2020 Non-Hispanic Black Population
    • 2018 HHs w Income Below Poverty Level
    • 2020 Senior Population
    • 2020 Est People Using Prescription Drugs for Asthma
    • 2018 Workers 16+ who Commute by Bus
    • 2018 Workers 16+ who Walk to Work
    • Total Canopy Coverage in Sq Km

    Add variables to be explored in the Data Engineering tool.

  3. Click the Calculate button.

    Calculate button in the Data Engineering tool

    To browse the results more easily, you'll freeze the Field Name column.

  4. Right-click the Field Name column header and click Freeze/Unfreeze.

    Freeze/Unfreeze menu option

  5. Scroll horizontally to review the statistics computed for each field.

    The Sum column is the one you need. It represents the total sum of all census tract values for each variable. For instance, the Sum value for the HHs w Income Below Poverty Level variable is 247,550 (households).

    Sum column in the Data Engineering tool

    Note:

    For ease of use, ensure that the Field Name and Sum columns are next to each other.

In this section, you found the total sums for each demographic variable and the canopy cover variable using the Data Engineering tool.

Compute percentages for demographic variables

The next step of the disparity computation is to convert the variable values into percentages. You'll do that by dividing the value in each tract by the total sum and multiplying by 100.

Note:

For instance, there are 247,550 households with income below the poverty level in the entire city of Los Angeles. If a census tract has 1,026 such households, this means that (1,026/247,550) * 100 or 0.41 percent of all households with income below the poverty level in the city of Los Angeles reside in that specific census tract.

You'll use the Calculate Field tool, and start with the variable HHs w Income Below Poverty Level.

  1. On the ribbon, on the Analysis tab, in the Geoprocessing group, click Tools.

    Tools button

    The Geoprocessing pane appears.

  2. In the Geoprocessing pane, search for the Calculate Field tool and open it.

    Calculate Field button

  3. In the Calculate Field tool, for Input Table, choose LA City Tract Data. For Field Name, type pPoverty (p is for percent).
    Note:

    When you provide a field name that doesn't exist, the Calculate Field tool creates the new field for you. However, the default type for the new field is Text, so be sure to change it if appropriate.

  4. For Field Type, choose Float.

    Calculate Field parameters

    You'll now build the expression.

  5. Under Expression, in the Fields list, choose 2018 HHs w Income Below Poverty Level.
    Note:

    The variable is inserted in the Expression text box using its name, households_acshhbpov, rather than its alias.

  6. In the Expression text box, type a divide sign.
  7. In the Data Engineering panel, find the Sum value for the 2018 HHs w Income Below Poverty Level variable. Copy the value (Ctrl+C) and paste it (Ctrl+V) in the Expression text box.
  8. Add a multiplication sign, parenthesis, and the 100 value to create the complete expression: (!households_acshhbpov! / 247550) * 100.

    Build the expression to compute percentages.

  9. Click Run.

    You'll review the new pPoverty field in the attribute table.

  10. In the Contents pane, right-click the LA City Tract Data layer, and choose Attribute Table.

    You'll ensure that the table is sorted in a predictable order, by Tract ID.

  11. In the attribute table, if necessary, right-click the Tract ID field name and choose Sort Ascending.

    Sort Ascending option

    The top row should have a Tract ID value of 06037101110.

  12. Scroll horizontally to find the new pPoverty field.
  13. On the first row, verify that the pPoverty field has the value 0.11634013.

    First pPoverty value

    Note:

    If the value is different, there may be an issue in your formula. Go back to the Calculate Field pane and check it. If necessary, in the attribute table, right-click and choose Delete. Then run Calculate Field again with the corrected formula.

    You'll now map the pPoverty field to visualize it.

  14. Right-click the LA City Tract Data layer and select Symbology.
  15. In the Symbology pane, choose the following parameter values:
    • For Primary symbology, choose Graduated Colors.
    • For Field, choose pPoverty.
    • For Color scheme, choose Yellow-Green-Blue (continuous).

    Symbology parameters for pPoverty

    The map indicates the proportion of the total 247,550 households with incomes below the poverty level that is present in each census tract.

    Map of percent households living below the poverty level by census tract

    You'll repeat the calculation to create percentages for every other demographic variable, using this table as a guide:

    Original variable nameNew variable nameExpression

    2020 Hispanic Population

    pHispanic

    (!raceandhispanicorigin_hisppop_cy_1! / 1970773) * 100.

    2020 Non-Hispanic Black Population

    pNonHspBlack

    (!raceandhispanicorigin_nhspblk_cy_1! / 338663) * 100

    2020 Senior Population

    pSeniors

    (!agedependency_senior_cy! / 511038) * 100

    2020 Children Under the Age of 5

    pYoungChildren

    (!F5yearincrements_pop0_cy! / 240187) * 100

    2018 Workers 16+ who Walk to Work

    pWalk2Work

    (!commute_acswalked! / 66917) * 100

    2018 Workers 16+ who Commute by Bus

    pBus2Work

    (!commute_acsbus! / 158972) * 100

    2020 Est People Using Prescription Drugs for Asthma

    pAsthma

    (!healthpersonalcare_mp14088a_b_1! / 126788) * 100

    Total Canopy Coverage in Sq Km

    pTreeCanopy

    (!CanopyCover_SqKM! / 104.221856528049) * 100

    Using the Hispanic variable as an example, proceed in the following manner for every variable in the table.

  16. At the bottom of the Symbology pane, click the Geoprocessing tab to return to the Geoprocessing pane.
  17. In the Geoprocessing pane, change the Field Name value from pPoverty to pHispanic, and confirm that the Field Type value is Float.
  18. Change Expression to (!raceandhispanicorigin_hisppop_cy_1! / 1970773) * 100.
    Note:

    The value 1,970,773 is the total sum found in the data engineering pane for that variable.

  19. Click Run.

    When you have created all 9 fields, you'll review them.

  20. In the attribute table, scroll horizontally to the end of the table and verify that all the fields have been added and populated.

    Populated percentage fields

In this section, you computed the percentages for each demographic variable and the tree canopy cover variable.

Compute disparity indices

Now that you have percentages for each of the demographic fields and tree canopy coverage in square kilometers, you'll compare them to assess equity. You'll subtract the percent of tree canopy coverage from the percent for each demographic variable. When the difference is zero, the distribution is equitable. When the difference is positive, there is a deficit of the current tree canopy coverage distribution in relation to the demographic variable being evaluated, and therefore inequity. Similarly, a negative result indicates a surplus of existing trees.

  1. Close the Data Engineering pane, as you won't need it any longer.

    As before, you'll change the field name and expression in the Calculate Field tool to make this new batch of computations, using the table below as a guide. The result will be eight new fields containing disparity indices for all demographic variables.

    New variable nameExpression

    DisIdxPoverty

    !pPoverty! - !pTreeCanopy!

    DisIdxHispanic

    !pHispanic! - !pTreeCanopy!

    DisIdxNonHspBlack

    !pNonHspBlack! - !pTreeCanopy!

    DisIdxSeniors

    !pSeniors! - !pTreeCanopy!

    DisIdxYoungChildren

    !pYoungChildren! - !pTreeCanopy!

    DisIdxWalk2Work

    !pWalk2Work! - !pTreeCanopy!

    DisIdxBus2Work

    !pBus2Work! - !pTreeCanopy!

    DisIdxAsthma

    !pAsthma! - !pTreeCanopy!

    Taking the pPoverty variable as an example, proceed in the following manner for every variable in the table.

  2. In the Geoprocessing pane, change Field Name to pPoverty, and confirm that Field Type is Float.
  3. Change Expression to !pPoverty! - !pTreeCanopy!.
  4. Click Run.

    When you have created the eight new fields, you'll verify the results.

  5. In the attribute table, verify that the eight new fields have been added correctly.

    Populated disparity index fields

  6. Close the attribute table.

    You'll symbolize the disparity indices for the Non-Hispanic Black population to see how they map out across the city of Los Angeles.

  7. Right-click LA City Tract Data and choose Symbology.
  8. In the Symbology pane, choose the following parameter values:
    • In the Symbology pane, for Primary symbology, keep Graduated Colors.
    • For Field, choose DisIdxNonHspBlack.
    • For Method, choose Standard Deviation.
    • For Color Scheme, choose Pink to Green (Continuous).

    Symbolize DisIdxNonHspBlack.

    Note:

    The Standard Deviation option shows how much a feature's attribute value varies from the mean. It helps emphasize the values the farthest away from the mean, whether above or below. You'll reverse the color scheme so green is associated with the most abundant percentage of tree canopy coverage with regard to the percent Non-Hispanic Black population.

  9. In the Symbology pane, on the Classes tab, click More and choose Reverse symbol order.

    Reverse symbol order option in Symbology pane

    The map symbology updates.

    Disparity index for Non-Hispanic Black population and tree canopy coverage in each census tract

    The darkest pink tones on the map represent tracts with the largest disparity values, and therefore the highest inequity. Those areas should get the highest priority for new trees to enable Non-Hispanic Black populations to have more equitable tree access.

In this section, you computed disparity indices for every demographic variable. You then mapped out the result for the Non-Hispanic Black population as an example. In the next two sections, you'll extract environment variable values from raster data.

Prepare the Air Quality raster data for analysis

While the tree canopy data and traffic data were preprocessed to save you time, you will extract the data for two of the environmental variables yourself to get a sense of the process involved. Such data is often provided in raster form and needs to be processed to be expressed at the census tract level. In this section, you'll process the air quality raster data.

Note:

For more information about the source of this data, see the Learn about data sources section at the end of the lesson.

  1. In the Contents pane, collapse the LA City Tract Data layer and turn it off. Expand the Air Quality layer and turn it on.

    Turn on the Air Quality layer.

    Every pixel in this raster reflects 18 years of PM 2.5 measurements generated from satellite-borne sensors. PM 2.5 stands for particulate matter 2.5, which represents microscopic particles floating in the air. You can see that the air quality measurement ranges from 3.5 to 17.06 (ug/m3), with some areas of Los Angeles having a much higher air pollution level (in dark-brown tones) and some a much lower one (in beige tones).

    Air Quality raster

    You'll use the Zonal Statistics as Table tool to summarize the values at the census tract level. The tool will compute the mean of all the raster pixels that fall within each tract.

  2. In the Geoprocessing pane, click the Back button.

    Back button in the Geoprocessing pane

  3. Search for the Zonal Statistics as Table tool and open it.
  4. In the Zonal Statistics as Table tool parameters, choose the following values:
    • For Input raster or feature zone data, choose LA City Tract Data.
    • For Zone field, choose Tract ID.
    • For Input value raster, choose the Air Quality layer.
    • For Output table, type Tract_AvePM25.
    • For the Statistics type parameter, choose Mean.

    Zonal Statistics as Table parameters

  5. Click Run.

    The tool runs and a new table containing the mean values for each census tract is added to the Contents pane. You'll join these values to the LA City Tract Data layer.

  6. In the Geoprocessing pane, click the Back button.
  7. Search for the Join Field tool and open it.
  8. In the Join Field tool parameters, choose the following values:
    • For Input Table, choose LA City Tract Data.
    • For Input Join Field, choose Tract ID.
    • For Join Table, choose Tract_AvePM25.
    • For Join Table Field, choose ID.
    • For Transfer Fields, choose MEAN.

    Join Field parameters

  9. Click Run.

    You'll open the attribute table of the LA City Tract Data layer to view the new mean values.

  10. Right-click the LA City Tract Data layer and choose Attribute Table.
  11. Scroll horizontally to the end of the attribute table.

    A new field named MEAN is now present in the table.

    Mean field added to the LA City Tract Data layer

    You'll change the name for clarity.

  12. Right-click the MEAN field header and choose Fields.

    Fields option

  13. On the Fields tab, in the Field Name column, locate the MEAN field. Double-click the MEAN value and type AvePM25.
  14. Change the Alias value of that same row to Ave PM 2.5.

    Rename the Mean field.

  15. On the ribbon, on the Fields tab, in the Changes group, click Save.

    Save button on the Fields tab

  16. Click the LA City Tract Data attribute table tab, and review the newly renamed Ave PM 2.5 field.

    Ave PM 2.5 field

  17. Close the attribute table and Fields view panes.

    Ave PM 2.5 is the variable that you'll use in your analysis to represent the Air Quality criteria.

In this section, you extracted information from the Air Quality raster data to create a new variable for your analysis.

Prepare the Land Surface Temperature raster data for analysis

You'll now summarize the temperature information provided by the Land Surface Temperature raster layer.

  1. In the Contents pane, collapse and turn off the Air Quality layer. Expand and turn on the Land Surface Temperature layer.

    Land Surface Temperature turned on

    Every pixel in the Land Surface Temperature raster layer reflects the highest temperatures recorded in LA County on September 5, 2020. This data was chosen because it was the closest date of captured data from Los Angeles' hottest day ever recorded, on September 6, 2020. For example, on September 6, 2020, an area near the Woodlands Hills neighborhood reached 121 degrees Fahrenheit at 1:30 p.m. The September 5, 2020, values reflect the worst-case scenario to date.

    Land Surface Temperature layer

    On the layer, you can see that the temperatures range 64.12 degrees (in blue tones, mostly in the ocean) to 128.83 in the hottest areas (dark red).

    Note:

    Land surface temperature (LST) isn't the same as air temperature. NASA defines LST as how hot the surface of the earth would feel to the touch. LST heats up and cools more quickly than air temperature.

  2. In the Geoprocessing pane, click the Back button. Search for and open the Zonal Statistics as Table tool.
  3. In the Zonal Statistics as Table tool parameters, choose the following values:
    • For Input raster or feature zone data, choose LA City Tract Data.
    • For Zone field, choose Tract ID.
    • For Input value raster, choose the Land Surface Temperature layer.
    • For Output table, type Tract_AveLST.
    • For the Statistics type parameter, choose Mean.

    Zonal Statistics as Table parameters

  4. Click Run.

    You'll join the resulting table to the LA City Tract Data layer.

  5. In the Geoprocessing pane, click the Back button. Search for and open the Join Field tool.
  6. For the Join Field tool parameters, choose the following values:
    • For Input Table, choose LA City Tract Data.
    • For Input Join Field, choose Tract ID.
    • For Join Table, choose Tract_AveLST.
    • For Join Table Field, choose ID.
    • For Transfer Fields, choose MEAN.

    Join Field parameters

  7. Click Run.

    You'll open the LA City Tract Data attribute table to review the new attribute and rename it.

  8. Right-click the LA City Tract Data layer and choose Attribute Table.
  9. Scroll horizontally to the end of the attribute table.
  10. Right-click the new MEAN field header and choose Fields.

    Mean field added to the LA City Tract Data layer

  11. On the Fields tab, in the Field Name column, rename the MEAN field name to AveLST, and the Alias to Ave Land Surface Temp.
  12. On the ribbon, on the Fields tab, in the Changes group, click Save.
  13. Click the LA City Tract Data attribute table tab, and review the newly renamed Ave Land Surface Temp field.

    Ave Land Surface Temp

  14. Close the attribute table and Fields view panes.
  15. Press Ctrl+S to save the project.

AveLST is the variable that you'll use in your analysis to represent the Land Surface Temperature criteria.

In this module, you prepared your data by computing disparity indices for each of your demographic variables. You then extracted the air quality and land surface temperature data at the census tract level to generate the last two environmental variables. All your data is now ready for analysis.


Model tree planting priorities

Your data is now ready for computing tree planting priority scores for all the census tracts. You'll do that with the Business Analyst Suitability Analysis tool. You'll then use those priority scores to decide how many of the 90,000 trees should be planted in each census tract. The Suitability Analysis tool, included in the ArcGIS Business Analyst Desktop extension, is originally meant to find an optimal location based on a list of criteria. In this lesson, you'll use it to compute a tree planting priority score for every census tract, based on your 12 criteria and 3 objectives.

One powerful feature of the Suitability Analysis tool is that it allows you to adjust the weight of each of the criteria to change how much it will influence the final score. In your analysis, you'll weight the variables based on their importance to meet the three objectives of the City of Los Angeles.

Compute priority scores

First, you'll create a suitability analysis layer.

  1. In the Contents pane, collapse the Land Surface Temperature layer and turn it off.
  2. On the ribbon, on the Analysis tab, in the Workflows group, click Business Analysis, and choose Suitability Analysis.

    Access Suitability Analysis from the Analysis tab

    The Make Suitability Analysis Layer tool opens in the Geoprocessing pane.

  3. In the Make Suitability Analysis Layer tool, choose the following values:
    • For Input Features, choose LA City Tract Data.
    • For Layer Name, type Tree Planting Priority.

    Make Suitability Analysis Layer

  4. Click Run.

    The Tree Planting Priority layer is added to the Contents pane. You'll change the symbology so that the layer is one solid color.

  5. In the Contents pane, right-click the Tree Planting Priority layer and choose Symbology.
  6. In the Symbology pane, for Primary symbology, choose Single Symbol.

    Single Symbol symbology for the Tree Planting Priority layer

    The Tree Planting Priority layer updates symbolized with a single color.

    Note:

    The map color chosen by default may vary.

    Next, you'll add your 12 analysis criteria to the Tree Planting Priority suitability layer.

    Note:

    When a suitability layer like Tree Planting Priority is selected in the Contents pane, a Suitability tab appears in the ribbon. The Suitability tab enables you to perform various suitability analysis tasks on the layer.

  7. In the Contents pane, ensure that the Tree Planting Priority layer is selected.
  8. On the ribbon, on the Suitability tab, in the Criteria group, expand the Add Criteria drop-down list, and choose Add Fields from Input Layer.

    Add Fields from Input Layer option

    The Add Field Based Suitability Criteria tool opens in the Geoprocessing pane.

  9. In the Add Field Based Suitability Criteria tool, click the arrow next to Fields to reveal all the fields you can use in your analysis.

    You'll add the fields associated with each of the city's objectives:

    • Equity: DisIdxHispanic, DisIdxNonHspBlack, DisIdxPoverty
    • Susceptibility: DisIdxAsthma, DisIdxBus2Work, DisIdxSeniors, DisIdxWalk2Work, DisIdxYoungChildren
    • Environment: Ave Percent Tree Canopy Coverage, Ave Annual Daily Traffic, AveLST, AvePM2.5
  10. Check the box next to the 12 fields listed above and click Add.

    Add Field Based Suitability Criteria parameters

  11. Click Run.

    The Tree Planting Priority layer updates.

    Initial tree planting priority map

    It now includes all 12 criteria and a new Final Score attribute. The map symbology changes to show the Final Score values.

    The Final Score attribute represents the sum of all weighted criteria. By default, the criteria have all been added with an equal weight. You will now adjust these weights to better match your analysis objectives.

  12. On the ribbon, on the Suitability tab, in the Criteria group, click Suitability Criteria.

    Suitability Criteria button

    The Suitability Analysis pane appears, listing all your criteria variables and specifying their current weights.

  13. In the Suitability Analysis pane, verify that all the criteria have an equal weight (8.33333), so that all weights add to 100.

    Suitability Analysis criteria pane

    You'll now choose the new weights. You'll construct the weights so that each of the City of Los Angeles' objectives (equity, susceptibility, and environment) are equal, even though each objective includes a different number of variables. You'll also round the weights to make the math easier. The sum of all weights is 100. The sum of values within each objective should be around 33. Doing this gives each objective similar influence on the priority scores.

    • The equity objective contains three variables that will each receive a weight of 11.
    • The susceptibility objective contains five variables that will each receive a weight of 7.
    • The environment objective contains four variables that will each receive a weight of 8.
    Note:

    The weight distribution chosen in this analysis is not a general requirement, and you have full control over how you want each variable to be weighted. You can be creative in applying different weights to each individual variable or objective, as long as they sum up to 100.

    It is best practice for government agencies and nonprofit organizations to partner with community stakeholders, advocates, and residents to determine appropriate weight for every criteria.

    You'll set the weight for the first variable.

  14. For Ave Annual Daily Traffic, click the lock to lock the value. For weight, type 8.

    Change a weight and lock the value.

    Note:

    Locking the value means that the weight will remain as you enter it and won't be adjusted automatically by the tool.

    Similarly, set the weights for the other variables, using the table below as a guide. The variables are alphabetized like in the Suitability Analysis pane.

    ObjectiveVariablesWeight

    Environment

    Ave Annual Daily Traffic

    8

    Ave Land Surface Temp.

    8

    Ave Percent Tree Canopy Coverage

    8

    Ave PM 2.5

    8

    Susceptibility

    DisIdxAsthma

    7

    DisIdxBus2Work

    7

    Equity

    DisIdxHispanic

    11

    DisIdxNonHspBlack

    11

    DisIdxPoverty

    11

    Susceptibility

    DisIdxSeniors

    7

    DisIdxWalk2Work

    7

    DisIdxYoungChildren

    7

    Once the weights are set, you'll change the Influence setting where needed. At present, all the influences are positive: the higher the score, the higher the tree planting priority. The influence for Ave Percent Tree Canopy Coverage, however, should be Inverse. Higher priority should be given to tracts with less tree canopy coverage.

  15. Under Ave Percent Tree Canopy Coverage, click Additional Options. For Influence, choose Inverse.

    Inverse option for Influence

    Note:

    The Additional Options settings in the Suitability Analysis pane allow you to tailor each variable's impact on the final score. You have the option to adjust its Influence and Threshold. Influence defines the importance of the value for each variable. If you select Inverse, then the lower the value, the more influential the variable becomes. Conversely, if you select Positive, then the higher the variable value is for a specific area, the more influence it will have on the overall score. Ideal allows you to choose the specific value that will be most important in the range of values for the variable. Anything close to that value will be more influential in the overall score. Threshold defines upper and lower limits to the variable's value range that will impact the overall score. You can set these limits to eliminate any possible outliers that could exaggerate the overall priority score.

    As you changed the weights and other options, the Final Score values automatically updated. Your map should now look like the map below.

    Final Tree Planting Priority Final Score map

    The dark-red areas represent the highest final scores, and they should get the highest priority for tree planting.

    Note:

    As mentioned earlier, one advantage of this approach is that it helps address the issue of intersectional inequity. For instance, a census tract where there is high inequity for residents living in poverty, senior citizens, Hispanic residents, and people affected by asthma will get a higher priority than a census tract where there is inequity only for one of those variables.

    Optionally, you can try changing the weights to see the impact it has on the map.

  16. Close the Suitability Analysis pane.

In this section, you computed tree planting priority scores for every census tract in the City of Los Angeles. Next, based on those scores, you'll compute the number of trees to plant in each tract, so the total number of trees sums up to 90,000.

Allocate trees based on priority scores

To convert the priority scores into a tree count, you'll first sum up the priority scores for the entire city of Los Angeles. You'll then compute the ratio of each census tract's score over the total sum and multiply it by 90,000 (the total number of trees). For instance, if a tract has a score of 0.77 and the total sum of all the scores is 817.9, the number of trees to be planted in that tract will be (0.77/817.9) * 90000, that is, 84.7 (or 85, rounded to the closest integer).

  1. Right-click the Tree Planting Priority layer and choose Attribute Table.
  2. In the attribute table, scroll horizontally to the end of the table to locate the Final Score attribute.

    Final Score field

  3. Right-click the Final Score attribute name and choose Statistics.

    Statistics menu option

    A histogram and the Chart Properties pane appear. The Chart Properties pane indicates the sum value of 817.9.

    Sum for Final Score field

    Next, you'll create a new attribute and compute in it the suggested number of trees for each tract.

  4. Close the histogram and Chart Properties pane.
  5. In the Geoprocessing pane, click the Back button. Search for and open the Calculate Field tool.
  6. In the Calculate Field tool, enter the following values:
    • For Input Table, choose Tree Planting Priority.
    • For Field Name, type TreesToPlant.
    • For Field Type, choose Long.

    The Long field type will ensure a whole number of trees (2, not 2.34, for example).

    Calculate Field parameters

  7. Under TreesToPlant =, create the expression (!FinalScore! / 817.9) * 90000.

    Create the expression to calculate the number of trees to plant per census tract.

  8. Click Run.

    You'll review the results.

  9. In the Tree Planting Priority attribute table, examine the values for the TreesToPlant attribute.

    TreesToPlant field

    The number of trees per tract ranges from 6 to 100. The total sum of trees does not amount to 90,000 exactly because of rounding, but it is close enough to make a start at allocating the 90,000 trees to census tracts.

    You'll join the TreesToPlant field to the LA City Tract Data, so all the data is in a single layer. You'll then create a final map that symbolizes the number of trees to plant in each census tract.

  10. Turn off the attribute table pane.
  11. In the Geoprocessing pane, click the Back button. Search for and open the Join Field tool.
  12. In the Calculate Field tool, enter the following values:
    • For Input Table, choose LA City Tract Data.
    • For Input Join Field, choose Tract ID.
    • For Join Table, choose Tree Planting Priority.
    • For Join Table Field, choose Tract ID.
    • For Transfer Fields, choose TreesToPlant.

    Join TreesToPlant field with LA City Tract Data layer.

  13. Click Run.

    The field is added to the LA City Tract Data layer.

  14. In the Contents pane, collapse and turn off the Tree Planting Priority layer. Turn on the LA City Tract Data layer.
  15. Right-click LA City Tract Data and click Symbology.
  16. In the Symbology pane, choose the following parameter values:
    • For Field, choose TreesToPlant.
    • For Method, keep Natural Breaks (Jenks).
    • ForClasses, choose 5.
    • For Color scheme, choose Yellow to Green (Continuous).

      TreesToPlant symbology

      The map updates, showing the final allocation for the 90,000 trees.

      Map showing the equitable allocation of 90,000 trees across all census tracts in the city of Los Angeles

  17. Press Ctrl+S to save the project.

In this section, you computed the number of trees that should be allocated in each census tract so the distribution is equitable, and the total number of trees sums up to 90,000.

Go further

Determining equitable tree planting locations across a city is an important and in-depth task. Throughout this lesson, you were able to accomplish this by exploring variable relationships, mapping disparity, and calculating disparity indices. You also processed raster data to summarize environmental information at the census tract level. Using the disparity indices and environmental variables, you then developed a suitability analysis model that prioritized census tracts based on their need for increased tree canopy coverage to meet equity, environment, and susceptibility objectives. Finally, you used the results from the suitability analysis to equitably allocate 90,000 trees across the city of Los Angeles.

In this workflow, you focused on allocating a fixed number of planned trees. However, a complete shade equity program might need to consider other elements upstream or downstream of this analysis. For instance, how would you determine what the optimal tree count is overall for a city? Guidelines for optimal numbers are not straightforward; they vary based on climate, topography, urban densities, and many other factors. Once you've determined the optimal number of trees for each census tract, how would you select the exact location for each tree within its tract? What tree species would you select? What measures should be taken to assure the survivability of newly planted trees? Would there be a program to accompany and monitor the tree growth to assure high survivability? You might also need to increase the number of trees you plant to compensate for tree mortality rates. How would you answer these and related questions?

So far, you focused on the city of Los Angeles, but next, could you take this approach and apply it to your own city? Do you find that tree canopy and shade equity is a topic for discussion? Could a similar approach help further your city's pledge to address social and racial equity? You now have an overall understanding of the workflow proposed and the type of data and variables required. It's time to take this knowledge and transfer it to your local government or nonprofit organization to help address inequities in your community.

Learn about data sources (optional)

To get started on your own city's equitable tree distribution project, you'll need to gather data for your geographic area. In this section, you'll find some explanations on where the data used in the lesson came from. If you are located in the United States, you'll likely be able to use the same datasets with a focus on your city. If you live in another country, you might need to look for equivalent data sources in your region.

Using the Enrich geoprocessing tool in ArcGIS Pro, you can obtain the demographic data needed to fulfill the susceptibility and equity objectives. The Enrich tool allows you to compile data for both custom and authoritative geographic boundaries (for instance, Census tracts in this lesson). Using the tool, you can select from a large list of social, economic, and demographic variables reducing the time needed to collect, format, and clean datasets for analysis. While the Enrich tool does use credits, it's typically much less expensive than finding, processing, and validating the data yourself.

The tree canopy and environmental data was acquired from several sources. The canopy data was derived from the ArcGIS Living Atlas USA NLCD Tree Canopy Cover raster layer, which represents the canopy cover percentage for each 30-meter size raster cell. The granular raster data was rolled up to the census tract level by averaging the values for all the raster cells within each census tract (Ave Percent Tree Canopy Coverage). This was done with the Zonal Statistics as Table tool, similarly to the Air Quality workflow you saw earlier. The average percent tree canopy values were then used to determine the square kilometer coverage of tree canopy within each census tract (Total Canopy Coverage in Sq Km).

The air quality data was acquired from the North American Regional Estimate PM 2.5 Air Quality dataset. This dataset represents 18 years of PM 2.5 measurements generated from satellite-borne sensors. The data was compiled in the raster format using the Composite Bands and Cell Statistics tools to generate a single raster layer composed of the 18 years of PM 2.5 data.

Land surface temperature was derived from the ArcGIS Living Atlas Multispectral Landsat image service. The date of a particularly hot day was selected (September 5, 2020), and Band 10, which represents thermal data, was utilized.

Traffic data was also acquired from ArcGIS Living Atlas. The USA Traffic Counts point data was rolled up to the census tract for the city of Los Angeles (with the tool Summarize Within).

Finally, note that the data used in this lesson is projected using NAD 1983 (2011) State Plane California V FIPS 0405 (US Feet), which is well suited for the Los Angeles area. If you want to reproduce this analysis on a different study area, you will need to project the data to the appropriate coordinate system. You can learn more about selecting projections by completing this Learn ArcGIS lesson or reading this blog.

You can find more lessons in the Learn ArcGIS Lesson Gallery.