Explore the mortality rate maps
In a December 2013 article, the New York Times reported that black women in America are far more likely to die of breast cancer than white women. This is not because of biological differences, but because of lack of healthcare and preventative measures. Your first goal is to confirm that the mortality gap mentioned in the article exists. To do so, you'll map where breast cancer mortality rate information for black and white women has been reported. In some parts of the United States, mortality data has been suppressed, or hidden, by the data provider. Data can be suppressed due to high rates of unreported numbers or privacy concerns.
Map breast cancer mortality
To view the data, you'll create a map of your area of interest in ArcGIS Pro. Then, you'll add mortality data for black and white women and compare it.
- Start ArcGIS Pro. If prompted, sign in using your licensed ArcGIS account.
If you don't have ArcGIS Pro or an ArcGIS account, you can sign up for an ArcGIS free trial.
- Under New, click Map.
- In the Create a New Project window, name the project Breast Cancer Mortality. Save the project in the location of your choice and make sure Create a new folder for this project is checked, and then click OK.
A blank map project opens in ArcGIS Pro. Next, you'll add data.
- On the ribbon, on the Map tab, click Add Data.
- In the Add Data window, click All Portal. In the search bar, type breast cancer owner:Learn_ArcGIS.
- Choose Breast Cancer Mortality and click OK.
The Breast Cancer Mortality layer is added to the map. The default symbology shows the overall mortality rate for breast cancer in the United States. Mortality rate is the number of deaths per 100,000, and is calculated using the formula Mortality Rate = (Cancer Deaths / Population) × 100,000.
Counties symbolized in darker pinks have higher mortality rates and lighter pinks are lower mortality rates. Counties on this map that aren't filled in show where data has been suppressed by the provider. Data is suppressed when there are too few cases in each county that individual privacy would be violated.
- In the Contents pane, right-click the Mortality layer. In the menu, click Open Attribute Table.
The attribute table shows all the attribute data contained in the feature layer. There is population data for each county, including breakdowns of black, white, male, and female populations. Scroll right until you see the mortality rate columns. Notice that there are a lot of zeros; this is the placeholder value for the suppressed data.
Next, you'll change the symbology to view and compare the available data for black and white women.
- Close the attribute table.
- In the Contents pane, right-click the Mortality layer and choose Symbology.
The Symbology pane opens. The primary symbology is already set to graduated colors.
- Expand Field and choose Black Mortality.
The symbology draws in the default five classes. Because the symbology was set not to show zeros, the class has no color. Now you'll change the basemap.
- On the ribbon on the Map tab, in the Layer group, click Basemap. Choose Light Gray Canvas.
The counties that have reported data for black women are shown in pink. Most of the data is in the east and southeast of the country. In all other areas, there are large gaps in the data. Counties on this map that aren't filled in show where data has been suppressed by the provider.
- If necessary, in the Contents pane, click the arrow next to the Breast Cancer Mortality Data layer to expand the legend.
The legend indicates that the mortality rates are as high as 90 average annual deaths due to breast cancer per 100,000 women. In Caddo Parish, Louisiana, for example, the mortality rate for black women is 29 for the period 2011–2015. The average number of annual deaths among black women with breast cancer is 19. Though 19 deaths due to a single type of cancer seems relatively small, when normalized for population, as the mortality rate statistic does, it is clear that breast cancer disproportionately affects the black female population in this county.
Before you change the symbology, you'll create a copy of the layer so that you can switch back and forth between the two.
- Right-click Mortality and choose Copy.
- Right-click Map and choose Paste.
An identical layer is added to the Contents pane.
- Click the copy layer's name twice to edit it. Type Black Mortality, and then uncheck the layer to turn it off.
The copy is not a new file that's been saved to your computer, but a layer that exists only in the map document. When you close the project, the layer data won't be saved.
- Right-click Mortality and choose Symbology. In the Symbology pane, expand Field and choose White Mortality.
While there are still gaps in reporting, less data was suppressed for white women. In addition, the legend indicates that the range of mortality rates is lower than for black women, with rates up to 48.3. For comparison, Caddo Parish's mortality rate for white women is 17.2. Every year, approximately 18 white women will die of breast cancer.
Though in terms of average annual deaths, the rates seem similar, when normalized for population (calculated using either the population of white or black women), it is clear that black women are more affected by breast cancer.
- On the ribbon, click save.
Next, you'll look at the classification statistics for both black and white women to see if the information proves that there is a gap in mortality rates. The default classification is Natural Breaks (Jenks). Jenks breaks the data into classes using a combination of statistical measures (mean, median, quantiles) and gaps that exist naturally in the data. Since no two datasets are the same, this creates unique classes. To better compare the mortality rates, you'll want to use a more uniform classification technique and exclude all the counties with suppressed data.
- If necessary, in the Contents pane, right-click Mortality and choose Symbology.
- For Method, expand the list and choose Quantile.
Quantile is a data classification method that separates data values so that each category has the same number of data values. Because the null values are shown as 0, they are making the statistics skew low. To more permanently remove the null values from the map so you can see more accurate statistics, you'll use data exclusion. While the null values will still exist in your data, you will write a query to remove them from the symbology.
- Click the Advanced Symbol Options button.
- Expand Data exclusion and click New Expression.
- Use the query wizard to build the expression White Mortality is Equal to 0.
All the null values that you just specified will now be excluded from the data you show.
- Click Apply.
The null values are shown on the map in a default color. You'll make them transparent again.
- Click the Primary Symbology button.
A new category for excluded values has been added to the symbology pane, but the zero class is still shown.
- For Classes, choose 4.
- Right-click the symbol box for <excluded>.
The Color selector window appears.
- At the top of the Color window, choose No Color.
Now that the suppressed data has been removed, the available data is accurately categorized. To view it, you'll look at the layer's data distribution using a histogram.
- Click the Histogram tab.
The list of classes changes to a histogram of the data for white mortality.
- Click More and choose Show Statistics.
The classification statistics for white mortality are shown in the pane. The Count statistic indicates that there are 1,627 counties with data for white women, which is about 52 percent of all 3,143 United States counties and county equivalents. Additionally, the statistics indicate that the average (mean) breast cancer mortality rate for white women is 21.
Now you'll repeat the process to see the distribution histogram for black women.
- In the Contents pane, uncheck Mortality and check Black Mortality. Click the Black Mortality layer to make it active.
For comparison, you'll change the symbology to the same method as for white mortality.
- For Method, choose Quantiles.
- Click the Advanced Symbology Options button and, if necessary, expand the Data exclusion group.
- Click New Expression and build the expression Black Mortality is Equal to 0. Click Apply.
- Click the Primary Symbology button to return and change Classes to 4.
- Right-click the symbol box for <excluded> and choose No Color.
- Click Histogram and open the attribute statistics.
For black women, there is data for 330 counties (just over 10 percent of all counties and county equivalents). The mean mortality rate for black women is 30.86, nearly 10 deaths per 100,000 higher than for white women.
- Save the map.
While there is more data available for white women, the range of the data values you mapped suggests the article's claims are true.
Map the mortality rate difference
Previously, you looked at maps that showed mortality rates are generally higher for black women across the country. But because of the large amount of unreported data, you couldn't visually compare where the variation occurs. Next, you'll map the difference in the mortality rates for black and white women for counties that have data for both groups.
Identify dual-data counties
To quantitatively compare mortality rates for black and white women, you need to know what counties have data for both groups. You'll find these counties by creating an SQL query to select counties that meet the criteria.
- If necessary, open your Breast Cancer Mortality project.
- Uncheck Black Mortality and check Mortality to turn the layer on.
- Right-click Mortality and choose Attribute Table.
The attribute table opens to show all the records for the layer. To find counties that have data for both black and white women, you'll create a new field.
- On the ribbon, on the Table contextual tab, click the View tab.
Contextual tabs appear when the application is in a particular state. The Table tab will only be visible when an attribute table is open.
- In the Selection group, click Select By Attributes.
The Select By Attributes menu opens in the Geoprocessing pane with a wizard like what you used to build the exclusion query earlier. To perform analysis, you want only counties that have mortality data for both black and white women.
- In the Geoprocessing pane, click New Expression. Build the query White Mortality is Greater Than 0 and click Add Clause again.
- Build the query And Black Mortality is Greater Than 0. Click Run.
By using the And operator in this query, you've specified that the selected records have to meet the criteria in both queries. The records that meet both queries are highlighted in the attribute table and on the map.
- In the attribute table, click Show Selected Records.
Three hundred and twelve counties and county equivalents have provided data for both. Next, you'll export these into a new layer which will keep all of the associated attributes.
- In the Contents pane, right-click the Mortality layer and choose Data. Click Export Features.
The Feature Class to Feature Class tool opens in the Geoprocessing pane.
- For Output feature class type Dual_Data_Counties.
Unless you change the output location, the new feature class will be saved in the default project geodatabase.
- Click Run.
The new layer draws to the map with default symbology.
- Rename the Mortality layer White Mortality and then turn it off. Close the attribute table.
These are the counties for which you can run comparisons.
Symbology is random. Your layer may draw in a different color.
Calculate mortality rate difference
- Right-click Dual_Data_Counties and open the attribute table for the new layer.
To calculate the difference in mortality rates, you'll create a new column in the table.
- In the attribute table on the Field ribbon, click Add Field.
The Fields tab opens in the same pane as the attribute table . The new field you created is added as the last row, and is populated with default values. You'll edit this to hold the values you're going to calculate later.
- On the Fields tab, click the last record in the Field Name column and name your new field Mortality_Rate_Difference.
- For Alias, type Mortality Rate Difference. For Data Type, click the box twice and choose Short.
Setting the field parameters determines how the information is stored and displayed. The field name requires underscores because spaces between words aren't readable by the software. That's why you gave it a more readable alias, which will show up in the attribute table.
Field type specifies how many decimal points of each data record will be saved. Choosing short allocates the least amount of storage to the field and only accepts integer values up to five digits. To store larger numbers or numbers with fractional values, use Long Integer, Float, or Double as the type.
- On the ribbon on the Fields tab, click Save.
- In the attribute table, close the Fields tab.
- On the Dual_Data_Counties tab, scroll to the end of the table to confirm that the Mortality Rate Difference field has been added.
Because Mortality Rate Difference is a new field, the data values are null, or not yet defined. Next, you'll calculate data values based on the data in other columns. For this analysis, you're interested in the difference between black female mortality rates and white female mortality rates. To find the difference, you'll subtract white female mortality rates from black female mortality rates using the Field Calculator.
- Right-click the Mortality Rate Difference heading and choose Calculate Field.
The Calculate Field menu opens in the Geoprocessing pane.
- If necessary, for Expression Type, choose Python 3.
- In the Fields box, double-click Black Mortality.
The field is added to the Mortality_Rate_Difference = expression box.
- Double-click the rest of the necessary fields and mathematical operators to build the expression !Black Mortality! - !White Mortality!.
- Click Run.
Data values are added to the Mortality Rate Difference column in your table. Most of the numbers in this column are positive, meaning that the mortality rate is higher for black females than for white females. Scattered through the data, though, are zeros and negative numbers. In these counties, the mortality rate is the same or is higher for white females.
Now that you have the calculated the differences, you'll map them.
- Close the attribute table and save the map.
Symbolize the values
Next, you'll symbolize the differences you just calculated to visualize them on the map. The Breast Cancer Data layer currently shows all its features using a single symbol. You'll display the features using the Mortality Rate Difference field to show where in the United States the gap exists.
- In the Contents pane, right-click the Dual_Data_Counties layer and choose Symbology.
- In the Symbology pane, expand the Primary Symbology menu and choose Graduated Colors.
- For Fields, choose the Mortality_Rate_Difference field.
When you calculated this field, you determined that negative numbers belonged to counties that had higher mortality rates for white women, zeroes had an equal mortality rate, and positive numbers were counties that had higher mortality rates for black women. You'll reduce the number of classes to three to show these categories on the map.
- For Classes, choose 3.
- For the first value in the Upper value column, click the break value twice and change it to -1. Change the second break value to 0 and leave the third unchanged.
The break lines on the bar chart represent the breaks you have set. By changing the breaks, you now have three categories that represent your three categories of interest. The default color ramp makes the categories difficult to distinguish on the map.
- Right-click the first symbol box (for ≤ -1.0).
The Format Polygon Symbol gallery opens.
- Click the Properties tab and for Color, choose Jadeite.
To see color names, hover over each box.
- Change the second symbol to Medium Yellow and the final symbol to Blackberry.
Next, you'll change the labels for each symbol.
- Under Label, click the ≤ -1.0 label and type Greater for White Women. Change the ≤ 0.0 symbol label to Equal and the ≤ 44.0 symbol to Greater for Black Women.
The map now shows the counties symbolized with the new colors.
The majority of counties that have reported data have higher mortality rates for black women. This map shows where the variation in mortality rates occurs, proving that the gap in mortality rates is a national problem.
Most of the variation occurs in eastern and southern counties.
- Save the project.
You've analyzed data about mortality rates for black and white women and symbolized it to show the large scope of the problem.
Map the mortality rate ratio
Previously, you mapped the differences in mortality rates for black and white women. Next, you'll determine how wide the mortality rate gap is in each county. You'll calculate this gap using a rate ratio, which compares rates (of mortality, in this case) in two groups that differ by demographic characteristics (race, in this case). The rate for the primary group of interest (black women) is divided by the rate for a comparison group (white women). Using this statistic, you'll perform a hot spot analysis to map where the mortality rates are highest. Based on your results, your cancer research advocacy group will know in which areas of the country to focus its campaign.
Map the mortality rate ratio
Now that you've seen the difference in mortality rates, you'll calculate another statistic, the rate ratio. The ratio gives the likelihood of an outcome for a specific group. In other words, a rate ratio of 5 in a specific county would mean that a black woman with breast cancer in that county is five times more likely to die of the disease than a white woman.
First, you'll calculate the ratio of deaths due to breast cancer between black women and white women using the following formula:
Rate Ratio = Rate for black women / Rate for white women
Then, you'll use this calculation in a hot spot analysis. Using the rate ratio identifies counties in which the mortality rate for black women is significantly higher, instead of counties where the mortality rate is high for both black and white women.
- If necessary, open your Breast Cancer Mortality project.
- Copy the Dual_Data_Counties layer and paste it in the Contents pane on top of Map.
- Turn off the original Dual_Data_Counties layer. Rename the copy Mortality Rate Ratio.
For this map, the mortality rates for black women are divided by the rates for white women. This value will tell you how much more likely black women are to die from breast cancer than white women. You'll add a new field to the attribute table and calculate the values in the field to show the rate ratio.
- Right-click Mortality Rate Difference and choose Attribute Table.
- In the attribute table on the Field ribbon, click Add Field.
- On the Fields tab, click the last record in the Field Name column and name your new field Mortality_Rate_Ratio. Change the Data Type setting to Double.
- For Alias, type Mortality Rate Ratio and on the ribbon, click Save. Close the Fields tab.
- Right-click the new Mortality Rate Ratio field and choose Calculate Field.
To find the difference, you subtracted black mortality from white mortality. To find the rate ratio, you'll divide.
- In the Geoprocessing pane, double-click field names and operators to create the expression !Black_Mortality! / !White_Mortality!.
- Click Run.
The rate ratio calculations are added to the new field.
- Right-click the Mortality Rate Ratio header and choose Sort Descending.
Values less than or equal to 1 are counties in which the mortality rate is the same or higher for white women than for black women.
- Close the attribute table.
Next, you'll symbolize the values you calculated.
Symbolize the values
To symbolize the difference, you'll use graduated symbology in which the size of a county's circle relates to the magnitude of the value you're mapping. Graduated symbols on a map change sizes according to the value of the attribute they represent. For instance, a county with a higher rate ratio would be symbolized with a larger shape than a county with a smaller rate ratio. You want to make sure that the counties in which black women have lower mortality rates are clearly distinguished from all others. To symbolize this distinction, you'll use a different color.
- In the Contents pane, right-click Mortality Rate Ratio and choose Symbology.
- Under Primary symbology, choose Graduated Symbols.
- For Fields, choose Mortality Rate Ratio, and for Classes, choose 5.
- Change Minimum size to 5 pt and Maximum size to 25 pt.
In the previous lesson, you had three categories because you symbolized the distribution of data into the three kinds of mortality rates: greater for black women, equal, and greater for white women. This time, you'll symbolize the range of mortality rates, so you'll break the data into five classes. Remember that values less than or equal to 1 are counties in which the mortality rate is the same or higher for white women than for black women.
- In the Classes list, change the values in the Upper value column to 1.0, 1.5, 2.0, 2.5, and 3.5.
The dots on the map redraw to show the classes you specified. Now you'll symbolize the ratios. Based on the rate ratios you calculated, the first symbol will show where mortality rates for white women are higher. Following the same color scheme you used earlier, you'll make this symbol green to differentiate it from the rest of the data.
- Click the first symbol. If necessary, click the Properties tab and use the Color selector window to make it the same color green you used for Mortality Rate Difference.
- Change the symbol Size setting to 7 pt and click Apply.
The symbol for counties where mortality rates are higher for white women are now drawn on the map using similar symbology to the Mortality Rate Difference layer.
- Click the back arrow, and then right-click the remaining symbols and choose the same purple you used for Mortality Rate Difference.
This map shows the rate ratio between average mortality rates for black and white women for the years 2006 to 2010. It was calculated as the number of black women who died per 100,000 black women divided by the number of white women who died per 100,000 white women. Larger purple circles are where black women are more likely to die of breast cancer than white women. Values under 1 (shown in green) are where white women are equally or more likely to die of breast cancer than black women.
This part of your investigation shows the higher rates at which black women are dying of breast cancer compared to white women. Calculating rate ratios was not only informative, it was a necessary step to finding hot spots. Next, you'll perform hot spot analysis to find clusters of high and low rate ratio values. Knowing where the clusters are will help you figure out where to target your organization's efforts.
Perform hot spot analysis
Based on your map, it appears that most counties with the highest rate are located in the south central and eastern coastal parts of the United States. To confirm this, you'll run a hot spot analysis. Hot spot analysis finds statistically significant clusters of high and low values. Where your mortality rate ratios are high and cluster together spatially, you have a hot spot. Cold spots are statistically significant spatial clusters of low rate ratio values. The output from the Hot Spot Analysis tool tells you how confident you can be that the spatial clustering of high or low values is significant. A hot spot with a 99 percent confidence level, for example, means there is only one chance out of 100 that a tight spatial cluster of high values is due to random chance. Finding a spatial cluster of high values that is statistically significant gives you confidence that the clustering is not the result of some random process, but rather suggests other spatial process are at work (like differences in health care, genetics, life styles, and so on).
Where confidence levels are very high, it is almost always statistically significant. For example, a 99 percent confidence interval means there is only a 1 percent chance that a cluster occurred randomly.
First, you'll make sure there are no null values in your data. Null values can make the analysis results inaccurate.
- In the Contents pane, double-click Mortality Rate Ratio.
The Layer Properties window appears.
- In the Layer Properties window, click Definition Query.
Like before when you used data exclusion to remove null values from the symbology, the definition query is an SQL query that removes null values from the layer.
- Click New Definition Query. Build the expression Mortality Rate Ratio is Not Null.
- Click Apply and click OK.
Any null values are removed from the data.
- On the ribbon, click the Analysis tab. In the Geoprocessing group, click Tools.
- In the Geoprocessing pane, search for and choose Optimized Hot Spot Analysis (Spatial Statistics Tools).
- For Input Features, choose Mortality Rate Ratio, and name the Output Mortality_Hot_Spots.
- For Analysis Field, choose Mortality Rate Ratio and click Run.
The Hot Spot Analysis tool finds statistically significant clusters of high and low values. Using the Getis-Ord Gi* statistic (pronounced G-i-star), the tool identifies statistically significant hot and cold spot areas. Hot spots, clusters of counties where black women are dying at higher rates than white women, are shown in bright red. Cold spots, clusters of counties where black and white women are dying at similar rates from breast cancer, are shown in blue.
For the counties in which data has not been suppressed, there are clear clusters of high and low values. These are groups of counties in which black women are dying at higher rates than white women. High values are clustered in the south central part of the United States, as you suspected from the previous map. The cluster in North Carolina is different from your previous impression that high values were clustered along more of the eastern coast. There is also a cluster of low values in the northeast. This is where the difference between the rates for black women and white women is smaller or where white women are dying at higher rates than black women. This has actually been studied previously (see Geographic Variation in Breast Cancer Mortality for White and Black Women: 1986–1995); however, your findings of clusters where black women are dying at higher rates than white women is new.
You can find more lessons in the Learn ArcGIS Lesson Gallery.