Explore SafeGraph social distancing data
Everything changes over time. Cities grow; tax evaluations increase; crime trends increase and decrease. There’s an impermanence to place that can be studied and understood.
Our daily choices and activity change also, and nothing has changed our collective spatial behavior quite like the COVID-19 pandemic. People's behavior has changed in different ways and at different rates across the country over time. ArcGIS Pro provides tools to understand those changes, and in this lesson, you will use some of these tools to visualize and understand some of the impact the pandemic has had on travel patterns in California.
Explore the Social Distancing dataset
First, you'll download and explore the data.
- Go to SafeGraph to register to download the data.
- Download the data.
The project file is 1.2 GB, so it may take some time.
- After the download has completed, double-click the VisualizeSafeGraphSocialDistancing.ppkx project package.
The project may take a minute to load due to the large amount of data (block group polygons for California for 45 days). If prompted, sign in to your ArcGIS account.
The project has two maps and a local scene. The California map is active.
This map shows the SafeGraph Social Distancing data symbolized by the % devices with stay at home behavior field. Lighter blue areas indicate lower percentages of the sampled devices showing the pattern described as stay-at-home behavior, that is, devices not leaving a 200-meter radius of the home.
This data contains daily slices from May 1, 2020, to June 14, 2020.
Next you’ll create a temporal chart to visualize the trends over this time period.
Create a data clock chart
Temporal charts provide a way to discover trends in data that contains a time value, such as this SafeGraph data. A data clock is a circular chart that divides a larger unit of time into rings and subdivides it by a smaller unit of time into wedges, creating a set of temporal bins. You will use data clock charts to understand trends in several variables based on the day of the week.
- In the Contents pane, right-click the SG Social Distancing polygon layer, click Create Chart, and click Data Clock.
- In the Chart Properties pane, on the Data tab, set the following variables:
- For Date, choose Date.
- For Rings, choose Weeks.
- For Wedges, choose Days of the week.
- For Aggregation, choose Mean.
- For Number, choose % devices with stay at home behavior.
- For Null, choose No Color.
In the data clock, each concentric circle (ring) represents a week, while each circle segment (wedge) represents a day of the week. The date range (May 1, 2020, to June 15, 2020) begins in the center and expands outward. The color of each wedge represents the variable value. The data clock you created represents the mean value of percentage of devices with stay-at-home behavior across all of the block groups.
This chart shows that for devices with stay-at-home behavior, Sunday tends to be a popular day for staying at home, though that trend is gradually declining.
The SafeGraph data includes other fields representing the percentage of devices showing other behavior patterns. The project contains three other data clocks showing the same time frame for other behavior patterns.
- In the Contents pane, double-click the Change in mean % devices with a delivery driver behavior by Days over Weeks chart.
This chart shows that for devices with delivery driver behavior, there is an increase in activity on Fridays. It may be that this was a popular day to order delivery takeout and groceries.
- In the Contents pane, double-click the Change in mean % devices with full-time worker behavior by Days over Weeks chart.
This chart shows that for devices with full-time worker behavior, Monday through Friday are peak activity days. The percentage values are between 3 percent and 5.5 percent, which is still relatively low. There is a significant drop in travel to work on the U.S. Memorial Day holiday (Monday, May 25, 2020).
- In the Contents pane, double-click the Change in mean % devices with part-time worker behavior by Days over Weeks chart.
This chart shows that for devices with part-time worker behavior, the weekdays saw higher values, but it’s notable that Saturday’s values are gradually increasing. The percentage values are between 5 percent and 7.7 percent, which is relatively low, but higher than full-time workers.
- Close the four data clock charts.
Data clock charts are a good way to visualize cyclical or seasonal data. Next, you'll use a line chart to look at temporal trends in a different way.
Create a line chart
Line charts are frequently used to visualize data. Now, you'll explore change in the % devices with stay at home behavior variable over the date range with a line chart. This will show the temporal trend for the mean of the variable from May 1, 2020, to June 14, 2020.
- In the Contents pane, right-click the SG Social Distancing polygon layer, click Create Chart, and click Line Chart.
- In the Chart Properties pane, on the Data tab, set the following variables:
- For Date or Number, choose Date.
- For Aggregation, choose Mean.
- For Numeric field(s), choose % devices with stay at home behavior.
- Click Apply but do not close the pane.
- In the Time binning options section, click the Interval size button and set the Interval size to 1 and the units to Days.
The resulting line chart shows a noticeable downward trend in the % devices with stay at home behavior variable over the date range, as well as a cyclic, weekly pattern.
As you saw with the data clocks, those cyclic peaks in stay-at-home behavior are typically Sundays. This shows that people are gradually reducing stay-at-home behavior over time, even though the pattern varies by day of the week.
- Close the line chart and close the Chart Properties pane.
Visualize change using the time slider
The time slider is useful for doing a qualitative, visual assessment of a symbolized data set with a time component. The time slider filters features to show the date range set on the slider. This can help you see trends over time.
- On the ribbon, on the Map tab, in the Navigate section, click the Bookmarks button. In the California Bookmarks section, click San Francisco.
The map zooms to San Francisco.
- In the Contents pane, right-click the SG Social Distancing polygon layer, and click Properties.
- Click the Time tab, and in the Layer Time drop-down list, click Each feature has a single time field, and click OK.
The Time Field defaults to Date, which is the correct field in this layer.
On the map, a Time button appears.
- Click the Time button.
- On the Time Slider, click the Play button.
The map plays an animation showing the data for each day. You can click the Pause button to pause the animation, and click the Step Forward and Step Back buttons on either side of it to step between the intervals.
As you view the time-step intervals, you can see changes on this map over time. This is a qualitative visual assessment method that can help to understand trends over time. This is more useful with some datasets than with others. Symbolizing the layer on other variables allows you to visually explore them.
Some polygons in some time intervals are missing because there was a low number of devices recorded in the block group.
For this dataset, it is difficult to visually quantify the patterns over time, but there’s a pulse. The temporal reference of intervals (such as day of week) is necessary to help make sense of what appear to be cyclic patterns, and the data clock and line graph charts earlier in the lesson can help you interpret what you see in this animation.
- Click the Save button to save the project.
You've learned a few methods to assess temporal trends in the SafeGraph social distancing data. The data clock visualizations helped you identify cyclic patterns. You’ve seen that Sundays are the day that has the highest percentage of individuals staying home, and you've seen that the highest percentage of individuals traveling for full-time work occurs Monday through Friday. The line graph showed that there is a downward trend in the percentage of individuals staying at home. Finally, the time slider helped you visualize the percentage of devices with stay-at-home behavior varying over space and time.
Data exploration such as this is a first step to begin to understand your data. While data exploration is valuable, it also is subjective. Next, you’ll use the Space Time Cube and Emerging Hot Spot Analysis geoprocessing tools to identify statistically significant trends over time.
Identify hot spot trends over time and space
Now you'll get the data into a format that supports advanced statistical analysis and visualization.
Create a space-time cube by aggregating points
To perform an emerging hot spot analysis and visualize the data as voxels, you need to create a space-time cube from the SafeGraph Social Distancing data.
- Open the project if necessary.
- Click the Pattern Analysis tab to activate the map.
The point layer on this map is derived from the block group layer you used earlier, with attributes tied to the block group centroids.
- On the ribbon, click the Analysis tab, and in the Geoprocessing group, click Tools.
- In the Geoprocessing pane, search for Create Space Time Cube By Aggregating Points and click to open the tool.
The Create Space Time Cube By Aggregating Points tool is located in the Space Time Pattern Mining toolbox.
- In the Create Space Time Cube By Aggregating Points tool pane, set the following parameters:
- For Input Features, choose SG Social Distancing point.
- For Output Space Time Cube, browse to the folder, such as C:\SocialDistancing, where you want to save the output NetCDF file, and specify a name, such as SG_Social_Distancing.
- For Time Field, choose date_range_start_Converted.
- For Time Step Interval, choose 1 Days.
- For Distance Interval, choose 5 Miles.
- In the Summary Fields section, click the Add Many button to add fields.
- Check the following fields, and click Add.
The fields are added to the Summary Fields section. The warning indicator on the section indicates that you must supply more information for each field.
- For each field, update the Statistic value to Mean and the Fill Empty Bins with value to Spatial neighbors.
- Click Run.
The tool runs and creates a space-time cube, which is stored in a NetCDF file in the folder you specified. The output of this tool is not added to your map. The NetCDF file organizes the summary data in a format that you can use to show trends, conduct emerging hotspot analyses, and create visualizations.
Next, you’ll use the output of the tool to do an emerging hot spot analysis.
Perform an emerging hot spot analysis
The Emerging Hot Spot Analysis tool will identify trends in the values of the space-time cube that you just created. This tool categorizes the data in the spatial bins to help you understand and characterize what is occurring over time. You’ll run the tool twice to compare the spatiotemporal patterns in the percentage of devices staying at home and the percentage of devices exhibiting full-time work behavior.
- In the Geoprocessing pane, click the Back button and type emerging in the search box.
- Click Emerging Hot Spot Analysis to open the tool.
- In the Emerging Hot Spot Analysis tool pane, set the following parameters:
- For Input Space Time Cube, choose the output cube from the previous steps, for example, SG_Social_Distancing.nc
- For Analysis Variable, choose PCTHOME_MEAN_SPATIAL_NEIGHBORS.
- For Output Features, type PctHome_EmergingHotSpotAnalysis.
- Click Run.
- Turn off the SG Social Distancing point layer to view the results of the emerging hot spot analysis.
- Zoom in to the Los Angeles area in southern California.
The cells with blue squares show the new cold spot pattern. These are statistically significant cold spots at the final time step that have never been statistically significant cold spots before. A statistically significant number of devices in Los Angeles have recently reduced their stay-at-home percentage.
- Pan the map to see the San Francisco area in northern California.
In contrast to Los Angeles, the San Francisco area shows a preponderance of the diminishing hot spot pattern. These are cells that have been statistically significant hot spots for 90 percent of the time-step intervals, including the final time step. In addition, the intensity of clustering in each time step is decreasing overall and this decrease is statistically significant.
You can read more about how Emerging Hot Spot Analysis works in the documentation.
This shows that people in Los Angeles are generally coming out of their homes more than people in San Francisco, and this pattern is emerging later in the data’s date range. People in San Francisco are gradually reducing their stay-at-home behavior over the time period.
Now you'll run the tool again to analyze patterns in the Full Time Work variable.
- In the Contents pane, uncheck the PctHome_EmergingHotSpotAnalysis layer.
- On the ribbon, on the Analysis tab, in the Geoprocessing group, click History.
The History pane lists the tools that you've run.
- In the History pane, double-click Emerging Hot Spot Analysis.
The tool opens with the settings you used before. This is useful when you need to run a tool multiple times with small changes. In this case, you will change the analysis variable and the output name.
- In the Emerging Hot Spot Analysis tool pane, set the following parameters:
- For Analysis Variable, choose PCTFULLTIME_MEAN_SPATIAL_NEIGHBORS.
- For Output Features, type PctFulltime_EmergingHotSpotAnalysis.
- Run the tool.
The results show some interesting patterns.
The areas in red denote persistent hot spots, meaning that these areas have been statistically significant hot spots for full-time work behavior throughout the period.
It’s difficult to understand the cause of this, but these areas have high representation of The Great Outdoors tapestry segment, based on Esri's Tapestry Segmentation system (see the 2020 USA Tapestry Segmentation feature layer). Characteristics of this tapestry segment indicate that the population is largely retired and enjoys outdoor recreation (which could register as work travel behavior).
The areas north of San Francisco and northwest of Carson City show that there’s a comparatively higher trend in the area to travel to full-time work, and that trend has stayed consistent through the analysis time period.
In the Los Angeles area, the pattern is different.
Areas east of Los Angeles show as both oscillating cold spots and sporadic cold spots. The oscillating cold spot east of Los Angeles has a statistically significant cold spot for the final time-step interval, but it has been a hot spot in prior intervals. The sporadic cold spot to the east of that area indicates that there were no time periods that were statistically significant hot spots, and less than 90 percent of the time-step intervals have been statistically significant cold spots.
Areas to the east of Los Angeles show less-consistent trends in the percentage of individuals travelling for full-time work. The areas classified as oscillating cold spots have more recently reduced travel for full-time work (but there were time steps where they were a hot spot). The areas further east of Los Angeles, classified as sporadic cold spots, have never been hot spots for full-time work travel activity during the period of the analysis.
You converted the feature data to a space-time cube stored in the NetCDF data format and used the Emerging Hot Spot Analysis tool to extract statistically significant patterns for two behavior variables over time. In the next section, you'll learn how to view the space-time cube data as a voxel layer.
Visualize the social distancing data as a voxel layer
Now you'll visualize the data as voxels in a 3D scene view.
Visualize data as a multidimensional voxel layer
As you learned earlier, the Create Space Time Cube By Aggregating Points tool will bin and aggregate feature data into a space-time cube stored in the NetCDF data format. This regularly gridded data can be viewed as a multidimensional voxel layer. The voxel layer uses the structure of the NetCDF file format to display 3D data in a new way. You’ll explore the space-time cube visualized as a voxel layer in this section.
- Click the SafeGraph Voxel Layer tab to activate the scene.
- On the ribbon, on the Map tab, click the Add Data drop-down menu, and click Multidimensional Voxel Layer.
- Browse to the folder where you saved your NetCDF file from the previous section, click the SG_Social_Distancing.nc file that you created, and click OK.
- In the Select Variables section, scroll to the bottom of the list and click the Default Variable button for PCTHOME_MEAN_SPATIAL_NEIGHBORS to set it as the default variable, and click OK.
This list allows you to choose the variables to include in the voxel layer.
- In the Contents pane, click the color scheme patch for the SG_Social_Distancing layer.
The current rainbow color scheme doesn't make it easy to differentiate areas with high and low values.
- In the Symbology pane, click the Color scheme drop-down list. Check the Show names option. Scroll down, and choose the Red-Blue (Continuous) color scheme.
- Check the Transparency function check box.
When the Transparency function control opens, no transparency is set. You can add control points across the color gradient to specify how transparent the colors will be.
- Double-click near the middle of the red section of the gradient, about halfway to the bottom.
A control point is added where you double-clicked.
The background of the Transparency function control changes to a checkered grey and white pattern, and the black line connecting the three control points indicates how transparent the color gradient is at any given color. Where the line is close to the top, the colors are more opaque. Where the line is close to the bottom, the colors are more transparent.
- Double-click near the middle of the blue section of the gradient, about halfway to the bottom.
The effect of adding these control points is to make the midrange shades of red and blue semitransparent, while keeping the more extreme values at either end of the gradient more opaque. This makes the more extreme values stand out in the scene view.
- On the ribbon, on the Map tab, in the Navigate section, click the Bookmarks button, and in the SafeGraph Voxel Layer Bookmarks section, click NorCal.
The view zooms to Northern California.
Some areas seem to have a white, cloudy fill. This is because the midrange values of the color scheme are only semitransparent.
In this volumetric visualization, earlier temporal bins are on the bottom, and more recent temporal bins are at the top. Areas with higher stay-at-home percentage are represented in blue, and the higher values are more opaque and darker blue. Areas with the lowest stay-at-home percentage are represented in red, and the lowest values are more opaque and darker red. This helps visualize the higher and lower stay-at-home values.
The voxels in each column correspond to time slices at each location. Since lower voxels are older, you can read up a column to see how behavior has changed at a given location. For example, if a column starts dark blue and gradually transitions to lighter blues, pinks, and dark red, you can assume that at that location there has been a trend to less stay-at-home behavior. Recall from the data clocks and the line chart that there are cycles within each week of more or less stay-at-home behavior, and there are trends overall to less stay-at-home behavior.
- Use your mouse to pan, zoom, and rotate the display.
For more information on 3D navigation, see the Navigation in 3D help topic.
- In the Symbology pane, on the Transparency function control, click and drag both of the control points that you added all of the way to the bottom.
You can see the transparent section in the data values histogram above the Transparency function control.
- Go back to the NorCal bookmark.
- In the Symbology pane, on the Transparency function control, click and drag the control point at the red end of the color scheme all the way to the bottom.
This emphasizes the voxels where stay-at-home behavior is highest.
- Adjust the control points so that midrange values are semitransparent.
You can see the higher stay-at-home values emphasized in context with the lower, red-symbolized values that are mostly transparent.
Now you'll switch the emphasis to show the areas with the lowest stay-at-home values.
- Adjust the control points so that blue values are fully transparent and the red values become opaque.
This emphasizes the voxels where stay-at-home behavior is lowest.
You've visualized the data as a voxel layer and used the color and transparency controls to emphasize different values in the data. Now you'll create slices through the voxel layer to see cross sections of the data.
Create slices of the voxel layer
The voxel layer shows a volumetric 3D view of the data. For voxels around the edges, you can see the entire time extent of the data. Data values near the interior of the volume tend to be hidden by those around them. You can create slices through the voxel layer to see what is going on inside it.
- In the Contents pane, in the SG_Social_Distancing layer section, right-click Slices and click Create Slice.
The Slice and Section toolbar for working with slices is added to the bottom of the scene view, and the pointer changes to a crosshair.
- Click a red cell on the western edge of the voxel layer, move the pointer to the east, and click again.
The slice appears between the locations that you clicked.
Creating a slice through the voxel layer allows you to see into it.
- On the Slice and Section toolbar, click the Push or Pull button, click the slice, and drag perpendicular to the plane of the slice.
Pushing and pulling the slice through the voxel layer allows you to interactively see slices through the data at different locations and allows you to change the position of a slice.
- In the Voxel Exploration pane, name the slice SF West-East.
- Click and drag the Position slider in the Voxel Exploration pane to move the slice.
- Create another slice, but this time, click near the first slice and move the pointer to the north, away from the first slice, and click again.
Now the data is sliced along the north-south axis.
- On the Slice and Section toolbar, click the Flip button to show the data on the other side of the slice.
- In the Voxel Exploration pane, name the slice SF North-South.
The Voxel Exploration pane controls the slice that is selected in the Contents pane.
- In the Contents pane, in the SG_Social_Distancing layer, in the Slices section, uncheck the SF West-East slice.
Now only the north-south slice through the data is visible.
You've learned how to explore the values in the interior of the voxel layer by creating and managing slices.
Show a different variable in the voxel layer
The voxel layer is a way of visualizing the multidimensional space-time cube. You've been exploring the stay-at-home behavior variable, but the original data had variables for other behavior patterns, including delivery driver, full-time work, and part-time work. Now you'll switch to another variable.
- On the ribbon, on the Appearance tab, in the Variable section, click the Variable drop-down arrow, and click PCTDELIVERY_MEAN_SPATIAL_NEIGHBORS.
The new variable is displayed in the voxel layer with a new color scheme.
- Change the symbology of the layer to the red-blue color scheme.
- Change the transparency to show the more extreme values and make the middle values more transparent.
- Explore the data by panning and zooming.
- Use the slices through the voxel layer to see the interior values.
You can also explore other variables, create slices at different angles, and create horizontal slices through the data.
You learned several techniques for exploring the social distancing data that SafeGraph shared, and these techniques can be applied in many other domains and types of data, including crime analysis, demographics analysis, and any other point data set with date values. Data clocks provide a view of cyclic patterns, and line graphs establish overall trends through time. The time slider allows you to visually interpret your data. The Emerging Hot Spot Analysis tool allows you to quantify statistically significant spatiotemporal trends and classify areas based on these trends.
Visualizing trends over time in 3D is a great way to see the entire picture spatially, and many people find it an intuitive way to understand spatiotemporal trends. You used the new voxel layer to visualize variables in your multidimensional space-time cube. You used transparency to emphasize certain values in the data, and used slices through the voxel layer to see the values in the middle of the data volume.
These techniques can help you explore and understand spatiotemporal trends in a complex dataset such as SafeGraph’s Social Distancing metrics. Apply these techniques to your own spatiotemporal data and gain a deeper understanding of trends through qualitative and quantitative analysis.