Download species observation data

The first step is acquiring observation data, or presence data, for your species of interest. In this section, you'll download data from the Global Biodiversity Information Facility (GBIF), which combines observation data from multiple sources for scientific use. Then, you'll add the badger observation data to ArcGIS Pro and ensure that the data types are correct for the types of analysis you want to perform.

Set up your ArcGIS project

First, you'll set up the ArcGIS Pro project where you'll be working with the data. You'll add a layer showing the national boundary of Spain to use for clipping data later in the tutorial.

Start ArcGIS Pro. If prompted, sign in using your licensed ArcGIS organizational account.
Note:
If you don't have access to ArcGIS Pro or an ArcGIS organizational account, see options for software access.
When you open ArcGIS Pro, you're given the option to create a new project or open an existing one. If you've created a project before, you'll see a list of recent projects.
Under New Project, click Map.
In the New Project window, for Name, type EuropeanBadger_Habitat. Leave Location unchanged and confirm that Create a folder for this project is checked.
Click OK.

First, you'll add a layer to the map showing the boundaries of Spain. This layer will be used to clip and constrain environmental data.
On the ribbon, click the View tab. In the Windows group, choose Catalog Pane.

The Catalog pane appears. The Catalog pane can be used to add items to a project; view, create, and manage items; and get information about item properties.
In the Catalog pane, click the Portal tab and choose Living Atlas.
Search for the Spain Country Boundary layer. Find the Spain Country Boundary feature layer owned by esri_dm and drag the result onto the map.

The layer draws on the map and is added to the Contents pane as ESP_Country.

Download animal observation data from GBIF

Next, you'll download animal observation data from the Global Biodiversity Information Facility (GBIF). GBIF is a global data repository that collects data on where species have been recorded. Data is included from multiple sources, such as iNaturalist, and formatted into a common schema for broad use.

Note: Depending on the protection status of the species, location data may be obscured to prevent poaching or other interference. The European badger is classified by the IUCN Redlist as Least Concern, so location data is not obscured.

Go to the GBIF page for Meles meles.
The species overview page appears. This page shows information about the badger including photos that have been submitted with occurrence records, a map of where sightings have occurred, and a description of the animal's activity and ecology.
Log in or sign up for an account.

GBIF data is freely available, but can only be downloaded by users with accounts. This allows GBIF to generate a custom citation for each user and dataset downloaded.
Scroll down and read the Description information, paying attention to the Activity and Biology Ecology sections.
Click the Metrics tab and explore statistics about sightings.

Badgers have been widely sighted across Europe, primarily in the warmer months. In colder climates, badgers hibernate to escape winter weather. To narrow the results, you can filter by country.
In the Occurrences per Country or Area graph, click Spain.

The Table tab opens, showing occurrence records from Spain in tabular form. Before downloading, you'll filter this data to ensure you're only downloading data you can map. First, you'll filter for data licensed with open Creative Commons licensing.
Expand License and check the boxes to select CC0 1.0 and CC BY 4.0.

These license types allow public use of the data. CC0 1.0 shows data that is in the public domain. CC BY 4.0 denotes data that can be shared and adapted as long as there is attribution of the original data source and/or owner and a description of what has been changed.
Expand Location and choose Including coordinates.
Expand the Issues and flags section and review the potential issues with the dataset.

This section lists the potential issues associated with each point. Notice, for example, that most coordinates are rounded. Depending on how you intend to use the results of your analysis, these issues may or may not disqualify the data for your study, require correction, or be addressed as potential error.
At the top of the table, click the Download button.
In the Download Options table, choose Simple.
In the pop-up window, read the GBIF user agreement and citation guidelines, then click Understood.

The data download page appears. Note the information about the citation and other data use guidelines on this page.
Once the data has finished processing, click Download.

A zipped .csv file is downloaded to your computer. A confirmation email with your custom citation is sent to the email you registered for your GBIF account. You'll use this later to cite your data source.

Extract animal observation data

Next, you'll add the badger observation data to ArcGIS Pro. Once in ArcGIS Pro, you can analyze the data and ensure that the data types are correct for the types of analysis you want to perform. The schema that GBIF uses stores sighting date in three different fields (day, month, and year), and stores them as text fields. To analyze sightings by month, you'll create a date type field combining day, month, and year.

In a file browser, browse to the location where you downloaded the GBIF data.
Unzip the file to your ArcGIS Pro project folder.

By default, new project folders are created in this location: C:\Users\<username>\Documents\ArcGIS\Projects. Depending where you downloaded ArcGIS Pro, your file path may be different.

When the file was downloaded, it was named with a string of numbers. You'll rename the file before adding it to your project.
Rename the file Melesmeles_GBIF_[date], substituting in the date you downloaded the file.

You'll add the file as a geodatabase table so you can do some data preparation before mapping.
In ArcGIS Pro, on the ribbon, click the Analysis tab. In the Geoprocessing group, click Tools.

The Geoprocessing pane appears.
In the Geoprocessing pane, search for and open the Table To Geodatabase tool.
For Input Table, click Browse and choose your Melesmeles_GBIF_[date] file.
For Output Geodatabase, click Browse and choose the project's default geodatabase. Click Run.

The table is added to your geodatabase.
In the Catalog pane, click the Project tab. Expand Databases, then expand your EuropeanBadger_Habitat geodatabase and drag the Melesmeles_GBIF_[date] table onto the map.

The table is added to the Contents pane under Standalone Tables. Next, you'll take a look at the data to see what fields you have to work with and what you might need to format for later analysis.
Right-click the Melesmeles_GBIF_[date] table and choose Open.

The table contains the same data you saw in GBIF. You'll use the decimalLatitude and decimalLongitude attributes to map the data. You also want to use the date field for spatiotemporal analysis, but the eventDate field has been saved as a text field. To work with this field, you'll need to have the date in a date format.
At the top of the table, click Calculate.

The Calculate Field tool opens.
In the Calculate Field window, for Field name, type stdTime.

This will allow you to add a new field to store the formatted date.
For Expression Type, choose Arcade, and for Field Type, choose Date.
In the Expression box, build the expression Concatenate($feature.month, "/", $feature.day, "/", $feature.year).
At the bottom of the Expression box, click the green check mark to validate the expression, then click OK.

When the tool has finished running, a window will show the completed status. The Messages will show a warning that some values weren't written because some of the data was missing. You'll ignore these for now.
Close the completed tool window.

To see the temporal spread of the data, you'll create a calendar heat chart.

Calendar heat charts visualize patterns in temporal data by aggregating incidents into a calendar grid. Calendar grids can be configured to display temporal patterns across the months in a year or across the days in a week.
In the Contents pane, right-click the Melesmeles_GBIF_[date] table, click Create Chart, and choose Calendar Heat Chart.

A blank Chart window appears.
In the Chart Properties pane, for Date, choose the stdTime field.

The chart populates, showing a heat chart of the month and day when sightings took place. Sightings occurred year-round, with more occurring during the cooler fall and winter months.
Close the Melesmeles_GBIF_[date] table and the chart. Save the project.

Now that you've downloaded Meles meles observation data and brought it into ArcGIS Pro for some initial data preparation, you're ready to map it. Each observation point contains coordinates that you'll use to map the data.

Map presence and pseudo absence points

Species distribution modeling can be done in several different ways using several statistical methods. Many of these methods require both presence and absence data, or in your case, pseudo absence data. They also require environmental data to determine what kinds of climate and habitat conditions are suitable for the animal species. Now that you have presence data, you can generate the pseudo absence or background data and extract environmental attribute data at each location.

Map presence points

First, you'll map badger presence by converting the tabular European Badger data to a feature class. Then, you'll examine the data. Depending on the collection method, some of the data might be overrepresentative of badger locations, such as studies tracking animal movement rather than reporting single sightings.

Note:

If you plan to use an analysis method such as Presence Only Prediction (MaxEnt), the data preparation you'll do in this section is included within the geoprocessing tool. But if you plan to use other analysis methods such as regression, these are necessary steps to prepare your data.

In the Contents pane, right-click the Melesmeles_GBIF_[date] table, point to Create Points From Table, and choose XY Table To Point.
In the XY Table To Point window, set the following parameters and click OK:
- Output Feature Class: EuropeanBadger_points
- X Field: decimalLongitude
- Y Field: decimalLatitude
Once the tool has finished running, the layer will be added to the Contents pane. There's a large grouping of points around Barcelona, and a tight cluster south of Seville.
Note:
At the time this data was downloaded, there were 2,214 observation points that met the licensing and other selection requirements. Your dataset might be different.
Zoom in to the cluster south of Seville.

This cluster of points lies within Doñana National Park and appears to represent animal tracks, which means that each set of points likely represents a single animal. To see how they were collected, you'll open the layer's attribute table.
In the Contents pane, right-click the EuropeanBadger_points layer and choose Attribute Table.
On the ribbon, on the Map tab, in the Selection group, click Select and draw a rectangle on the map around the points within Doñana National Park.
At the bottom of the attribute table, click Show Selected Records.

The table is filtered to show only the selected records. Depending how you drew your selection, roughly half of the observation points fall within the national park, and you can see that most of the points were gathered through a tracking study. To avoid overrepresenting this area in future analysis, you'll thin these points.
In the Geoprocessing pane, search for and open the Delete Identical tool.

At the top of the tool is a warning that the tool modifies the input dataset. The Delete Identical tool permanently removes points from the feature layer that you'll input but won't modify the Standalone Table.
For Input Dataset, choose EuropeanBadger_points and leave the Use the selected records toggle button turned on.
For Field(s), choose Shape. For XY Tolerance, choose 500 Meters and click Run.

When the tool finishes running, the attribute table will need to be refreshed because some of the selected records are now deleted.
In the Table failed to load window, click OK.
On the ribbon, in the Selection group, click Clear to remove the selection.

There are still many points within the national park, but they've been thinned.
Reopen the Attribute table and check the number of remaining presence points that is listed at the bottom of the table.

Depending on the points you had selected, your number may vary. Generally, you'll want to create the same number of background points as you have observations, so make sure to check your specific points. Now you can create the random sample.

Generate randomly sampled pseudo absence points

Now that your presence data is ready, you'll generate pseudo absence points. The simplest method is using random generation within the study area. To ensure that presence and pseudo absence points are equally weighted, you'll create the same number of background points as you have presence points.

In the Geoprocessing pane, search for and open the Create Spatial Sampling Locations tool.

The Create Spatial Sampling Locations tool generates sample locations within a continuous study area using simple random, stratified, systematic (gridded), or cluster sampling designs.
Enter the following parameters and click Run:
- Input Study Area: ESP_Country
- Output Features: ESP_randomsample
- Sampling Method: Simple random
- Number of Samples: The number of points in your EuropeanBadger_points table
The layer of random points within the country of Spain is added to the map. This dataset can now be combined with your EuropeanBadger_points layer.
Close the EuropeanBadger_points attribute table.
In the Geoprocessing pane, search for and open the Merge tool.
For Input Datasets, choose ESP_randomsample and EuropeanBadger_points. For Output Dataset, type badger_sample_set.

Within the Merge tool, you can decide what fields to add to the new layer, and you can create new ones. You'll add a new field named Presence that you'll use to differentiate the observation points from the GBIF data and the background points from the random sample.
For Field Map, click the Add Fields drop-down menu and choose Add Empty Field.
Rename the NewField to Presence and press Enter.

By default, the Presence field is set to be a Text field.
Point to the Presence field and click Edit. In the Field Properties window, click Type and choose Short.
In the Field Properties window, click OK, then run the Merge tool.

Note:
The Presence field will have a warning indicating that it's empty.
In the Contents pane, uncheck ESP_randomsample and EuropeanBadger_points to turn the layers off. Right-click the badger_sample_set layer and click Attribute Table.

To distinguish the presence and absence points in your new layer, you'll calculate values for the Presence field. Typically, presence points are shown with a value of 1 and background points are given a value of 0. As you scroll through the table, notice that the merged points have a lot of null data fields. You'll use these null fields to select the background points.
In the table, click Select by Attributes. In the Select By Attributes window, build the expression Where kingdom is null and click Apply.
In the attribute table, scroll until you see the Presence field. Right-click the Presence column name and choose Calculate Field.
In the Calculate Field window, for Presence =, type 0 and click OK.
In the Select By Attributes window, check the Invert Where Clause box and click OK.
Right-click the presence column name and choose Calculate Field. Build the expression Presence =1 and click OK.

Now the features are coded with a 1 value for observed presence, and 0 value for pseudo absence.
On the ribbon, click Clear to clear the selection. Close the ESP_randomsample table and save the project.

Extract environmental data

Next, you'll locate and prepare environmental variables that might help determine the presence of badgers. Remember from GBIF that badgers prefer good vegetation cover within foraging habitats. From the animal description in GBIF, you know that in central Spain, badgers prefer mid-elevation mountain areas with woodland and pastures, and avoid lower elevations. In this tutorial, you'll focus on acquiring and setting up the data that would be needed for species distribution modeling, such as land cover, slope, and elevation.

Download the SpainPortugalElev.zip file to your computer and unzip it to the ArcGIS project folder you're working in.

This file contains two raster images downloaded from USGS EROS Archive - Digital Elevation - Global Multi-resolution Terrain Elevation Data 2010 that have been mosaicked together to cover the whole of Spain, then clipped to the countries of Spain and Portugal. For more information on creating a mosaic dataset, refer to the documentation. You'll use this raster image to create a slope dataset for Spain.

You can access more detailed slope and elevation data from ArcGIS Living Atlas. However, because of data export limitations, which limit exports to 4,000x4,000 pixels at a time, the ArcGIS Living Atlas data isn't the best choice for a study area this large.
In the Catalog pane, click the Project tab and expand the Folder group, then expand the EuropeanBadger_Habitat project folder.
Locate the Spain_GTMED2010 image you unzipped, and drag it onto the map.

Note:
If you're prompted to build pyramids and calculate statistics for the layer, click OK.
In the Contents pane, uncheck the badger_sample_set and ESP_Country layers to turn them off.

The elevation raster draws on the map. You can use this raster to calculate slope, another variable that may help determine badger habitat.
In the Geoprocessing pane, search for and open the Surface Parameters tool.
Enter the following parameters and click Run:
- Input surface raster: SpainPortugalElev.tif
- Output Raster: Spain_Slope
- Input analysis mask: ESP_Country
- Parameter type: Slope
- Local surface type: Quadratic
- Slope measurement: Degree
The Spain_Slope layer is added to the map. It shows slope values in degrees.
The next environmental layer you want to find is land cover. For this, you'll use the European Space Agency's WorldCover 2020 data. WorldCover maps 11 land cover types.
In the Catalog pane, click the Portal tab and choose Living Atlas.
Search for the ESA WorldCover layer and drag it onto the map.

The WorldCover layer draws on the map. This layer contains 11 different land cover classes at 10-meter resolution.

Now that you have your habitat data, you'll use the Extract Multi Values to Points tool to get the raster values for each point location.
In the Geoprocessing pane, search for and open the Extract Multi Values to Points tool.
For Input point features, choose badger_sample_set.
For Input rasters, choose Spain_Slope, Spain_Elevation, LandCover and give them the corresponding Output field name: slope, elevation, and landcover.

Note:
A number 1 may be appended to the elevation field name. This won't affect your output.

Before running the tool, you'll set the Processing Extent to the Spain country boundary you've been using. Because the WorldCover layer is a global dataset, setting the Processing Extent will allow you to extract only the data you need.
Click the Environments tab.
Expand the Processing Extent group. Click Extent of a Layer and choose the ESP_Country layer.
Click Run.

This tool will take some time to run. When the tool is done, the badger_sample_set layer has three new variables in the attribute table. You'll also add bioclimate data to your sample set.

Sample multidimensional data

In addition to slope, elevation, and land cover, other variables that may help model badger habitat are bioclimatic. You'll add the Bioclimate Baseline 1970-2000 layer from ArcGIS Living Atlas and sample its values at each of the presence and background points. The Bioclimate Baseline layer provides downscaled estimates of climate and bioclimate variables as monthly means over the period of 1970-2000 based on interpolated station measurements from WorldClim 2.1. There are 19 bioclimate variables provided in this layer, including data on temperature and precipitation. Each variable can be accessed from the Multidimensional Filter.

In the Catalog pane, click the Portal tab and choose Living Atlas.
Search for the Bioclimate Baseline 1970-2000 layer and drag it onto the map.

The Bioclimate Baseline 1970-2000 layer is added to the map. ArcGIS Living Atlas also contains projections into the future for each of the bioclimate variables in the Baseline dataset. Each of the Bioclimate Projections datasets contains SSP2-4.5, SSP3-7.0, and SSP5-8.5 to model potential future conditions depending on greenhouse gas emissions, political and social policy, and other changes. These layers can be swapped out for the Baseline dataset, but will have to be sampled individually.

Like before, you want to extract these variables to your badger sample set, but to get all of the bioclimate variables, you'll use the Sample tool, which processes each multidimensional slice. But unlike the Extract Multi Values to Points tool, Sample creates a new feature class in your geodatabase. Before you run this tool, you'll need to ensure there's a unique identifier in your badger_sample_set layer that you can use to join the results from the Sample tool back to the badger_sample_set layer.
In the Contents pane, right-click the badger_sample_set layer and choose Attribute Table.

In the table, the OID field acts as the unique identifier. This identifier was created when you converted the .csv file to a geodatabase table. OIDs, and other automatically assigned unique identifiers, can be reset or regenerated, so you'll calculate a new unique field to use when joining the tables together later.
In the attribute table, on the ribbon, click Calculate.
In the Calculate Field window, for Field Name, type joinID. For Field Type, choose Long.

For this calculation, you'll use Python Helpers, which provide commonly used code snippets.
In the Helpers pane, double-click Sequential Number.

The Helper is added to the Code Block field. By default, the sequential numbering will start at 1.
Click OK.

The joinID field is added to the end of the table. Now you're ready to run the Sample tool.
In the Geoprocessing pane, search for and open the Sample tool.
In the Sample tool, enter the following parameters and click Run:
- Input rasters: Bioclimate Baseline 1970:2000
- Input location raster or features: badger_sample_set
- Output table or feature class: badger_sample_bioclimatebase
- Unique ID field: joinID
- Process as multidimensional check box: checked
When the tool is done running, the sample_bioclimatebase table is added to the Contents pane. You'll join it to the badger_sample_set layer.
In the Geoprocessing pane, click the back button. Search for and open the Join Field tool.
Enter the following parameters and click Run:
- Input Table: badger_sample_set
- Input Field: joinID
- Join Table: badger_sample_bioclimatebase
- Input Field: joinID
- Transfer Fields: Select all BC fields
The 19 bioclimate attributes are now added to the sample set. For more information on what each attribute represents and how to use this data, see the source publication from USGS, linked from the item details page in ArcGIS Online.

Use data engineering

Next, you'll use the Data Engineering tools to explore the data. With the Data Engineering tools in ArcGIS Pro, you can explore, visualize, clean, and prepare your data for analysis. In this section, you'll use Data Engineering tools to better understand the environmental variables you've extracted to your sample set.

In the Contents pane, right-click the badger_sample_set layer and choose Data Engineering.
The Data Engineering view opens. The type of data preparation you choose to do depends on the type of modeling you want to use to create your habitat suitability model. For example, if you're planning to use regression analysis, you can use the Transform tool to transform skewed data to a normal distribution.
In the Fields pane, click the landcover field. Hold the Shift key, and click the last bioclimate field, BC_19.
Drag the selected fields in to the empty Statistics pane in the middle of the window.
The environmental data you've collected for your habitat modeling project is added to the Statistics pane.
In the Data Engineering pane, on the ribbon, click Calculate.
Statistics for the fields are calculated, including the mean, unique values, and outliers. You can use these statistics to start identifying patterns in your data.
In the Statistics pane, scroll to the Outliers column.
The field with the most statistical outliers is the slope field.
Right-click the Outliers record for the slope field and choose Select Outliers.
The outliers are selected on the map. A lot of the outlier points are in northern Spain in or near the Cantabrian and Pyrenees Mountains.
It makes sense that there would be steeper terrain here, but there are also a lot of points scattered throughout Spain. To visualize these values, you'll use a histogram.
In the Statistics pane, right-click the histogram for the slope field. Click Open Histogram.
The histogram for the slope field opens. The outliers that you selected are shown on the histogram. Using the histogram, you can see that all the outliers are on the high side, or in areas with steeper slope. Next, you'll look at BC_01, or average annual temperature.
On the Distribution of slope histogram, on the ribbon, click Clear Selection, then close the histogram.
In the Statistics pane, right-click the Chart Preview for BC_01 and choose Open Histogram.
The histogram opens. To understand what temperatures badgers might prefer, you'll select badger presence points.
On the ribbon, click the Map tab. In the Selection group, click Select by Attributes.
In the Select by Attributes window, clear any existing expressions and build the expression Where Presence is equal to 1. Click OK.
The presence points are selected on the map and in the chart.
Based on the chart, it appears that badgers prefer warmer temperatures.
Tip:
If you don't see both the selected data and nonselected data on the chart, on the chart ribbon, in the Filter group, ensure the Selection filter is turned off.
You can use the Data Engineering tools to examine the other bioclimate variables and make changes to the data as needed.
Clear the selection and close the chart.
The last step before using the data for modeling is to add attribution. Keeping track of the data source is a good idea, and because you downloaded data with CC BY 4.0 licenses, you need to ensure attribution of the dataset.
On the ribbon, click the View tab. In the Windows group, choose Catalog View.
The Catalog view opens. The Catalog view and Catalog pane, which you've worked with so far in this tutorial, have many similarities, but metadata can only be edited in the Catalog view.
In the Catalog view, expand Databases and EuropeanBadger_Habitat.gdb, then click the badger_sample_set layer.
The Metadata editor opens. Currently, the metadata is empty except for the Geoprocessing history, which shows the Join Field and Calculate Field tools you ran.
On the ribbon, on the Catalog tab, in the Metadata group, click Edit.
The metadata editor opens.
Enter the following information in the Metadata pane:
- Title: European Badger Sample Dataset
- Tags: species modeling, Meles meles, European badger
- Summary (Purpose): This dataset was created in the Learn ArcGIS tutorial Sample species and environmental data for distribution modeling to model European badger (Meles meles) habitat in Spain.
- Description (Abstract): The European badger is an important species, providing three main ecosystem services: seed dispersal, topsoil disturbances and microhabitat creation. To model its habitat in Spain, animal observation data was downloaded from GBIF, recorded in the Presence field with a value of 1. Pseudo-absence or background points were generated and merged with the observation data. Environmental data, including slope, elevation, land cover, and bioclimate variables, were extracted to these points.
In the Credits section, enter the unique citation that was generated when you downloaded the GBIF data.
Tip:
This citation can be found either on the Download page or in the confirmation email you received from downloads@gbif.org.
At the bottom of the metadata editor, click New Bounding Box.
Enter the following coordinates:
West East South North
-17.7532431
5.6396581
26.8567504
44.3051478
On the ribbon, on the Metadata tab, click Save.
Close the metadata editor and save the project.

West	East	South	North
-17.7532431	5.6396581	26.8567504	44.3051478

Now you have a dataset on the European badger that you can use for species distribution modeling. The dataset contains both presence and pseudo absence points as well as environmental data about the slope, elevation, land cover, temperature, and more. This information can be used in models such as MaxEnt or random forest prediction to do species distribution modeling.

Download species observation data Access species observation data and add it to an ArcGIS Pro project.	45 minutes
Map presence and pseudo absence points Create a sample dataset of observation data and background points and add environmental data.	45 minutes

Download species observation data

Map presence and pseudo absence points

Set up your ArcGIS project

Note:

Download animal observation data from GBIF

Extract animal observation data

Map presence points

Note:

Note:

Generate randomly sampled pseudo absence points

Note:

Extract environmental data

Note:

Note:

Sample multidimensional data

Use data engineering

Tip:

Tip:

Getting Started with Spatial Analysis

Processing Raster Data Using ArcGIS Pro

Spatial Analysis with ArcGIS Pro

Requirements

Outline

Download species observation data

Map presence and pseudo absence points

Download species observation data

Set up your ArcGIS project

Note:

Download animal observation data from GBIF

Extract animal observation data

Map presence and pseudo absence points

Map presence points

Note:

Note:

Generate randomly sampled pseudo absence points

Note:

Extract environmental data

Note:

Note:

Sample multidimensional data

Use data engineering

Tip:

Tip:

Acknowledgements

Send Us Feedback

Share and repurpose this tutorial

Ready to learn more?

Related Esri training

Getting Started with Spatial Analysis

Processing Raster Data Using ArcGIS Pro

Spatial Analysis with ArcGIS Pro