Prepare your analysis environment

In this analysis, you'll work with R and libraries within the ArcGIS ecosystem. To manage the language and libraries, you'll create a Conda environment specific to your needs for this project. Conda is an environment manager that is widely used for Python and R. In programming, an environment refers to the set of libraries, packages, languages, and other tools that are used in development. By creating project-specific environments, you can control which versions of libraries and software are used, without changing the preinstalled software on your machine.

Set up Conda and RStudio environments

In ArcGIS Pro, you can create and manage Conda environments that include libraries used by ArcGIS Pro. Within your ArcGIS Pro installation, you'll create a new Conda environment and install Microsoft R (MRO) and other libraries.

While it takes time to set up the environment, once it is in place you'll be able to easily perform multiple analysis using the combined analysis power of ArcGIS Pro and R.

  1. On your desktop, create a new folder named WaterBuffaloENFA.
  2. Start ArcGIS Pro. If prompted, sign in using your licensed ArcGIS account.
    Note:

    If you don't have ArcGIS Pro or an ArcGIS account, you can sign up for an ArcGIS free trial.

  3. Click Settings.
  4. Click the Python tab.

    The Python Package Manager opens. The default Conda environment used in ArcGIS Pro is arcgispro-py3 and you cannot edit this environment. You'll clone the original environment and install all packages in the cloned environment.

  5. Click Manage Environments.
  6. Select arcgispro-py3 and click Clone.

    Clone the original Python environment.

  7. In the Clone Environment window, for Name, type water_buffalo_study.
  8. For Folder, browse to the WaterBuffaloENFA folder.

    Name the newly cloned environment.

    All files, packages, and libraries for the project need to be in this folder directory.

  9. Click Clone.

    It might take up to a few minutes to clone the environment.

    Note:

    Ensure your path does not contain spaces. If your Desktop path contains spaces, you should create the WaterBuffaloENFA directory in another location.

  10. In the Manage Environments window, select the water_buffalo_study environment and click OK.

    When you switch environments, you are changing the libraries and packages that ArcGIS Pro has access to. You can also now install new packages or languages, such as Microsoft R, into the water_buffalo_study environment. Before you install packages, you'll restart ArcGIS Pro so it can use the new environment.

  11. Close and reopen ArcGIS Pro.
  12. Click Settings and click the Python tab.

    You'll first refresh the environment to ensure the latest ArcGIS Pro packages are installed. Then, you'll add the supplementary packages necessary for your project.

  13. Under Installed Packages, click Refresh.

    Refresh packages.

  14. Click Add Packages. Search for mro-base, and choose mro-base version 3.5.1 or later in the result list. Click Install. If prompted, check the box to agree with the terms and conditions.

    The mro-base package may take up to a few minutes to install.

    The mro-base Conda package installs Microsoft R into the ArcGIS Pro Conda environment. The MRO version of R is preferred because of its performance. However, if you usually use another version of R, your work should not be impacted because MRO will be contained in the project-specific Conda environment. Next, you'll set up the R-ArcGIS bridge, which enable a better communication between R and ArcGIS Pro.

  15. Click the Options tab. In the Options window, click the Geoprocessing tab.
  16. If necessary, click the drop-down list and ensure the home directory matches the folder where your Conda environment was created. Click Install package then click Install package from the internet. If prompted to install a new version, click Yes.

    Install the arcgis binding.

    Now Microsoft R is installed in your water_buffalo_study Conda environment. Next, you'll set up and install RStudio and connect it to MRO. RStudio is an application which offers a user-friendly interface to write and run R scripts.

  17. Click OK and close ArcGIS Pro.
  18. If you do not have RStudio installed, download and install RStudio Desktop. Accept all defaults in the installation wizard.
    Note:

    The first step of the RStudio installation is to install the default R implementation, even though you already installed mro-base. This is required to satisfy RStudio's default requirements.

  19. Open RStudio Desktop.
  20. On the toolbar, click Tools and click Global Options.

    In the Tools menu, click Global Options.

  21. In the Options window, for R version, click Change.
  22. Select Choose a specific version of R and click Browse.
  23. Browse to Desktop > WaterBuffaloENFA > water_buffalo_study > Lib, click the R folder, and click Select folder.

    Select the R folder.

  24. In the Choose R Installation window, click OK.
  25. Close RStudio.

Now that RStudio is configured, you'll begin your analysis in ArcGIS Pro by adding the necessary data layers.

Add Living Atlas data

African buffalo do not have many predators, and research indicates their range is determined by substrate characteristics, such as bedrock and soil, which in turn determine plant species and water geographies. You'll add six datasets from Esri's Living Atlas about the regional environmental characteristics and data on local watersheds and buffalo locations.

  1. Open ArcGIS Pro and click Map.
  2. In the Create a New Project window, for Name, type WaterBuffalo.
  3. For Location, browse to your desktop WaterBuffaloENFA folder. Click OK.

    You'll add vector layers for the current distribution of the buffalo observations and the local watersheds.

  4. On the ribbon, on the View tab, in the Windows group, click the Catalog Pane.
  5. In the Catalog pane, click the Portal tab and click All Portal.

    Search in All Portal.

  6. Search for and add Elefantes Incomati Watersheds.
    Note:

    To add a layer, right-click and choose Add to Current Map.

  7. Search for and add African Buffalo Locations owner:Learn_ArcGIS.

    The African Buffalo Locations layer contains locations on where buffalos have been spotted throughout the region. The Elefantes and Incomati watershedbasins will determine the study area for ENFA. They span three countries: South Africa, Mozambique, and Eswatini. Next, you'll add climate data.

  8. Search for World Bioclimates and add it to the map.
    Note:

    Ensure the layers you add are imagery layers. These layers have a specific icon.

    Add the World BioClimate layer.

    The World Bioclimates layer display biologically-influenced climate information. You'll also include the World Ecological Facets Landform Classes layer, which contains information on landforms such as mountains and plains; the World Lithology layer, which contains information about the rock types based on chemistry, mineral composition and physical properties; the Global Land Cover 1998-2018 layer, which classified the Earth's surface as agricultural, developed, and so on; the World Distance to Water layer, which provides a rasterized estimate of the distance to water bodies; and the World Population Estimate 2016 layer, calculated by Esri to show population estimates based on country's available census data and other relevant inputs.

  9. Search for and add these imagery layers:
    • World Ecological Facets Landform Classes
    • World Lithology
    • Global Land Cover 1998-2018
    • World Distance to Water
    • World Population Estimate 2016
      Note:

      Make sure not to choose World Population Density Estimate 2016.

    Next, you'll set up the geoprocessing environment in your project. You currently have climate and environmental data that spans the globe, so you'll set extent settings to only process data within your study area. You also have six rasters with varying cell sizes. For comparable analysis, you'll also set a uniform geoprocessing cell size.

  10. On the ribbon, on the Analysis tab, in the Geoprocessing group, click Environments.

    Geoprocessing group on the Analysis tab

    The Environments pane appears. The current workspace is set to the project geodatabase, WaterBuffalo.gdb, which is where you will output copies of the raster layers.

  11. For Output Coordinate System and Extent, choose Elefantes_Incomati_Watersheds.
  12. Under Raster Analysis, for Cell Size, type 928.

    The largest cell size of the six rasters rounds to 928. When cells are resampled during analysis, it is necessary to use the largest input cell size to ensure accuracy. When this happens, smaller cells are averaged together to create a single cell at the larger dimension.

    Tip:

    To check the cell size of a raster, open its Layer Properties window. Click the Source tab and expand Raster Information.

  13. Click OK.

    Some feature layers, which were added to the map as online hosted feature layers, need to be in your project's geodatabase for the analysis. This will allow them to be accessible later on when you run your R script. You'll run the Feature Class to Geodatabase tool to copy them.

  14. On the Analysis tab, in the Geoprocessing group, click Tools. Search for and select Feature Class to Geodatabase.
  15. For Input Features, choose African_Buffalo_Locations and Elefantes_Incomati_Watersheds.
  16. For Output Geodatabase, click Browse and choose WaterBuffalo.gdb.
  17. Click Run.

Connect to a geodatabase

For the purpose of this lesson, some raster data has been pre-processed to condense the total analysis time. This intermediary data is available as a File Geodatabase, which you'll download and connect to your project.

  1. Download and unzip the Focal Statistics Results.gdb in your project folder WaterBuffaloENFA\WaterBuffalo.
  2. In ArcGIS Pro, open the Catalog pane.
    Note:

    You can access the Catalog pane on the View tab in the Windows group.

  3. On the Project tab, right-click Databases and choose Add Database.
  4. Navigate to the WaterBuffalo folder and click refresh. Select Focal Statistics Results.gdb and click OK.
    Note:

    The Refresh button is to the left of the search box.

    The File Geodatabase is now connected to your project. When performing your analysis in RStudio, the code will have access to the data stored there.

  5. Press Ctrl+S to save the project.

Your Conda and geoprocessing environments are now set up. You'll continue this analysis using R in RStudio. Thanks to the earlier setup steps, RStudio is currently connected to your ArcGIS Pro Conda environment. Because of the R-ArcGIS bridge, RStudio also has access to your data layers in this project.


Analyze environmental data in RStudio

Previously, you installed the R-ArcGIS bridge and added climate and environmental data for your ENFA analysis. Then, in ArcGIS Pro, you set a study area based on the locations of watersheds that intersect with African buffalo observations. Next, you'll build a model using R to determine the probability of buffalo locations using the climate and environmental data as inputs.

Build a probabilistic MaxEnt model

A maximum entropy (MaxEnt) model uses environmental variables and species presence points to explore the possible distribution under set constraints and outputs a prediction of habitat suitability based on the environmental and species variables. In other words, the analysis will seek to answer two questions: (1) In what type of habitat have African buffalos been observed in the past? And (2) where else in the region can you find a similar habitat type? You'll create this model using the raster and maxlike libraries in R.

  1. Open RStudio.
  2. On the toolbar, click File > New File > R Script.

    Create a new R script.

    You'll write a new R script for this species model.

  3. Press Ctrl+S. Save the script as ENFA in your WaterBuffaloENFA folder.

    You'll start by installing the necessary libraries that will enable R to work with geospatial data, and then write a script to import the libraries and initialize the R-ArcGIS bridge.

  4. On the toolbar, click Tools and choose Install Packages.
  5. For Packages, type raster rgdal sf maxlike. Click Install.

    After the libraries are installed you'll use the first few lines of your script to import these libraries, so that you can access their functionalities.

  6. In the ENFA.R script window, type the following:
    ## Import R-ArcGIS Bridge
    library(arcgisbinding)
    ## Import Raster library
    library(raster)
    ## Import RGDAL for Raster Manipulation
    library(rgdal)
    ## Import SF Packages
    library(sf)
    ## Import Species Distribution Modeling library
    library(maxlike)
    ## Run Check Product to initialize R-ArcGIS Bridge
    arc.check_product()

    RStudio can run either a single line of code or a block of selected code. To check each line of code individually, place your cursor at the end of each line and click Run. You can also highlight a block and click Run. In this lesson, run your code after you type each chunk.

  7. Select all the code.
  8. On the Script window, click Run (or press Ctrl+Enter).

    Run the import libraries code.

    Output of running import libraries code.

    Next, you'll set a root directory for the analysis.

  9. Below the previous text, type the following:
    ## Directory that contains the R script
    root.dir <- 'Path-to-project-folder'
    setwd(root.dir)
  10. Replace Path-to-project-folder with the path from the C drive to your project folder on the desktop.

    An example path may look like this: C:/Users/UserName/Desktop/WaterBuffaloENFA/WaterBuffalo. Make sure to replace all blackslashes (\) with slashes (/) and to keep the single quotes around the path .

  11. Run the code.

    In the top right is the Environments tab. Whenever you declare variables in your R code, such as the "root.dir" variables set to your project folder, it will appear here. Next, you'll add the water buffalo data presence locations, and convert it to a vector of latitude and longitude coordinates.

  12. Below the previous text, type the following:
    ## Read water buffalo presence data and environmental raster
    presence.points.loc <- file.path('WaterBuffalo.gdb', 'African_Buffalo_Locations')
    ## Convert data to spatial dataframe
    presence.points.arc <- arc.select(arc.open(presence.points.loc), sr =3857 )
    ## Get Occurrence Coordinates
    shape_info <- arc.shape(presence.points.arc) 
    presence.locs <- cbind(shape_info$x, shape_info$y)
  13. Run the code.

    After the code complete, you'll notice three new variables - presence.locs, presence.points.arc, and shape_info - added to your Environments tab. Next, you'll read all the environmental rasters created in Focal_Statistics_Results.gdb. This geodatabase contains rasters that have been previously analyzed for this tutorial.

  14. Below the previous text, type the following:
    ## Read environmental rasters and stack them
    env.rasters.loc <- 'Focal_Statistics_Results.gdb'
    raster.gdb.data <- arc.open(env.rasters.loc)
    list.rasters <-raster.gdb.data@children$RasterDataset
    
    print('Display Environmental Rasters in Geodatabase')
    print(list.rasters)
  15. Run the code.
    Print raster results.

    When arc.open is called on a geodatabase, such as Focal_Statistics_Results.gdb, it returns a description of the geodatabase including its contents. The command print(list.rasters) lists all the rasters that are contained in it. You'll store environmental and climate data layers in this geodatabase in a new empty stack object. In R, a stack object is a collection of objects, such as raster layers, with the same spatial extent and resolution. Storing the rasters in a single stack simplifies the analysis.

  16. Below the previous text, type and run the following code:
    ## Create an R stack of environmental rasters
    raster.id <- 0
    ## Define an empty raster list for environmental rasters
    env.raster.list <- list()
    ## Define an empty list for raster names
    raster.name.list <- list()

    The three variables, raster.id, env.raster.list, and raster.name.list are added to the Environments tab. Next, you'll loop through each raster in the geodatabase; read the data using arc.raster; convert it to an R raster; and then add it to the R stack object, Raster.id if they contain enough variation.

  17. Below the previous text, type and run the following code:
    ## Loop through rasters in the geodatabase and add rasters with variation
    
    for (raster.name in list.rasters) {
      raster.dir <- file.path('Focal_Statistics_Results.gdb', raster.name)
      raster.data.arc <- arc.raster(arc.open(raster.dir))
      
      ## Convert arc raster into R spatial raster
      raster.data.R <- as.raster(raster.data.arc)
      ## Compute sum of pixels in the current raster
      pixel.count <- cellStats (raster.data.R > 0, 'sum')
      ## Add raster to the stack if there is variation
      if (pixel.count >= dim(presence.locs)[1]) {
        raster.id <- raster.id +1
        raster.norm.R <- (raster.data.R - cellStats(raster.data.R, 'mean') / cellStats(raster.data.R, 'sd'))
        env.raster.list[[raster.id]] <- raster.norm.R
        raster.name.list[raster.id] <- raster.name
      }
    }

    In the above code block, you've added raster data from the Focal Statistics Results file geodatabase. Then, you converted them into the R spatial raster format. After that, you performed raster calculations to sum all rasters. Then, the rasters are added to the stack where there is enough variation compared to the number of location points for the buffalo. For example, if there are 100 buffalo observations, then at least 100 cells should contain non-zero (or varied) values.

    Next, you'll convert the list of rasters into a multidimensional raster stack and plot the rasters in the stack.

  18. Below the previous text, type and run the following code:
    ## Create multi-dimensional raster stack from list of rasters
    env.raster.stack <- stack(env.raster.list)
    names(env.raster.stack) <-raster.name.list

    So far in your code, you've added the coordinates for water buffalo occurrences and added the stack of environmental rasters for the entire study area. Next, you'll use these two datasets to model potential water buffalo occurrence.

  19. Below the previous text, type and run the following code:
    ## Fit a probabilistic MaxEnt Model to Occurrence Data
    ## Define an empty mathematical expression
    math.exp.str <- paste(raster.name.list, collapse = "+")
    math.exp <- as.formula(paste("~", math.exp.str))
    ## Fit the MaxEnt model to occurrence and environmental covariates
    maxEnt.model <- maxlike(math.exp, env.raster.stack, presence.locs, link = "logit", hessian = TRUE, removeDuplicates = TRUE, savedata = TRUE)
    ## Plot the confidence interval for model coefficients
    confint(maxEnt.model)
    ## Predict water buffalo occurrence probability for study area
    suitability.map <- predict(maxEnt.model)
    MaxEnt modeling output.

    This may run for several minutes. If your output contains a warning message for "NaNs produced" you can ignore it. Next, you'll plot the predicted probability of water buffalo occurrences and the current occurrence.

  20. Below the previous text, type and run the following code:
    ## Plot histogram of occurrence probability
    hist(suitability.map)
    ## Plot occurrence probability and overlay observations
    plot(suitability.map)
    points(presence.locs[, 1], presence.locs[, 2])

    On the right, on the Plots tab, a probability map appears, with a scale indicating which locations are most suitable for the species. The green areas have the highest probability to be suitable for African buffalo. Finally, you'll write the probability map to ArcGIS Pro.

  21. Below the previous text, type and run the following code:
    ## Write the probability map to a geodatabase
    write.loc <- file.path('ENFA.gdb', 'suitability_P')
    ## Write the prediction raster
    arc.write(write.loc, suitability.map)

    The script is now complete. If you ran reach code chunk as recommended the analysis is already performed.

    If you encountered any issues running your script, you can download a complete script here. Unzip the file then open it in RStudio.

  22. Press Ctrl+S to save the script.

    Now that your script is saved, you could run it again, either one chunk at a time, or in one go using the shortcut Ctrl+Alt+R.

The suitability map and prediction rasters are now accessible in ArcGIS Pro to continue the analysis. In this section, you converted your data into R-readable formats, conducted an MaxEnt analysis, and then output your results back into ArcGIS Pro. Next, you'll determine the final conservation area that you will recommend for the eco-tourism project.

Define conservation regions

Previously, you used a MaxEnt model to analyze several rasters and buffalo location data. You'll now use the results of that analysis and the Locate Regions geoprocessing tool in ArcGIS Pro to define a contiguous area for preservation.

  1. Return to ArcGIS Pro.

    In the last section of your R script, you output the suitability_P layer to the ENFA.gdb. This layer will be the input for the Locate Regions tool, but you first need to project it into the same coordinate system as the rest of the data.

  2. On the ribbon, on the Map tab, in the Layer group, click Add Data button.
  3. Navigate to your WaterBuffalo folder, double-click ENFA.gdb and select suitability_P. Then click OK to add it.
    Note:

    If ENFA.gdb does not appear, click the Refresh button.

    When the layer is added, you'll receive a warning that the coordinate system is unknown. You'll fix this by running the Define Projection tool.

  4. On the Analysis tab, in the Geoprocessing group, click Tools. If necessary, in the Geoprocessing pane, click the Back button. Search for and select the Define Projection tool.

    The other layers are using WGS 1984 Web Mercator (Auxiliary Sphere) coordinate system and this is also the coordinate system you'll choose for the new layer.

  5. For Input Dataset or Feature Class, choose suitability_P.
  6. If necessary, for Coordinate System, choose Current Map (it will appear as WGS_1984_Web_Mercator_Auxiliary_Sphere). Click Run.

    Now that the layer is correctly projected, you can run the Locate Regions tool.

  7. In the Geoprocessing pane, click the Back button, and search for and select Locate Regions.
  8. Edit the following parameters:
    • For Input raster, choose suitability_P.
    • For Area units, choose Square map units.
    • For Output raster, type New_Conservation_Region.
    • For Number of regions, type 1.
    • For Region Shape, choose Square.
    • For Shape/Utility tradeoff (%), type 0.
    • For Evaluation Method, choose Highest value.
    • For Distance units, choose Map units.
    • For Input raster or feature of existing regions, choose African_Buffalo_Locations.

    Expand Region growth and search parameters. Then edit the following parameters:

    • For Number of neighbors to use in growth, choose Eight.
    • Ensure Islands not allowed in regions is checked.
  9. Click Run.

    It may take several minutes to run.

    Final map of conservation regions.

    The result indicates the areas that should be conserved. The original data contained patch areas based on observed African buffalo occurrences, but conservation requires planning for a larger contiguous area of suitable habitat. The Locate Regions tool achieves this by growing regions of high probability from the MaxEnt result to define a spatially contiguous area that is as large as possible. This map is now ready to be exported or shared as a web map with planning stakeholders.

In this lesson, you defined specific project parameters using Conda and geoprocessing environments. Then, you gathered relevant environmental and climate data and prepared it for modeling using MaxEnt in RStudio. Finally, you exported your results for final analysis in ArcGIS Pro and established a contiguous conservation area for the African water buffalo.

You can find more lessons in the Learn ArcGIS Lesson Gallery.