Create a training dataset

First, you'll establish a data-driven relationship between ocean measurements at a location and seagrass occurrence using a supervised machine learning method, random forest. To perform this analysis, you'll clean the data and download the necessary Python libraries. First, you'll install the scikit-learn library using the ArcGIS Pro Python package manager. Python package manager makes installing Python libraries easy and makes sure that you can utilize libraries you installed, so that their functionality can directly be used from ArcGIS Pro. Next, you'll prepare your data to be used in the predictive analysis. You'll create interpolation surfaces to estimate ocean measurements at 10,000 randomly created coastal locations around United States.

Add Python packages

First, you'll install the Python libraries you'll use later for machine learning and data analysis. ArcGIS Pro includes a default conda environment, arcgispro-py3. The default conda environment includes several common packages, like ArcPy, SciPy, NumPy, and Pandas, among others. You can add and remove packages from this environment as needed. For this lesson, you will be adding two libraries: scikit-learn, a popular machine learning library, and seaborn, a statistical data visualization library.

  1. Download the SeagrassPrediction zipped folder.
  2. Locate the downloaded file on your computer and unzip it to your Documents folder.
    Note:

    It is important to save the file to your Documents folder because you'll need to use the exact file path in the Python script. If the file doesn't have the right path, the script will fail.

    The file contains an ArcGIS Pro package.

  3. Double-click the SeagrassPrediction.aprx file to open the project in ArcGIS Pro. If prompted, sign in using your licensed ArcGIS account.
    Note:

    If you don't have ArcGIS Pro or an ArcGIS account, you can sign up for an ArcGIS free trial.

  4. On the ribbon, click Project. On the left pane, click Python.

    Open Python Package Manager

  5. Scroll down the list of installed packages. If you already have scikit-learn and seaborn installed, skip to the next section.

    If you don't have scikit-learn and seaborn installed, you'll need to use a clone of the default environment to download them. The default environment can't be modified.

  6. Under Project Environment, click Manage Environments.

    The Manage Environments window opens.

  7. In the Manage Environments window, click Clone Default.

    The clone is typically named arcgispro-py3-clone. It may take a few minutes to clone the environment.

  8. Select the cloned environment and click OK.

    For the new environment to be recognized, you need to restart the program.

  9. Restart ArcGIS Pro.
  10. In the Package Manager, click Add Packages and type scikit in the search box.

    Add the scikit-learn package

  11. Click scikit-learn and click Install. In the Install wizard, accept the license agreement and click Install.
    Note:

    The image shown is an example, and a newer version of the package may be available. Install the latest version of the package. If you can't install scikit-learn, your software might have an earlier version of the package pinned. Under Versions, click the menu and choose 0.18.0, then follow the install procedure as written.

  12. If necessary, in the Conda_uac.exe window, click Run.
  13. When the library is done installing, type seaborn in the search box and click Install. Follow the prompts to install the library.
  14. Click Installed Packages and scroll down to make sure both libraries were added to the list.

    Installed Packages

  15. In the upper left corner, click the back arrow to return to the map view.

Prepare input data

For your analysis, you need to know variables like ocean temperature, salinity, and nutrient concentration to predict the suitability of a location for seagrass growth. As a marine ecologist, you're working with field measurements to build your predictive model. Because field data is not perfect and often has missing values, you'll need to fill in values to complete the raw data before you use it in your analysis.

In the Contents pane, there are four feature classes:

  • EMU_Global_90m: Ecological Marine Unit point data that contains ocean measurements up to a 90-meter water depth.
  • Seagrass_USA: Polygon data for seagrass occurrence. Every polygon in Seagrass_USA is an identified seagrass habitat.
  • US_coastline_shallow: Polygon data for United States coast that covers bathymetry up to the depth at which seagrass habitats are observed in Seagrass_USA.
  • bathymetry_shallow: Global shallow bathymetry polygon used to predict seagrass globally.
  1. Right-click EMU_Global_90m and choose Attribute Table.

    Open Attribute Table

    The attribute table opens. This feature class contains data from the Ecological Marine Unit dataset, and its attributes are the prediction variables to be used in random forest. Some of these variables include salinity, ocean temperature, and nitrate level. But you can see that this data contains lots of missing values.

    Missing values

    You'll replace the null data using the Fill Missing Values tool. This tool provides estimates for missing values in the data using spatial, spatiotemporal, or temporal neighbors. In this case, you will be using an average value of 100 nearest neighboring points with values to estimate the missing values.

  2. On the ribbon, click the Analysis tab and in the Geoprocessing group, click Tools.

    Tools

    The Geoprocessing pane opens.

  3. In the Geoprocessing pane, search for the Fill Missing Values tool. Choose the first result.

    Fill Missing Values tool

  4. For Input Features, expand the list and choose EMU_Global_90m.
  5. Rename Output Features to EMU_Global_90m_Filled.
  6. For Fields to Fill, click the drop-down arrow and check the following variables:
    • dissO2
    • nitrate
    • phosphate
    • salinity
    • silicate
    • srtm30
    • temp

    Add fields

  7. Click Add, and then for Fill Method, choose Average.
  8. For Conceptualization of Spatial Relationships, choose K nearest neighbors and set Number of Spatial Neighbors to 100.

    Fill Missing Values tool parameters

  9. Click Run.

    After the tool finishes, you will see a warning message at the bottom of the Geoprocessing pane explaining that no missing values were filled in for Salinity, SRTM30, and Temp attributes. These attributes did not have missing values, but will be needed later on for analysis, and were included to carry them over to the new output attribute table.

  10. Once the tool is finished, right-click EMU_Global_90m_Filled and open its attribute table.

    Filled values

    The Fill Missing Values tool appends a string to the output field names designated either filled or unfilled. The columns marked as filled include data that has been created by the tool, while unfilled marks original data. For the fields that you filled values for, the tool also creates two more columns with the suffixes _STD and _ESTIMATED. The _STD field shows the standard deviation of the neighboring data points used in estimating the missing value. The _ESTIMATED field shows a 1 if the attribute was filled using the tool and a 0 if the data already existed. Now you have spatially complete data for the oceanic variables needed.

  11. Close the attribute table.

    Filled data symbology

    The new data has been added to the map. Currently, it is symbolized with two points, a blue circle where new data values were added and an empty circle where there is only original data. You don't care that the data has been added, just that it is available, so you'll change the symbology.

  12. In the Contents pane, right-click EMU_Global_90m_Filled and choose Symbology.
  13. In the Symbology pane, for Symbology, expand the menu and choose Single Symbol.

    Single Symbol

    The layer redraws to show the new symbol. This symbol is randomly chosen, so you'll change it to something more uniform and easily seen.

  14. Next to Symbol, click the example.

    Example symbol

    The gallery opens.

  15. In the symbol gallery, choose Circle 1.

    Circle 1 symbol

  16. Click the Properties tab and set Size to 6 pt. Click Apply.
  17. Close the Symbology pane.

Create training data

Next, you'll create the training data that the random forest prediction model will need to form a relationship between seagrass occurrence and ocean conditions. The training dataset will be made up of seven predictor variables (ocean measurements) and one outcome variable (whether a location is a suitable seagrass habitat or not). To be easily accessible to the Python script you'll use later, these predictor variables need to be in a single feature class. You'll create a new feature class of random points, and then you'll add the ocean measurement data to each point.

  1. If necessary, check the boxes for EMU_Global_90m_Filled and Seagrass_USA to turn the layers on.
  2. On the ribbon, click the Map tab. In the Navigate group, click Bookmarks and choose Florida.

    Florida coast

    Most of the EMU_Global_90m_Filled data lies outside the seagrass observation layer. But if you use only the subsample of EMU_Global_90m_Filled that lies within the seagrass polygon, you'll have too few observations. You'll fix this by creating mock locations around the United States coastline and calculating the associated ocean measurements at these locations using EMU_Global_90m_Filled. Use the Create Random Points tool create a set of random points around the United States coast.

  3. In the Geoprocessing pane, search for the Create Random Points (Data Management) tool.

    Create Random Points tool

  4. For Output Point Feature Class, type USA_Train.
  5. For Constraining Feature Class, expand the menu and choose US_coastline_shallow.
  6. Change Number of Points to 10000 and click Run.

    Create Random Points parameters

    Now you have a new feature class with 10,000 points, which will be helpful in training your random forest model quickly. The problem is that these points don't have attributes. To give these points data, you'll create continuous interpolation surfaces for the EMU_Global_90m data using empirical Bayesian kriging (EBK) so that you can extract the data from these layers at each USA_Train point. This query will let you save the interpolated ocean measurements for each point location.

  7. In the Geoprocessing pane, click the back arrow and search for Empirical Bayesian Kriging. Click the first result.

    Empirical Bayesian Kriging

  8. For Input Features, choose EMU_Global_90m_Filled.
  9. For Z value field, choose TEMP_UNFILLED and set the Output raster to temp.
    Note:

    The temperature attribute may also be listed under its alias, TEMP. If TEMP_UNFILLED isn't listed, check the EMU_Global_90m_Filled attribute table to double-check.

  10. Click Run.
  11. Rerun the tool for each of the remaining ocean measurements:

    Z value fieldOutput raster

    DISSO2 (alias: DISSO2_FILLED)

    dissO2

    NITRATE (alias: NITRATE_FILLED)

    nitrate

    PHOSPHATE (alias: PHOSPHATE_FILLED)

    phosphate

    SILICATE (alias: SILICATE_FILLED)

    silicate

    SRTM30 (alias: SRTM30_UNFILLED)

    srtm30

    SALINITY (alias: SALINITY_UNFILLED)

    salinity

    Note:

    Make sure to use the names shown for output, because the code you will use later will look for these specific names. Ensure you don't confuse O and 0.

    Once the Empirical Bayesian Kriging tool is run for all seven ocean measurements, the next step is to extract the values for these measurements at USA_Train locations. All the surfaces should look similar to the one below, which shows the EBK model for nitrate concentration.

    Interpolated raster for nitrates

  12. In the Geoprocessing pane, click the back arrow and search for Extract Multi Values to Points (Spatial Analyst Tools).
  13. For Input point features, choose USA_Train.
  14. Next to Input rasters, click the drop-down arrow to expand the menu and add all seven of the interpolation rasters you just created.

    Extract Multi Values to Points parameters

  15. Click Run.

    This tool uses the interpolation rasters to extract the values of these surfaces at USA_Train locations.

Create a training label

The last step in creating the training dataset is determining where you already know seagrass grows. All the points in the training data lie within the US_Coastline_Shallow polygon, which overlaps with the Seagrass_USA polygon. You'll create a new field name, Present, and run a simple query to determine whether each point overlaps with the Seagrass_USA polygon. If it does, it can be given the categorical variable 1 to show that there is known seagrass growth at that location. All other points will be given the value 0 to show they are not suitable as seagrass habitat. Using this attribute, the machine learning model will be able to learn what combinations of ocean conditions are suitable for seagrass growth.

  1. In the Geoprocessing pane, search for Add Field (Data Management Tools).
  2. For Input Table, choose USA_Train, and for Field Name, type Present.
  3. Set Field Type to Double and click Run.

    Add Field

    This tool created an empty field in the USA_Train feature class named Present.

  4. In the Contents pane, right-click USA_Train and click Attribute Table. Make sure Present was added to the table.

    All the rows in the Present column have null data values. You want to assign points in USA_Train that fall in a seagrass polygon a value of 1, and assign a value of 0 for points that are not in a seagrass polygon. First, you'll change all the entries from null to 0.

  5. In the attribute table, right-click Present and choose Calculate Field.

    Calculate Field

    The Calculate Field menu opens in the Geoprocessing pane.

  6. For Input Table, choose USA_Train, and set Field Name to Present.
  7. Under Expression, for Present =, type 0 and click Run.

    Calculate Field parameters

    Now that the entire field is set to 0, you can find point locations that intersect the seagrass layer.

  8. On the ribbon on the Map tab, in the Selection group, click Select by Location.
  9. In the Geoprocessing pane, for Input Feature Layer, choose USA_Train.
  10. Make sure Relationship is set to Intersect, and for Selecting Features, choose Seagrass_USA and then click Run.

    Select Layer By Location parameters

    This tool selects rows in USA_Train that intersect Seagrass_USA polygons. These are the points that you'll give a value of 1.

  11. In the attribute table, right-click Present and choose Calculate Field. Set the parameters as listed below:
    • Input Table: USA_Train
    • Field Name: Present
    • Expression: Present = 1

    Calculate Present field

  12. Click Run.
  13. After the tool is finished running, on the ribbon on the Map tab, click the Selection group and click Clear.

    Clear selection

  14. Close the attribute table and save the map.

Next, you'll use the training data to create a model using the random forest classifier.


Perform random forest classification

Previously, you created a training dataset with eight variables that help determine suitability for seagrass habitats. Now that you have prepared your data, you'll use the machine learning libraries you downloaded to create a prediction model. First, you'll check the correlation of the variables to make sure a random forest classification is the best option. Random forest is a supervised machine learning method that requires training, or using a dataset where you know the true answer to fit (or supervise) a predictive model. Then, you'll split the data into two sections, one to train your random forest classifier, and the other to test the results it creates. Based on the accuracy of the results, you can apply the model to the global data you have and save it as a feature class.

Move your spatial data into Python

You'll use the ArcGIS Pro Python console to interact with the spatial training data you created in the previous lesson. First, you'll import the Python libraries that you'll use to build a predictive model and perform machine learning. Then, you'll bring your data into Python by converting it to structures that the libraries can manipulate. In the same way you need to have your data in the correct format, like a shapefile, to be read by ArcGIS, you need to have your data in arrays or data frames that can be read by Python.

  1. If necessary, open your SeagrassPrediction project.
  2. On the ribbon, click the Analysis tab, and in the Geoprocessing group, click Python.

    Open Python console

    The Python console opens within ArcGIS Pro.

  3. In the Python console, paste the following code and press Enter twice to run it.

    Pressing Enter twice after every step will run the code in small pieces.

    from sklearn.ensemble import RandomForestClassifier
    import numpy as NUM
    import arcpy as ARCPY
    import arcpy.da as DA
    import pandas as PD
    import seaborn as SEA
    import matplotlib.pyplot as PLOT
    import arcgisscripting as ARC
    import SSUtilities as UTILS
    import os as OS

    This code imports all the Python libraries you'll need for analysis.

    Loading libraries in the console

  4. Name the feature classes that contain the attributes you will use in your analysis. USA_Train and EMU_Global_90m_Filled are inconvenient to type, so name them inputFC and globalFC, respectively.
    inputFC = r'USA_Train'
    globalFC = r'EMU_Global_90m_Filled'
  5. Define the names of prediction variables (ocean measurements) and the prediction and classification variable (seagrass presence). For each variable, type the attribute names that you want the variable to contain. Finally, create a variable that contains all the attributes you're using. Instead of typing all eight variables again, concatenate the previous two variables, or link them together.
    Note:

    Ensure you are typing O or 0 where appropriate.

    predictVars = ['DISSO2', 'NITRATE', 'PHOSPHATE', 'SALINITY', 'SILICATE', 'SRTM30', 'TEMP']
    classVar = ['PRESENT']
    allVars = predictVars + classVar

    These variables will be added to the NumPy array data structure once the feature classes are read in to the Python framework. NumPy stands for Numerical Python and is a library for scientific computing. This library specifically contains functions that you'll use to break your training data into training and test sets.

  6. Use the ArcPy function FeatureClassToNumPyArray. For its arguments, type the input table and the field names from the input table that you want to use. Then use the ArcPy.Describe function to read in the spatial reference of your training data feature class.

    This function brings your feature classes in ArcGIS Pro into Python as arrays.

    trainFC = DA.FeatureClassToNumPyArray(inputFC, ["SHAPE@XY"] + allVars)
    spatRef = ARCPY.Describe(inputFC).spatialReference

    The fields argument ["SHAPE@XY"] calls the location coordinates for each point in the USA_Train data (trainFC) and concatenates it with the list allVars. Your array now contains the coordinates of all the points in the training data as well as all the attributes individually associated with them. The spatial reference saves the projection of your original data in the metadata so that you can visualize it as a feature class if you export it back to a feature class.

  7. Now that your data is in Python arrays, convert it to a pandas data frame. With the first argument, specify which array you're converting, and with the second argument, define the attributes you want to include.
    data = PD.DataFrame(trainFC, columns = allVars)

    A data frame is a data structure, and pandas is a standard library used to create and reference the structure. Now that all your variables are formatted correctly in Python, you can start using them for analysis.

Choose classification and separate data

The classification scheme is one of the most important parts of creating an accurate prediction model. In statistics, you want to choose a method of processing the numbers that has the least possibility of accidental bias. To make sure that the method you're using, random forest classification, is the best option, you'll create a correlation chart for the seven variables you're using. Then, you'll separate the data points you created earlier into training and test sets. The script will use the test dataset to predict seagrass occurrence. Then, it will use the test dataset to determine the accuracy of the predictor.

  1. Use the pandas .astype function to change the data type for easier computation. Then use the .corr() function to calculate the correlation between variables.

    Remember that in the last section, you named the pandas data frame containing all the attribute data.

    corr = data.astype('float64').corr()

    This function calculates Pearson's correlation coefficient between your predictor attributes. Correlation coefficients are a way of measuring the relationship between variables. All coefficients have a value between -1 and 1, with -1 showing a perfectly negative correlation (as variable A grows, variable B tends to shrink) and 1 showing perfect correlation (when variable A grows, variable B also tends to grow). A correlation coefficient of 0 shows no relationship at all.

  2. Use the following code to plot the correlation coefficients as a correlation matrix between variables.

    The .heatmap function from the seaborn library defines the type of chart you want to use. The following arguments specify the appearance of the chart.

    ax = SEA.heatmap(corr, cmap=SEA.diverging_palette(220, 10, as_cmap=True),
    square=True, annot = True, linecolor = 'k', linewidths = 1)
    PLOT.show()

    Correlation chart

    The chart shows the correlation between the variables. High positive correlations are shown in bright red, which is why there is a diagonal line across the center. Each variable is highly correlated with itself. The dark blues show high negative correlations.

    Multiple predictors are highly correlated, either positively or negatively, which makes random forest a good method to use. Random forest can handle predictor variables that are dependent on each other in a way that minimizes bias.

    Now that you've confirmed that random forest is the best model you can use, you'll break the training data into two portions using random sampling.

  3. Close the window to continue the workflow.
  4. Define fracNum, the size of the sample you want to take, and then use the .sample function to take a random sample from the training dataset. Use fracNum as the sample size parameter.
    fracNum = 0.1
    train_set = data.sample(frac = fracNum)

    You now have a random sample of 10 percent of the training data. The rest will become the test set.

  5. Create the dataset test_set by using the .drop argument to remove all the points from data that have already been assigned to the training dataset.
    test_set = data.drop(train_set.index)

    The seagrass data for the United States was divided into a training dataset and a test dataset. You specified that 10 percent of the available data should go into the training dataset to build your random forest predictor. The remaining 90 percent of the dataset will be used as a test of how accurately the model predicts seagrass occurrence.

  6. Create the variable Indicator and use the .factorize command to make sure the train_set data reads as categorical variables.
    indicator, _ = PD.factorize(train_set[classVar[0]])

    Alternatively, Python would read the data as continuous data, meaning the model could return all values between 0 and 1. In other cases this is the norm, but a potential value of 0.5 or 0.32 wouldn't tell us anything about the presence of seagrass.

  7. Use the print command to show the values of the training and test datasets. Type labels for both and concatenate them with the string version of the variable.
    print('Training Data Size = ' + str(train_set.shape[0]))
    print('Test Data Size = ' + str(test_set.shape[0]))

    Print the size of the two datasets

    Note:

    Because you're using a different random sample taken from the data, your results will vary slightly.

    As a next step, you will be training a random forest classifier to form a relationship between your predictors and seagrass occurrence.

Train your random forest classifier

Now that you have split your data, you'll train your random forest classifier using the training data you have created.

  1. Create the variable rfco to show the results of running the RandomForestClassifier command to create 500 trees. Then use the .fit argument to apply the forest results to the training data.
    rfco = RandomForestClassifier(n_estimators = 500, oob_score = True)
    rfco.fit(train_set[predictVars], indicator)
  2. Run the classification again using the test dataset. Create the attribute seagrassPred to store this data with a 1 for occurrence and a 0 for no occurrence.

    The test data is 90 percent of the United States coastal data that was not used to train the model, and will show the accuracy of your prediction.

    seagrassPred = rfco.predict(test_set[predictVars])
  3. Use the results of the classification to check the efficiency of the model by calculating prediction accuracy and estimation error.
    test_seagrass = test_set[classVar].as_matrix()
    test_seagrass = test_seagrass.flatten()
    error = NUM.sum(NUM.abs(test_seagrass - seagrassPred))/len(seagrassPred) * 100
  4. Print the accuracy metrics of your data to make sure the model's predictions are working correctly.
    print('Accuracy = ' + str(100 - NUM.abs(error)) + ' % ')
    print('Locations with Seagrass = ' + str(len(NUM.where(test_seagrass==1)[0])) )
    print('Predicted Locations with Seagrass = ' + str(len(NUM.where(seagrassPred==1)[0])))

    Print accuracy metrics

    The script prints the number of points used for the training and test data, as well as the accuracy. Approximately 95 times out of 100, the prediction model was correct in predicting seagrass occurrence in a location in which it was known to exist. With such a high accuracy rate, you can now train this model on the entire United States data and predict global seagrass locations.

  5. Use the .factorize function to create the variable indicatorUSA.
    indicatorUSA, _ = PD.factorize(data[classVar[0]])

    Like when you created indicator, the .factorize function will encode the data as categorical.

  6. Define the variable rfco as the random forest model you're training using all the data from United States coasts. For the argument n_estimators, specify that you want to create 500 trees.
    rfco = RandomForestClassifier(n_estimators = 500)
    rfco.fit(data[predictVars], indicatorUSA)

    Now that the random forest model, rfco, is trained, you'll apply it to the EMU data for the world's coasts. The process for this is similar to the process you used to format the training data correctly.

  7. Read the global EMU data in to Python as arrays and convert it to a pandas framework, and then use the .describe function to save the spatial reference of the feature class.
    globalData = DA.FeatureClassToNumPyArray(globalFC, ["SHAPE@XY"] + predictVars)
    spatRefGlobal = ARCPY.Describe(globalFC).spatialReference
  8. Run the global data through the rfco model to get the global predictions.
    globalTrain = PD.DataFrame(globalData, columns = predictVars)
    seagrassPredGlobal = rfco.predict(globalTrain)
  9. Use the NumPyArrayToFeatureClass function to store the prediction array as a feature class. Name the feature class and specify the geodatabase. Specify the input data and format of the output table.
    Note:

    Make sure to edit the outputDir location to your Documents folder where you unzipped the project. Replace your_username before running this piece of code. To easily find the correct file path, open the geodatabase in a File Explorer window, and copy the entire path.

    nameFC = 'GlobalPrediction'
    outputDir = r'C:\Users\your_username\Documents\SeagrassPrediction\SeagrassPrediction.gdb'
    grassExists = globalData[["SHAPE@XY"]][globalTrain.index[NUM.where(seagrassPredGlobal==1)]]
    ARCPY.da.NumPyArrayToFeatureClass(grassExists, OS.path.join(outputDir, nameFC), ['SHAPE@XY'], spatRefGlobal)
  10. Close the Python console.
  11. Save the map.

You've created a prediction model on whether or not seagrasses occur at a given coastal location around the globe. The attribute seagrassPred will contain the prediction for each point as either 1 or 0. A value of 1 indicates suitability as a seagrass habitat, and 0 indicates an unsuitable location for seagrass growth. In your pursuit of modeling seagrass habitats, you are interested in the 1 values, locations where seagrass grow. In addition, you are interested in contiguous patches of locations that have a high density of 1 values.


Evaluate the prediction result

Previously, you predicted seagrass occurrence around the world using the random forest classifier. As an ecologist, you know it is most cost-effective to protect locations with dense seagrass predictions. To find those areas, you'll add your prediction results to the map and use the Kernel Density tool to find locations around the globe where there are large concentrations of predicted seagrass. Finally, you'll insert a layout and add your map so that you can easily export it as a graphic to share.

Create a kernel density surface

First, you'll save the results of your prediction as a feature class, and then you'll add it to the map.

  1. If necessary, open your project.
  2. If necessary, click the View tab on the ribbon, and in the Windows group, click Catalog Pane.
  3. In the Catalog pane, click Project and expand Folders to locate the geodatabase.

    Seagrass geodatabase

  4. Right-click SeagrassPrediction.gdb and click Refresh.
  5. After the refresh is complete, expand SeagrassPrediction.gdb and drag GlobalPrediction onto the map.

    GlobalPrediction layer results of points with seagrasses

  6. In the Geoprocessing pane, search for the Kernel Density tool and choose Kernel Density (Spatial Analyst Tools).
  7. For Input point or polyline features, choose GlobalPrediction. For Output raster, type SeagrassHabitats and change the Output cell size to 0.2.

    Kernel Density tool

    To make sure that the density results do not include land, you'll use the bathymetry_shallow layer as a mask.

  8. Click the Environments tab, and for Mask, choose bathymetry_shallow. Click Run.

    The resulting density surface is a new raster containing the density of seagrass predicted along different coasts around the world. The default symbology shows seagrass growth with a purple color ramp. The dark purple areas indicating high concentration of predicted seagrass are visible, but the lighter areas are difficult to see. In the Contents pane, you can also see that the values have been symbolized using classification. The range of values has been divided into categories, or classes, shown with individual boxes. In this case, the classes aren't meaningful, so you'll reclassify the data using Stretch.

  9. If necessary, turn off all the layers except for SeagrassHabitats.
  10. Right-click the SeagrassHabitats layer and click Symbology.
  11. In the Symbology pane, expand the Symbology menu and choose Stretch as the classification type.

    Instead of breaking the data into strict categories, Stretch takes all the values in your data and shows them as a relative place within a range.

  12. In the Symbology pane, expand the Color scheme menu and click Show All. Choose Heat Map 1.

    Heat Map 1 color ramp

    The map redraws to show predicted seagrass locations with the new heat map color ramp.

    Prediction color scheme

    While seagrasses grow in most shallow coastal areas, the areas in reds and yellows are locations that have the right ocean conditions to create large clusters of seagrass growth. Against the current basemap, the clusters don't stand out very well.

  13. On the ribbon, click the Map tab and change the basemap to Light Gray Canvas.

    Light Gray Canvas basemap

    The simple gray of the continents makes the seagrass layer stand out, which is what you want, because your findings on seagrasses are what you're trying to emphasize on this map. The hot spots you found are also on the easternmost and westernmost edges of the map—you'll change the view to center these clusters.

  14. In the Contents pane, right-click Seagrass and choose Properties.

    Map Properties

  15. In the Map Properties: Seagrass window, click Coordinate Systems and check the Enable wrapping around the date line box. Click OK.

    Enable wrapping around the date line

  16. Pan the map so that the Alaska-Siberia cluster is in the center.

    Map centered on Alaska and Siberia

Insert a layout

Now that you've found the areas that are most important to conserve, you'll put the map in layout form so that you can easily export your work to share in papers and presentations.

  1. On the ribbon, click the Insert tab and expand New Layout. Choose the first option under ANSI – Landscape, Letter.

    Letter layout

    A new layout opens. It is blank so that you can choose which of your maps you want to add.

  2. On the ribbon on the Insert tab, click the Map Frame button. Choose the Seagrass map.

    Add Map Frame

  3. Click the upper left corner of the map and draw a rectangle the size of the layout.

    The map frame is added to the layout. It will show any layers that you had turned on in your map.

  4. Right-click the map frame in your layout and choose Activate.

    Activate map frame

    When the map frame is activated, you can zoom and pan the map to the location you want.

  5. Pan the map until it is roughly centered.
  6. On the ribbon under Activated Map Frame, click the Insert tab. In the Map group, click Close Activation.

    The map frame is deactivated and will stay stationary as you add the rest of the elements, such as a title and scale bar.

  7. On the ribbon, click the Insert tab. In the Text group, click Text (or Rectangle) and then click the map to insert a text box.
  8. If necessary, in the Contents pane, double-click the Text item to open the Format Text pane.
  9. In the Format Text pane, for Text, type GLOBAL SEAGRASS HABITATS and click outside of the text box to apply.

    Format title text

  10. Click the Text Symbol tab and click Properties, and then expand Appearance. Change Font name to Constantia, Font style to Bold, and Size to 24 pt.

    Format title text

  11. Click Apply and center the title at the top of the map.
  12. On the ribbon, on the Insert tab, click Scale Bar and draw a rectangle at the bottom left corner of the map.

    A scale bar is added to the map.

  13. Drag the scale bar to the lower right corner of the layout.

    Map with scale bar

    Finally, you'll add a legend to give the prediction layer more context.

  14. On the ribbon, on the Insert tab, click Legend. Draw a rectangle at the bottom left of the map.

    By default, the legend shows the layer you currently have symbolized on the map. The text on the legend doesn't tell you anything useful, so you'll remove it.

  15. In the Format Legend pane, expand the Legend Items group and click Show Properties. Under Show, uncheck every box.

    The only thing visible on your legend is the color ramp. Because this is all you're choosing to show, you'll make it larger.

  16. If necessary, expand the Sizing group. Change Patch width to 30 pt and Patch height to 20 pt.

    Change patch size

  17. In the upper left corner of the Format pane, click the back arrow.

    Now you'll add your own labels to the legend.

  18. On the ribbon on the Insert tab, click Text. Draw a text box at the top of the legend.
  19. In the Format Text pane, for Text, type High Density.
  20. Click the Text Symbol tab and expand Appearance. Change Font Name to Constantia and Font Style to Bold. Click Apply.
  21. Create another text box named Low Density and format it the same way.
  22. Line the labels up with the top and bottom of the color ramp, respectively.

    Final map layout

    Now that all your layout elements are in place, you'll share the map.

  23. On the ribbon, click the Share tab. In the Export group, click Layout.
  24. Save the map.

You can choose to save your graphic in any format and size you want to be able to use it in a presentation or paper later. The layout shows several important clusters of probable seagrass growth around the world, which you identified using random forest classification to find hospitable ocean conditions. These clusters should be researched further, and ultimately, protected.

You can find more lessons in the Learn ArcGIS Lesson Gallery.