Detect objects with Text SAM

Set up the project

To get started, you'll download a project that contains all the data for this tutorial and open it in ArcGIS Pro.

Download the Boat_Detection package.
A file named Boat_Detection.ppkx is downloaded to your computer.
Note:
A .ppkx file is an ArcGIS Pro project package and may contain maps, data, and other files that you can open in ArcGIS Pro. Learn more about managing .ppkx files in this guide.
Locate the downloaded file on your computer.
Tip:
In most web browsers, it is downloaded to the Downloads folder.
Double-click Boat_Detection.ppkx to open it in ArcGIS Pro. If prompted, sign in with your ArcGIS account.
Note:
If you don't have access to ArcGIS Pro or an ArcGIS organizational account, see options for software access.
ArcGIS Pro version 3.2 or later is required for Text SAM workflows.
A map appears, centered on the Tuborg Havn neighborhood of Copenhagen, Denmark. An image layer, Tuborg_Havn.tif, displays on top of the topographic basemap.
Zoom in and pan to examine the imagery.
Observe the numerous boats throughout the marinas and canals. This is aerial imagery that was orthorectified to remove any distortions. It is high resolution—each pixel represents a 20 by 20 centimeter square on the ground—and shows boats and other features quite clearly. It is in the TIFF format with three bands: red, green, and blue, which together form a natural color picture. It has a pixel depth of 8 bits.
Identifying all the boats manually in this image, and even more in all the marinas and canals of Copenhagen, would be time consuming. Instead, you'll use the Text SAM GeoAI model to detect them automatically.

Download the Text SAM model

To use the Text SAM model, you'll first download it to your computer. Text SAM is available on ArcGIS Living Atlas of the World, which is Esri's authoritative collection of GIS data and includes a growing library of deep learning models.

Open ArcGIS Living Atlas in your web browser.
On the ArcGIS Living Atlas home page, in the search box, type Text SAM.
In the list of results, click Text SAM to open its item page.
On the Text SAM item page, read some of the description and explore the page.
Text SAM is a multipurpose model that can be prompted using free-form text prompts to extract features of various kinds from imagery. For instance, text prompts might be airplane to detect airplanes, panel to detect solar panels, or red car to detect red cars. The output is a polygon layer representing the approximate outline of the objects detected.
The page also includes useful information on the expected input, which should be 8-bit, 3-band RGB imagery.
The model is a good match for the Copenhagen imagery used in this tutorial.
Tip:
You can learn more about Text SAM by reading the Text SAM: Extracting GIS Features Using Text Prompts article and the Text SAM: Use the model guide.
If you want to run this workflow on your own imagery, see the last section of this tutorial on tips to convert your data to the expected input format.
At the top of the page, under Overview, click Download.
The model file downloads to your computer.
Note:
The model file is 1.75 GB and may take a few minutes to download.
Locate the downloaded TextSAM.dlpk file on your computer and move it to a folder where you can easily find it , such as C:\GeoAI_models.
Tip:
You can also use Text SAM directly in an ArcGIS Pro geoprocessing tool without saving it first; however, the tool will then download a new copy of the model each time it runs. I For that reason, it can be useful to store it locally.

Detect boats using Text SAM

You will now detect the boats present in your Copenhagen image. You'll use the Detect Objects Using Deep Learning geoprocessing tool pointing to the copy of the Text SAM model you downloaded as one of the parameters.

Note:

Using the deep learning tools in ArcGIS Pro requires that you have the correct deep learning libraries installed on your computer. If you do not have these files installed, save your project, close ArcGIS Pro, and follow the steps delineated in the Get ready for deep learning in ArcGIS Pro instructions. In these instructions, you will also learn how to check whether your computer hardware and software are able to run deep learning workflows and other useful tips. Once done, you can reopen your project and continue with the tutorial.

On the ribbon, on the View tab, in the Windows group, click Geoprocessing.
In the Geoprocessing pane, in the search box, type Detect Objects Using Deep Learning. In the list of results, click the Detect Objects Using Deep Learning tool to open it.
In the Detect Objects Using Deep Learning tool, choose the following parameter values:
- For Input Raster, choose Tuborg_Havn.tif.
- For Output Detected Objects, type Detected_Boats.
- For Model Definition, click the Browse button.
You will now retrieve the Text SAM model.
In the Model Definition window, browse to the location where you saved the Text SAM model, click TextSAM.dlpk, and click OK.
After a few moments, the model arguments load automatically. You will choose a text prompt that corresponds to the objects you want to detect.
Under Arguments, for text_prompt, type boat.
Tip:
You could add more words to your text prompt separated by commas, such as boat, yacht, canoe. In this case however, the single-word boat prompt will give excellent results.
Locate the batch_size argument.
Deep learning object detection cannot be performed on the entire image at one time. Instead, the tool will cut the image into small pieces known as chips. A batch size of 4 means that the tool will process four image chips at a time. As you run the tool, you may get an out of memory error because your computer doesn't have enough memory for that level of processing. In that case, try decreasing the batch_size value from 4 to 2 or even 1. If you have a powerful computer, you could also increase the batch_size value for faster processing. Changing the batch_size value will not affect the quality of the model, only the efficiency of the model's detection process.
For now, you'll keep the 4 default value.
For the nms_overlap argument, type 0.7.
Sometimes the model detects an object more than once. Non Maximum Suppression (NMS) is an optional process that suppresses some of the detected objects when there is duplication. The object that was detected with the highest confidence is kept, the other objects are removed. In the following example image, the boat was detected three times, and with NMS, only one of these three polygons will be kept.

The nms_overlap argument determines how much overlap there must be between two detected objects for them to be considered duplicates of each other and for NMS to be applicable. Possible values for that argument are between 0 and 1. For instance, 0.7 means that the overlap should be 70 percent or more.
Check the box next to Non Maximum Suppression.
With the Text SAM model, you can apply NMS during the Text SAM object detection process (that's the nms_overlap argument) or as a post-processing step (that's the Non Maximum Suppression checkbox). Through trial and error, it was found that the best results for this specific use case are obtained by choosing a high value for the nms_overlap argument (0.7) and by applying the Non Maximum Suppression post-processing option with its default settings.
Note:
Under Non Maximum Suppression, the Max Overlap Ratio parameter specifies the overlap for the post-processing NMS step. Just like nms_overlap, it can vary from 0 to 1. The default value of 0 means that as soon as two polygons have an overlap greater than 0, they will be considered duplicates.
Keep the default values for all the other arguments.
Note:
For more information about the role of the different arguments, see the Text SAM guide.
Click the Environments tab.
At this point, you could run the tool as is: it would proceed to detect boats over the entire Tuborg_Havn.tif image, which could take 30 minutes to 1 hour based on your computer's specifications. For the brevity of this tutorial, you will only detect boats in a small subset of the input image.
On the ribbon, on the Map tab, in the Navigate group, click Bookmarks and choose Detection area.
The map zooms in to a smaller area of the Tuborg Havn marina.
In the Geoprocessing pane, on the Environments tab, under Processing Extent, click the Current Display Extent button.
The Top, Left, Right, and Bottom coordinates update to match the current extent showing on the map.
Under Processor Type, choose GPU. For GPU ID, type 0.
Note:
For this tutorial, it is assumed that your computer has an NVIDIA GPU. If it doesn't, choose CPU, but realize that the process will take much longer to run. To learn more about GPUs and how they are used for deep learning processes, see the Check for GPU availability section in the Get ready for deep learning in ArcGIS Pro tutorial.
Accept all other default values and click Run.
You can monitor the process progress below the Run button, and you can click View Details to see more information.
After a few minutes, the result layer, Detected_Boats, appears in the Contents pane and on the map. It is a feature layer in which each polygon represents a boat.
Tip:
If you get an out of memory error, try decreasing the batch_size value from 4 to 2 or even 1 and run the process again.
You successfully detected boats in an area of Tuborg Havn using Text SAM.
Note:
The Text SAM deep learning algorithm is not deterministic, so the results may vary slightly each time you run the tool.
Also, the color is assigned at random and may vary.
On the Quick Access toolbar, click the Save button to save your project.

Style the result layer

You'll now examine the results in the Detected_Boats layer and refine them. First, you'll change the symbology of the layer to better see the detected objects.

In the Contents pane, click the Detected_Boats symbol to display the Symbology pane.
In the Symbology pane, if necessary, click the Properties tab.
Under Appearance, set the following parameters:
- For Color, choose No color.
- For Outline color, choose a bright red, such as Fire Red.
- For Outline width, select 2 pt.
Click Apply.
The layer updates to the new symbology.
On the map, zoom in and pan to inspect the Detected_Boats layer.
You can observe that the model was successful in detecting the boats, showing an approximate outline of each boat. However, there are a few cases of false positives—where the model mistakenly found a boat where there is none, as seen in the following image example.

Refine the results

You'll now learn how to refine the results and remove the false positives.

In the Contents pane, right-click the Detected_Boats layer and choose Attribute Table.
In the Detected_Boats attribute table, each row corresponds to a detected boat feature. There are currently 76 features.
Note:
The number of features you obtained might be slightly different.
You'll focus on the two following fields:
- Confidence—This field indicates with what confidence level the model identified each feature as a boat (as a percentage).
- Shape_Area—This field indicates the area of each feature (in square meters).
You will first examine features that may be too small to be boats.
Double-click the Shape_Area field name to sort the attribute table by that field.
The features are now listed from smallest to largest area.
Double-click the row header for the first feature to zoom in to it on the map and examine it.
You can see on the map that this feature is only a few pixels wide and it is not a boat.
Similarly, review a few of the next features in the list to determine what area is large enough to represent actual boats.
An area of 9 square meters seems to be the threshold. Next, you will examine features that have the lowest confidence.
Double-click the Confidence field name to sort the attribute table by that field.
The features are now listed from lowest to highest confidence. The first features in the list have a confidence of about 20 percent, which is very low.
Double-click the first few features to zoom in to them and examine them.
Continue to examine more features, to determine at what level of confidence they actually start representing boats.
You find that a confidence of about 28 percent seems to be the threshold. You will now create a copy of the Detected_Boats layer that contains only the features that have a high enough confidence and are large enough to be boats.
Tip:
You could delete the unwanted features manually from the Detected_Boats layer, but it can be useful to keep that layer intact and derive a new layer instead, in case you want to continue exploring your original results.
On the ribbon, on the Map tab, in the Selection group, click Clear to deselect all features.
In the Contents pane, right-click Detected_Boats, click Data, and choose Export Features.
In the Export Features pane, for Output Feature Class, type Detected_Boats_Cleaner.
Expand Filter, and form the expression Where Confidence is greater than 28.
Click Add Clause, and form the second expression And Shape_Area is greater than 9.
Click OK.
The new layer is added to the map.
In the Contents pane, click the box next to Detected_Boats to turn the layer off.
Close the Detected_Boats attribute table to increase the size of the map.
On the Map tab, click Bookmarks and choose the Detection area bookmark to come back to the full detection extent.
On the map, review the Detected_Boats_Cleaner layer.
Most of the false positives are now gone.
Note:
Now that you know the optimal confidence threshold is 28 percent for your data, if you want to run the tool again, you could change the box_threshold model argument in the Detect Objects Using Deep Learning tool from 0.2 to 0.28. This way the false positives with a confidence between 20 and 28 percent will be dropped from the results. (The box_threshold argument determines the minimum Confidence value accepted in the output.)
However, there is no equivalent for the area threshold, so that part needs to remain as a postprocessing step.
In the Contents pane, right-click the Detected_Boats_Cleaner layer and choose Attribute Table.
There are now 66 boats left in the layer.
If you were to use Text SAM to detect boats in every Copenhagen neighborhood, the boat count could be summarized in a graph by neighborhood. The Detected_Boats layer could also be used to create a hot spot map showing the boat concentration levels throughout the city. Finally, this analysis could be repeated regularly on new imagery to identify patterns and change over time.
Press Ctrl+S to save your project.

Apply Text SAM to your own imagery

If you want to apply Text SAM to your own data, below are a few tips to help you be successful.

Preparing the imagery—The Text SAM model is expecting three-band imagery (red, green, and blue or RGB). If your imagery has more than three bands, you should extract the relevant bands before applying Text SAM. The model also expects the imagery to have an 8-bit pixel depth. If your imagery has a different pixel depth, such as 16 bit, you should convert it to 8 bit. See the Select relevant imagery bands section in the Improve a deep learning model with transfer learning tutorial for step-by-step instructions on how to implement these changes.
Finding information about your imagery—If you are not sure what your imagery's properties are (such as number of bands, pixel depth, or cell size), in the Contents pane, right-click your imagery layer, and choose Properties. In the Properties pane, click the Source pane, and under Raster Information, find the Number of Bands, Cell Size X, Cell Size Y, and Pixel Depth.
Original workflow—When your imagery is ready for the object detection process, first try using the Text SAM workflow exactly as you learned in this tutorial. This is the simplest approach, and you may immediately obtain high-quality results.
Changing the cell size—If you are not fully satisfied with your first results, you could try varying the Cell Size value in the Detect Objects Using Deep Learning environment parameters. The cell size (in meters) should be chosen to maximize the visibility of the objects of interest throughout the chosen extent. Consider a larger cell size for detecting larger objects and a smaller cell size for detecting smaller objects. For example, set the cell size for cloud detection to 10 meters, while for car detection, set it to 0.3 meters (30 centimeters). While your input image will not change, the tool will resample the data on the fly during processing. For further information regarding cell size, refer to Multiresolution Object Detection with Text SAM and Pixel size of image and raster data.
Using a mask—When detecting objects in specific areas of interest, such as boats appearing only in water-covered areas, it can be useful to set a Mask in the Detect Objects Using Deep Learning environment parameters. A mask is a polygon (or raster) layer that delineates the areas of interest for the analysis, for instance, water area boundaries throughout Copenhagen, or perhaps specific marinas if these are the sole targets of your study. When the tool runs, processing will only occur on locations that fall within the mask, saving time and avoiding false positives outside the mask.
Using an object-specific pretrained model—Besides using Text SAM, there is another time-effective approach to detecting objects with GeoAI in ArcGIS: you can use one of the dozens of pretrained models released by Esri, where each model focuses on a single object type, such as trees, buildings, or solar panels. To learn more, see the Detect objects with a deep learning pretrained model tutorial.
Land cover and other types of pixel-level classification —Text SAM is meant to detect discrete objects that are relatively compact and distinct from their context (such as boats surrounded by water). If you want to extract land use and land cover (LULC) information, or perform other types of pixel-level classification, you shouldn't use Text SAM (or other similar models, such as SAM). Instead, consider other pretrained models from ArcGIS Living Atlas, such as High Resolution Land Cover Classification – USA or Land Cover Classification (Sentinel-2). To learn more, see the Extract high-resolution land cover with GeoAI tutorial.

In this tutorial, you downloaded the Text SAM GeoAI model from the ArcGIS Living Atlas website and used it to detect boats in an image. You then used attribute filters to remove false positive features. Finally, you learned a few tips to successfully apply this workflow to your own imagery.

You can find more tutorials like these in the Try deep learning in ArcGIS series.

Set up the project

Note:

Tip:

Note:

Download the Text SAM model

Tip:

Note:

Tip:

Detect boats using Text SAM

Note:

Tip:

Note:

Note:

Note:

Tip:

Note:

Style the result layer

Refine the results

Note:

Tip:

Note:

Apply Text SAM to your own imagery

Deep Learning Using ArcGIS Pro

Classifying Object Using Deep Learning in ArcGIS Pro

ArcGIS: Introduction to Deep Learning

Requirements

Outline

Detect objects with Text SAM

Set up the project

Note:

Tip:

Note:

Download the Text SAM model

Tip:

Note:

Tip:

Detect boats using Text SAM

Note:

Tip:

Note:

Note:

Note:

Tip:

Note:

Style the result layer

Refine the results

Note:

Tip:

Note:

Apply Text SAM to your own imagery

Acknowledgements

Send Us Feedback

Share and repurpose this tutorial

Ready to learn more?

Related Esri training

Deep Learning Using ArcGIS Pro

Classifying Object Using Deep Learning in ArcGIS Pro

ArcGIS: Introduction to Deep Learning