List and describe datasets with Python

Open a project and review datasets

Before you begin using Python to list and describe datasets, you will download and extract a .zip file of the project data and review the datasets in ArcGIS Pro.

  1. Download the data for this tutorial and extract the contents to a location on your computer.

    The .zip file contains a folder named PythonDesc.

    In this tutorial, the data is shown at C:\Lessons\PythonDesc\. You can use a different folder, but be sure to adjust the paths in the instructions and code that follow.

  2. Start ArcGIS Pro.
  3. In ArcGIS Pro, under New Project, click Start without a template.

    Start without a template.

    A new project called Untitled opens. You will only use ArcGIS Pro to review the data, so there is no need to rename the project and save your work.

  4. If the Catalog pane is not already visible, on the View tab, click Catalog pane.
  5. In the Catalog pane, right-click Folders and click Add Folder Connection.

    Add folder connection.

  6. In the Add Folder Connection window, browse to the location where you extracted the PythonDesc.zip file (C:\Lessons), click the PythonDesc folder, and click OK.
  7. In the Catalog pane, expand Folders and expand PythonDesc.

    PythonDesc folder contents

    The folder contains two shapefiles (bike_routes.shp and watersheds.shp), as well as a text file (bike_racks.csv) and a database table (gardens.dbf). It also contains one geodatabase named DC.gdb.

  8. Expand DC.gdb.

    expanded.

    The geodatabase contains three feature classes (boundary, neighborhoods, and public_schools), one table (car_share_locations), and one feature dataset (Transportation).

  9. Expand Transportation.

    Transportation dataset expanded.

    The feature dataset contains four feature classes (roads, street_lights, traffic_analysis_zones, and traffic_junctions) and one network dataset (traffic).

    The datasets are typical of GIS projects: feature classes and tabular data in different formats, as well as other elements to organize this data. You will use Python code to identify these datasets based on their type and other properties. Notice that these groupings mean that the project data has a multilevel, nested structure.

    Before working with Python code, you will also explore these datasets by looking at the actual files on your computer.

  10. Open Microsoft File Explorer and browse to the C:\Lessons\PythonDesc\ folder, or the other location where you put the PythonDesc folder.

    Folder contents in Windows File Explorer

    File Explorershows you the files. The two stand-alone tables, bike_racks.csv and gardens.dbf, are single files. The bike_routes and watersheds shapefiles are each composed of multiple files with the same name and different file extensions. The DC geodatabase is a folder with a .gdb file extension.

  11. In File Explorer, expand the DC.gdb folder.

    The geodatabase contents

    There are dozens of files inside this folder. There is no clearly recognizable link between these files and the data elements that are visible when viewing the geodatabase in the Catalog pane in ArcGIS Pro. For example, you cannot identify one or more files that constitute one specific feature class. This is important when using GIS datasets in Python code since it affects how to access and work with datasets in different formats.

  12. You have examined the organization of the data in ArcGIS Pro and in Windows File Explorer. Next, you'll open a Python file and begin to use Python to list and describe this data.

Open a Python script file

In this tutorial, you will write Python code using IDLE. IDLE is a basic Python editor that is included with ArcGIS Pro.

To ensure the correct version of Python that is installed with ArcGIS Pro is used to run your code, you will use a shortcut to launch IDLE.

If you also have ArcGIS Desktop (ArcMap) installed, the context menu will also include the shortcut Edit with IDLE. Do not use this shortcut; it will open an older version of Python.

  1. In File Explorer, go back up a level to the C:\Lessons\PythonDesc\ folder.
  2. Right-click describe_data.py and click Edit with IDLE (ArcGIS Pro).

    Edit with IDLE (ArcGIS Pro)

    Opening IDLE this way opens the script using the active Python environment of ArcGIS Pro. This ensures the correct version of Python is used. While this tutorial uses IDLE as the code editor, the code also works in other Python editors, such as PyCharm or Spyder, as well as in the Python window or Notebooks inside ArcGIS Pro.

    Note:
    If Edit with IDLE (ArcGIS Pro) is not visible in the context menu, click Start, expand ArcGIS, and click Python Command Prompt. In the Python Command Prompt window, type idle and press Enter. The IDLE (Python 3.7 Shell) appears. Click File and click Open. Browse to and open the describe_data.py file.

    The script opens.

    Script open in IDLE.

    The script includes three lines of code to get started. The first line is import arcpy, which imports the ArcPy package. This ensures the functionality of ArcPy, including functions you will use to list and describe GIS datasets, is available in the script.

    The second line specifies the path where the data resides. This value is assigned to a variable called mypath.

    A single forward slash (/) is used in the path instead of a single backslash (\). A single backslash in Python is used as an escape character and using it in paths can have unintended consequences. When followed by certain other characters, they encode special behavior. For example, \t means a tab, and a path such as C:\Lessons\toronto.gdb would result in a tab being added in the path, causing an error.

    Instead of one forward slash, you can also add the letter r (for raw) before the string (r"C:\Lessons\PythonDesc") or use double backslashes ("C:\\Lessons\\PythonDesc"). All three notations will work.

    The third line sets the workspace. A workspace is the default location for the files you will be working with, such as input and output datasets. The workspace is set to the mypath variable. While it is possible to set the workspace directly without using the mypath variable, there are benefits to using a separate variable for the path where the data resides.

  3. If you saved the PythonDesc folder in a location other than C:\Lessons\PythonDesc\, edit the path in the script to point to the location where you saved the folder.

    For example, if you extracted the data for this tutorial into the folder C:\EsriLessons, you should edit the second line to be as follows:

    mypath = "C:/EsriLessons/PythonDesc"

    If your path is relatively long, you can copy the path from File Explorer and paste it into the script. You can do this by right-clicking the folder (Windows 11) or holding down the Shift key and right-clicking the folder (Windows 10), and clicking Copy as Path. You can then select and delete the old path from the script and press Ctrl+V to paste the new path.

    To use a path as a valid workspace, it must be a Python string, which means it must have quotes around the path. You also need to use one of the three correct styles for paths mentioned above. If you copy a Windows path, it will contain backslashes. You can replace every single backward slash with two backward slashes or a single forward slash. For long paths, the easiest solution is to add an r in front of the string to make it a raw string. This ensures the backward slashes are not read as escape characters.

    Now you'll test your code.

  4. Click File and click Save.
  5. Click Run and click Run Module.

    The IDLE Shell window appears, showing a message that indicates the shell has restarted for your script. The script path and file name are shown. There is a pause of several seconds, then the IDLE Shell prompt, >>> returns.

    shell restarts while script runs.

    It may seem as though nothing happened, because the script does not have any instructions that print data to the shell. Later, you will add print statements to your code to return information to the shell.

    The IDLE Shell is also a place where you can type lines of code and have them run immediately.

  6. In the IDLE Shell, after the prompt >>>, type print(arcpy.env.workspace) and press Enter.

    Print the workspace.

    The path to the workspace that you set in the script is printed to the IDLE Shell. The script imported the arcpy module, set the mypath variable equal to a string that contained the path to the tutorial data, and set the workspace for arcpy to that path. By printing the value of arcpy.env.workspace in the shell, you have shown that the script worked.

  7. Close the IDLE Shell window.

With the workspace set, you can reference a dataset by using its file name without the full path.

Describe a dataset using Python

Now that the script sets the workspace, you can use Python to describe the properties of a dataset in that workspace.

You will use the da.Describe() function to do this.

  1. In the IDLE editor window, click after the arcpy.env.workspace = mypath line and press Enter to add a new line.
  2. Type the following code and press Enter:

    desc = arcpy.da.Describe("bike_routes.shp")

    Describe bike routes.

    The da.Describe() function is a function of the arcpy.da module, which is used for data access workflows. The function returns a Python dictionary. A Python dictionary consists of pairs of keys and their corresponding values. The keys in the dictionary returned by da.Describe() are the properties of the dataset.

  3. Add a new line after the desc = arcpy.da line, and add the following code and press Enter:

    print(f'baseName: {desc["baseName"]}')

    When the script is run, this will print the value of the property baseName. In the line of code above, desc is the dictionary of properties and baseName is the key, as a string. The expression desc["baseName"] returns the value associated with this key in the dictionary.

    The style of print formatting used here is known as an f-string. F-strings are also called formatted string literals. They are strings prefixed with the letter f and they contain variables in braces. These variables are replaced with their value at run time. In the example code above, double quotes are used around the value baseName to make it a string. Since quotes are also used for f-strings, single quotes are necessary to distinguish them from the double quotes. These different types of quotes can be used interchangeably in Python, provided they are used consistently.

    The following is equally correct:

    print(f"baseName: {desc['baseName']}")

    The complete code so far is as follows:

    import arcpy
    mypath = "C:/Lessons/PythonDesc"
    arcpy.env.workspace = mypath
    desc = arcpy.da.Describe("bike_routes.shp")
    print(f'baseName: {desc["baseName"]}')

    If you stored the data in another location, your mypath line will be different.

  4. Click File and click Save to save your script.

    You can also use the keyboard shortcut Ctrl+S.

  5. Click Run and click Run Module.

    Report basename bike_routes.

    The IDLE Shell window appears with the result:

    baseName: bike_routes

    The base name of the data is the file name without the file extension.

    When learning how to code, it is common to run into errors. A typical error you may encounter when running this script is illustrated in the next step.

  6. Modify the desc = arcpy.da.Describe("bike_routes.shp") line by removing the underscore character from bike_routes.

    desc = arcpy.da.Describe("bikeroutes.shp")

  7. Save and run the script.

    The IDLE Shell shows an error message:

    ValueError: Object: Error in accessing describe

    Value error message

    The error suggests that there is an issue with the da.Describe() function. The reason for this error is that the name of the feature class is not spelled correctly, so it cannot be located. If you encounter this error, double-check that the name is correct. Another possible error is that the workspace has not been set correctly, so make sure to check the path used for your workspace as well.

    Note:
    In addition to da.Describe(), you can also use the regular ArcPy function Describe() to examine the properties of GIS datasets. The syntax is slightly different from da.Describe().The Describe() function returns a Describe object, and you can use this object to check its properties.

  8. In the IDLE editor window, change the line to fix the feature class name.

    desc = arcpy.da.Describe("bike_routes.shp")

  9. After the print(f'baseName: line, add the following lines:

    print(f'extension: {desc["extension"]}')
    print(f'dataType: {desc["dataType"]}')
    print(f'shapeType: {desc["shapeType"]}')

  10. Save and run the script.

    Results with additional properties printed

    The base name, file extension, data type, and geometry type of the input dataset are printed. It is easy to see in the Catalog pane that bike_routes.shp is a polyline shapefile, but now you can access those properties in a Python script.

    There are many more properties, but this provides a good start to understand each dataset. A complete list of properties can be found in the ArcPy documentation section of the ArcGIS Pro help pages.

    There are many different properties, and in the help pages these are organized in property groups. The Describe property group at the link above includes some general properties for all datasets, including baseName, extension, and datatype. Then there are property groups for specific types of datasets. For example, the FeatureClass property group includes the shapeType property used in the previous code example. Not all datasets will have this property since they may not contain any geometry. Some properties can be a bit difficult to locate when navigating the help pages. For example, the extent property is located in the Dataset property group. The property groups only pertain to the organization of the help pages and have no effect on the code.

Next, you will replace the reference to the dataset to use the script to explore other datasets.

Describe other datasets

Now that you've got the script describing the bike_routes.shp dataset, you'll modify it to describe other data.

  1. Modify the desc = arcpy.da.Describe("bike_routes.shp") line to replace bike_routes.shp with watersheds.shp.

    desc = arcpy.da.Describe("watersheds.shp")

  2. Save and run the script.

    Watershed shapefile results

  3. Modify the desc = arcpy.da.Describe("bike_routes.shp") line to replace watersheds.shp with bike_racks.csv.

    desc = arcpy.da.Describe("bike_racks.csv")

  4. Save and run the script.

    Key error on the CSV file

    The IDLE Shell shows an error message.

    KeyError: 'shapeType'

    This means there was an error using the dictionary because the dictionary key shapeType did not exist. When you consider what shapeType means, this makes sense. A text file does not have geometry, so it cannot be categorized as Point, Polyline, or Polygon. The property is missing from the dictionary since it is not meaningful for CSV files.

    The dictionary returned by da.Describe() only includes the keys and values that are meaningful for a given data type.

    You can prevent the error from occurring by performing a check.

  5. Click in the script after the print(f'dataType: {desc["dataType"]}') line and press Enter to add a new line.

    Add a new line.

  6. After the print(f'dataType: line, add the following line:

    if "shapeType" in desc:

    The line contains an if statement. If statement lines include a test, in this case, determining whether the string "shapeType" is in the dictionary desc. After the test, if statements are followed by a colon. The next line is the first line of a block of code that is run conditionally, if that test evaluates to True. The lines in the code block must be indented to tell Python that they belong together and should only run if the test is true.

  7. Click at the beginning of the last line and add four spaces to indent the line.

    Indent the last line.

    The final two lines of code are as follows:

    if "shapeType" in desc:
        print(f'shapeType: {desc["shapeType"]}')

  8. Save and run the script.

    CSV file results

    The results for the CSV file print with no error. The first three properties are printed, then the if statement tests whether the dictionary contains "shapeType", and since it does not, the last line does not run.

    How can you see which properties are available for a given dataset?

    You can see the properties by printing the entire dictionary.

    Next, you'll see how to print the properties.

Print all of the properties for a dataset

You've seen how to print specific keys and values for a dictionary, and how to test whether a specific key is in the dictionary. Now you'll see how to print the whole dictionary.

  1. Select the last five lines of the script.

    This includes the lines that print the first three properties, plus the if statement and the final print line.

  2. With the lines of code selected, click Format, then click Comment Out Region.

    Comment Out Region

    The selected lines now have double pound signs before them. A leading # converts a line to a comment for Python, so these lines will not run when the code runs.

    You can also do this by typing a pound sign at the start of each line, but the menu option in IDLE allows you to do this for multiple lines in one step.

    You can reverse this by selecting the lines and from the Format menu selecting Uncomment region.

  3. Add a new line of code at the bottom of the script and remove any indentation.
  4. Add the following lines:

    for k, v in desc.items():
        print(f"{k}: {v}")

    Loop to print all of the properties.

    The first of the two new lines starts a for loop. For loops take a set of inputs and a block of indented code and run the code block on each of the inputs. In this case, the for loop iterates over the key and value pairs returned by calling the items method on the desc dictionary. See the Python documentation for more information about looping over dictionaries.

    The second of the two lines is indented. This line is the only line of the code block of the for loop. Each time the loop is run, this line prints a formatted string containing the values of the k and v variables (the key and value), separated by a colon.

  5. Save and run the script.

    Result of running the list key value loop on the CSV file

    The code iterates through all the items in the dictionary and prints the keys and values.

    Printing the dictionary allows you to see the properties for a given data type.

    Next, you will look at some of the elements inside the geodatabase.

Describe geodatabase feature classes

Now that you've seen how to examine the properties of file-based shapefiles and CSV tables, you'll use Python to examine properties of items in a geodatabase. You'll use the set of print statements you used before, rather than listing all of the properties.

  1. Select the five lines of code that were previously commented out, click Format, and click Uncomment region.
  2. Comment out the final two lines of code that were used to iterate over the dictionary.
  3. Modify the mypath = "C:/Lessons/PythonDesc" line to add /DC.gdb.

    mypath = "C:/Lessons/PythonDesc/DC.gdb"

  4. Modify the desc = arcpy.da.Describe("bike_racks.csv") line to describe the boundary feature class.

    desc = arcpy.da.Describe("boundary")

    Elements inside a geodatabase do not have a file extension, so they are only distinguished based on their data type.

    Describe the boundary feature class.

  5. Save and run the script.

    Describe boundary script result.

    The second property is the file extension. While this property is a valid key in the dictionary, the value is empty. Since empty file extensions are common for geodatabase elements, you can add another if statement to only print the line for extension if one is present.

  6. Replace the print(f'extension: {desc["extension"]}') line with the following two lines:

    if desc["extension"] != "":
        print(f'extension: {desc["extension"]}')

    Conditional extension display lines

    The comparison operator != means “not equal to”. Valid comparison operators in Python are == (equal to), != (not equal to), < (less than), <= (less than or equal to), > (greater than) and >= (greater than or equal to). Be aware that =! and <> seem like they should work too, but they are not valid comparison operators and will produce a syntax error.

    The first of these two lines checks whether the value associated with the "extension" key is not equal to an empty string (two quotation marks with nothing between them). If this evaluates as True, the line to print the formatted string with the extension key name and the associated value for that key is run.

  7. Save and run the script.

    Conditional extension code results

    The code runs and checks whether the extension key does not have an empty value. Since it does have an empty value, the test evaluates to False, so the indented code block line does not run, and the code does not print that line. The code proceeds to the next line outside of the code block and proceeds, printing the dataType and the shapeType.

    You can explore the other elements inside the geodatabase in a similar manner by changing the name of the item in the da.Describe() function.

  8. Modify the desc = arcpy.da.Describe("boundary") line to replace boundary with car_share_locations.

    desc = arcpy.da.Describe("car_share_locations")

  9. Save and run the script.

    Describe car share locations table

    The car_share_locations item is a geodatabase table. It has no extension and no entry for shapeType, so those lines are not printed.

  10. Modify the desc = arcpy.da.Describe("car_share_locations") line to describe the Transportation item.

    desc = arcpy.da.Describe("Transportation")

  11. Save and run the script.

    Transportation describe result

    Transportation is a feature dataset within the geodatabase. A feature dataset contains data elements that share a common coordinate system. A feature dataset can be used as a workspace.

  12. Modify the mypath = "C:/Lessons/PythonDesc/DC.gdb" line to add /Transportation.

    mypath = "C:/Lessons/PythonDesc/DC.gdb/Transportation"

    You can now describe the feature classes and other elements that reside inside the feature dataset.

  13. Modify the desc = arcpy.da.Describe("Transportation") line to describe the Traffic item.

    desc = arcpy.da.Describe("Traffic")

    Code to describe the

  14. Save and run the script.

    Traffic network description

    Traffic is a network dataset. Network datasets are used to model transportation networks. They are created from source features, which can include simple features (lines and points) and turns, and they store the connectivity of the source features. When you perform a network analysis, it is always done on a network dataset.

    There are other data types that can be used in da.Describe(), but the items you've described cover some of the most typical kinds of GIS datasets. Describing each dataset one by one, however, can be cumbersome. It is helpful to be able to create a list of the various datasets available in a workspace without having to type out their individual names.

List files in a workspace

A common task in Python scripting is to work with multiple datasets. Typing the name of each individual datasets is cumbersome and time consuming. ArcPy includes several functions to create an inventory of datasets. These functions typically return the datasets as a Python list, which can be used for further processing.

  1. You will start with a new script.
  2. In the IDLE editor window, click File and click Save As.

    The current script file name is describe_data.py. You will use Save As to create a new script to continue your work, while keeping the work you've done so far in the describe_data.py file so you'll have it for future reference.

  3. On the Save As dialog box, type list_data.py and click Save.
  4. In the describe_data script window, select all but the first three lines, and press the Delete key.

    Delete all but the first three lines.

  5. Edit the mypath = line to remove /DC.gdb/Transportation from the path.

    mypath = "C:/Lessons/PythonDesc"

    This will leave the path to the base folder for the tutorial data.

  6. After the arcpy.env.workspace line, add the following two lines:

    files = arcpy.ListFiles()
    print(files)

    The first of these lines creates a new variable named files and sets it equal to the result of calling the ListFiles function. Functions are followed by a list of parameters, enclosed in parentheses. In this case, the ListFiles() function does not require any input parameters, so the parentheses are empty.

    The second line prints the list.

    List and print files.

    The code should be as follows:

    import arcpy
    mypath = "C:/Lessons/PythonDesc"
    arcpy.env.workspace = mypath
    files = arcpy.ListFiles()
    print(files)

  7. Save and run the script.

    List data ListFiles first results

    This is a list of the files in the PythonDesc folder. Python lists are enclosed in square brackets.

    The ListFiles function returns the files in the current workspace. That means if the workspace has not been set using arcpy.env.workspace, the result is an empty list and the script prints None.

    The function has no required parameters, and it automatically uses the current workspace. The files listed are very similar to those visible in File Explorer. There is one exception. A geodatabase is a folder in File Explorer but is included in the result of ListFiles(). As a result, DC.gdb is shown here as a file, even though it is technically a folder that contains files.

    You limit the search to return only specific file types.

  8. Edit the files = arcpy.ListFiles() line to add "*.csv" inside the parentheses.

    files = arcpy.ListFiles("*.csv")

    The ListFiles function can take an optional parameter, called a wildcard, that allows you to specify a string that the search results must contain. The asterisk represents zero or more unspecified characters, so this wildcard search will return any file name that includes the .csv extension.

  9. Save and run the script.

    The script prints the list of matching results. In this case, that's a single item list.

    ['bike_racks.csv']

    This approach works for any number of files, and is a good way to get all the files of the same file type as a list and to perform the same task on each file.

    You can also list other file types, such as .xlsx, .dbf, and so on, or you can match other parts of the name string. For example, using the wildcard string "*bike*" returns a list of all the file names that include bike: ['bike_racks.csv', 'bike_routes.dbf', 'bike_routes.prj', 'bike_routes.sbn', 'bike_routes.sbx', 'bike_routes.shp', 'bike_routes.shx']

  10. Modify the mypath = "C:/Lessons/PythonDesc" line to include DC.gdb

    mypath = "C:/Lessons/PythonDesc/DC.gdb"

  11. Remove the "*.csv" parameter from the ListFiles function.

    files = arcpy.ListFiles()

  12. Save and run the script.

    The list result appears.

    These are the same files you can see in the DC.gdb folder using File Explorer. The ListFiles() function is not a useful way to examine the contents of a geodatabase, because the datasets do not correspond to individual files. Fortunately, there is a function for listing feature classes.

List feature classes in a workspace

Feature classes are some of the most commonly used GIS dataset types. The ListFeatureClasses() function returns a list of feature classes in the current workspace.

  1. Edit the mypath = line to remove /DC.gdb from the path.

    mypath = "C:/Lessons/PythonDesc"

    This will leave the path to the base folder for the tutorial data.

  2. Edit the files = arcpy.ListFiles() line to use ListFeatureClasses.

    files = arcpy.ListFeatureClasses()

    The code should be as follows:

    import arcpy
    mypath = "C:/Lessons/PythonDesc"
    arcpy.env.workspace = mypath
    files = arcpy.ListFeatureClasses()
    print(files)

  3. Save and run the script.

    The script prints a list with the two feature classes.

    ['bike_routes.shp', 'watersheds.shp']

    The feature classes in this case are shapefiles. This can sometimes be a source of confusion. The term “feature class” is used to describe a homogeneous collection of features, each having the same spatial representation (for example, points, lines, or polygons) and a common set of attributes. The two most common types of feature classes in ArcGIS Pro are shapefiles and geodatabase feature classes.

    The ListFeatureClasses() function works for both shapefiles and geodatabase feature classes, but for a given workspace it only returns one of them. When the workspace is a folder, the function lists shapefiles. When the workspace is a geodatabase, the function lists geodatabase feature classes.

  4. Edit the mypath = line to add /DC.gdb to the path.

    mypath = "C:/Lessons/PythonDesc/DC.gdb"

    This is the path to the file geodatabase.

    List feature classes in the geodatabase,

  5. Save and run the script.

    The script prints a list of three feature classes.

    ['neighborhoods', 'boundary', 'public_schools']

    The list does not include the feature classes inside the Transportation feature dataset, because that is a different workspace.

    You can use a wildcard to filter the results from ListFeatureClasses(). For example, you can obtain all the feature classes that start with a certain letter. The function also allows you to filter by the feature type. The syntax of the ListFeatureClasses() function is as follows:

    ListFeatureClasses ({wild_card}, {feature_type}, {feature_dataset})

    The feature_type parameter of the function allows you to limit the result based on the feature class type.

  6. Edit the files = arcpy.ListFeatureClasses() line to use two parameters ("", "POINT").

    files = arcpy.ListFeatureClasses("", "POINT")

    The first parameter, wild_card, is not used, but because parameters have a prescribed order, this parameter needs to be skipped. The empty string "" acts as a placeholder to indicate that parameter is not used. You can also use the Python keyword None.

    You can also supply parameters referenced by name, in which case there is no need to stick to the original order. These two alternatives are also valid ways to write the line:

    files = arcpy.ListFeatureClasses(None, "POINT")

    files = arcpy.ListFeatureClasses(feature_type="POINT")

    You can also use a number sign as a string ("#") to skip a tool parameter. However, this does not work for nontool functions of ArcPy.

    Python is case-sensitive for the most part, but strings used in ArcPy functions are typically not case-sensitive. As a result, you could also use "Point" or "point" in the parameter.

    The third parameter that the function can take, feature_dataset, is not used and can be left out completely since it comes at the end of the parameter sequence.

  7. Save and run the script.

    The script prints the one point feature class in the workspace.

    ['public_schools']

List tables and datasets

You've listed files and feature classes. Next, you'll list tables and feature datasets.

  1. Edit the mypath = line to remove /DC.gdb from the path.

    mypath = "C:/Lessons/PythonDesc"

    This will leave the path to the base folder for the tutorial data.

  2. Edit the files = arcpy.ListFeatureClasses(feature_type="POINT") line to be ListTables().

    files = arcpy.ListTables()

    The code should be as follows:

    import arcpy
    mypath = "C:/Lessons/PythonDesc"
    arcpy.env.workspace = mypath
    files = arcpy.ListTables()
    print(files)

    List tables in the folder.

  3. Save and run the script.

    The script prints the list of the two tables in the workspace.

    ['gardens.dbf', 'bike_racks.csv']

  4. Edit the mypath = line to add /DC.gdb to the path.

    mypath = "C:/Lessons/PythonDesc/DC.gdb"

    This sets the path to the geodatabase.

    import arcpy
    mypath = "C:/Lessons/PythonDesc/DC.gdb"
    arcpy.env.workspace = mypath
    files = arcpy.ListTables()
    print(files)

  5. Save and run the script.

    The script prints the list with the single table in the geodatabase.

    ['car_share_locations']

  6. Edit the files = arcpy.ListTables() line to be ListDatasets().

    files = arcpy.ListDatasets()

    import arcpy
    mypath = "C:/Lessons/PythonDesc/DC.gdb"
    arcpy.env.workspace = mypath
    files = arcpy.ListDatasets()
    print(files)

  7. Save and run the script.

    The script prints the list with the single feature dataset in the geodatabase.

    ['Transportation']

  8. Edit the mypath = line to add /Transportation to the path.

    mypath = "C:/Lessons/PythonDesc/DC.gdb/Transportation"

    This sets the path to the Transportation feature dataset.

    import arcpy
    mypath = "C:/Lessons/PythonDesc/DC.gdb/Transportation"
    arcpy.env.workspace = mypath
    files = arcpy.ListDatasets()
    print(files)

  9. Save and run the script.

    The script prints the list with the single network dataset in the Transportation dataset.

    ['traffic']

    The ListDatasets() function works with a variety of data elements. including feature datasets, geometric networks, networks, parcel fabric, raster catalogs, topology, and several others.

Now that you have seen how to use list functions to create an inventory of data, you will use the resulting list to describe each element of the list.

Iterate over a list

Creating a list of data is typically the first step in a larger workflow. In most cases, you want to perform some sort of task on each dataset. You will use a Python for loop to iterate over a list.

You will start with a new script.

  1. Click File and click Save As.
  2. Name the new file iterate_data.py.
  3. Edit the mypath = line to remove /Transportation from the path.

    mypath = "C:/Lessons/PythonDesc/DC.gdb"

  4. Edit the files = arcpy.ListDatasets() line to use the ListFeatureClasses function.

    files = arcpy.ListFeatureClasses()

  5. Replace the print(files) line with for file in files: and press the Enter key.

    The colon at the end of the line indicates the start of a block of code. The for loop iterates over the elements in the list, and the indented code in the code block is run for every element. When you press Enter after a line of code ending in a colon, the next line of code is automatically indented.

    In each cycle, the temporary variable file gets assigned one of the names from the list of files in the files variable.

  6. On the indented new line, add the following:

    desc = arcpy.da.Describe(file)

    The da.Describe() function returns a dictionary with the properties of the feature classes. This line of code will run for every feature class in the list of feature classes.

  7. Press Enter and add the following:

    print(desc["baseName"])

    This line should be indented four spaces to match the previous line. It is part of the for loop code block and will run for each feature class in the list.

    This line retrieves and prints the value stored in the desc dictionary using the baseName key.

    The complete code is as follows:

    import arcpy
    mypath = "C:/Lessons/PythonDesc/DC.gdb"
    arcpy.env.workspace = mypath
    files = arcpy.ListFeatureClasses()
    for file in files:
        desc = arcpy.da.Describe(file)
        print(desc["baseName"])

    Print the descriptions.

  8. Save and run the script.

    The script prints the name of the three feature classes on three new lines.

    neighborhoods
    boundary
    public_schools

    You will modify the code to make the results more informative.

  9. Edit the print(desc["baseName"]) line to be name = desc["baseName"].

    Instead of printing the name, this line stores it in the name variable.

    The line should still be indented to match the previous line in the codeblock so it will run for every cycle of the loop.

  10. Add the following three lines:

        data = desc["dataType"]
        shape = desc["shapeType"]
        print(f"{name} is a {data} with {shape} geometry")

    This code stores the data type and the geometry values in the data and shape variables.

    The final line creates and prints a formatted string with the contents of the three variables substituted into it.

  11. Save and run the script.

    The script prints the three formatted strings:

    neighborhoods is a FeatureClass with Polygon geometry
    boundary is a FeatureClass with Polygon geometry
    public_schools is a FeatureClass with Point geometry

You've listed the feature classes in the geodatabase, iterated over the list, and printed information about each. Next, you'll add code to keep track of how many feature classes of each type there are.

Track the number of feature class types

You can check the properties of each feature class and use conditional logic to determine what happens. In this section, you will use if statements to count feature classes by geometry type.

  1. Insert a new line before the for file in files: line.
  2. Add the following three lines before the for loop:

    count_point = 0
    count_line = 0
    count_poly = 0

    These lines set three new variables to zero values. You will use these variables to keep track of how many of each feature class there are.

    They are set to zero before the for loop so that in the for loop you can add to them, depending on the type of the feature class. If you defined and set them to zero inside the for loop, each time the loop ran, you would lose the data from the previous cycle.

  3. Inside the for loop, delete the last four lines.

    Set the counters and cut the lines in the for loop.

  4. Inside the for loop, after desc = arcpy.da.Describe(file), add the following lines:

    if desc["shapeType"] == "Point":
        count_point += 1

    Counter increments for points

    The if line should be indented to the same level as the desc line.

    The if line checks to determine whether the value in the desc dictionary stored at the shapeType key is equal to the string Point.

    In Python, the double equals sign, ==, is used to check for equality, while the single equals sign is used to assign values to variables.

    The line is followed by a colon, and the next line is indented, because it is a new code block.

    If the statement is false, nothing happens. If the statement is true, the code block that follows the if statement is run.

    The code block adds one to the value stored in the count_point variable. When the loop runs, it checks each item from the list of feature classes to determine whether it is a Point feature class, and if it is, it adds one. When the loop finishes running, the counter variable will contain the count of the number of Point feature classes.

    You will add two more if statements to check for other geometry types.

  5. At the end of the last line of code, press Enter.

    The new line is aligned with the existing indentation in the current block of code. However, you want the next if statement to be aligned at the same level of indentation with the previous if statement.

  6. Press the Backspace key.

    This adjusts the indentation to align the new line with the previous if statement.

  7. Add the following lines of code:

    if desc["shapeType"] == "Polyline":
        count_line += 1
    if desc["shapeType"] == "Polygon":
        count_poly += 1

    The three if statements are aligned.

    You can adjust the indentation using the Backspace key to remove indentation or you can add four spaces to add indentation.

    Now, for every item in the files list, the item will be described, the dictionary of describe results will be stored in the desc variable, and the data stored at the shapeType key will be checked if it matched the strings Point, Polyline, or Polygon. When a match occurs, the counter variable for that shape has one added to it.

    Now you'll add some lines to print the information.

  8. At the end of the script, add a line of code and press Backspace to remove any indentation.
  9. Add the following lines of code:

    print(f"Count of Point feature classes: {count_point}")
    print(f"Count of Polyline feature classes: {count_line}")
    print(f"Count of Polygon feature classes: {count_poly}")

    Completed script to count geometries and print results

    Because the last three lines are not indented, they are not part of the for loop code block and they only run after the loop has completed and the number of each shape type has been recorded.

  10. Save and run the script.

    The script prints the following three lines:

    Count of Point feature classes: 1
    Count of Polyline feature classes: 0
    Count of Polygon feature classes: 2

    This shows one approach to use Python to process data. First, get a list of the data then use that information to do something with the data. This example simply counts feature classes, but a similar script could be used for other tasks, such as copying all the polygon feature classes or checking the attributes of all point feature classes.

You've printed the number of feature classes of the three geometries. Next, you'll see a more compact way to do the same thing.

Get counts using a feature type filter

Another way to get the counts of feature classes in a workspace is to use the ListFeatureClasses() function with a filter by feature type.

  1. Click File and click Save As.
  2. Name the new file filter_by_type.py.
  3. Select the lines between the first three and last three lines of the script, click Format, and click Comment Out Region.

    The middle section of the script is commented out.

  4. Add a new line after the commented out section, but before the print lines.
  5. Add the following lines of code:

    count_point = len(arcpy.ListFeatureClasses(feature_type="POINT"))
    count_line = len(arcpy.ListFeatureClasses(feature_type="POLYLINE"))
    count_poly = len(arcpy.ListFeatureClasses(feature_type="POLYGON"))

    Added lines to count using the feature_type filter

    These three lines set the count_point, count_line, and count_poly variables in a new way.

    Instead of setting them to zero and incrementing their values, each line here uses the ListFeatureClasses() function with the optional feature_type parameter. This function creates a list of feature classes that match the filter parameter. The len function determines the length of the resulting list, which is a count of the feature classes of that type.

  6. Save and run the script.

    The script prints the following three lines:

    Count of Point feature classes: 1
    Count of Polyline feature classes: 0
    Count of Polygon feature classes: 2

    This code solution , excluding the commented out lines, is shorter than the earlier approach that uses da.Describe() to determine the geometry type. However, da.Describe() allows you to examine properties other than the geometry type, such as the number of fields, the spatial extent, and the coordinate system.

  7. Close the script and the IDLE Shell window.

You've seen two approaches for getting the counts of feature classes of different types and printing that information to the IDLE Shell. Next, you'll see an example of how to use both these approaches in a script that lists and describes the contents of a workspace and writes the results to a text file.

Write inventory information to a text file

While printing to the interactive window provides immediate feedback, sometimes it can be more useful to write the results to a text file. A finished script for this is in the PythonDesc folder.

  1. In File Explorer, go to the PythonDesc folder where you extracted the data.
  2. Right-click write_inventory.py and click Edit with IDLE (ArcGIS Pro).

    You will review the script, adjust the path if necessary, and run it to see how it works. The script includes comments as well as empty lines to improve legibility, but these have no influence on the code operation.

    The script starts with comments that include the author, date, and purpose. It is a good practice to add notes such as these to scripts to remind of the purpose of the script and when it was created. If you share the script, this information will help others understand the script, and will let them know who to contact if they have questions.

    The first lines of code that run in the script import the arcpy and os modules.

    import arcpy
    import os

    The os module gives you access to operating system-level functions through Python.

    The next section sets some variables for paths and the workspace.

    #Variables for paths and workspace
    root = "C:/Lessons/PythonDesc"
    gdb = "DC.gdb"
    textfile = "inventory.txt"
    arcpy.env.workspace = os.path.join (root, gdb)
    output = os.path.join(root, textfile)

    The section starts with a comment identifying what this part of the code does. This is another useful practice for writing Python code.

  3. If you extracted the PythonDesc to a different location than C:/Lessons/PythonDesc, edit the path in the root = "C:/Lessons/PythonDesc" line to point to the location of the folder on your computer.

    Regardless of the root folder, the name of the file geodatabase and the text file name can remain the same, as well as the line of code that sets the arcpy.env.workspace. By using a variable for the root folder path, you only have to update it in one location in the script. The os.path.join() function is used to create complete paths from the root variable and the gdb and textfile variables.

    Next, a new empty text file is created using the open() function and specifying write mode using "w". If a file with this name already exists, it will be overwritten.

    #Create new textfile to write results                  
    f = open(output, "w")

    The next section of the script uses the ListFeatureClasses() function to get lists of the three geometry types.

    #Create list of feature classes by geometry type
    points = arcpy.ListFeatureClasses("", "POINT")
    lines = arcpy.ListFeatureClasses("", "POLYLINE")
    polygons = arcpy.ListFeatureClasses("", "POLYGON")

    The next section of the script writes context information to the text file using formatted strings. The \n at the end of each of these lines adds an escape character that encodes a new line so the information will be presented as three separate lines in the text file.

    f.write(f"Workspace of interest: {arcpy.env.workspace}\n")
    f.write(f"This workspace contains the following feature classes:\n")
    f.write(f"Count of Point feature classes: {len(points)}\n")

    The third of these lines uses len(points) to insert the length of the points list, the number of point feature classes in the workspace, in the string.

    The next line contains an if statement that checks whether there are items in the points feature class list. If there are no points, nothing happens until the next f.write line. However, if there are point feature classes, the code block after the line runs, and the script writes a new formatted string to the text file.

    if len(points) != 0:
        f.write(f"The names of the Point feature classes are:\n")
        for point in points:
            desc = arcpy.da.Describe(point)
            f.write(f'\t{desc["baseName"]}\n')
    f.write(f"Count of Polyline feature classes: {len(lines)}\n")

    Next, a for loop processes each of the point feature classes in the list.

    Inside the loop, the desc variable is set to contain the dictionary of the arcpy.da.Describe description of the feature class.

    The next line writes a formatted string with the name of the feature class, retrieved from the dictionary using the baseName key. The line is indented by one tab length by using the \t escape character, then a new line is added with the \n. Then the loop repeats for any other point feature classes. When there are no more, the f.write line runs and adds the text describing the number of polyline feature classes to the text file and starts a new line with a \n.

    The rest of this section reports on the polyline and polygon feature classes in the same way.

    
    if len(lines) != 0:
        f.write(f"The names of the Polyline feature classes are:\n")
        for line in lines:
            desc = arcpy.da.Describe(line)
            f.write(f'\t{desc["baseName"]}\n')
    f.write(f"Count of Polygon feature classes: {len(polygons)}\n")
    if len(polygons) != 0:
        f.write(f"The names of the Polygon feature classes are:\n")
        for polygon in polygons:
            desc = arcpy.da.Describe(polygon)
            f.write(f'\t{desc["baseName"]}\n')

    When the loop to process the polygon feature classes finishes, the next line closes the text file.

    f.close()

    The final section uses the os module's startfile function to open the report text file.

    #Open the resulting output file
    os.startfile(output)

  4. Save and run the script.

    The text file opens in Notepad.

    The text file with results opens in Notepad.

    The results are written to the text file using the file write() method. The new line character \n is used to ensure proper formatting with each result on a new line. The tab character \t is used to improve legibility by indenting the names of the feature classes. A backslash is used for these special characters.

    Instead of writing to a text file, you can print the results to the interactive window by replacing all the instances of f.write with print. If you wanted to run the script within ArcGIS Pro as a script tool, you could write the information to the geoprocessing tool messages by replacing f.write with arcpy.AddMessage.

Review

  • Python code can be used to describe the properties of GIS datasets, such as the data type, file extension, and geometry.
  • ArcPy includes several functions to create lists of datasets. Specific functions are available to create lists of files, datasets, tables, and feature classes.
  • By listing and describing datasets using Python code, you can create a detailed inventory of GIS datasets in a workspace. You can then decide to process each dataset differently based on its characteristics.
  • Python code can be used to write information to text files. This is useful for reporting and for logging errors.

You may also be interested in Python Scripting for ArcGIS Pro and Advanced Python Scripting for ArcGIS Pro by Dr. Paul A. Zandbergen, published by Esri Press.