Construct a data pipeline

To create the final dataset requested by the stakeholders, your data pipeline needs to transform the data by enabling location, removing unnecessary attributes, and calculating new fields.

You'll add the initial table containing information about the capital projects, filter the dataset, create its geometry using coordinates, reproject the data, and calculate a field.

Create a data pipeline

First, you'll sign into ArcGIS Online and create an empty data pipeline.

  1. Sign in to your ArcGIS organizational account.
    Note:

    If you don't have an organizational account, see options for software access.

  2. On the ribbon, click the app launcher button. Choose Data Pipelines.

    Data Pipelines button

    A browser tab opens to a gallery showing any existing data pipelines you own.

  3. Click Create data pipeline.

    Create data pipeline button

    The Data Pipelines Editor appears. This editing environment allows you to add data inputs, provides access to tools to transform data, and lets you write the processed data out to a feature layer.

    Data Pipelines Editor

    While you are actively working in the Data Pipelines Editor, you are connected to a compute resource as indicated by the Connection details dialogue.

    Connection details dialogue

    Note:

    Using ArcGIS Data Pipelines consumes credits. Credits are used while the editor page's status is Connected. To learn more about credit consumption and ArcGIS Data Pipelines, read Does Data Pipelines charge credits and Compute resources.

Add a CSV table as an input

Now that you have the Data Pipelines Editor open, you'll add your first input dataset. This is the table of capital projects from the DPR. Since this table is frequently updated, an exported table would quickly be out of date. Therefore, you'll access the .csv table directly from its source.

  1. Open the New York City OpenData (NYC OpenData) page for the Capital Project Tracker dataset.

    A browser tab opens to an overview of the Capital Project Tracker dataset. This provides valuable information, such as how often the dataset is updated, when it was last updated, and a description for each column in the .csv table.

    Capital Project Tracker overview page

  2. Click the Export button.

    Export button

    The Export dataset window appears.

  3. In the Export dataset window, click API endpoint.

    API endpoint button

    Note:

    The number of rows you see in this dataset and other datasets throughout this tutorial may vary slightly from the images provided due to changes in this dataset over time.

    The API endpoint allows access to the dataset through a URL.

    A warning appears. It indicates that, by default, you'll only be able to access 1,000 rows from this dataset that has 2,542 rows. Later, you'll increase this limit to 3,000 rows.

    Warning indicating that the default API limit has been exceeded

    While Data Pipelines accepts JSON data formats, you'll change the data format to .csv.

  4. For Data format, choose CSV.

    Data format option

    Next, you'll copy the URL for this dataset.

  5. Click Copy to clipboard.

    Copy to clipboard button

    Now, you'll you add this .csv table to your data pipeline as a Public URL input.

  6. On the Editor toolbar, click Inputs. In the Inputs pane, under File, choose Public URL.

    Public URL option

    Note:

    To see the names of the buttons in the Editor toolbar, click the Expand button at the bottom of the toolbar.

    The Add a URL window appears.

  7. Click the URL text box and press Ctrl + V to paste the URL you copied from the NYC OpenData website.

    URL parameter

    The Data format parameter is selected automatically.

    This URL has a limit of 1,000 rows from the source dataset. To overcome this limitation, you'll add a URL parameter to the existing URL to increase its limit to 3,000 rows. You're increasing the limit to 3,000 rows because the table has 2,542 rows. A limit of 3,000 rows is large enough to accommodate the current size of the dataset and allow it to grow in the future.

  8. Click once at the end of the URL you pasted. Type ?$limit=3000.

    Updated URL with a limit of 3,000 rows

    The first input is complete.

  9. Click Add.

    The Public URL element is added to the canvas.

    First input element

    The element's name is derived from the name of the .csv you accessed. You'll change its name next.

  10. On the Element action bar, click the Rename button.

    Rename button

  11. In the text box, clear any text and type Capital Project Tracker and press Enter. Expand the element so that the name is visible.

    Renamed element

    Since the element is selected in the canvas, the Public URL pane is open. Here, you can configure or reconfigure any selected element on the canvas.

    Public URL pane

    Now, you'll preview the dataset you added.

  12. In the Public URL pane, click Preview.

    Preview button

    The Preview window appears. It's currently showing the table preview. By previewing your data, you'll know what your data will look like when you run the data pipeline.

    Note:

    You can also preview your data by clicking the Preview button on an element's action bar.

    The top of the table indicates the number of records is 2,554. This may not match the number of rows on the OpenData website. If the numbers do not match, it is because some of the records contain a character that makes the record multiple lines. In the Public URL panel, you can account for this.

    Table preview

  13. In the Public URL pane, turn on Has multiline data.

    Has multiline data parameter

  14. Click Preview.

    The preview refreshes. The number of records is now 2,542.

    Number of preview records

  15. Scroll through the table to observe the data provided by NYC OpenData.
  16. Click the Map preview button.

    The Map preview button

    Since this is only a table and no geometry has been defined, a map preview is not available. You'll allow for a map preview in a later section.

  17. Click the Schema button.

    Schema button

    This lists all the fields in the dataset and their field types. Throughout the rest of the tutorial you'll use a number of these fields to transform your data, including currentphase, designstart, latitude, and longitude.

    Dataset's schema

  18. Click the Messages button.

    Messages button

    If there were any warnings or errors in your preview dataset, they would be listed here.

  19. Close the preview window.

Filter data by attribute

Now that you've added the .csv table to the data pipeline, you'll use a tool element to filter the dataset to only show the capital projects whose current phase is construction and have incorrect latitude and longitude values.

  1. On the Editor toolbar, click Tools.

    Tools button

    The Tools pane appears. The tools listed, by category, let you manipulate the datasets in your data pipeline. You'll add the Filter by attribute tool to remove any row whose current phase is not construction. You'll also filter out any rows that have a latitude or longitude value of 0.

  2. In the Tools panel, under Clean, click Filter by attribute.

    Filter by attribute tool

    An element is added to the canvas. It needs to be connected to an existing element that contains data. Then, it needs to be configured.

  3. Move the Filter by attribute element to the right of the Public URL element.

    Filter by attribute element

  4. In the Filter by attribute pane, under Input dataset, click Dataset. In the Select dataset window, choose Capital Project Tracker.

    Capital Project Tracker option

    The two elements are connected. Data will flow from the .csv file into the Filter by attribute tool when the data pipeline runs.

    Connected Public URL and Filter by attribute elements

    Note:

    You can also connect elements in a data pipeline by dragging the pointer from the output port of one element to the input port of another element.

    Next, you'll configure the filter to exclude any records that have a value of 0 for latitude or longitude and only show those rows that have a current phase value of construction.

  5. In the Filter by attribute pane, click Build new query.

    Build new query button

    The Query builder window appears.

  6. Ensure that Expression is selected and click Next.

    Expression option

  7. For the first expression, set the field to latitude and set the operator to does not equal. For the value, type 0.

    First expression

  8. Click the Expression button.

    Expression button

  9. Write a second expression where longitude does not equal 0.

    Second expression

  10. Add another expression and have it query the rows where currentphase equals construction.
    Note:

    For the value, use can use the drop-down list to select a value rather than type it.

    Third expression

  11. Click Add.

    Next, you'll preview the results.

  12. In the Filter by attribute pane, click Preview.

    The preview window appears. Below the title of this table is a count of the number of records. Previously, it was over 2,500 records. Now, because of the filters you applied, it's less than 200.

  13. Scroll through the table and observe the values for the latitude, longitude, and currentphase fields.

    These values meet the criteria of your query.

  14. Close the preview window.

    When the tool element was added to the canvas, it was given the default name of Filter by attribute. You'll change its name to make it more meaningful.

  15. On the Element action bar, click Rename and type Filter for Construction Phase. Resize the element so that the name is visible.

    Renamed filter element

    Before adding additional elements, you'll save your data pipeline.

  16. On the Editor toolbar, click Save and open and choose Save as.

    Save as option

    The Save data pipeline window appears.

  17. For Title, type Capital Projects Data Pipeline.

    Data pipeline title

  18. Click Save.

    The data pipeline is saved.

Create point geometry

Next, you'll use the latitude and longitude columns in the filtered dataset to provide this dataset with a geometry that is viewable on a map.

  1. On the Editor toolbar, click Tools. In the Tools pane, under Construct, click Create geometry.

    The Create geometry element is added to the canvas.

  2. Move the Create geometry element to the right of the Filter by attribute element.

    Create Geometry element

  3. Click and drag from the Filter by attribute element's output port to the Create geometry element's input port.

    Connected Filter by attribute and Create geometry elements

    The two elements are connected. Next, you'll configure the Create geometry element. Since this table contains latitude and longitude values, you can create a point geometry.

  4. In the Create geometry pane, for Geometry type, choose Point. For Geometry format, choose XYZ.

    Geometry type and Geometry format parameters

    Additional parameters appear. These parameters are used to determine which fields in your table contain X, Y, and Z values. Your dataset does not have Z values; this parameter will not be used.

  5. For X field, choose longitude. For Y field, choose latitude.

    X field and Y field parameters

  6. Click Preview.
  7. On the preview window, click the Map preview button.

    The locations of the capital project are visible on the map. You can click on features to see their attributes in a pop-up window.

    Capital project locations on a map

  8. Close the preview window.

Project point data

Your points were created with latitude and longitude values using the WGS 1984 geographic coordinate system. This is not an ideal coordinate system for New York City. You'll project your data to a more appropriate coordinate system.

Note:

If you're not familiar with coordinate systems, read Coordinate Systems: What's the Difference.

  1. On the Editor toolbar, click Tools. In the Tools pane, under Format, click Project geometry.

    The Project geometry element is added to the canvas.

  2. Move the Project geometry element to the right of the Create geometry element.
  3. Click and drag from the Create geometry element's output port to the Project geometry element's input port.

    Connected Create geometry and Project geometry elements

    The two elements are connected. Next, you'll configure the Project geometry element.

  4. In the Project geometry pane, for Spatial reference, click Browse coordinate systems.

    Spatial reference parameter

    For a projected coordinate system, you'll use NAD 1983 (2011) StatePlane New York Long Isl FIPS 3104 (Meters). Its ID number is 6538.

  5. In the Browse coordinate systems window, in the search box, type 6538. Choose NAD 1983 (2011) StatePlane New York Long Isl FIPS 3104 (Meters).

    Browse coordinate systems window

  6. Click Done.

Calculate a new field

As a final step for preparing your initial input dataset, you'll calculate a new field. The dataset contains the designstart field. This records when each project initially began. You'll calculate an additional field that determines the amount of time since each project began in years and days.

  1. On the Editor toolbar, click Tools. In the Tools pane, under Construct, click Calculate field.

    The Calculate field element is added to the canvas.

  2. Move the Calculate field element to the right of the Project geometry element.
  3. Click and drag from the Project geometry element's output port to the Calculate field element's input port.

    Connected Project geometry and Calculate field elements

    The two elements are connected. Next, you'll configure the Calculate field element. You'll start by providing the new field with a name.

  4. In the Calculate field pane, for New field name, type Elapsed_Time.

    New field name parameter

    Note:

    Field names cannot contain special characters, such as spaces.

    Next, you'll choose the type of field this should be. Since this field will contain text and numeric characters, it must be a string field.

  5. For New field type, choose String.

    New field type parameter

    Next, you'll write an expression to calculate the field. This tool uses ArcGIS Arcade expressions to calculate fields.

    Note:

    To learn more about ArcGIS Arcade, read Learn ArcGIS Arcade in Four Easy Steps.

  6. Under Arcade expression, click Author Arcade expression.

    Author Arcade expression button

    The Arcade expression window appears. Here, you can write Arcade expressions to calculate field values. You'll copy and paste code that returns the number of years and days since a capital project's design began.

    Arcade expression window

  7. In the Arcade expression window, clear the sample code.
  8. Copy and paste the following code into the Arcade expression window:

    //Convert time between 2 fields to count years and days 
    
    //Determine the total number of days
    var TotalDays = DateDiff(now(), $record.designstart, "days")
    
    //Determine the number of days
    var RemainderDays = Floor(TotalDays % 365)
    
    //Determine the number of years
    var RemainderYears = Floor(DateDiff(now(), $record.designstart, "years"))
    
    //Format the final text to account for year(s) and day(s)
    if(RemainderYears == 1 && RemainderDays == 1){
      return RemainderYears + " year and " + RemainderDays + " day"
    }
    else if (RemainderYears == 1 && RemainderDays != 1){
      return RemainderYears + " year and " + RemainderDays + " days"
    }
    else if (RemainderYears != 1 && RemainderDays == 1){
      return RemainderYears + " years and " + RemainderDays + " day"
    }
    else{
      return RemainderYears + " years and " + RemainderDays + " days"
    }

    The Arcade expression

    The desired format of this calculation is X years and Y days. To do this, the expression first determines the number of days since a capital project design started. Since the number of days may be more than one year, the code divides the number of days by 365 and returns the remainder value. This returns the Y value in the desired format. Then, the expression calculates the number of years since the project began. This is the X value in the desired format. The last part of the expression, starting on line 12, formats the years and days text to make them singular or plural based on the number of days or years since the design started.

  9. Click Save.

    The Elapsed_Time field is added to the table and calculated.

  10. In the Calculate field pane, click Preview.
  11. In the preview window, scroll to the Elapsed_Time field.

    The Elapsed_Time field

    For each project, the number of years and days since the project began is recorded in an understandable format.

  12. Close the preview window.

    You'll rename this element to clarify the field that it calculates.

  13. Rename the Calculate field element to Calculate Elapsed Time.
  14. Expand the element so that its full name is visible.

    Updated Calculate field element

    Finally, you'll save your data pipeline.

  15. On the Editor toolbar, click Save and open and choose Save.

So far, you've added a .csv table and begun to transform the capital projects data. You also filtered the data, gave it point geometry using coordinates, reprojected it to an appropriate coordinate system, and calculated a field to provide the elapsed time since a project was designed.


Perform spatial joins

At this point, the capital project data has been added and partially formatted, but it still needs attribution from other datasets. For each capital project, you need to determine which neighborhood tabulation area and community district they fall within. Both the neighborhood tabulation areas and community districts exist as publicly available polygon datasets. You'll add these two datasets to your data pipeline and use spatial joins to append the neighborhood and district names to each capital project.

Add a GeoJSON as an input

First, you'll add the neighborhood tabulation areas dataset to your data pipeline. They're available on the NYC OpenData website in a GeoJSON format.

  1. Open the NYC OpenData page for the 2020 Neighborhood Tabulation Areas (NTAs) - Tabular dataset.

    A browser tab opens to an overview of the 2020 Neighborhood Tabulation Areas (NTAs) - Tabular dataset. Like the Capital Project Tracker dataset, this page provides an overview of the dataset and how frequently it's updated.

    Neighborhood Tabulation Areas overview page

  2. Click the Export button.

    Export button

    The Export dataset window appears. Since this dataset contains fewer than 1,000 rows, you won't need to change to URL like you did with the Capital Project Tracker dataset.

  3. In the Export dataset window, click API endpoint.

    API endpoint button

    Note:

    The number of rows you see in this dataset may vary from the image above due to changes in this dataset over time.

    The default listed format is JSON; however, to add this dataset to the data pipeline and create a polygon geometry, it makes more sense to use the GeoJSON format.

  4. For Data format, choose GeoJSON.

    GeoJSON option

    Next, you'll copy the URL for this dataset.

  5. Click Copy to clipboard.

    You'll add this GeoJSON to your data pipeline as a Public URL input.

  6. In the Data Pipelines Editor, on the Editor toolbar, click Inputs. In the Inputs pane, under File, choose Public URL.

    Public URL option

    The Add a URL window appears.

  7. For URL, paste the URL you copied from the NYC OpenData website.

    URL parameter

    The Data format parameter is selected automatically.

  8. Click Add.

    The Public URL element is added to the canvas.

    Second input element

    Again, the name is unintuitive. You'll rename this element.

  9. Rename the Public URL element to Neighborhood Tabulation Areas.
  10. Resize the Public URL element so that the full name is visible.
  11. Move the element under the Project geometry element.

    Neighborhood Tabulation Areas dataset

    Next, you'll preview the dataset you added.

  12. In the Public URL pane, click Preview.

    Preview window

    In the preview window, observe the fields provided with this dataset. The ntaname field is the attribute that you'll be adding to the capital projects point layer using a spatial join.

  13. Click the Map preview button.

    A map appears with the neighborhoods drawn as polygons.

    Map preview

  14. Close the preview window.

Project polygon data

Like the capital project points, the neighborhoods GeoJSON uses the WGS 1984 geographic coordinate system. Therefore, you'll add another Project geometry tool to project the neighborhoods to the same state plane zone that you used for the capital project locations. To save time, you'll copy the existing Project geometry element.

  1. On the canvas, select the Project geometry element.
  2. Press Ctrl + C to copy the element.
  3. Press Ctrl + V to paste the element on to the canvas.
  4. Move the Project geometry element to the right of the Public URL element for Neighborhood Tabulation Areas.

    Project geometry element

  5. Click and drag from the Public URL element's output port to the Project geometry element's input port.

    Connected Public URL and Project geometry elements

    The two elements are connected. Because you copied this element, the coordinate system is already selected. Now, the neighborhoods dataset uses the proper coordinate system.

Spatially join the capital projects and the neighborhoods

Now that both of your datasets are using the same coordinate system, you'll add a spatial join to your data pipeline. This spatial join will determine which neighborhood each capital project point falls within and add the neighborhood attributes to the capital project point.

  1. On the Editor toolbar, click Tools. In the Tools pane, under Integrate, click Join.

    The Join element is added to the canvas.

  2. Move the Join element to the right of the Calculate Field and Project geometry elements.

    Join element

    Next, you'll connect the Calculate field and the Project geometry elements to the Join element. The Join element has two input ports. The upper input port is for the target dataset. The target dataset is the dataset that will have additional attributes added to it. The lower input port is for the join dataset. This is the dataset that will share its attributes to the target dataset. In this case, you want the capital projects to receive the attributes from the neighborhood dataset. Therefore, the Calculate field element is the target dataset and it will be connected to the upper input port.

  3. Click and drag from the Calculate field element's output port to the Join element's upper input port. Click and drag from the Project geometry element's output port to the Join element's lower input port.

    Connected Calculate field, Project geometry, and Join elements

    The two input elements are connected to the Join element. In the Join panel, the Target dataset and Join dataset are filled out based on the elements you linked.

    Target dataset and Join dataset parameters

    Next, you'll set the Join element to use a spatial relationship.

  4. In the Join pane, under Spatial relationship, turn on Use spatial relationship.

    Use spatial relationship parameter

    Additional parameters appear. The Target geometry and Join geometry parameters are automatically completed. But you still need to choose a Spatial relationship. This defines how the target and join datasets are joined. Since the capital project points fall inside of the neighborhood polygons, you'll use the Intersects relationship.

  5. For Spatial relationship, choose Intersects.

    Spatial relationship parameter

  6. Click Preview.

    The preview window appears. This dataset continues to represent the capital project points. In the table preview, the fields that initially appear are from the capital project points.

  7. In the table preview, scroll to the end of the table and find the ntaname field.

    The ntaname field

    The fields at the far end of the table are the fields from the neighborhoods. Now, for each capital project, you know the neighborhood that it falls within.

    As you perform more joins, the number of fields becomes cumbersome, especially since many of these fields have not been requested by your stakeholders. Later, you'll remove the unnecessary attribute fields.

  8. Close the preview window.

    In the next section, you'll add a second spatial join. To prevent confusion, you'll rename the first Join element.

  9. Rename the Join element to Neighborhood Join.

    Renamed Join element

Add a feature layer as an input

Some stakeholders requested that the final output also contain information about the community district that each capital project falls within. To accomplish this, you'll use another Join element, but first you need the polygon dataset containing the community districts. Instead of using a Public URL input element, you'll add a Feature layer input element because this dataset lives in ArcGIS Online.

  1. On the Editor toolbar, click Inputs. In the Inputs pane, under ArcGIS, choose Feature layer.

    Feature layer option

    The Select a feature layer window appears. You can add datasets from various locations, like ArcGIS Living Atlas or content that you own in ArcGIS Online.

  2. Click My content and choose ArcGIS Online.

    ArcGIS Online option

    Next, you'll search for a publicly available dataset from New York City containing the community districts.

  3. In the search box, type New York City Community District.
  4. Scroll down and find the Community District layer owned by Data Owner.

    Community District feature layer

    Since feature layers may contain multiple sublayers, you'll choose the CommunityDistrict sublayer to add to the data pipeline.

  5. For Community District, click Select layer. Choose CommunityDistrict.

    CommunityDistrict sublayer

  6. Click Add.

    A Feature layer element is added to the canvas.

  7. Move the Feature layer element under the Project geometry element for the neighborhoods dataset.

    CommunityDistrict Feature layer element

    Next, you'll preview the dataset you added.

  8. In the Feature layer pane, click Preview.
  9. Scroll through the preview table. Find the COMMDIST field.

    COMMDIST field

    The COMMDIST field is the attribute that you'll be adding to the capital projects point layer using a spatial join.

  10. Click the Map preview button.

    A map appears with the community districts drawn as polygons.

    Map preview

  11. Close the preview window.

Project a feature layer

The feature you added uses the Web Mercator (auxiliary sphere) projected coordinate system. To ensure data accuracy, you'll project this feature layer so that it uses the same coordinate system as the reprojected capital project points.

  1. On the canvas, select one of the Project geometry elements.
  2. Press Ctrl + C to copy the element.
  3. Press Ctrl + V to paste the element on to the canvas.
  4. Move the Project geometry element to the right of the Feature layer element.
  5. Click and drag from the Feature layer element's output port to the Project geometry element's input port.

    Connected Feature layer and Project geometry elements

    The two elements are connected. Now, the community districts dataset uses the proper coordinate system.

Spatially join the capital projects and the community districts

Now that the community districts dataset has been projected, you'll perform a second spatial join to determine which community district each capital project falls within.

  1. On the Editor toolbar, click Tools. In the Tools pane, under Integrate, click Join.

    The Join element is added to the canvas.

  2. Move the Join element to the right of the first Join and Project geometry elements.

    Join element

    Next, you'll connect the first Join and the Project geometry elements to the second Join element. The first Join element will be the target dataset parameter and the Project geometry element will be the join dataset parameter.

  3. Click and drag from the first Join element's output port to the second Join element's upper input port. Click and drag from the Project geometry element's output port to the second Join element's lower input port.

    Connected Project geometry and Join elements

    Next, you'll set the second Join element to use a spatial relationship.

  4. In the Join pane, under Spatial relationship, turn on Use spatial relationship.
  5. For Spatial relationship, choose Intersects.

    Spatial relationship parameter

  6. Click Preview.

    The preview window appears.

  7. In the table preview, scroll to the end of the table.

    Table preview with the COMMDIST field

    The first fields you see are from the capital projects dataset. Next, you see the fields from the neighborhood dataset. Finally, at the end of the table, you see the fields from the community districts dataset. Now, each capital project has information about the community district that it falls within.

  8. Close the preview window.

    Since there is another Join element on the canvas, you'll rename this second Join element for clarity.

  9. Rename the second Join element to Community District Join. Resize the element so that the full name of the element is visible.

    Renamed Join element

  10. Save your data pipeline.

In this module, you added two public polygon layers with attribution that you wanted to add to the capital projects dataset. One dataset is a GeoJSON from the NYC OpenData site, and the other is a feature layer from ArcGIS Online. Then, you projected both datasets and spatially joined them to the capital projects dataset.


Clean the data

After adding data and spatially joining it, you have all the attributes that were requested by the stakeholders in various departments. However, there are many other fields that are unnecessary and make the attribute table difficult to navigate. Additionally, some of the requested fields have names that are difficult to interpret.

Next, you'll clean up the attributes before the results are written to an output dataset.

Select fields

First, you'll select only the fields of interest to your stakeholders. This includes several fields from the capital projects dataset, the Time_Elapsed field you calculated, the ntaname field, and the COMMDIST field.

  1. On the Editor toolbar, click Tools. In the Tools pane, under Clean, click Select fields.

    The Select fields element is added to the canvas.

  2. Move the Select fields element to the right of the second Join element.

    Select fields element

    Next, you'll connect the Join and Select field elements.

  3. Click and drag from the Join element's output port to the Select fields element's input port.

    Connected Join and Select field elements

    Next, you'll choose which fields you want the output dataset to contain.

  4. In the Select fields pane, under Fields, click Field.

    Fields parameter

    The Select fields window appears. You'll choose the fields that are of interest to your stakeholders. You'll also choose the GEOMETRY field. This field is necessary for you to be able to display your output dataset as points. Otherwise, the output would be a nonspatial feature layer or hosted table.

  5. In the Select fields window, select the following fields:
    • fmsid
    • currentphase
    • GEOMETRY
    • Elapsed_Time
    • ntaname
    • COMMDIST

    Select fields window

  6. Click Done.
  7. In the Select fields pane, click Preview.

    The preview window appears.

    Table preview with only the selected fields

    Instead of having an overwhelming number of fields, your output dataset will only contain these six fields that were requested by the stakeholders.

  8. Click the Map preview button.

    Map of the capital project points

    Since you included the GEOMETRY field in the selected fields, a map of the capital project points is visible.

  9. Close the preview window.

Update fields

Now that you have the fields of interest for your stakeholders, you’ll change some of their names to make them more readable.

  1. On the Editor toolbar, click Tools. In the Tools pane, under Clean, click Update fields.

    The Update fields element is added to the canvas.

  2. Move the Update fields element to the right of the Select fields element.

    Select fields element

    Next, you'll connect the Select field and Update field elements.

  3. Click and drag from the Select field element's output port to the Update fields element's input port.

    Connected Select field and Update fields elements

    Next, you'll choose which fields you want to update and configure them. When updating fields, you can update their name and field type. You'll update three of the fields' names.

    The first field you'll update is fmsid. This field originated from the Capital Project Tracker dataset and contains a project identification number.

  4. In the Update fields pane, under Updates, for Field to update, choose fmsid.

    Field to update parameter

    Next, you'll provide an updated name for this field.

  5. For New field name, type Project_ID.

    New field name parameter

    Note:

    As with the Calculate field tool, field names cannot contain special characters, such as spaces.

    The first field has been updated. You'll update two more fields, the ntaname and COMMDIST fields.

    Note:

    If you wanted to change a field's type, like string to integer, you could do that using the New field type parameter.

  6. Click Add.

    Add button

  7. For Field to update, choose ntaname. For New field name, type Neighborhood.

    Field to update and New field name parameters for the neighborhood attribute

  8. Click Add.
  9. For Field to update, choose COMMDIST. For New field name, type Community.

    Field to update and New field name parameters for the community district attribute

  10. Click Preview.

    In the table preview, the column headings have been updated. Your table's field headings are more intuitive for your stakeholders.

    Table with updated field names

  11. Close the preview window.

Create an output feature layer

Thus far, your data pipeline ingests and transforms your data. As a final step, this data will be loaded into a feature layer.

  1. On the Editor toolbar, click Outputs. In the Outputs pane, under ArcGIS, click Feature layer.

    The Feature layer element is added to the canvas.

  2. Move the Feature layer element to the right of the Update fields element.

    Feature layer element

    Next, you'll connect the Update fields and Feature layer elements.

  3. Click and drag from the Update fields element's output port to the Feature layer element's input port.

    Connected Update fields and Feature layer elements

    Next, you'll configure the output settings for the feature layer that will be created. With ArcGIS Data Pipelines, it's also possible to have the output replace an existing feature layer or add and update features in an existing feature layer.

  4. In the Feature layer pane, under Output settings, ensure that Output method is set to Create.

    Output method parameter

    Next, you'll provide the feature layer with a name.

  5. For Output name, type DPR Capital Projects.

    Output name parameter

  6. Click Preview.

    What you see in the preview window is what will be written to your feature layer when you run the data pipeline.

  7. Close the preview window.
  8. Resize the Feature layer element so that its full name is visible.

    Feature layer element

    Your data pipeline is complete.

    Data pipeline

    If your data pipeline's elements are disorganized, the Auto layout diagram button repositions elements to better see the flow of inputs, tools, and outputs.

  9. On the Canvas action bar, click Auto layout diagram.

    Auto layout diagram button

    The elements on the canvas are repositioned.

    Updated data pipeline layout

  10. Save your data pipeline.

In this module, you cleaned up the data created from the previous spatial joins. You removed unnecessary fields and renamed fields whose names were unintuitive. Finally, you set the data pipeline to write the output dataset to a feature layer in your ArcGIS Online organization.


Review the results

Next, you'll run the data pipeline that you created and explore the results. Then, you'll set the data pipeline to run automatically on a schedule to keep the information in ArcGIS Online current.

Run the data pipeline

Now that your data pipeline is complete, you'll run it to create a feature layer.

  1. On the Canvas action bar, click Run.

    Run button

    The Latest run details window appears and opens to the Run details tab. This window provides you with information as the data pipeline runs. It also displays any warnings or errors that occur during processing.

    Latest run details window

    After the data pipeline completes, you'll explore your results. Processing takes about a minute.

  2. In the Latest run details window, click the Output results tab.

    Output results tab

    This tab lists any outputs created by the data pipeline. The DPR Capital Projects feature layer is listed.

    Output results tab with the DPR Capital Projects feature layer listed

    Next, you'll review your feature layer's item details and share the feature layer with your organization.

  3. For the DPR Capital Projects layer, click Options and choose View details.

    View details option

    A browser tab opens to the DPR Capital Projects item details page.

    Item details page

    This page provides information about the feature layer created by the data pipeline. Next, you'll share the results with your organization.

  4. Click the Share button.

    Share button

    The Share window appears.

  5. In the Share window, for Set sharing level, choose Organization.

    Organization option

  6. Click Save.

    The DPR Capital Projects layer is now shared with your organization for others to access. When the data pipeline runs, it'll update this feature layer for anyone who adds it to the maps or apps.

    Note:

    When you create a data pipeline, it is stored as an item in your ArcGIS Online account. This item does not need to be shared with your organization for users to access a data pipeline's output feature layer.

    Next, you'll view the result on a map.

  7. Click Open in Map Viewer.

    Open in Map Viewer button

    A map opens and the DPR Capital Projects feature layer is added.

    Map containing the DPR Capital Projects feature layer

  8. Click one of the points.

    A pop-up appears with the attributes that you specified in the data pipeline.

    Pop-up containing attributes

    This dataset is available to be symbolized, analyzed, and configured further for your stakeholders's web maps and apps.

Update the data pipeline

Your data pipeline ran successfully and you now have a feature layer representing DPR Capital Projects. However, the source data updates regularly and your stakeholders want the latest information reflected in their web maps and apps. You'll update the Feature layer output element to replace the DPR Capital Projects feature layer every time the data pipeline runs in the future.

Note:

If your organization's Data Pipelines only need to be run once, updating the Feature layer element is not necessary.

  1. In the Data Pipeline Editor, close the Latest run details window.
  2. On the canvas, click the Feature layer element representing your output feature layer.
  3. In the Feature layer pane, under Output settings, change Output method to Replace.

    Output method parameter

    The Feature layer parameter appears. This parameter tells the data pipeline which feature layer in your organization to replace when the data pipeline runs in the future.

  4. For Feature layer, click Select layer.

    Feature layer parameter

    The Select a feature layer window appears. You'll select the feature layer you want to replace.

    Caution:

    Be careful when selecting a feature layer to replace. If you choose the incorrect feature layer, data could be irreversibly lost.

  5. Find the DPR Capital Projects feature layer. Click Select layer and choose DPR Capital Projects.

    DPR Capital Projects feature layer

  6. Click Confirm.

    Now, when the data pipeline runs again in the future, it will overwrite the existing feature layer and avoid errors.

  7. Save the data pipeline.

Schedule the data pipeline

Since the input datasets are subject to change, you'll schedule the data pipeline to run automatically in the future.

  1. Click ArcGIS Data Pipelines.

    ArcGIS Data Pipelines button

  2. Click Manage scheduling.

    Manage scheduling button

    Now, you'll create a task. A task lets you control how frequently your data pipeline runs.

  3. Click Create task.

    Create task button

    The Create task window appears. Here, you'll choose the data pipeline that you created.

  4. Select Capital Projects Data Pipeline.

    Create task window.

  5. Click Next.

    You'll schedule your data pipeline to run automatically, allowing you to saturate the target feature layer with the latest information. Scheduling a data pipeline to run consumes credits, like working in the editor. Since this is a tutorial, you'll only run this data pipeline once to conserve credits. However, in a production environment, you might set this to run monthly, daily, or more frequently based on how often your input datasets update.

    First, you'll give this task a title.

  6. For Title, type DPR Capital Projects Update.

    Title parameter

    You'll have this data pipeline run every 15 minutes.

  7. For Repeat type, choose Minute. For Repeat interval, leave the default value of 15 minutes.

    Repeat type and Repeat interval parameters

    Next, you'll ensure that the data pipeline only runs once.

  8. For End, choose After number of runs. For Number of runs, type 1.

    End and Number of runs parameters

    Note:

    To learn more about scheduling tasks, read Schedule a data pipeline task.

  9. Click Save.

    The task is visible and informs you when it will run next.

    Scheduled task

    Note:

    If you want to edit, pause, or delete a task, click the Options button at the far end of the table. Additionally, you can click the link to view or edit your data pipeline.

    After the task runs, you can see the task run history.

  10. Click the DPR Capital Projects Update task.

    DPR Capital Projects Update task

    The Task runs pane shows the task and the status of completed runs. A green checkmark indicates that the run succeeded. A red hexagon indicates that the run failed.

    Previous task status

    Under Output results, an overview of the results from the data pipeline are shown.

    Output results and Run details

In this tutorial, you built a data pipeline to integrate data from various dynamic sources, added additional attributes, removed extraneous attributes, renamed fields, and wrote the results to a feature layer. You also set the data pipeline to automatically run on a schedule. By configuring a data pipeline, you can skip the tedious process of manually manipulating data and updating feature layers every time there's an update to the source data.

You can find more tutorials in the tutorial gallery.