Skip to main content
Version: 0.18.3

Manage Data Assets

A Data Asset is a collection of records that you create when you connect to your Data Source. When you connect to your Data Source, you define a minimum of one Data Asset. You use these Data Assets to create the Batch Requests that select the data that is provided to your Expectations.

To learn more about Data Assets, see Data Asset.

Prerequisites

  • You have a GX Cloud Beta account.

  • You have set up GX Cloud and the GX Agent is running.

  • You have a Snowflake account with USAGE privileges on the table, database, and schema you are validating, and you know your password. To improve data security, GX recommends using a Snowflake service account to connect to GX Cloud.

Create a Data Asset

Create a Data Asset to define the data you want GX Cloud to access. Currently, the GX Cloud user interface is configured for Snowflake. To connect to Data Assets on another Data Source, see Connect to a Data Source in the GX OSS documentation.

  1. In GX Cloud, click Data Assets > New Asset.

  2. Click the New Data Source tab and then select Snowflake.

  3. Enter a meaningful name for the Data Asset in the Data Source name field.

  4. Optional. To use a connection string to connect to a Data Source, click the Use connection string selector, enter a connection string, and then move to step 6.

  5. Complete the following fields:

    • Username: Enter the username you use to access Snowflake.

    • Account identifier: Enter your Snowflake account or locator information. The locator value must include the geographical region. For example, us-east-1. To locate these values see Account Identifiers.

    • Password: Enter your Snowflake password.

    • Database: Enter the name of the Snowflake database where the data you want to validate is stored. In Snowsight, click Data > Databases. In the Snowflake Classic Console, click Databases.

    • Schema: Enter the name of the Snowflake schema (table) where the data you want to validate is stored.

    • Warehouse: Enter the name of your Snowflake database warehouse. In Snowsight, click Admin > Warehouses. In the Snowflake Classic Console, click Warehouses.

    • Role: Enter your Snowflake role.

  6. Optional. Select Test connection if you want to test the Data Asset connection. Testing the connection to the Data Asset is a preventative measure that makes sure the connection configuration is correct. This verification can help you avoid errors and can reduce troubleshooting downtime.

  7. Click Continue.

  8. Select Table Asset or Query Asset and complete the following fields:

    • Table name: When Table Asset is selected, enter a name for the table you're creating in the Data Asset.

    • Data Asset name: Enter a name for the Data Asset. If you use the same name for multiple Data Assets, each Data Asset must be associated with a unique Data Source.

    • Query: When Query Asset is selected, enter the query that you want to run on the table.

  9. Select the Complete Asset tab to provide all Data Asset records to your Expectations and validations, or select the Batches tab to use subsets of Data Asset records for your Expectations and validations. If you selected the Batches tab, complete the following fields:

    • Split Data Asset by - Select Year to partition Data Asset records by year, select Year - Month to partition Data Asset records by year and month, or select Year - Month - Day to partition Data Asset records by year, month, and day.

    • Column of datetime type - Enter the name of the column containing the date and time data.

  10. Optional. Select Add Data Asset to add additional tables or queries and repeat steps 8 and 9.

  11. Click Finish.

  12. Create an Expectation. See Create an Expectation.

View Data Asset metrics

Data Asset metrics provide you with insight into the data you can use for your data validations.

  1. In GX Cloud, click Data Assets and then select a Data Asset in the Data Assets list.

  2. Click the Overview tab.

  3. Select one of the following options:

    • If you have not previously generated Data Asset metrics, click Fetch Metrics.

    • If you previously generated Data Asset metrics, click Refresh to refresh the metrics.

Available Data Asset metrics

The following table lists the available Data Asset metrics.

ColumnDescription
Row CountThe number of rows within a Data Asset.
ColumnA column within your Data Asset.
TypeThe data storage type in the Data Asset column.
MinFor numeric columns the lowest value in the column.
MaxFor numeric columns, the highest value in the column.
MeanFor numeric columns, the average value with the column.
This is determined by dividing the sum of all values in the Data Asset by the number of values.
MedianFor numeric columns, the value in the middle of a data set.
50% of the data within the Data Asset has a value smaller or equal to the median, and 50% of the data within the Data Asset has a value that is higher or equal to the median.
Null %The percentage of missing values in a column.

Add an Expectation to a Data Asset column

When you create an Expectation after fetching metrics for a Data Asset, the column names and some values are autopopulated for you and this can simplify the creation of new Expectations. Data Asset Metrics can also help you determine what Expectations might be useful and how they should be configured. When you create new Expectations after fetching Data Asset Metrics, you can add them to an existing Expectation Suite, or you can create a new Expectation Suite and add the Expectations to it.

  1. In GX Cloud, click Data Assets and then select a Data Asset in the Data Assets list.

  2. Click the Overview tab.

  3. Select one of the following options:

    • If you have not previously generated Data Asset metrics, click Fetch Metrics.

    • If you previously generated Data Asset metrics, click Refresh to refresh the metrics.

  4. Click New Expectation.

  5. Select one of the following options:

    • To add an Expectation to a new Expectation Suite, click the New Suite tab and then enter a name for the new Expectation Suite.

    • To add an Expectation to an existing Expectation Suite, click the Existing Suite tab and then select an existing Expectation Suite.

  6. Select an Expectation type. See Available Expectation types.

  7. Complete the fields in the Create Expectation pane.

  8. Click Save to add the Expectation, or click Save & Add More to add additional Expectations.

Add a Data Asset to an Existing Data Source

Additional Data Assets can only be added to an existing Snowflake Data Source.

  1. In GX Cloud, click Data Assets and then select New Data Asset.

  2. Click the Existing Data Source tab and then select a Snowflake Data Source.

  3. Click Add another Data Asset.

  4. Select Table Asset or Query Asset and complete the following fields:

    • Asset name: Enter a name for the Data Asset. Data Asset names must be unique. If you use the same name for multiple Data Assets, each Data Asset must be associated with a unique Data Source.

    • Table name: When Table Asset is selected, enter a name for the table you're creating in the Data Asset.

    • Query: When Query Asset is selected, enter the query that you want to run on the table.

  5. Optional. Select Add another Data Asset to add additional tables or queries and repeat step 4.

  6. Click Finish.

Edit Data Source settings

Currently, you can only edit Snowflake Data Source settings.

  1. In GX Cloud, click Data Assets.

  2. Click Manage Data Sources.

  3. Click Edit Data Source for the Snowflake Data Source you want to edit.

  4. If you used a connection string to connect to the Data Source, edit the Data Source connection string, or click the Use connection string selector and edit the following fields:

    • Data Source name: Enter a new name for the Data Asset.

    • Username: Enter a new Snowflake username.

    • Account identifier: Enter new Snowflake account or locator information. The locator value must include the geographical region. For example, us-east-1. To locate these values see Account Identifiers.

    • Password/environment variable: Enter a Snowflake password or ${GX_CLOUD_SNOWFLAKE_PASSWORD}. If you haven't set this variable, see Set up GX Cloud.

    • Database: Enter a new Snowflake database name.

    • Schema: Enter a new schema name.

    • Warehouse: Enter a new Snowflake database warehouse name.

    • Role: Enter a new Snowflake role.

  5. Click Save.

Edit a Data Asset

Currently, you can only edit Snowflake Data Assets.

  1. In GX Cloud, click Data Assets and in the Data Assets list click Edit Data Asset for the Data Asset you want to edit.

  2. Edit the following fields:

    • Table name: Enter a new name for the Data Asset table.

    • Data Asset name: Enter a new name for the Data Asset. If you use the same name for multiple Data Assets, each Data Asset must be associated with a unique Data Source.

  3. Click Save.

Secure your GX API Data Source connection strings

When you use the GX API and not GX Cloud to connect to Data Sources, you must obfuscate your sensitive Data Source credentials in your connection string. Data Source connection strings are persisted in GX Cloud backend storage. Connection strings containing plaintext credentials are stored as plaintext.

  1. Store your credential value as an environment variable by entering export ENV_VAR_NAME=env_var_value in the terminal or adding the command to your ~/.bashrc or ~/.zshrc file. For example:

    Terminal input
    export GX_CLOUD_SNOWFLAKE_PASSWORD=<password-string>

    Prefix environment variable names with GX_CLOUD_.

  2. Create a Data Source connection string using the environment variable name instead of the credential value. For example:

    Example Data Source connection string
    snowflake://<user-name>:${GX_CLOUD_SNOWFLAKE_PASSWORD}@<account-name>/<database-name>/<schema-name>?warehouse=<warehouse-name>&role=<role-name>

    Environment variable names must be enclosed by curly braces and be preceded by a dollar sign. For example: ${GX_CLOUD_SNOWFLAKE_PASSWORD}. Do not use interpolation to add credential values to connection strings.

  3. Use the environment variable to supply the credential value when you run the GX Agent. For example:

    Terminal input
    docker run --rm -e GX_CLOUD_SNOWFLAKE_PASSWORD="<snowflake_password>" -e GX_CLOUD_ACCESS_TOKEN="<user_access_token>" -e GX_CLOUD_ORGANIZATION_ID="<organization_id>" greatexpectations/agent

Delete a Data Asset

  1. In GX Cloud, click Settings > Datasources.

  2. Click Delete for the Data Source and the associated Data Assets you want to delete.

  3. Click Delete.