Create Rulesets

A Ruleset is a collection of validation rules. Rulesets can be used within the Validate step, versioned and shared between Spaces you have access to. Rulesets allow a set of business rules to be created and managed globally then applied to similar data in different Spaces.

For example, you may have a Ruleset (based upon your current business policy) for 'contact information'. This would be a collection of validation rules used on any data containing contact fields. If there are any changes to this policy in the future, you only need to change, test and publish one Ruleset which will update all instances where these validation rules are used.

Creating rules

Within a Ruleset you can define:

  • the names of the columns the validation rules apply to. These can be automatically or manually mapped when the Ruleset is being used in the Validate step (e.g., so that the 'Phone' rule is applied to the 'Phone number' column).
  • the rules and rule groups. This is done in the same way as the Validate step using the Function builder and script editor.

Using the Validation Script editor, you can copy/paste a script and it will automatically create the rule columns.

Use business rules to measure the quality, completeness and accuracy of your data.

A rule is a Function that returns a true or false value, which translates to a pass or fail. This can be either a default Function (like 'Contains' or 'Is Date') or a custom one.

Suggest validation rules

The easiest way to get started is to use Suggest validation rules, which profiles the data to then suggest formulas to be applied to columns as validation rules. Uncheck any suggestions that are not useful and apply to create rules separated into rule groups ready for further changes if required.

In Explore mode, Profile your data to see the Suggest rules action button, which will create a new Workflow containing a Validate step.

Applying a Ruleset

Selecting Apply ruleset allows you to select any Rulesets that have been created and map columns to the rules where the names differ.

Creating rules and groups

On the Validate step, click Rules to view and edit any existing rules and to Add rule or Add group.

Each rule has a name and summary, a parent rule group and a Function definition. The Function can either be a pre-existing Function in Data Studio, or a custom Function created in the Function builder. When building a custom Function, pick one of the relevant columns (or a parameter) as the input value and ensure the Function returns true/false as the result.

Rule groups represent logical collections of validation rules and can have different thresholds set. For example, a group of compliance rules may have a fail threshold of 99%, but a data quality group of rules may use a lower threshold and be increased over time as data and processes are improved.

The Status result column for each rule is based on these thresholds, so Red below the fail threshold, Green at or above the pass threshold, and Amber in between.

Each group has a name and description, pass and fail threshold, plus an optional column to weight the results. As well as counting the number of Passed rows and Failed rows, the results will also include Passed weight and Failed weight columns, which contain a sum of the chosen column for all the passing/failing rows. For example, weighting each rule's results by total sales allows you to prioritize data quality issues for the highest spending customers with the most overall impact.

Ignoring values in validation rules

Rules can be set to ignore or skip values that do not need to be validated, and so should not impact the rule pass rate. For example, where 10 emails are being checked; 5 are valid, 2 invalid and 3 are blank, the pass rate would be 5/7 (rather than 5/10 if all values were considered).

The Ignore Null values checkbox can be selected when first creating a rule using an existing Function. However, values other than null can also be ignored using the Ignore literal value when designing a custom Function.

Validation rule script editor

You can use a scripting language to create new and manage existing validation rules, especially if you're making several changes at once. Click Edit script to open the script editor and make the required changes, such as renaming, re-ordering or changing rule groups.
To open a read-only version, click View script.

You can create Functions using the same functionality in the Function script editor.

A rule is made up of a rule group, group id, rule name, rule id, rule summary and Function in that order. For example:

## Rule_Group_1 {{c1e5ff0d}}
# Rule_1 {{d1f5ff0c}}
-- Summary of rule 1
IsEven(3)

Using the script editor:

  • ## denotes a rule group.
  • # denotes a rule name.
  • -- denotes rule or rule group summary text.
  • Rule id's are non-editable and should not be copied when creating a new rule.
  • Use Ctrl+Space to show available Functions or to suggest a Function according to what you have already typed.
  • Hover over a Function name with the cursor to show the arguments required for that Function.
  • Use Ctrl+Enter to validate and reformat the Function script.

Viewing results

Validation results are available in several formats:

  • Show passing rows contains the rows that pass all of the defined rules.
  • Show failing rows contains the rows that fail at least one validation rule.
  • Show all rows is a combination of the two above, comprising the entire input data alongside the rules and a Pass/Fail result for each.
  • Show results by rule summarizes/aggregates the results for each rule with a count of passes and fails.
    • Show results by rule for analysis is the same data, un-pivoted into rows.
  • Show results by group is similar to the above, but useful for a higher level view of aggregated results when there are many rules.
    • Show results by group for analysis is the same data, un-pivoted into rows.

Similar to the Source and Profile step, the Validate step has a Source metadata dropdown that enables you to include the lineage metadata of input data. This information (such as file name or batch ID) is useful to include as additional columns in the aggregated results.

Convert to Ruleset

If you already have an existing group of validation rules, or to make the creation of rule columns easier (and to ensure automapping), use the Validate step within a Workflow and click Convert to Ruleset. Ensure that the Add & replace with Ruleset checkbox is ticked - this will create the Ruleset from the validation rules and rule columns you've defined replacing the rules within the validate step itself.

Share and version

By default, a Ruleset is restricted to the Space it was created in. To share it with other Spaces, the Ruleset has to be published.

Updating a previously published Ruleset will create a new version in draft status, which can be improved and tested, these changes will take immediate effect within the same space, but in other shared Spaces not until the Ruleset is re-published.

Updating rules

Before updating a Ruleset, we recommend using the Associations feature to check where it's being used. If you introduce new rule columns, these might be unable to be auto-mapped within the Validation steps that use the Ruleset.

A Ruleset used within a Validation step must always be fully mapped. Failed auto-mappings would require you to manually remap the affected Rulesets. Until the Ruleset has been fully remapped, the Workflow would fail to execute (note that you may not have access to edit the Workflows in which the Ruleset has been used).

Deleting rule columns, updating custom functions, or rules contained within the Ruleset could cause one or more of the rules within the Ruleset to become invalid (these would be highlighted in red). Any invalid rules within the Ruleset would render the whole Ruleset invalid - invalid rules would require fixing before any Workflow using the Ruleset can be executed.