Validate and enrich data

Use business rules to measure the quality, completeness and accuracy of your data.

A rule is a Function that returns a true or false value, which translates to a pass or fail. This can be either a default Function (like 'Contains' or 'Is Date') or a custom one.

Suggest validation rules

The easiest way to get started is to use Suggest validation rules, which profiles the data to then suggest formulas to be applied to columns as validation rules. Uncheck any suggestions that are not useful and apply to create rules separated into rule groups ready for further changes if required.

In Explore mode, Profile your data to see the Suggest rules action button, which will create a new Workflow containing a Validate step.

Applying a Ruleset

Selecting Apply ruleset allows you to select any Rulesets that have been created and map columns to the rules where the names differ.

Creating rules and groups

On the Validate step, click Rules to view and edit any existing rules and to Add rule or Add group.

Each rule has a name and summary, a parent rule group and a Function definition. The Function can either be a pre-existing Function in Data Studio, or a custom Function created in the Function builder. When building a custom Function, pick one of the relevant columns (or a parameter) as the input value and ensure the Function returns true/false as the result.

Rule groups represent logical collections of validation rules and can have different thresholds set. For example, a group of compliance rules may have a fail threshold of 99%, but a data quality group of rules may use a lower threshold and be increased over time as data and processes are improved.

The Status result column for each rule is based on these thresholds, so Red below the fail threshold, Green at or above the pass threshold, and Amber in between.

Each group has a name and description, pass and fail threshold, plus an optional column to weight the results. As well as counting the number of Passed rows and Failed rows, the results will also include Passed weight and Failed weight columns, which contain a sum of the chosen column for all the passing/failing rows. For example, weighting each rule's results by total sales allows you to prioritize data quality issues for the highest spending customers with the most overall impact.

Ignoring values in validation rules

Rules can be set to ignore or skip values that do not need to be validated, and so should not impact the rule pass rate. For example, where 10 emails are being checked; 5 are valid, 2 invalid and 3 are blank, the pass rate would be 5/7 (rather than 5/10 if all values were considered).

The Ignore Null values checkbox can be selected when first creating a rule using an existing Function. However, values other than null can also be ignored using the Ignore literal value when designing a custom Function.

Validation rule script editor

You can use a scripting language to create new and manage existing validation rules, especially if you're making several changes at once. Click Edit script to open the script editor and make the required changes, such as renaming, re-ordering or changing rule groups.
To open a read-only version, click View script.

You can create Functions using the same functionality in the Function script editor.

A rule is made up of a rule group, group id, rule name, rule id, rule summary and Function in that order. For example:

## Rule_Group_1 {{c1e5ff0d}}
# Rule_1 {{d1f5ff0c}}
-- Summary of rule 1
IsEven(3)

Using the script editor:

  • ## denotes a rule group.
  • # denotes a rule name.
  • -- denotes rule or rule group summary text.
  • Rule id's are non-editable and should not be copied when creating a new rule.
  • Use Ctrl+Space to show available Functions or to suggest a Function according to what you have already typed.
  • Hover over a Function name with the cursor to show the arguments required for that Function.
  • Use Ctrl+Enter to validate and reformat the Function script.

Viewing results

Validation results are available in several formats:

  • Show passing rows contains the rows that pass all of the defined rules.
  • Show failing rows contains the rows that fail at least one validation rule.
  • Show all rows is a combination of the two above, comprising the entire input data alongside the rules and a Pass/Fail result for each.
  • Show results by rule summarizes/aggregates the results for each rule with a count of passes and fails.
    • Show results by rule for analysis is the same data, un-pivoted into rows.
  • Show results by group is similar to the above, but useful for a higher level view of aggregated results when there are many rules.
    • Show results by group for analysis is the same data, un-pivoted into rows.

Similar to the Source and Profile step, the Validate step has a Source metadata dropdown that enables you to include the lineage metadata of input data. This information (such as file name or batch ID) is useful to include as additional columns in the aggregated results.

Use this step to validate and enrich addresses in bulk using Experian Batch, depending on your license.

If your data has address columns tagged already, this step will automatically pick up all the columns tagged as addresses and list as Selected columns.

If you are using AddressBase Premium data, you can enable key searching by specifying which columns contain UPRNs and UDPRNs.

Select the Reference data that you want to validate addresses against and then select an Address layout to specify how the validated addresses will be returned.

Using Additional data and options you can enrich valid addresses by selecting one or more Additional data elements.

Results columns - This defines what information is returned about how the address was cleansed. You can ether return a simple summary of the cleaning action (Good Match, Unmatched, and so on), or a much more detailed breakdown of the match code.

  • Standard (returns the result code).
  • Detailed (returns the result code and additional metadata including the match success and the confidence of the match)

Find out how to configure Experian Batch for the Validate addresses step.

Result codes

An address cleansed in Data Studio will result in one of the following possible results:

Validation result Description
Verified Correct Experian Batch verified the input address as a good-quality match to a complete address. No corrections or formatting changes were necessary.
Good Match Experian Batch verified the input address as a good-quality match to a complete address, although minor corrections or formatting changes may have been applied.
Good Premise Partial Experian Batch was not able to find a full match to a correct address, but found a good match to premise level by excluding organization or sub-premise details.
Tentative Match Experian Batch found a match to a complete address, but the overall differences between the input and cleaned addresses are significant enough to reduce the confidence in the match.
Multiple Matches Experian Batch found more than one correct address which matched the input address. This means that no single address could be matched with high confidence.
Poor Match Experian Batch found a match to an address, but with low confidence. This often means that the cleaned address is not deliverable.
Partial Match Experian Batch was unable to find a full correct address which matched the input address. This often occurs when the property number is missing from the input address.
Foreign Address Experian Batch could not find a matching address because the input address referred to a different country.
Unmatched Experian Batch was unable to match the input address to any correct address.
Processing Failure The address input has not returned any results and may be of a bad format. Please report this to whoever manages Aperture Data Studio for your organization.
{end-mode}

Validate emails based on the format or domain address.

Select the Email column and pick one of the two available Validation type options:

  • Format Check: Checks whether the value matched a valid email format. Returns either true or false.
    Examples of valid and invalid email formats:

    Format Result
    info@gmail.com Valid
    first.second-name@gmail.com Valid
    first.name+tag@gmail.com Valid
    name@info@gmail.com Invalid
    name"not"right@test.com Invalid
    another.test.com Invalid
    name@incorrect_domain.com Invalid
    com.domain@name Invalid
    first_address@gmail.com, second_address@gmail.com Invalid

    Only one email can be validated at once; lists of emails as seen in the last example will be rejected.

  • Domain Level: Checks whether the value has a domain that exists and is an email server. This option returns both an overall validity result (true or false) in the Email domain: Result column, and additional information in the Email domain: Error column describing the reason for failure. The possible outcomes are:

    Error Result Description
    Valid True Domain exists and is a mail server.
    Bad format False Email format is invalid.
    Invalid domain False Domain validation check failed. The domain may not exist, or may have been flagged as illegitimate, disposable, harmful, nondeterministic or unverifiable.
    Invalid name False Local part validation failed. For example it may have been identified as a spam trap or role account such as "admin@server.com".
    Failed to perform DNS lookup False An error occurred when attempting to perform the DNS lookup.

Domain level validation results are cached with results refreshed every 30 days. The cache validity is configurable in Settings > Workflow steps by changing the Email validation cache validity setting.

Click Show step results to view the results.

Validate global phone numbers using the Validate phone numbers step in Workflows.

There are two types of phone validation available:

  • Format validation – a basic library format check of the phone number.
  • Live validation – provides information about the live status of a phone number and further metadata.

Live validation requires an Experian Phone Validation license. If you have a license already you can find your token in the Self Service Portal. Contact your account manager if you are interested in live phone validation.

Validated phone numbers incur cost in form of credits that can also be monitored in the Self-service portal.
Global coverage is over 240 countries and territories.

Format validation

The phone number format is validated against a library.
Connect the step to the source step and specify the following:

  • Select phone column - specify the column containing the phone numbers you want to validate
  • Select country - pick the country to use for phone validation. You can either select a single country or pick a column with country data. For the latter, please ensure that the country names adhere to ISO 3166-1 alpha-2, ISO 3166-1 alpha-3 or ISO3166 country name standards.
  • Select validation type - choose the validation rule(s) that will be applied:
    • Valid phone: shows True for phone numbers that have been successfully validated against the selected country and False otherwise.
    • Valid phone region: shows True for phone numbers that have been successfully validated against the region of the selected country and False otherwise.
    • Possible phone : shows True if the phone number could be a valid phone number for the selected country based on its format but with a lower confidence than the Valid phone selection. It shows False otherwise.
    • Invalid phone: shows True for phone numbers that have been unsuccessfully validated against the selected country and False otherwise.
    • Invalid phone region: shows True for phone numbers that have been unsuccessfully validated against the region of the selected country and False otherwise.
    • Not possible phone : shows True if the phone number could not be a valid phone number for the selected country based on its format but with a lower confidence than the Valid phone selection. It shows False otherwise.

Click Show step results to view the results. The following columns will be appended to your data:

  • Validation results – shows the result of the validation rule (one column per each applied rule).
  • Phone Country – shows the country for the validated phone number.
  • Phone Number Type – shows the phone type (e.g. mobile or fixed line).

Live validation

The live status of a phone number is checked via a live ping using the Experian Phone Validation product. To ensure access to the service, Experian IP addresses need to be whitelisted.

In order to use our services, you need to ensure that certain IP addresses and address ranges (listed below) are accessible to you (i.e. not blocked by your firewalls).

  • 45.60.31.210
  • 45.60.33.210
  • 45.60.35.210
  • 45.60.37.210
  • 45.60.39.210
  • 45.60.103.210

To enable live validation go to:

  1. Step settings > Validate phone numbers > Create new step settings > Add a name.
  2. Enter your phone validation authentication token.
  3. A summary and description are optional.

Go back to the Validate phone numbers step in the Workflow, open the Live validation options and select the Validate phone numbers v2 step setting from the drop down menu. Live validation options are now available:

When validating data from multiple countries at once choose no selection from the country drop down and ensure phone numbers submitted are in E164 format.

Clicking Show step results will append the result columns to your data.

FAQs

Format validation is suitable for ensuring the phone numbers in your data are in the correct format for a country. It includes information on whether a phone number is a mobile phone or landline.

If you aim to contact your customers by phone or text message you will need to know if the phone number is active and can be reached. It could also be helpful to know if a phone number is disposable. Ensuring phone numbers are accurate and contactable can help with cost efficiencies.

There is currently no limit of numbers that can be submitted. They will be validated in batches of 10,000.

Time of day and number of requests submitted at the same time can influence the speed of returned results, e.g. during peak business hours. For 10,000 phone numbers, the estimated time is a few minutes.

You will have purchased an amount of credits with your phone validation license. The exact cost can differ depending on the returned results. See our phone validation documentation for more details.

To save credits, test your Workflow with a very small number of records or use sample data.

It is possible to build custom steps using the Data Studio SDK. The step can then connect to an external service to validate your data.

Any other Experian Phone Validation questions can be found on the Phone Validation FAQ page.