When combining values from multiple records (i.e. either the Select and complete best record or Merge best values processing method was chosen), we strongly recommend that your harmonization rules are applied to column groups instead of individual columns.
For example, you wouldn't want to use Address Line 1 column from one record and Address Line 2 from another as they could represent two entirely separate addresses.
Instead, you'd group all the address columns together and apply business rules to the group in order to find the best overall address.
Similarly, you may want to group columns such as {Latitude and Longitude}, and {DOBYear, DOBMonth, DOBDay}.
To create a column group, click Additional options in the Harmonize duplicates step then click New group.
There are two ways to specify which columns will be added to the group:
The column group allows you to choose any of the value priority types to apply to all columns in the group together.
You will often want to apply different priority types to different column groups (for example you might favor using mailing addresses from your e-commerce delivery system, but email addresses from a social media system).
All columns not targeted by a column group will be processed by the default processing behavior as defined in the Harmonize duplicates step.
Harmonization models can be applied to each column group and are selected from the record priority setting.
Each column can only be processed once but could be targeted by multiple column groups (e.g. if it has multiple data tags applied). In this case, the priority is given to the first column group in the list, and that column will be ignored by any subsequent column groups.
To make this more obvious, each column group has a list of Affected columns containing the names of all columns targeted by this group. If any columns in this list are also targeted by any column groups higher up the list, then they are shown to be crossed-out because they will be ignored in this instance.
If you have more than one column group, you can move the groups up/down in the list to change the order they will be processed in.
In some cases, it's possible to exclude low quality/incorrect values from the harmonization results. If you specify a Filter column for a column group, then only the records that contain the Filter value in this column will be considered.
For example, you may have an email column in your data but want to make sure that no invalid emails make it into your 'surviving' record.
To achieve this
The resulting harmonization would ignore any values from the 'Email' column where the 'Is Valid Email' column does not contain true, so all emails in the output are valid.
This is used in situations where you would rather have no email address than an invalid one.
Filter columns are applied before the prioritization process takes place (so, for example, the 'most common' priority rule would only count occurrences of valid emails, and ignore all invalid emails).
Another situation where Filter Columns are useful is where a source system outputs an 'inactive' or 'do not use' flag against legacy or outdated records, where you may want to avoid using such data in some or all fields.
If you want to apply more complex criteria when deciding which values should be ignored by harmonization, it's often easiest to add a Transform step earlier in the workflow, to output a new column indicating whether or not each value should be considered.