Data tags allow you to enrich source data by specifying additional information about what type of data each column contains. Having tagged columns improves the user experience by providing sensible defaults and suggestions in Workflow steps. For example, Find duplicates and Validate addresses. Data Studio can use machine learning algorithms to automatically tag columns.
Tags are saved as part of the Dataset, so any new batch of data added to the Dataset will retain the same tags.
Data tags appear next to the column name in data grids, so if you're not familiar with your data they provide an overview at a glance.
There are two types of data tags:
Tags in Data Studio are hierarchical: a parent tag (e.g. Address) can have multiple children (e.g. Postal Code, City).
While you can't modify system defined tags, you can create a child tag (e.g. PO box) and assign it to a system tag (e.g. Address). Once created, it will appear in the User defined list under that system parent tag.
To create new tags follow these steps:
The newly created tag will appear in the list.
To manage existing tags:
Any data tag can have multiple training datasets. A training dataset is used to train a fingerprint, which will allow Data Studio to learn how to recognize data that have similar properties, and allow them to be auto tagged.
To create or view the Training datasets for a data tag:
When creating a new training dataset you will need to define:
Threshold suggestion
Data Studio will train a fingerprint using the specified source and column. Once the fingerprint is trained, it can be used for auto tagging new Datasets. When auto tagging, Data Studio will analyze the input columns and tag them if a fingerprint match is found.
Training datasets can be disabled, which will prevent the training data from being included in the auto tagging process.