Data Quality user documentation

Data Studio offers a number of ways to transform your data in various ways. You can either use:

a Function
the Transform step

Manipulate and configure your data.

This step allows you to add/remove various Functions and arrange or exclude columns.

Clicking the Columns row on any Transform Workflow step will open the panel where you can select the columns you want to work with.

Moving columns

Moving columns allows you to change the order in which they appear in the data. This can be done by selecting the columns and pressing the up/down arrow button next to the columns list.

Columns can also be moved by dragging them up/down the list.

Renaming columns

Rename a column by selecting the column and clicking Rename on the toolbar. You will then be presented with a modal box prompting you to enter the name and an optional summary.

If multiple columns are renamed at the same time, the modal box will provide you with an opportunity to apply a common prefix or suffix to the originating column name. Alternatively, you will be able to rename all columns individually.

Excluding columns

Select the columns you want to exclude and click Exclude in the toolbar. This can also be done by hovering over the column name and clicking the inline icon.

To make columns visible again, select the columns and click the Include button in the toolbar.

This only applies for Workflows here. To exclude columns in Views, you have to use the Columns action instead of the Transform one.

Transforming and creating columns

A column can be transformed by applying one or many Functions to it. These Functions range from a simple action, such as converting a column to uppercase, to more complex multiple column operations.
Whatever it may be, transforming a column can be done to create a new column or to change the data in an existing one.

Suggestions

Before applying any custom transformations, you can use the Suggest option to analyze the data being explored and return some recommended transformations based on its content.

Any of these suggestions can be applied to the data by checking their selection boxes and clicking Apply.

The transformation suggestions can also be created as new columns rather than replacing the original data, by checking New column? in the table and specifying a column name.

Applying a Function

Select the columns
Decide to transform the data in the existing columns, or create new ones for the results
- To change existing columns click Existing column on the toolbar
- To create new columns click New column on the toolbar
Choose an existing Function from the list (these are grouped into expandable categories)
After choosing your Function you will be presented with options to configure it
Click Apply transformation to apply your Function to the columns

Creating an advanced Function

Select the columns
Click Existing column or New column (see Applying a Function above for more info)
Click Create your own function
Create a Function with as many Function steps as you like
Click Apply to apply your Function to the columns

If the column already has a transformation Function applied you will first be prompted to choose whether to 'Append to existing functions' or 'Replace existing functions'.

The data grid at the bottom will show a preview of the results of your Function, but you may need to click Execute.

When transforming or creating multiple columns at once, the Function is applied to each one individually. Selecting the 'Selected columns: (x)' placeholder within a function will insert the selected column into that place in the Function.

Note: to create a single Function that references multiple columns you will need to select each one individually, not use the 'Selected columns: (x)' placeholder.

Duplicating columns

Duplicating columns can be done by selecting the columns and clicking Duplicate on the toolbar. The duplicated column will be placed after the original column and will be renamed with the prefix 'Copy of '.

Duplicating a column does not copy any Functions applied to it, instead it will create a new column using the 'Copy' Function to just copy the data values.

Editing a transformation

Select the column that is already transformed
Click Edit on the toolbar
Either edit the options in the configuration form or start over with a completely new Function

If the column's Function has multiple Function steps then you will be taken to the advanced Function builder to edit your Function.

Removing a transformation

To remove the Functions applied to columns, select them and click Delete on the toolbar.

You can only completely delete a created column rather than just remove the transformation, because created columns can't exist without a Function.

Tip: all of these transformation features can also be done by clicking the relevant inline icon when hovering over a column.

Join two inputs into one based on one or more columns from each input.

To specify the columns you want to join, click Left side columns or Right side columns.

Use the Retention type menus to specify different combinations of join. The top menu is for the first join input and the bottom one for the second join input.

All matches: Performs a standard join.
Singletons only: Return the first match of a one-to-many match and any matches that occur only once. This is a normal relational join with duplicates removed.
Duplicates only: Return all the matches that are one-to-many only . This is a normal relational join excluding matches that match only once.

Click on the Venn diagram to specify the type of join you want to apply:

Inner (center segment of the Venn): Returns all rows where the join condition is met (keys match).
Left outer (left and center segments): Returns all rows from the left-hand input and only the rows from the right-hand input where the join condition is met (keys match).
Left outer without intersection (left segment): A variant on the basic left outer join, which returns only rows from the left-hand input that don't meet the join condition. No rows from the right-hand input are returned.
Right outer (center and right segments): Returns all rows from the right-hand input and only the rows from the left-hand input where the join condition is met (keys match)
Right outer without intersection (right segment only): A variant on the basic right outer join, which returns only rows from the right-hand input that don't meet the join condition. No rows from the left-hand input are returned.
Full outer (all segments of the Venn): Returns all rows in both the left and right inputs. If a row from one side of the join has no match in the input on the other side, the columns from the other input will contain null values.
Full outer without intersection (left and right segments only): Returns all rows in both the left and right inputs, excluding rows where the join condition is met.
Cross join or Cartesian product (no segments of the Venn): Returns all rows from the left input. Each row from the left input is combined with all rows from the right input.

Above the Venn diagram select More options to expose more detailed options for the join.

The Join step provides three matching options which determine how keys will be compared:

Exact match: rows will be joined if key values on the left-hand side and right-hand side are exactly the same, including data type and casing.
Ignore datatype: rows will be joined if keys can be standardized to the same value. A key with the alphanumeric value '1' will be joined with a key that is the integer value 1.
Case-insensitive: rows will be joined if keys match with casing ignored. A key with value 'ABC' will be joined to a key with value 'abc'.

The Do not match nulls option configures how null values are handled in the join. When enabled, left or right column values containing nulls will never match with any other row. This means that rows where the join key contains nulls will never appear in inner join results and will always appear in left/right unmatched results if the selected join type will include left/right unmatched rows.

In the join dialog, click Show step results to view the results of the join. Specify which values from the joined columns to use in any following steps: all the matches, only duplicates or only singletons.

Click Lightbulb icon. to see suggested join keys. Joins will be suggested only where column names match exact on the two side of the join. Click on the suggested join to apply it.

Filter your data by selecting a column and entering a comparison value to match against using an operation like Equals, Contains or In Range. To filter on multiple columns and/or multiple comparison values create an advanced Function that returns a boolean (true or false) value.

Quick filter

In Explore mode, right-click on a value in a column and select one of the options Keep or Remove to show or exclude all rows containing the chosen value.

Filter or Split?

The Split Workflow step is the same as Filter but has the benefit of two outputs to see the rows that pass the filter and the rows that fail the filter.

Grouping will reduce the number of rows in a table.

The default output of a Group step is the Count aggregate, which returns the total count of rows. This count will then be broken down by each unique value in the group column.

Moving columns from ungrouped to grouped will show the distinct/deduplicated values in the column (along with the optional Count of the number of times each combination of values appears in the data).

Aggregates

The Add aggregate button is used to create a new column(s) containing a value or metric for each group.

Give the new aggregate column a name or optional suffix value if applying an aggregate on Multiple columns, then select the type of aggregate to use.

Aggregate	Description
Count	Returns a count of the number of items in each group.
Count excluding nulls	Returns a count of the number of items in each group ignoring null values.
First value	Returns the first value in the column for each group.
First populated value	Returns the first value in the column for each group that is not null.
Last value	Returns the last value (based on input row order) in the column for each group.
Minimum value	Returns the minimum value in the column for each group.
Maximum value	Returns the maximum value in the column for each group.
Grouped values	Returns a comma separated list of all values in each group. The order is as they appear in the input.
Sorted grouped values	Returns a comma separated list of all values in each group sorted A to Z.
Reverse grouped values	Returns a comma separated list of all values in each group sorted Z to A.
Average	Returns the average (mean) of the values in each group.
Median	Returns the median (based on a sort of the values) of the values in each group.
Sum	Returns the sum of the values in each group.
Standard deviation	Returns the standard deviation of the values in each group.
Standard deviation (population)	Returns the standard deviation of the values in each group where it is known that the values represent the whole population.
Variance	Returns the statistical variance of the values in each group.
Variance (population)	Returns the statistical variance of the values in each group where it is known that the values represent the whole population.
Group ID	Returns a numeric value that can be used to identify each group.

Non-numeric values will be excluded (ignored) from numeric aggregation calculations. Zero will be returned by default if the values are all non-numeric.

Interactivity

In Explore mode you can drill into a particular group to see the rows that belong to it by right-clicking and selecting View rows in group.

This ability to drill down into grouped values will be retained if the step is connected to any of the following subsequent steps: Branch, Chart, Export, Filter, Fire event, Output, Sample, Sort, Split, Take Snapshot, Transform and Union.

A Take Snapshot step can be set to retain this interactivity except for the Union step or if the original Source is Interactive - you can't create an Interactive Dataset from an Interactive Dataset.

Sort your data in ascending/descending order.

Move columns from 'unsorted columns' to 'sorted columns', set the sort order (min to max or max to min) for each column using Flip sort direction or the up/down arrow on the right of each 'sorted column'.

Select a 'sorted column' and use the 'move up'/'move down' buttons to change the priority order if sorting on mutliple columns.

Was this helpful?

Previous: Discover and profile data

Next: Validate and enrich data