To further improve data quality, Data Studio can capture lineage metadata. This ensures traceability for a variety of purposes including data governance, auditing and root cause analysis.
Lineage metadata refers to information about the origin of the batches of data in a Dataset, its transformations and characteristics. This can be included in Workflow outputs and subsequently used for further integration or processing, allowing the metadata to be treated as data for in depth analysis and record management.
Each Dataset has metadata associated with it which may be automatically captured or defined by the user. The following elements will be captured automatically:
Batch ID
Column name (except for the Source step)
Company
Credential
Database
Dataset
File Name
Host Name
Interface Name
Keyspace
Port
Project
Qualified table name
REST config file
REST sample endpoint
Schema
Server
SID
Source type
Sub filenames
System
Table name
Timestamp
For more details, see default metadata by source type .
User connection properties under advanced settings of a connection to an External system will not be captured as metadata.
Lineage metadata can be included in the output of these Workflow steps:
To provide more granular control, metadata elements can be individually selected from a checkbox list, so that only relevant metadata is displayed. Each metadata property will be outputed as an individual additional column, styled differently to ensure it's not confused with data.
Only Datasets loaded after Data Studio version 2.1.13 will include full lineage metadata.
Data Studio supports JDBC connections and in addition to the metadata automatically captured about the source system when data is loaded, you can define custom metadata.
This can be configured in the customjdbc.json file or in datadirectJdbc.json for native drivers.
Custom metadata consists of string key/value pairs. When adding custom metadata, create a new connectionParam
prefixed with CUSTOM_ . This is required in order for it to appear as custom metadata in Data Studio.
You can define custom metadata at two different levels, when defining:
an External system (e.g. to label a system as "Development" or "Production")
a Dataset
If the same key is defined at both levels, the Dataset level definition will take precedence . Note that we don't support custom keys that are identical to any of the automatically-captured metadata properties.
Single file
Metadata
Single file (e.g. CSV)
Batch ID
Y
Column name (not Source step)
Y
Company
Credential
Database
Dataset
Y
File Name
Y
Host Name
Interface Name
Keyspace
Port
Project
Qualified table name
REST configuration file
REST endpoint
Schema
Server
SID
Source type
Y
Sub filenames
System
Table name
Timestamp
Y
Multiple tab file
Metadata
Multiple tab file (e.g. Excel)
Batch ID
Y
Column name (not Source step)
Y
Company
Credential
Database
Dataset
Y
File Name
Y
Host Name
Interface Name
Keyspace
Port
Project
Qualified table name
REST configuration file
REST endpoint
Schema
Server
SID
Source type
Y
Sub filenames
Y
System
Table name
Timestamp
Y
Oracle
Metadata
Oracle
Batch ID
Y
Column name (not Source step)
Y
Company
Credential
Y
Database
Dataset
Y
File Name
Host Name
Y
Interface Name
Keyspace
Port
Y
Project
Qualified table name
Y
REST configuration file
REST endpoint
Schema
Y
Server
SID
Y
Source type
Y
Sub filenames
System
Y
Table name
Y
Timestamp
Y
SQL Server
Metadata
SQL Server
Batch ID
Y
Column name (not Source step)
Y
Company
Credential
Y
Database
Y
Dataset
Y
File Name
Host Name
Y
Interface Name
Keyspace
Port
Project
Qualified table name
Y
REST configuration file
REST endpoint
Schema
Y
Server
SID
Source type
Y
Sub filenames
System
Y
Table name
Y
Timestamp
Y
MySQL
Metadata
MySQL
Batch ID
Y
Column name (not Source step)
Y
Company
Credential
Y
Database
Y
Dataset
Y
File Name
Host Name
Y
Interface Name
Keyspace
Port
Y
Project
Qualified table name
Y
REST configuration file
REST endpoint
Schema
Server
SID
Source type
Y
Sub filenames
System
Y
Table name
Y
Timestamp
Y
DB2
Metadata
DB2
Batch ID
Y
Column name (not Source step)
Y
Company
Credential
Y
Database
Y
Dataset
Y
File Name
Host Name
Y
Interface Name
Keyspace
Port
Y
Project
Qualified table name
Y
REST configuration file
REST endpoint
Schema
Y
Server
SID
Source type
Y
Sub filenames
System
Y
Table name
Y
Timestamp
Y
Informix
Metadata
Informix
Batch ID
Y
Column name (not Source step)
Y
Company
Credential
Y
Database
Y
Dataset
Y
File Name
Host Name
Y
Interface Name
Keyspace
Port
Y
Project
Qualified table name
Y
REST configuration file
REST endpoint
Schema
Y
Server
Y
SID
Source type
Y
Sub filenames
System
Y
Table name
Y
Timestamp
Y
PostgreSQL
Metadata
PostgreSQL
Batch ID
Y
Column name (not Source step)
Y
Company
Credential
Y
Database
Y
Dataset
Y
File Name
Host Name
Y
Interface Name
Keyspace
Port
Y
Project
Qualified table name
Y
REST configuration file
REST endpoint
Schema
Y
Server
SID
Source type
Y
Sub filenames
System
Y
Table name
Y
Timestamp
Y
OpenEdge
Metadata
OpenEdge
Batch ID
Y
Column name (not Source step)
Y
Company
Credential
Y
Database
Y
Dataset
Y
File Name
Host Name
Y
Interface Name
Keyspace
Port
Y
Project
Qualified table name
Y
REST configuration file
REST endpoint
Schema
Y
Server
SID
Source type
Y
Sub filenames
System
Y
Table name
Y
Timestamp
Y
Sysbase
Metadata
Sysbase
Batch ID
Y
Column name (not Source step)
Y
Company
Credential
Y
Database
Y
Dataset
Y
File Name
Host Name
Y
Interface Name
Keyspace
Port
Y
Project
Qualified table name
Y
REST configuration file
REST endpoint
Schema
Y
Server
SID
Source type
Y
Sub filenames
System
Y
Table name
Y
Timestamp
Y
Hive
Metadata
Hive
Batch ID
Y
Column name (not Source step)
Y
Company
Credential
Y
Database
Dataset
Y
File Name
Host Name
Interface Name
Keyspace
Port
Y
Project
Qualified table name
Y
REST configuration file
REST endpoint
Schema
Y
Server
SID
Source type
Y
Sub filenames
System
Y
Table name
Y
Timestamp
Y
Greenplum
Metadata
Greenplum
Batch ID
Y
Column name (not Source step)
Y
Company
Credential
Y
Database
Y
Dataset
Y
File Name
Host Name
Y
Interface Name
Keyspace
Port
Y
Project
Qualified table name
Y
REST configuration file
REST endpoint
Schema
Y
Server
SID
Source type
Y
Sub filenames
System
Y
Table name
Y
Timestamp
Y
Salesforce
Metadata
Salesforce
Batch ID
Y
Column name (not Source step)
Y
Company
Credential
Y
Database
Dataset
Y
File Name
Host Name
Y
Interface Name
Keyspace
Port
Project
Qualified table name
Y
REST configuration file
REST endpoint
Schema
Y
Server
SID
Source type
Y
Sub filenames
System
Y
Table name
Y
Timestamp
Y
MongoDb
Metadata
MongoDb
Batch ID
Y
Column name (not Source step)
Y
Company
Credential
Y
Database
Y
Dataset
Y
File Name
Host Name
Y
Interface Name
Keyspace
Port
Y
Project
Qualified table name
Y
REST configuration file
REST endpoint
Schema
Y
Server
SID
Source type
Y
Sub filenames
System
Y
Table name
Y
Timestamp
Y
Redshift
Metadata
Redshift
Batch ID
Y
Column name (not Source step)
Y
Company
Credential
Y
Database
Y
Dataset
Y
File Name
Host Name
Y
Interface Name
Keyspace
Port
Y
Project
Qualified table name
Y
REST configuration file
REST endpoint
Schema
Y
Server
SID
Source type
Y
Sub filenames
System
Y
Table name
Y
Timestamp
Y
Cassandra
Metadata
Cassandra
Batch ID
Y
Column name (not Source step)
Y
Company
Credential
Y
Database
Dataset
Y
File Name
Host Name
Y
Interface Name
Keyspace
Y
Port
Y
Project
Qualified table name
Y
REST configuration file
REST endpoint
Schema
Server
SID
Source type
Y
Sub filenames
System
Y
Table name
Y
Timestamp
Y
SparkSQL
Metadata
SparkSQL
Batch ID
Y
Column name (not Source step)
Y
Company
Credential
Y
Database
Y
Dataset
Y
File Name
Host Name
Y
Interface Name
Keyspace
Port
Y
Project
Qualified table name
Y
REST configuration file
REST endpoint
Schema
Y
Server
SID
Source type
Y
Sub filenames
System
Y
Table name
Y
Timestamp
Y
ServiceCloud
Metadata
ServiceCloud
Batch ID
Y
Column name (not Source step)
Y
Company
Credential
Y
Database
Dataset
Y
File Name
Host Name
Y
Interface Name
Y
Keyspace
Port
Project
Qualified table name
Y
REST configuration file
REST endpoint
Schema
Y
Server
SID
Source type
Y
Sub filenames
System
Y
Table name
Y
Timestamp
Y
Google BigQuery
Metadata
Google BigQuery
Batch ID
Y
Column name (not Source step)
Y
Company
Credential
Y
Database
Dataset
Y
File Name
Host Name
Interface Name
Keyspace
Port
Project
Y
Qualified table name
Y
REST configuration file
REST endpoint
Schema
Server
SID
Source type
Y
Sub filenames
System
Y
Table name
Y
Timestamp
Y
AutonomousRestConnector
Metadata
AutonomousRestConnector
Batch ID
Y
Column name (not Source step)
Y
Company
Credential
Y
Database
Dataset
Y
File Name
Host Name
Interface Name
Keyspace
Port
Project
Qualified table name
Y
REST configuration file
Y
REST endpoint
Y
Schema
Server
SID
Source type
Y
Sub filenames
System
Y
Table name
Y
Timestamp
Y
Eloqua
Metadata
Eloqua
Batch ID
Y
Column name (not Source step)
Y
Company
Y
Credential
Y
Database
Dataset
Y
File Name
Host Name
Interface Name
Keyspace
Port
Project
Qualified table name
Y
REST configuration file
REST endpoint
Schema
Y
Server
SID
Source type
Y
Sub filenames
System
Y
Table name
Y
Timestamp
Y
Jira
Metadata
Jira
Batch ID
Y
Column name (not Source step)
Y
Company
Y
Credential
Y
Database
Dataset
Y
File Name
Host Name
Interface Name
Keyspace
Port
Project
Qualified table name
Y
REST configuration file
REST endpoint
Schema
Y
Server
SID
Source type
Y
Sub filenames
System
Y
Table name
Y
Timestamp
Y
SalesCloud
Metadata
SalesCloud
Batch ID
Y
Column name (not Source step)
Y
Company
Credential
Y
Database
Dataset
Y
File Name
Host Name
Y
Interface Name
Keyspace
Port
Project
Qualified table name
Y
REST configuration file
REST endpoint
Schema
Y
Server
SID
Source type
Y
Sub filenames
System
Y
Table name
Y
Timestamp
Y