Data Quality user documentation | Configure External systems

In addition to file uploads, Aperture Data Studio offers the option to connect directly to External systems.

Data can be loaded from an External system into a Dataset, or an External system can be defined as a target for the Export step.

The supported systems are:

Java Database Connectivity (JDBC)
Apache Hadoop HDFS
Amazon S3
Microsoft Azure Blob Storage (Data Lake Gen2)
SFTP including Experian STS

Create an External system

Navigate to System > External systems and select add new External System.
Give the system a name and optional description.
Define the connection type. If JDBC is selected, additionally select the DBMS.
Complete the fields necessary to configure the connection.
Choose the authentication method to connect to your system:
- Username and password
- Instance profile (only available for Amazon S3)
- Public/private key pair (only available for SFTP systems)
Enter Test settings to confirm that you can successfully connect. These test credentials will not be saved.
Click Finish to save the system configuration or Finish and add credentials to save both, the current system configuration and the connection credential(s) which can then be associated to Data Studio users for accessing data in the specified system.

When working with External systems, access to data is managed through valid credentials, which can take various forms such as username/password combinations, API tokens, or keys.

The choice of credentials can impact the accessibility of data, with different users possibly obtaining varying results from the same dataset.

Use credentials

When interacting with an External system (to either fetch data, or push it in) credentials are pinned to a particular usage of the External system, so that the same set of credentials are used consistently.

Load data from an External system

When adding a Dataset from an External system, you will need to select a credential to be associated with the Dataset. The credentials available for selection are only those that your user has permission to use.

Once set, this credential will be used whenever the Dataset is refreshed by any user, even for those with no direct permission to access the credentials.

Find out how to create a Dataset from an External system.

Space level restrictions will still be enforced, so you will need to ensure that the Space where the Dataset is being added to is suitably secured. Another user with access to the Space can edit the Dataset to change the credential used, for example to one that they have been given permission to use.

For Systems that only have a single set of credentials defined, those credentials are automatically assigned to the relevant Datasets and Workflows.

Export to an External System from a Workflow

To export data to external systems using the Export step in the workflow designer, the user will need to specify the credential to be used. The credential will need to have the necessary permissions to write to the external database.

Another user with access to the Space can modify the credentials used in the workflow's Export step. Once the credential has been set, this credential will be used whenever the workflow is executed (including scheduled execution), even by those with no direct permissions to access the credentials.

Manage credentials

Create an External system with credentials

Only users with the 'Manage Connections to External Systems' capability in their role can create new external systems and credentials.

To create an external system with credentials:

Go to System > External systems.
Click Add new external system.
Populate connection details for the external system to be connected.
Click Finish and add credentials.
Enter credential details (for example, a username and password that allow access to the external system).
Select users who are granted access to the credential.
Specify the permission type for the user.
Click Apply.

If your database does not require any username/password, any credential (populated or otherwise) can be used for accessing the database.

Credential permission types

There are 3 credential permission types in Data Studio as described in the table below.

All permission types allow a user to use the credential to extract (fetch) data from, or to export data back to an associated external system.

Permission type	Description
Use	The user cannot edit/delete the credential and cannot execute SQL queries when importing a Dataset.
Use with query	The user cannot edit/delete the credential but can execute SQL queries when importing a Dataset.
Manage	The user can edit/delete the credential and can execute SQL queries when importing a Dataset.

Import and export External system metadata

External system metadata (connection details and credential names) can be exported and imported from a Space.

When exporting or importing credentials, the password and user permissions are not included, as a security measure.

Configure password and user permissions

After importing an External system's metadata into another environment, you will need to edit the credential to enter the correct password and assign the required user access permissions.

Go to System > External systems.
Click on the imported external system.
Select the imported credential.
Enter credential details (for example the username and password).
Select users who are granted access to the credential.
Specify the permission type for the user.
Click Apply.

JDBC databases
Cloud storage and SFTP

Connect to a system using JDBC

Data Studio provides support for connecting to External systems which can be used either as data sources or as the target of an export operation. One of these external system types is a DBMS system accessible via JDBC.

JDBC will always be slower than file transfer when moving large volumes of data.

There are two types of DBMS JDBC driver recognized by Data Studio:

Add-on (bundled) JDBC drivers by Progress. These are supported directly by Experian and are configured in the file datadirectJdbc.json. These jar files all have the prefix "DS", e.g. DSsforce.jar.
Third-party JDBC drivers. These drivers may be downloaded from a database vendor's website, provided by a third party, or developed by an in-house team. Common native drivers (i.e. those supplied by the DBMS vendor) may be configured in the file customJdbc.json. These drivers are not supported directly by Experian.

The two JDBC configuration files, datadirectJdbc.json and customJdbc.json, can be copied into same directory as server.properties (i.e. the installation root directory) and modified in order to customize the driver's connection parameters. These configuration files provide the Data Studio UI with information about the JDBC drivers and the format (and properties) required to build a valid connection string for each DBMS. Drivers that have not been configured in either of these files will require the user to enter a custom connection string when they create an external system that uses the driver.

Changes made to customJdbc.json are retained when the Data Studio is upgraded.

Data Studio includes drivers that allow you connect to a wide range of external systems.

A wide range of JDBC drivers are included in the Data Studio deployment.

You can also download and deploy your own JDBC (Java Database Connectivity) driver, which must be a type 4 driver and be supported by Java 8. Once a custom driver is deployed, you will be able to create and manage External system connections just as you do with the native drivers. To deploy a custom driver:

Download the JDBC driver you want to use. This will typically be a single file with a .jar extension.
Stop the Data Studio Database service.
Copy the driver to the \drivers\jdbc folder in the database root directory. By default this will be C:\ApertureDataStudio\drivers\jdbc. If a driver is comprised of multiple jar files then the auxiliary jars should be placed in the lib subdirectory.
Restart the Data Studio Database service.
Log in to Data Studio as a user that has the "Manage Connections to External Systems" capability, and configure a connection using the new driver in System > External systems > Add new external system.
With the type JDBC selected, the DBMS dropdown should now display the new custom driver, with the word "Custom" prefixing the driver name and version.
- If the driver isn't configured in either datadirectJdbc.json or customJdbc.json then it will appear in the UI with the name "CUSTOM_N (source jar file name) version" where N is an index of each unknown driver (e.g. "CUSTOM_1 (csvjdbc-1.0-36.jar) 1.0").
- If the driver is already configured in customJdbc.json then it will appear in the UI with the name "Custom DBMsname version" (e.g. "Custom Snowflake 3.11").
Follow the driver's documentation to configure the connection.

Add-on drivers

The following Progress DataDirect drivers are bundled with Data Studio and can be used by any customer with the appropriate licensing add-on:

DBMS name	Driver file name
Amazon Redshift	DSredshift.jar
Apache Hive Microsoft Azure HDInsight Hortonworks Distribution for Apache Hadoop Cloudera's Distribution Including Apache Hadoop (CDH) Amazon Elastic MapReduce (Amazon EMR) IBM BigInsights MapR Distribution for Apache Hadoop Pivotal HD Enterprise (PHD)	DShive.jar
Autonomous Rest Connector (REST API data sources)	DSautorest.jar
Cassandra	DScassandra.jar
DB2	DSdb2.jar
Google BigQuery	DSgooglebigquery.jar
Greenplum Pivotal Greenplum Pivotal HDB (HAWQ)	DSgreenplum.jar
Informix	DSinformix.jar
MongoDB	DSmongodb.jar
Microsoft Dynamics 365	DSdynamics365.jar
MySql	DSmysql.jar
Oracle	DSoracle.jar
Oracle Eloqua	DSeloqua.jar
Oracle Sales Cloud	DSoraclesalescloud.jar
Oracle Service Cloud	DSrightnow.jar
PostgreSQL	DSpostgresql.jar
Progress OpenEdge	DSopenedgewp.jar
Salesforce Salesforce.com Veeva CRM Force.com Applications Financial Force	DSsforce.jar
Snowflake	DSsnowflake.jar
Spark SQL	DSsparksql.jar
SQL Server Microsoft SQL Server Microsoft SQL Azure	DSsqlserver.jar
Sybase	DSsybase.jar

Full documentation for these drivers can be found on the Progress DataDirect Connectors site, under the relevant source.

Third party drivers

The following Native drivers are preconfigured by Data Studio in the file customJdbc.json:

DBMS name	Driver file name pattern
Oracle (Thin client driver)	ojdbc.*
SQL Server (Microsoft driver)	(ms)?sql[-]?jdbc.*
MySQL ( Connector/J driver)	mysql-connector.*
DB2	db2jcc.*
Informix	ifxjdbc.*
PostgreSQL	postgresql.*
Apache Hive	hive-jdbc.*
MongoDB	mongodb.*
Amazon Redshift (Amazon Redshift driver)	redshift.*
dBase	dbf.*
Derby	derby.*
Microsoft Dynamics (CData driver)	.dynamicscrm.
Vertica	vertica.*
MariaDB	mariadb.*
HyperSQL	hsqldb.*
SQLite	.sqlite.jar
H2	h2-.*
Presto	presto.*
SAPDB	sapdbc.*
SAP HANA	ngdbc.*
Amazon Athena	Athena.*
CIS with Business Objects	csjdbc.*
Neo4j	neo4j.*
Google BigQuery (Magnitude Simba driver)	GoogleBigQueryJDBC.*
Netezza	nzjdbc.*
HDP	ddhybrid.jar
Snowflake	snowflake-jdbc.*

The DBMS name for these native drivers is prefixed by "Custom " in the UI

Native drivers that are not configured (i.e. have no configuration details in customJdbc.json) must have the connection string configured manually in the UI.

Create a JDBC External system

In Data Studio, JDBC drivers are associated with an External system. The drivers that are currently known to Data Studio will be listed in the DBMS field when creating an External system with the "JDBC" type.

Once you've selected the DBMS, fields you must supply to create the connection will depend on how the selected DBMS has been configured in datadirectJdbc.json or customJdbc.json. Unconfigured DBMSs will provide a field for the connection string that the driver will use. The format of this string will be provided in the driver vendor's documentation.

Advanced settings

Data Studio will build the connection string for configured DBMSs, although there may be occasions when you may need to override the prebuilt connection string to provide one of your own.

Additional optional fields are available by pressing the Advanced Settings button:

Connection Timeout. This timeout, expressed in seconds, defines how long the driver will wait for a connection request to be accepted. If the remote database is on the end of a slow connection then use this field to increase the timeout.
User Connection Properties. Add any additional driver properties here. Key is the property name, Value is the property value.
Use custom connection string. The connection string is prebuilt in the format appropriate to the specific DBMS driver. Select this checkbox if you need to replace the prebuilt string with your own custom string.
Debug connections. Selecting this checkbox will produce additional debug logging for the connection in the server's main log file.
Schema. Some databases use a default schema. If a schema name is required and isn't the default name then use this field to specify the schema name.
Force JDBC schema name. Tick this box to force the use of a schema name to fully qualify table names in JDBC queries. Use this when the user's default schema name differs from the schema you are attempting to connect to.
Table Pattern. Configure the tables to display to Data Studio users. The default pattern is "%" i.e. accept all table and view names.
Filter regular expression. By default all tables available with the given connection credential are exposed to Data Studio. You can further exclude irrelevant table and view names by specifying a regular expression for acceptable table names.
Filter is case sensitive. Tick this box if the filter is case sensitive.
Include Tables. This checkbox is ticked by default. Tables that exist in the JDBC system will be available.
Include Views. Tick this checkbox to include database views in the list of available tables.

Test the connection

To test the connection, enter the necessary authentication credentials (such as a username and password) and click Test Connection. The test credentials you enter here are only used for testing the connection and are not saved.

If you received a Success message, click on Finish and add credentials to save the settings and proceed to create connection credentials for this external system connection.

In order to read and/or write data to a table in a JDBC system, the credentials that you supply must allow this on the target database. Even if a table is listed by Data Studio, your user's connection credentials may not have read (or write) access to it.

JDBC Connection Troubleshooting

Where the database uses a Schema name, if you want to access tables that are on different schemas, we would advise that you create a separate external system connection for each schema.
Any connection errors will be reported in the UI, however more information will be available in the Data Studio server logfile.
The native drivers can provide detailed logging of all operations. See Spy and Snoop logging.
If additional connection properties are required, click on Show advanced settings, and then on the + button at the User connection properties to add a new key value pair row.

In the following documentation, for the sake of brevity, a driver's connection property key/value pair may appear as PropertyName=Value.

Connection parameters

Hostname: Host name of the computer where Oracle Database is installed.
Port: The JDBC port number that the DBMS server uses to listen for connection requests.
SID: The System Id (optional).

The connection string template is: jdbc:experian:oracle://hostname:port;SID=SID;LoginTimeout=15;UseServerSideUpdatableCursors=true. Additional connection properties are detailed in the DataDirect Oracle JDBC driver documentation.

Authentication parameters

Username: Oracle Database username
Password

Using Service Name instead of SID

To connect to an Oracle server using a Service Name rather than SID, leave the SID field blank and specify the Oracle service name as a user connection property, ServiceName=my.server.com. This property is mutually exclusive with the SID property.

Use the service name to specify connections to an Oracle Real Application Cluster (RAC) because the nodes in a RAC system share a common service name. Service name is also typically used to connect to Pluggable databases (PDBs).

Troubleshooting

The specified SID (or service name) was not recognized by the Oracle server

Error messages:

ORA-12505 Connection refused, the specified SID () was not recognized by the Oracle server
ORA-12514 Connection refused, the specified service Name () was not recognized by the Oracle server

This error can occur if the Oracle database server is configured as RAC (multiple servers, with a load balancer). You may need to use the LoadBalancing=true and AlternateServers=(server2:server2port) user connection properties in the connection configuration.

Invalid identifier

Error:

ORA-00904: Invalid identifier

This error can occur when attempting to preview or load data into Data Studio, if the schema has not been defined in the External system connection and the table exists in more than one schema. To resolve, specify the schema in the connection's advanced settings, to avoid using the Oracle user's default schema.

Transliteration failed, reason: invalid UTF8 data

When attempting to load data from a table, no records are shown and the logs contain "Transliteration failed, reason: invalid UTF8 data". This issue occurs while processing the table, which contains data in a codepage other than UTF-8.

To resolve, set the CodePageOverride connection property, or alternatively use the Oracle thin driver as an alternative custom JDBC driver.

Connection parameters

The following fields are required for connection:

Hostname: Hostname of the computer where the SQL server Database is installed. An instance name or port number may also be specified.
Database: Database name.
Schema: Schema name.

The connection string template is: jdbc:experian:sqlserver://;allowPortWithNamedInstance=true;DatabaseName=;LoginTimeout=15;UseServerSideUpdatableCursors=true;SelectMethod=cursor

Additional connection properties are detailed in the DataDirect SQLServer JDBC driver documentation.

Specify port or named instance

The hostname field can be used to define a port and instance name.

If using default port (1433) or instance (MSSQLSERVER), just enter the hostname.
To specify a port, enter the port number in the format "hostname:port".
To specify a named instance, enter the instance name in the format "hostname\instance".

When specifying the location of the SQL server instance, you should never provide both an instance name and port number together.

Azure SQL Server

If you're connecting to Microsoft Azure SQL two additional connection properties need to be added. This is because Microsoft Azure SQL requires communication using SSL.

EncryptionMethod=SSL
ValidateServerCertificate=false. If set to true, the certificate needs to be available in the database server trust store. The certificate must be issued by a trusted Certificate Authority. If set to false, there will be no validation on the certificate.

Authentication parameters

The SQL Server driver supports several different authentication methods for establishing a connection. The default method is SQL Server authentication with a username and password.

To connect using Windows Authentication (NTLMv2), use the connection property AuthenticationMethod=ntlm2java. When adding credentials, enter the Windows credentials that will be used to connect to the Microsoft SQL server instance.

To use Azure Active Directory (Azure AD) authentication, set AuthenticationMethod=ActiveDirectoryPassword. In addition, a value must be specified for the HostNameInCertificate property, and the connection must be configured to use SSL.

Troubleshooting

Cannot open database <"Database name">

If you receive this error when attempting to make a connection, the login failed. This usually occurs either because the database name is incorrect or connection credentials do not have the required permissions.

Ensure that you have the right database name entered and check and ensure valid permissions for the credentials.

View Column Map Failure

The error displays when exporting data from Data Studio to SQL Server. It usually occurs when an export step in a workflow is attempting to INSERT or UPDATE data into a table where the data doesn't match the target schema.

Ensure that the data that you're exporting from Data Studio matches the data types of the target table's schema. You can use the Conversion functions in Data Studio to convert the values in your columns to the intended data type.

Column cannot be modified because it is either a computed column or is the result of a UNION operator

The error displays when exporting data from Data Studio to SQL Server. This error can occur when you're trying to export (INSERT) into a SQL table where one of the columns is a computed column.

The solution is to use a Map to target step before the export and leave any computed columns unmapped.

Invalid object name

Error when loading data from SQL server into Data Studio. The credentials used may not have the necessary permission to preview and load the data for the specified schema.

Ensure that the credentials used have the required permissions to preview and load data (using SELECT) from the relevant table or view in the specified schema. If you're connecting with a user whos default schema is different to the specified schema, check the Force JDBC schema name setting in the external system's configuration.

A user name must be specified when using pure Java NTLM

This authentication error can happen when using the same Windows credentials to connect using two different external systems to different domains on the same SQL Server instance. If the error appears when testing the connection, check that you're using nltm2java as the AuthenticationMethod connection property, and not nltmjava.

Type datetimeoffset loaded as Alphanumeric in Data Studio

By default, the driver loads datetimeoffset values (datetime with timezone offset) as type VARCHAR, which will be alphanumeric data type in Data Studio. To interpret these values as TIMESTAMP and load as dates instead, set the user connection property FetchTSWTZAsTimestamp=true.

Timestamps are converted to UTC on load into Data Studio.

The Microsoft Dynamics 365 driver supports two different methods of authentication. NTLM is typically used when Dynamics is hosted on premise, and requires a username and password to authenticate. OAuth is typically used when Dynamics is hosted in the cloud and requires more configuration.

Connection parameters (NTLM)

Service URL - The url that handles service requests for the database. It should be in the format: https://myonpreminstance.sso3.dynamics365.net/api/data/v9.1/

Authentication parameters (NTLM)

User name - the user name for connecting to the service.
Password - the above user's password.

Connection parameters (OAuth)

These settings can be obtained from the dynamics server:

Service URL - The url that handles service requests for the database. It should be in the format: https://sandbox.api.ds.dynamics.com/api/data/v9.1/.
Authentication URI - The Microsoft endpoint that handles authentication. It should be in the format: https://login.microsoftonline.com/1234567-12345-1234-1234-123456789abc/oauth2/v2.0/authorize.
Token URI - the Microsoft endpoint that handles granting of new authentication tokens. It should be in the format: https://login.microsoftonline.com/1234567-12345-1234-1234-123456789abc/oauth2/v2.0/token.
Scope URI - the Dynamics endpoint that handles the permissions when authenticating. It should be in the format: https://sandbox.api.ds.dynamics.com/user_impersonation offline_access.

Authentication parameters (OAuth)

Client ID - The client id for your application.
Client Secret - The client secret for your application.
Access Token - the value of the access token obtained from external sources.
Refresh Token - the value of the refresh token obtained from external sources.

Troubleshooting

The MS Dynamics 365 driver can take minutes to test. If you find that your Test button is timing out, set the JDBC timeout setting in Aperture Data Studio under Settings->Loading to 360 seconds and try again.
The refresh token is not always necessary - if not entering any value is sufficient.
The Access token and Refresh token will need to be generated externally. The access token can timeout quickly, so check that your token is still valid if you can no longer connect to a database.
For further documentation on this driver see the DataDirect for JDBC for Microsoft Dynamics 365 Driver documentation.

Connection parameters

Hostname: This specifies the base Salesforce URL to use for logging in, for example um5.salesforce.com or experian.my.salesforce.com. It will be the same base URL used for browsing to the Salesforce classic web application. This driver does not support the Lightning URL. For example, um5.lightning.force.com is not a valid hostname for the driver. If you're unsure, you can use the default Hostname login.salesforce.com.
Schema: Optional, default SFORCE.

The template connection string is: jdbc:experian:sforce://;LoginTimeout=30;StmtCallLimit=0.

Additional connection properties are detailed in the DataDirect Salesforce JDBC driver documentation.

Authentication parameters

Username: Specify the username that is used to connect to the Salesforce instance.
Password: Specify the password to use to connect to your Salesforce instance. Typically, the username and password will be the same as the details you use to log into the Salesforce web UI.
Security Token: Specifies the security token required to make a connection to a Salesforce instance that is configured for a security token. The security token is not required when Salesforce has been configured for Trusted IP Ranges and the user is logging in from a trusted IP address.

Troubleshooting

Salesforce schema has changed. Columns or tables not shown

If your Salesforce schema has changed (e.g. columns/tables have been added or removed), the driver will need to recreate the local schema mapping.

If the schema change is a one-off event, set the SchemaMap property to force the driver to create a new schema map the next time a connection is established, while retaining the old mapping for reference.

If the schema is likely to change regularly, you can configure the driver to always refresh the schema map when establishing a connection. To do this, set the following: CreateMap=forceNew. If you get an error "The specified configuration options do not match the options used to initially create the schema map", you'll have to force the driver to create a new schema map using the SchemaMap property. If you've already defined this, change it to a new value to force the creation of new schema map.

Reset your security token

To reset your security token via the Salesforce UI, browse to your personal settings and click on Reset My Security Token. An email will be sent with your new token. Some organizations will only allow administrators to reset users' tokens.

Error reading XMLStreamReader

You may be using the Salesforce lightning app hostname. Change the hostname URL to Salesforce Classic. For example, use ap160.salesforce.com instead of ap160.lightning.force.com.

The configuration options used to open the database do not match the options used to create the database

You may see this when making changes to a connection that require the mapping schema to be updated.

Use the SchemaMap connection property. The value for this property is either a name or the absolute path and name (including the .config extension) of the local schema mapping file.

Error when loading data from the Salesforce instance into Data Studio - Timeout on load (or preview) from Salesforce

There is likely to have been a timeout while the driver waits for a response to the web service request. Add the connection property WSTimeout and/or WSRetryCount.

Control which columns are visible, and column name formatting

The custom connection property ConfigOptions controls how Salesforce table columns are mapped in the driver:

The driver exposes Salesforce audit fields (e.g. ISDELETED) and the master record ID (MASTERECORDID) field. To hide AuditColumns set ConfigOptions to (AuditColumns=none).
The driver exposes the names of system fields as they exist in the Salesforce data model, unchanged. To have the prefix Sys_ for system columns, set ConfigOptions to (MapSystemColumnNames=1).
The driver adds the __c suffix to custom table and column names when mapping the Salesforce data model. To remove the _C suffix for custom tables and column names set ConfigOptions to (CustomSuffix=strip).
To change the casing of the column names so that they're not all uppercase, set ConfigOptions to (UppercaseIdentifiers=false).
These settings can be combined by setting 'ConfigOptions' to (AuditColumns=none;MapSystemColumnNames=1;CustomSuffix=strip; UppercaseIdentifiers=false).

Connection parameters

Hostname: Host name of the computer where the Hive Database is installed.
Port: The Hive connection port number (default 9999).

The template connection string is: jdbc:experian:hive://:9999;LoginTimeout=15. Additional connection properties are detailed in the DataDirect Hive JDBC driver documentation.

Authentication parameters

Username: Specify the username that is used to connect to the Hive instance.
Password: Specify the password to use to connect to your Hive instance.

Connecting to Azure HDInsight

To connect to HDInsight, Hostname will be set to the cluster URL, for example mycluster.azurehdinsight.net, and you will need to specify the following connection properties:

databaseName, set to the name of the internal or external database in your cluster that you want to connect to. If not specified, this defaults to default.
TransportMode=http.
EncryptionMethod=SSL.
HTTPPath set to the path of HTTPS endpoint, for example /hive2.

When establishing the connection, use the HDInsight cluster admin name and password to authenticate.

Troubleshooting

Thrift errors when attempting to Insert or Update

To avoid Thrift errors when attempting to Insert or Update rows in a Hive table, try increasing the value for the connection property ArrayInsertSize.

Connection parameters

Project: The project name associated with your Google BigQuery account.
Dataset: The Dataset name.

The template connection string is: jdbc:experian:googlebigquery:Project=;Dataset=. Additional connection properties are detailed in the DataDirect BigQuery JDBC driver documentation.

The driver supports OAuth 2.0 and service account authentication.

Authentication using OAuth 2.0

To use OAuth 2.0, you will add user connection properties to enter the access and refresh tokens.

Set the AuthenticationMethod = oauth2.
Set at least one of the following properties:
- AccessToken: Set this to specify the access token you have obtained to authenticate to Google BigQuery.
- RefreshToken: Set this to specify the refresh token you have obtained to authenticate to Google BigQuery.
  If a value for the AccessToken property is not specified, the driver uses the value of the RefreshToken property to make a connection. If both values are not specified, the driver cannot make a successful connection. If both are specified, the driver uses the AccessToken value; however, if the AccessToken value expires, it uses the RefreshToken value to generate a new AccessToken value.
Set the ClientID property to specify the consumer key for your application.
Set the ClientSecret property to specify the consumer secret for your application.
Optionally, set the Scope property to specify the OAuth scope. It limits the permissions granted by an access token.

Authentication with a service account

A service account is a type of Google account that represents an application instead of an individual end user. For a successful service account authentication, you need:

Private key file or Private key:
- The private key file is a .json or .p12 file that contains the key required to authenticate API calls. You can download it from the Google Cloud Platform (GCP) Console.
- The private key is contained in the private key file downloaded from the GCP Console.
Service account email address: A unique email address that is provisioned while creating a service account.

To configure the driver to use service account authentication, set the following connection properties:

Set AuthenticationMethod=serviceaccount.
Set the ServiceAccountEmail property to specify your service account's email address.
Set either the ServiceAccountKeyContent property or the ServiceAccountPrivateKey property:
- ServiceAccountKeyContent specifies the private key required to authenticate to Google BigQuery. Use this property if you do not want to persist the private key file in your environment.
- ServiceAccountPrivateKey specifies the full path to the .json or .p12 file that contains the private key. The driver extracts the private key value from the specified file and uses it to authenticate the user to the database. Use this property if it is preferable to persist the private key file.

Troubleshooting

List of tables shows incorrectly, or "table… was not found" on load

This error is typically caused by an out-of-date map of the data model. To force the schema to refresh, add the property RefreshSchema=true or alternatively use CreateMap=session to store the map in memory.

Connection parameters

Hostname: The name of the computer on which the MySQL database server is running.
Port: The MySQL server connection port (default is 3306).
Database The Database name.

The template connection string is: jdbc:experian:mysql://:3306;DatabaseName=;LoginTimeout=15

Additional connection properties are detailed in the DataDirect MySQL JDBC driver documentation.

Authentication parameters

Username
Password

Troubleshooting

Connections to MySQL Community Server are not supported

The Progress DataDirect MySQL JDBC driver shipped with Data Studio does not support MySQL Community Server versions (including Amazon Aurora MySQL). To connect to a database based on MySQL Community versions, download the MySQL Connector/J driver and deploy the mysql-connector-java-.jar file as a new custom driver.

Connection parameters

Hostname - The name of the computer on which the PostgreSQL database server is running
Port - The PostgreSQL server connection port (default is 5432)
Database- The Database name

The template connection string is: jdbc:experian:postgresql://:5432;DatabaseName=;LoginTimeout=15. Additional connection properties are detailed in the DataDirect PostgreSQL JDBC driver documentation.

Authentication parameters

Username
Password

Connection parameters

Hostname - The name of the computer on which the Redshift database server is running
Port - The Redshift server connection port (default is 5439)
Database- The Database name

The template connection string is: jdbc:experian:redshift://:5439;DatabaseName=;LoginTimeout=15. Additional connection properties are detailed in the DataDirect Redshift JDBC driver documentation.

Authentication parameters

Username
Password

Connection parameters

Company - The name of the computer on which the Jira database server is running
Schema - The schema name

The template connection string is: jdbc:experian:jira:;User=;Password=
Additional connection properties are detailed in the DataDirect Jira JDBC driver documentation.

Authentication parameters

Username
Password

The Autonomous REST Connector is a JDBC driver that allow Data Studio to codelessly connect to any RESTful API data source to load data.

You can connect to the REST service either directly via and endpoint, or by using a path to a .rest file. If you're using the endpoint directly, default mappings are generated to map JSON structures to relational tables. Alternatively, a .rest file can be created to define endpoints, mappings, and paging.

Connection parameters

One of the following settings must be provided:

REST configuration file: Specifies the name and location of the input REST file that contains a list of endpoints to sample, PUSH request definitions, and configuration information.
REST endpoint: Specifies the endpoint to sample when not using an input REST file.
AuthenticationMethod: Determines which authentication method the driver uses during the course of a session. Options include None, Basic, HttpHeader, OAuth2, UrlParameter and Custom.

The template connection string is: jdbc:experian:autorest:Config=;Sample=. For additional connection properties and more information on creating a REST file and the authentication options, see the DataDirect Autonomous REST JDBC driver documentation.

Authentication parameters

The username and password fields can be left blank if alternative authentication methods are used and authentication parameters are provided in the connection params.

Username: Specifies the user name that is used to connect to the service.
Password: A password that is used to connect to the service.

Connection parameters

Hostname: Host name of the computer where the Cassandra Database is installed.
Port: The Cassandra connection port number (default 9042).
Keyspace: The keyspace name.

The template connection string is: jdbc:experian:cassandra://:9042;KeyspaceName=;LoginTimeout=15. Additional connection properties are detailed in the DataDirect Cassandra JDBC driver documentation.

Authentication parameters

Username: Specify the username that is used to connect to the Cassandra database.
Password: Specify the password to use to connect to your Cassandra database.

Connection parameters

Hostname: Host name of the computer where the DB2 Database is installed.
Port: The DB2 connection port number (default 50000).
Database: The database name.

The template connection string is: jdbc:experian:db2://:50000;DatabaseName=;LoginTimeout=15. Additional connection properties are detailed in the DataDirect DB2 JDBC driver documentation.

Authentication parameters

Username: Specify the username that is used to connect to the DB2 database.
Password: Specify the password to use to connect to your DB2 database.

Connection parameters

Hostname: Host name of the computer where the Greenplum Database is installed.
Port: The Greenplum connection port number (default 5432).
Database: The database name.

The template connection string is: jdbc:experian:greenplum://:5432;DatabaseName=;LoginTimeout=15. Additional connection properties are detailed in the DataDirect Greenplum JDBC driver documentation.

Authentication parameters

Username: Specify the username that is used to connect to the Greenplum database.
Password: Specify the password to use to connect to your Greenplum database.

Connection parameters

Hostname: Host name of the computer where the Informix Database is installed.
Port: The Informix connection port number (default 1526).
Server: The name of the Informix server.
Database: The database name.

The template connection string is: jdbc:experian:informix://:1526;InformixServer=;DatabaseName=;LoginTimeout=15. Additional connection properties are detailed in the DataDirect Informix JDBC driver documentation.

Authentication parameters

Username: Specify the username that is used to connect to the Informix database.
Password: Specify the password to use to connect to your Informix database.

Connection parameters

Hostname: Host name of the computer where the MongoDB Database is installed.
Port: The MongoDB connection port number (default 27017).
Schema Definition: Specifies the fully qualified path of the configuration file where the relational map of native data is written. The driver looks for this file when connecting to a MongoDB server. If the file does not exist, the driver creates one.
Database: The database name.

The template connection string is: jdbc:experian:mongodb://:27017;SchemaDefinition=;DatabaseName=;LoginTimeout=15. Additional connection properties are detailed in the DataDirect MongoDB JDBC driver documentation.

Authentication parameters

Username: Specify the username that is used to connect to the MongoDB database.
Password: Specify the password to use to connect to your MongoDB database.

Connection parameters

Company: The company identifier issued by Oracle Eloqua during the registration process.
Username
Password

The template connection string is: jdbc:experian:eloqua:Company=;User=;Password=. Additional connection properties are detailed in the DataDirect Eloqua JDBC driver documentation.

Authentication parameters

Username: Specify the username that is used to connect to the Eloqua database.
Password: Specify the password to use to connect to your Eloqua database.

Connection parameters

Login Host: the base URL of the Oracle Sales Cloud site.

The template connection string is: jdbc:experian:oraclesalescloud//. Additional connection properties are detailed in the DataDirect Sales Cloud JDBC driver documentation.

Authentication parameters

Username: Specify the username that is used to connect to the Sales Cloud database.
Password: Specify the password to use to connect to your Sales Cloud database.

Connection parameters

Login Host: The base URL of the Oracle Service Cloud site.
Interface name: deprecated.

The template connection string is: jdbc:experian:oracleservicecloud:LoginHost=;InterfaceName=;user=;password=;LoginTimeout=15. Additional connection properties are detailed in the DataDirect Service Cloud JDBC driver documentation.

Authentication parameters

Username: Specify the username that is used to connect to the Service Cloud database.
Password: Specify the password to use to connect to your Service Cloud database.

Connection parameters

Hostname: Host name of the computer where the OpenEdge Database is installed.
Port: The OpenEdge connection port number (default 2003).
Database: The database name.

The template connection string is: jdbc:experian:openedge://:2003;DatabaseName=;LoginTimeout=15. Additional connection properties are detailed in the DataDirect OpenEdge JDBC driver documentation.

Authentication parameters

Username: Specify the username that is used to connect to the OpenEdge database.
Password: Specify the password to use to connect to your OpenEdge database.

The Snowflake driver supports UserID/Password authentication.

Connection parameters

Account name: Specifies the full name of your account and the region where it is hosted. The full account name may include additional segments that denote region and cloud platform.
Database name: Specifies the name of the database to which you are connecting.
Partner Application Name: Snowflake partner use only. Specifies the name of a partner application to which you are trying to connect. This property is useful for users who have an existing partner contract with Snowflake and are using the driver with a Snowflake partner application or plug-in.
Schema: Specifies the default schema to use for the specified database once connected. The specified schema should be an existing schema for which the specified default role has privileges.
Warehouse: Specifies the virtual warehouse to use once connected. The specified warehouse should be an existing warehouse for which the specified default role has privileges.

The template connection string is: jdbc:experian:snowflake:accountname=;databasename=;partnerapplicationname=;schema=;warehouse=;. Additional connection properties are detailed in the DataDirect Snowflake JDBC driver documentation.

Authentication parameters

Username: Specifies the user name that is used to connect to Snowflake.
Password: A password that is used to connect to the service.

Connection parameters

Hostname: Host name of the computer where the Spark SQL Database is installed.
Port: The Spark SQL connection port number (default 9999).
Database: The database name.

The template connection string is: jdbc:experian:sparksql://:9999;DatabaseName=;LoginTimeout=15. Additional connection properties are detailed in the DataDirect Spark SQL JDBC driver documentation.

Authentication parameters

Username: Specify the username that is used to connect to the Spark SQL database.
Password: Specify the password to use to connect to your Spark SQL database.

Connection parameters

Hostname: Host name of the computer where the Sybase Database is installed.
Port: The Sybase connection port number (default 5000).
Database: The database name.

The template connection string is: jdbc:experian:sybase://:5000;DatabaseName=;LoginTimeout=15. Additional connection properties are detailed in the DataDirect Sybase JDBC driver documentation.

Authentication parameters

Username: Specify the username that is used to connect to the Sybase database.
Password: Specify the password to use to connect to your Sybase database.

Load data using a SQL query

When loading data into a Dataset from a JDBC External system, users with the correct Credential permissions can specify a SQL query to determine the data to fetch. The query will be executed on the source system, and can be used to specify the columns or filter rows to fetch, or to join, sort, limit or otherwise manipulate the data to be loaded into Data Studio.

Loading data via a SQL query can be beneficial for many reasons:

Improve refresh time and reduce network traffic by only loading the values (rows and columns) you need, rather than an entire table or view.
Improve the efficiency of Data Studio Workflows by executing operations such as joining, grouping and sorting on the source system where they are optimized.
Avoid the need to have views created in the data source system.
Incrementally refresh deltas into a multi-batch Dataset instead of re-loading all the data.

When a Dataset is loaded from a SQL query, the query remains associated with the Dataset, and can be viewed or edited in the Edit details menu option. The query is used whenever the Dataset is refreshed.

The SQL dialects supported are dependent on the JDBC driver that is assigned to the external system. Data Studio does not provide any compatibility or SQL dialect standardization.

Example of use cases for SQL queries when importing a Dataset from a JDBC external system:

Specify a few columns from table
Specify rows by TOP/LIMIT/FILTER/DISTINCT/GROUPBY/MAX/MIN
Join multiple tables and import resulting Dataset

The user executing the SQL query must have either the Use with query or Manage permission type for the credential used in conjunction with the JDBC external system.

To execute a SQL query when importing a Dataset from a JDBC external system:

Go to Datasets.
Click Add new Dataset.
Select External system as the source type.
Select a system from the available ones. If no systems are listed, follow the steps to create an external system.
Select a credential to be associated with the Dataset.
Only credentials that your user has been given access to will be available for selection. If Data Studio reports that no user credential has been found, check that the External system has at least one credential assigned to your user.
Select Database query.
Enter your SQL query into the text box.
Tick the Approve query checkbox to verify that the SQL query is safe for execution.
Click Next to see a preview of the data to be loaded.
Continue with importing the Dataset.

You are strongly advised to execute SQL queries with caution as the SQL queries are pushed down to the source system and executed in that environment. It is possible to delete data or drop a database if the user has either the Use with query or Manage credential permission. Hence, we recommend that the credential user is only granted the permissions required. Only approve the query once it has been verified.

SELECT * FROM customers WHERE age > 50
SELECT * FROM customers LIMIT 100
SELECT name FROM customers ORDER BY age
SELECT * FROM customers INNER JOIN employees ON customers.person_id = employees.person_id

Connect to Cloud storage or SFTP

Data Studio supports connecting to the following cloud-based storage systems:

Connect to Amazon S3

Data Studio provides connectivity to Amazon S3 Buckets.

To configure, select Amazon S3 as the external system type and configure the following:

HTTP Protocol Select either HTTP or HTTPS, depending on the permission levels in your Bucket policy.
Region The AWS region where your bucket resides (for example: us-east-1a).
Bucket The name of your bucket.
Root Directory After connecting, you will be placed in the default directory based on your login credentials. If this directory is correct, use a full stop ".". Otherwise, you can enter a path, which can either be relative (starting with "./") or absolute (starting with"/").
Include files in subdirectories If checked, all folders within the root directory will be scanned for files, flattened and displayed as a single directory.
Credential type Select Instance Profile only if Data Studio is deployed in an AWS EC2 instance.
Access key The AWS access key of the user account
Secret Access key The corresponding secret key generated for the access key.

Connect to Apache Hadoop HDFS

Data Studio provides connectivity to Apache Hadoop HDFS.

To configure, select Apache Hadoop HDFS as the external system type and configure the following:

Hostname The hostname or IP address of your HDFS server.
Port The port number for your HDFS service.
Root Directory After connecting, you will be placed in the default directory based on your login credentials. If this directory is correct, use a full stop ".". Otherwise, you can enter a path, which can either be relative (starting with "./") or absolute (starting with"/").
Include files in subdirectories If checked, all folders within the root directory will be scanned for files, flattened and displayed as a single directory.
Username The username for your HDFS account
Password The corresponding password for the username.

Connect to Azure Data Lake Gen2

Data Studio provides support for connecting to systems built on Microsoft Azure Blob storage, including Azure Data Lake Storage Gen2.

Data Studio uses access keys to authenticate when making requests to the Azure storage account. To configure a connection to the Data Lake:

Sign in to the Azure portal and locate your storage account. In the Settings section of the storage account overview, select Access keys to view Access keys and the connection information.
Create a new External system with the type Microsoft Azure Blob Storage.
URL combines the storage account name and endpoint suffix, and is usually of the form <storage-account-name>.core.windows.net.
Container should match the name of the container within the storage account.
Select whether to Include Files In Subdirectories of that container.

To create a new credential:

The Account Name should match the storage account name in the Azure portal.
Key 1 and Key 2 will be the two Access keys specified in the Azure portal.

Configuring SFTP servers

An SFTP connection allows transferring files to and from other machines running SFTP server software, either on a local LAN, remotely via the internet, or using a VPN connection. Communications use TCP/IP - usually (but not exclusively) via port 22 on the server machine.

To configure, select SFTP Server as the external system type.

Hostname is the hostname or IP address of the target server. If a name is entered, DNS will be used to resolve the name to an IP address
Port is the port to connect to on the server machine (usually 22)
Use compression can be activated if the remote server supports compression and you expect the data transferred to be highly compressible. On fast connections it is unlikely that this option will improve transfer speed, counter intuitively it seems to slow things down more often than not. If compression is not supported on the server this option will be ignored.
RSA Host Public Key is optional. If populated it allows Data Studio to validate that the server connected is the one expected. This removes any possibility of redirection or man-in-the-middle attacks. This value can be obtained from the administrator of the server and supplied to you via email or another out-of-band method. Note that if the SFTP server is on an internal trusted network it may not be necessary to populate this field. As an example, for a linux server the public key can be found in a file /etc/ssh/ssh_host_rsa_key.pub
Only the key in the second field should be copied to Data Studio
Root Directory Once connected to the SFTP server the connection is usually left in a default directory defined by the login credentials. If this is the correct directory to use, enter a full-stop . Otherwise a path may be entered, either relative (starting with ./) or absolute (starting with /). The administrator of the server should be able to help with these details.
Include files in subdirectories If selected the root directory and all subdirectories will be scanned for files - the result flattened and displayed as a single directory.
Upload directory If set, this directory will be used to upload files - otherwise the root directory will be used. Please refer to the notes in Root directory above regarding relative and absolute paths.
Credential type Credentials may be entered using a username and password, or the private key of a public/private keypair. Please refer to the server administrator as to which of these authentication methods should be used.
- Password Enter the username and password supplied by the server administrator
- Public key
  - Username This should be supplied by the server administrator
  - Private key It is best to create a keypair locally, then give the public key to the server administrator to be associated with the username on the server. Apply the private key to Data Studio. One method of generating keys is to use the ssh-keygen utility.
    Note that the private key must be generated in PEM format
    Give the public key (the file 'keyFile.pub' in the example) to the server administrator. Apply the private key (the file 'keyFile' in the example) to Data Studio by one of :
    - browse and select the file
    - drag & drop the file into the area denoted by dashes
    - copy and paste the private key data into the text area
  - Passphrase This was prompted when generating your keypair. Enter it here.

Configure External System dropzone

Aside from loading and exporting data to cloud storage systems, more importantly you can configure a folder in cloud storage as an External system dropzone.
Just like its on server counterpart, when a file is uploaded to this external system folder, it is automatically loaded into the corresponding Dataset as a new batch of data.

External dropzones are supported for:

SFTP
Amazon AWS S3
Microsoft Azure Blob Storage (Data Lake Gen2)

To configure External system dropzone:

Enable the "External System Dropzone" option during Dataset creation or editing.
Specify an external system folder to monitor for new file versions of the Dataset.
- The default dropzone folder location on the external system is its root folder.
- Use the Folder to use as dropzone text field to specify the intended folder.
- Use the Starts with file pattern text field to specify the prefix for the name of the intended files.

Polling and load process

Data Studio uses a pull model, polling the external system dropzone folder every minute for new files.
When a new file arrives, an attempt will be made to load it into the Dataset.
If a current load is in progress, no new files are fetched until the process is complete.

Conditions for file loading

The Dataset must have been originally created from an External System.
The new file may have a different extension as long as the format matches the original Dataset's schema.
Column names in the new file should match the existing Dataset's columns, unless marked as Optional.
Additional columns in the new file are ignored.

Unlike dataset dropzones, processed files in external system dropzones are not deleted from the folder. Instead, the processed file is tracked in the server based on the file name and last modified date, to ensure it is not loaded again unless modified.

Dropzone authentication error

When the credential for the external dropzone causes an authentication error, Data Studio will stop polling for new files.

The credential has to be updated to resume the polling. In cases where the extended password expiry requires no changes, saving the credential without making any changes will also resume the polling.

You can create a notification for a 'Dropzone authentication failed' event and specify the external system in the event filters.

If authentication fails, Data Studio will delay the next poll attempt for one hour.
Three consecutive failures will delay the polling frequency further to every 24 hours.
Updating a valid credential will resume polling as normal (every 1 minute).
In cases of password expiry, saving the credentials will also resume polling as normal.
On each authentication failure, Data Studio will send an automatic notification to the user's Inbox (only to those with Manage permission for credential of the impacted external dropzone).

Was this helpful?

Previous: Change Dataset schema

Next: Auto-refresh data sources

Aperture Data Studio v2

Get started

Next topic:
Definition of SCV
Previous topic:
Technical recommendations