In addition to file uploads, Aperture Data Studio offers the option to connect directly to External systems.
Data can be loaded from an External system into a Dataset, or an External system can be defined as a target for the Export step.
The supported systems are:
When working with External systems, access to data is managed through valid credentials, which can take various forms such as username/password combinations, API tokens, or keys.
The choice of credentials can impact the accessibility of data, with different users possibly obtaining varying results from the same dataset.
When interacting with an External system (to either fetch data, or push it in) credentials are pinned to a particular usage of the External system, so that the same set of credentials are used consistently.
When adding a Dataset from an External system, you will need to select a credential to be associated with the Dataset. The credentials available for selection are only those that your user has permission to use.
Once set, this credential will be used whenever the Dataset is refreshed by any user, even for those with no direct permission to access the credentials.
Find out how to create a Dataset from an External system.
Space level restrictions will still be enforced, so you will need to ensure that the Space where the Dataset is being added to is suitably secured. Another user with access to the Space can edit the Dataset to change the credential used, for example to one that they have been given permission to use.
To export data to external systems using the Export step in the workflow designer, the user will need to specify the credential to be used. The credential will need to have the necessary permissions to write to the external database.
Another user with access to the Space can modify the credentials used in the workflow's Export step. Once the credential has been set, this credential will be used whenever the workflow is executed (including scheduled execution), even by those with no direct permissions to access the credentials.
To create an external system with credentials:
There are 3 credential permission types in Data Studio as described in the table below.
All permission types allow a user to use the credential to extract (fetch) data from, or to export data back to an associated external system.
Permission type | Description |
---|---|
Use | The user cannot edit/delete the credential and cannot execute SQL queries when importing a Dataset. |
Use with query | The user cannot edit/delete the credential but can execute SQL queries when importing a Dataset. |
Manage | The user can edit/delete the credential and can execute SQL queries when importing a Dataset. |
External system metadata (connection details and credential names) can be exported and imported from a Space.
After importing an External system's metadata into another environment, you will need to edit the credential to enter the correct password and assign the required user access permissions.
Data Studio provides support for connecting to External systems which can be used either as data sources or as the target of an export operation. One of these external system types is a DBMS system accessible via JDBC.
There are two types of DBMS JDBC driver recognized by Data Studio:
The two JDBC configuration files, datadirectJdbc.json and customJdbc.json, can be copied into same directory as server.properties (i.e. the installation root directory) and modified in order to customize the driver's connection parameters. These configuration files provide the Data Studio UI with information about the JDBC drivers and the format (and properties) required to build a valid connection string for each DBMS. Drivers that have not been configured in either of these files will require the user to enter a custom connection string when they create an external system that uses the driver.
Changes made to customJdbc.json are retained when the Data Studio is upgraded.
Data Studio includes drivers that allow you connect to a wide range of external systems.
A wide range of JDBC drivers are included in the Data Studio deployment.
You can also download and deploy your own JDBC (Java Database Connectivity) driver, which must be a type 4 driver and be supported by Java 8. Once a custom driver is deployed, you will be able to create and manage External system connections just as you do with the native drivers. To deploy a custom driver:
The following Progress DataDirect drivers are bundled with Data Studio and can be used by any customer with the appropriate licensing add-on:
DBMS name | Driver file name |
---|---|
Amazon Redshift | DSredshift.jar |
Apache Hive
|
DShive.jar |
Autonomous Rest Connector (REST API data sources) | DSautorest.jar |
Cassandra | DScassandra.jar |
DB2 | DSdb2.jar |
Google BigQuery | DSgooglebigquery.jar |
Greenplum
|
DSgreenplum.jar |
Informix | DSinformix.jar |
MongoDB | DSmongodb.jar |
Microsoft Dynamics 365 | DSdynamics365.jar |
MySql | DSmysql.jar |
Oracle | DSoracle.jar |
Oracle Eloqua | DSeloqua.jar |
Oracle Sales Cloud | DSoraclesalescloud.jar |
Oracle Service Cloud | DSrightnow.jar |
PostgreSQL | DSpostgresql.jar |
Progress OpenEdge | DSopenedgewp.jar |
Salesforce
|
DSsforce.jar |
Snowflake | DSsnowflake.jar |
Spark SQL | DSsparksql.jar |
SQL Server
|
DSsqlserver.jar |
Sybase | DSsybase.jar |
Full documentation for these drivers can be found on the Progress DataDirect Connectors site, under the relevant source.
The following Native drivers are preconfigured by Data Studio in the file customJdbc.json:
DBMS name | Driver file name pattern |
---|---|
Oracle (Thin client driver) | ojdbc.* |
SQL Server (Microsoft driver) | (ms)?sql[-]?jdbc.* |
MySQL ( Connector/J driver) | mysql-connector.* |
DB2 | db2jcc.* |
Informix | ifxjdbc.* |
PostgreSQL | postgresql.* |
Apache Hive | hive-jdbc.* |
MongoDB | mongodb.* |
Amazon Redshift (Amazon Redshift driver) | redshift.* |
dBase | dbf.* |
Derby | derby.* |
Microsoft Dynamics (CData driver) | .*dynamicscrm.* |
Vertica | vertica.* |
MariaDB | mariadb.* |
HyperSQL | hsqldb.* |
SQLite | .*sqlite.*jar |
H2 | h2-.* |
Presto | presto.* |
SAPDB | sapdbc.* |
SAP HANA | ngdbc.* |
Amazon Athena | Athena.* |
CIS with Business Objects | csjdbc.* |
Neo4j | neo4j.* |
Google BigQuery (Magnitude Simba driver) | GoogleBigQueryJDBC.* |
Netezza | nzjdbc.* |
HDP | ddhybrid.jar |
Snowflake | snowflake-jdbc.* |
Native drivers that are not configured (i.e. have no configuration details in customJdbc.json) must have the connection string configured manually in the UI.
In Data Studio, JDBC drivers are associated with an External system. The drivers that are currently known to Data Studio will be listed in the DBMS field when creating an External system with the "JDBC" type.
Once you've selected the DBMS, fields you must supply to create the connection will depend on how the selected DBMS has been configured in datadirectJdbc.json or customJdbc.json. Unconfigured DBMSs will provide a field for the connection string that the driver will use. The format of this string will be provided in the driver vendor's documentation.
Data Studio will build the connection string for configured DBMSs, although there may be occasions when you may need to override the prebuilt connection string to provide one of your own.
Additional optional fields are available by pressing the Advanced Settings button:
To test the connection, enter the necessary authentication credentials (such as a username and password) and click Test Connection. The test credentials you enter here are only used for testing the connection and are not saved.
If you received a Success message, click on Finish and add credentials to save the settings and proceed to create connection credentials for this external system connection.
In the following documentation, for the sake of brevity, a driver's connection property key/value pair may appear as PropertyName=Value
.
The connection string template is: jdbc:experian:oracle://hostname:port;SID=SID;LoginTimeout=15;UseServerSideUpdatableCursors=true
. Additional connection properties are detailed in the DataDirect Oracle JDBC driver documentation.
To connect to an Oracle server using a Service Name rather than SID, leave the SID field blank and specify the Oracle service name as a user connection property, ServiceName=my.server.com
. This property is mutually exclusive with the SID property.
Use the service name to specify connections to an Oracle Real Application Cluster (RAC) because the nodes in a RAC system share a common service name. Service name is also typically used to connect to Pluggable databases (PDBs).
Error messages:
This error can occur if the Oracle database server is configured as RAC (multiple servers, with a load balancer). You may need to use the LoadBalancing=true
and AlternateServers=(server2:server2port)
user connection properties in the connection configuration.
Error:
This error can occur when attempting to preview or load data into Data Studio, if the schema has not been defined in the External system connection and the table exists in more than one schema. To resolve, specify the schema in the connection's advanced settings, to avoid using the Oracle user's default schema.
When attempting to load data from a table, no records are shown and the logs contain "Transliteration failed, reason: invalid UTF8 data". This issue occurs while processing the table, which contains data in a codepage other than UTF-8.
To resolve, set the CodePageOverride
connection property, or alternatively use the Oracle thin driver as an alternative custom JDBC driver.
The following fields are required for connection:
The connection string template is: jdbc:experian:sqlserver://;allowPortWithNamedInstance=true;DatabaseName=;LoginTimeout=15;UseServerSideUpdatableCursors=true;SelectMethod=cursor
Additional connection properties are detailed in the DataDirect SQLServer JDBC driver documentation.
The hostname field can be used to define a port and instance name.
When specifying the location of the SQL server instance, you should never provide both an instance name and port number together.
If you're connecting to Microsoft Azure SQL two additional connection properties need to be added. This is because Microsoft Azure SQL requires communication using SSL.
EncryptionMethod=SSL
ValidateServerCertificate=false
. If set to true
, the certificate needs to be available in the database server trust store. The certificate must be issued by a trusted Certificate Authority. If set to false, there will be no validation on the certificate.The SQL Server driver supports several different authentication methods for establishing a connection. The default method is SQL Server authentication with a username and password.
To connect using Windows Authentication (NTLMv2), use the connection property AuthenticationMethod=ntlm2java
. When adding credentials, enter the Windows credentials that will be used to connect to the Microsoft SQL server instance.
To use Azure Active Directory (Azure AD) authentication, set AuthenticationMethod=ActiveDirectoryPassword
. In addition, a value must be specified for the HostNameInCertificate
property, and the connection must be configured to use SSL.
If you receive this error when attempting to make a connection, the login failed. This usually occurs either because the database name is incorrect or connection credentials do not have the required permissions.
Ensure that you have the right database name entered and check and ensure valid permissions for the credentials.
The error displays when exporting data from Data Studio to SQL Server. It usually occurs when an export step in a workflow is attempting to INSERT or UPDATE data into a table where the data doesn't match the target schema.
Ensure that the data that you're exporting from Data Studio matches the data types of the target table's schema. You can use the Conversion functions in Data Studio to convert the values in your columns to the intended data type.
The error displays when exporting data from Data Studio to SQL Server. This error can occur when you're trying to export (INSERT) into a SQL table where one of the columns is a computed column.
The solution is to use a Map to target step before the export and leave any computed columns unmapped.
Error when loading data from SQL server into Data Studio. The credentials used may not have the necessary permission to preview and load the data for the specified schema.
Ensure that the credentials used have the required permissions to preview and load data (using SELECT) from the relevant table or view in the specified schema. If you're connecting with a user whos default schema is different to the specified schema, check the Force JDBC schema name setting in the external system's configuration.
This authentication error can happen when using the same Windows credentials to connect using two different external systems to different domains on the same SQL Server instance. If the error appears when testing the connection, check that you're using nltm2java
as the AuthenticationMethod
connection property, and not nltmjava
.
By default, the driver loads datetimeoffset
values (datetime with timezone offset) as type VARCHAR
, which will be alphanumeric data type in Data Studio. To interpret these values as TIMESTAMP
and load as dates instead, set the user connection property FetchTSWTZAsTimestamp=true
.
Timestamps are converted to UTC on load into Data Studio.
The Microsoft Dynamics 365 driver supports two different methods of authentication. NTLM is typically used when Dynamics is hosted on premise, and requires a username and password to authenticate. OAuth is typically used when Dynamics is hosted in the cloud and requires more configuration.
https://myonpreminstance.sso3.dynamics365.net/api/data/v9.1/
These settings can be obtained from the dynamics server:
https://sandbox.api.ds.dynamics.com/api/data/v9.1/
.https://login.microsoftonline.com/1234567-12345-1234-1234-123456789abc/oauth2/v2.0/authorize
.https://login.microsoftonline.com/1234567-12345-1234-1234-123456789abc/oauth2/v2.0/token
.https://sandbox.api.ds.dynamics.com/user_impersonation offline_access
.The template connection string is: jdbc:experian:sforce://;LoginTimeout=30;StmtCallLimit=0
.
Additional connection properties are detailed in the DataDirect Salesforce JDBC driver documentation.
If your Salesforce schema has changed (e.g. columns/tables have been added or removed), the driver will need to recreate the local schema mapping.
If the schema change is a one-off event, set the SchemaMap
property to force the driver to create a new schema map the next time a connection is established, while retaining the old mapping for reference.
If the schema is likely to change regularly, you can configure the driver to always refresh the schema map when establishing a connection. To do this, set the following: CreateMap=forceNew
. If you get an error "The specified configuration options do not match the options used to initially create the schema map", you'll have to force the driver to create a new schema map using the SchemaMap
property. If you've already defined this, change it to a new value to force the creation of new schema map.
To reset your security token via the Salesforce UI, browse to your personal settings and click on Reset My Security Token. An email will be sent with your new token. Some organizations will only allow administrators to reset users' tokens.
You may be using the Salesforce lightning app hostname. Change the hostname URL to Salesforce Classic. For example, use ap160.salesforce.com
instead of ap160.lightning.force.com
.
You may see this when making changes to a connection that require the mapping schema to be updated.
Use the SchemaMap
connection property. The value for this property is either a name or the absolute path and name (including the .config extension) of the local schema mapping file.
There is likely to have been a timeout while the driver waits for a response to the web service request. Add the connection property WSTimeout
and/or WSRetryCount
.
The custom connection property ConfigOptions
controls how Salesforce table columns are mapped in the driver:
ConfigOptions
to (AuditColumns=none)
.ConfigOptions
to (MapSystemColumnNames=1)
.ConfigOptions
to (CustomSuffix=strip)
.ConfigOptions
to (UppercaseIdentifiers=false)
.(AuditColumns=none;MapSystemColumnNames=1;CustomSuffix=strip; UppercaseIdentifiers=false)
.The template connection string is: jdbc:experian:hive://:9999;LoginTimeout=15
. Additional connection properties are detailed in the DataDirect Hive JDBC driver documentation.
To connect to HDInsight, Hostname will be set to the cluster URL, for example mycluster.azurehdinsight.net
, and you will need to specify the following connection properties:
databaseName
, set to the name of the internal or external database in your cluster that you want to connect to. If not specified, this defaults to default
. TransportMode=http
.EncryptionMethod=SSL
.HTTPPath
set to the path of HTTPS endpoint, for example /hive2
.When establishing the connection, use the HDInsight cluster admin name and password to authenticate.
To avoid Thrift errors when attempting to Insert or Update rows in a Hive table, try increasing the value for the connection property ArrayInsertSize
.
The template connection string is: jdbc:experian:googlebigquery:Project=;Dataset=
. Additional connection properties are detailed in the DataDirect BigQuery JDBC driver documentation.
The driver supports OAuth 2.0 and service account authentication.
To use OAuth 2.0, you will add user connection properties to enter the access and refresh tokens.
AuthenticationMethod = oauth2
.AccessToken
: Set this to specify the access token you have obtained to authenticate to Google BigQuery.RefreshToken
: Set this to specify the refresh token you have obtained to authenticate to Google BigQuery.ClientID
property to specify the consumer key for your application.ClientSecret
property to specify the consumer secret for your application.Scope
property to specify the OAuth scope. It limits the permissions granted by an access token.A service account is a type of Google account that represents an application instead of an individual end user. For a successful service account authentication, you need:
To configure the driver to use service account authentication, set the following connection properties:
AuthenticationMethod=serviceaccount
.ServiceAccountEmail
property to specify your service account's email address.ServiceAccountKeyContent
property or the ServiceAccountPrivateKey
property:ServiceAccountKeyContent
specifies the private key required to authenticate to Google BigQuery. Use this property if you do not want to persist the private key file in your environment.ServiceAccountPrivateKey
specifies the full path to the .json or .p12 file that contains the private key. The driver extracts the private key value from the specified file and uses it to authenticate the user to the database. Use this property if it is preferable to persist the private key file.This error is typically caused by an out-of-date map of the data model. To force the schema to refresh, add the property RefreshSchema=true
or alternatively use CreateMap=session
to store the map in memory.
The template connection string is: jdbc:experian:mysql://:3306;DatabaseName=;LoginTimeout=15
Additional connection properties are detailed in the DataDirect MySQL JDBC driver documentation.
The Progress DataDirect MySQL JDBC driver shipped with Data Studio does not support MySQL Community Server versions (including Amazon Aurora MySQL). To connect to a database based on MySQL Community versions, download the MySQL Connector/J driver and deploy the mysql-connector-java-
The template connection string is: jdbc:experian:postgresql://:5432;DatabaseName=;LoginTimeout=15
. Additional connection properties are detailed in the DataDirect PostgreSQL JDBC driver documentation.
The template connection string is: jdbc:experian:redshift://:5439;DatabaseName=;LoginTimeout=15
. Additional connection properties are detailed in the DataDirect Redshift JDBC driver documentation.
The template connection string is: jdbc:experian:jira:;User=;Password=
Additional connection properties are detailed in the DataDirect Jira JDBC driver documentation.
The Autonomous REST Connector is a JDBC driver that allow Data Studio to codelessly connect to any RESTful API data source to load data.
You can connect to the REST service either directly via and endpoint, or by using a path to a .rest file. If you're using the endpoint directly, default mappings are generated to map JSON structures to relational tables. Alternatively, a .rest file can be created to define endpoints, mappings, and paging.
One of the following settings must be provided:
None
, Basic
, HttpHeader
, OAuth2
, UrlParameter
and Custom
.The template connection string is: jdbc:experian:autorest:Config=;Sample=
. For additional connection properties and more information on creating a REST file and the authentication options, see the DataDirect Autonomous REST JDBC driver documentation.
The username and password fields can be left blank if alternative authentication methods are used and authentication parameters are provided in the connection params.
The template connection string is: jdbc:experian:cassandra://:9042;KeyspaceName=;LoginTimeout=15
. Additional connection properties are detailed in the DataDirect Cassandra JDBC driver documentation.
The template connection string is: jdbc:experian:db2://:50000;DatabaseName=;LoginTimeout=15
. Additional connection properties are detailed in the DataDirect DB2 JDBC driver documentation.
The template connection string is: jdbc:experian:greenplum://:5432;DatabaseName=;LoginTimeout=15
. Additional connection properties are detailed in the DataDirect Greenplum JDBC driver documentation.
The template connection string is: jdbc:experian:informix://:1526;InformixServer=;DatabaseName=;LoginTimeout=15
. Additional connection properties are detailed in the DataDirect Informix JDBC driver documentation.
The template connection string is: jdbc:experian:mongodb://:27017;SchemaDefinition=;DatabaseName=;LoginTimeout=15
. Additional connection properties are detailed in the DataDirect MongoDB JDBC driver documentation.
The template connection string is: jdbc:experian:eloqua:Company=;User=;Password=
. Additional connection properties are detailed in the DataDirect Eloqua JDBC driver documentation.
The template connection string is: jdbc:experian:oraclesalescloud//
. Additional connection properties are detailed in the DataDirect Sales Cloud JDBC driver documentation.
The template connection string is: jdbc:experian:oracleservicecloud:LoginHost=;InterfaceName=;user=;password=;LoginTimeout=15
. Additional connection properties are detailed in the DataDirect Service Cloud JDBC driver documentation.
The template connection string is: jdbc:experian:openedge://:2003;DatabaseName=;LoginTimeout=15
. Additional connection properties are detailed in the DataDirect OpenEdge JDBC driver documentation.
The Snowflake driver supports UserID/Password authentication.
The template connection string is: jdbc:experian:snowflake:accountname=;databasename=;partnerapplicationname=;schema=;warehouse=;
. Additional connection properties are detailed in the DataDirect Snowflake JDBC driver documentation.
The template connection string is: jdbc:experian:sparksql://:9999;DatabaseName=;LoginTimeout=15
. Additional connection properties are detailed in the DataDirect Spark SQL JDBC driver documentation.
The template connection string is: jdbc:experian:sybase://:5000;DatabaseName=;LoginTimeout=15
. Additional connection properties are detailed in the DataDirect Sybase JDBC driver documentation.
When loading data into a Dataset from a JDBC External system, users with the correct Credential permissions can specify a SQL query to determine the data to fetch. The query will be executed on the source system, and can be used to specify the columns or filter rows to fetch, or to join, sort, limit or otherwise manipulate the data to be loaded into Data Studio.
Loading data via a SQL query can be beneficial for many reasons:
When a Dataset is loaded from a SQL query, the query remains associated with the Dataset, and can be viewed or edited in the Edit details menu option. The query is used whenever the Dataset is refreshed.
Example of use cases for SQL queries when importing a Dataset from a JDBC external system:
To execute a SQL query when importing a Dataset from a JDBC external system:
Data Studio supports connecting to the following cloud-based storage systems:
Data Studio provides connectivity to Amazon S3 Buckets.
To configure, select Amazon S3 as the external system type and configure the following:
Data Studio provides connectivity to Apache Hadoop HDFS.
To configure, select Apache Hadoop HDFS as the external system type and configure the following:
Data Studio provides support for connecting to systems built on Microsoft Azure Blob storage, including Azure Data Lake Storage Gen2.
Data Studio uses access keys to authenticate when making requests to the Azure storage account. To configure a connection to the Data Lake:
<storage-account-name>.core.windows.net
.To create a new credential:
An SFTP connection allows transferring files to and from other machines running SFTP server software, either on a local LAN, remotely via the internet, or using a VPN connection. Communications use TCP/IP - usually (but not exclusively) via port 22 on the server machine.
To configure, select SFTP Server as the external system type.
Aside from loading and exporting data to cloud storage systems, more importantly you can configure a folder in cloud storage as an External system dropzone.
Just like its on server counterpart, when a file is uploaded to this external system folder, it is automatically loaded into the corresponding Dataset as a new batch of data.
External dropzones are supported for:
When the credential for the external dropzone causes an authentication error, Data Studio will stop polling for new files.
The credential has to be updated to resume the polling. In cases where the extended password expiry requires no changes, saving the credential without making any changes will also resume the polling.
You can create a notification for a 'Dropzone authentication failed' event and specify the external system in the event filters.
If authentication fails, Data Studio will delay the next poll attempt for one hour.
Three consecutive failures will delay the polling frequency further to every 24 hours.
Updating a valid credential will resume polling as normal (every 1 minute).
In cases of password expiry, saving the credentials will also resume polling as normal.
On each authentication failure, Data Studio will send an automatic notification to the user's Inbox (only to those with Manage permission for credential of the impacted external dropzone).