Data Quality user documentation | Pushdown Processing

Pushdown Processing is an optimization technique that reduces the volume of data loaded into Data Studio and the time taken to process the data by moving data processing to the data source.

It is an Aperture Data Studio license add-on module that could be useful to customers connecting to databases directly via an External system, depending on their use cases.

The functionality allows parts of a Workflow to be executed directly on the database tables, which brings the following benefits:

Simplicity – the same easy to use drag and drop Workflow builder without needing to know coding or complex SQL.
Security – data does not need to be moved (from the database into Data Studio).
Performance – both not moving/refreshing data and taking advantage of database optimizations when joining, sorting, filtering, etc. data.
Volume – the performance enhancements allow billions of records to be processed efficiently which can broaden use cases.
Lower costs – potentially cheaper in both data storage and server costs since storing less data, and require less processing power for Data Studio.

Results can be viewed in Data Studio or exported downstream as part of the data pipeline like any other Workflow.

Databases supported

The currently supported list of databases, with more to come:

Hive
Oracle
Postgres
Snowflake
SQLite
SQL Server

Workflow steps supported

Compare, Filter, Group, Join, Sample, Sort, Source (metadata: System, Credential, Table), Union
Profile, Transform & Validate - certain Functions

Functions supported

Data Studio Functions like Uppercase or Greater than are mapped to the equivalent SQL-flavor for that specific database.

Add, AddPeriod, After, And, Before, CompareDate, Concatenate, Constant, Divide, EndsWith, Equals, Extract, FirstNonNull, GreaterThan, GreaterThanOrEqual, IfThenElse, Insert, IsAlphabetic, IsAlphanumeric, IsDate, IsDecimal, IsEmpty, IsEven, IsFalse, IsNegative, IsNull, IsOdd, IsPositive, IsTrue, Length, LessThan, LessThanOrEqual, LowerCase, Multiply, NotAlphabetic, NotAlphanumeric, NotDate, NotDecimal, NotEmpty, NotEven, NotFalse, NotNegative, NotNull, NotOdd, NotPositive, NotTrue, Or, Pad, Remainder, RemoveWhiteSpace, Replace, ReplaceFirst, Round, StartsWith, Substr, Subtract, Tag, ToNumber, Trim, UnquoteValue, UpperCase

Not available for SQLite:
MatchesExpression, PowerOfN

Not available for SQLite, Snowflake, Hive:
FormatPattern, Repeat

Not available for SQLite, SQL Server, Snowflake, Hive:
CompactSpaces, FromHexToText, FromTextToHex, RegexReplace, RemoveNoise, RemoveUnprintableCharacters, Soundex, TitleCase

For more advanced users and use cases, it is possible to amend the SQL that is generated. Reasons for doing this might be to trigger a stored procedure or for a (supported) database that uses a non-standard SQL syntax, such as an older version.

To call a stored procedure, you can create a custom Function in Data Studio (that does anything/nothing, maybe just Constant Value) then associate some SQL with that Function.

The Function to SQL mapping requires a file named sql.<dbtype>.user.json be created, where <dbtype> is one of: hive, oracle, postgresql, snowflake, sqlite, sqlserver. This file should be saved in the directory C:\ApertureDataStudio\pushdown which may need to be created. This location assumes you are using defaults, so your JDBC drivers directory is located at C:\ApertureDataStudio\drivers.

If this custom file sql.<dbtype>.user.json exists for a database, the entries within in it will automatically override the default SQL mappings for those Functions.

token is the External label value of a Custom Function or the Native Function name as shown in the Function script editor or Show script window.

sql is the SQL Script to be executed that will be shown in the UI when the user opens Show script. SQL input parameters will pass column data values to the stored procedure, i.e. column named ‘ID’ , or to the Function, i.e. ‘#v1#’ (is Input Value 1, v2, v3, v4 etc.). The number of parameters depends on the Function and if optional checkbox/params are being defined, so ‘MoreThan(InputValue, InvertFalse, IgnoreCaseFalse, ComparisonValue)’.

Enabling Pushdown Processing

Ensure your license has the add-on module (top-right user icon > About Aperture Data Studio).
Go to System > External systems to connect to a (supported) database or edit the credentials of an existing database.
On the credentials screen select Enable pushdown processing.
Click Add Dataset > select a database table to load data from the database or > select Column headers only to get the schema without data.

Using Pushdown Processing

Create a Workflow with a Source step using a dataset from a supported database. (Multiple source steps can be used if in the same database.)
Any Pushdown supported Workflow steps added to the Workflow will show an icon.
To turn off Pushdown for a specific step click the step menu > Edit properties > Block pushdown processing, which will remove the icon from the step (and any subsequent steps).
When the Pushdown Action button is ‘On’, running the Workflow or selecting Show step results will query the database and show the data returned in a grid.

Viewing Pushdown processing step results

Clicking Show step results on a Workflow step will execute the full query on the database, but then limit the number of rows returned. This allows you to see the column schema and a few thousand example records without waiting to transfer millions of records. However, if you do wish to see the full result, there is an option you can click above the grid to re-query and transfer all rows.

FAQs

Data Studio will convert your Workflow into a SQL query which will be sent to the database and the results of the query will be displayed to you in the UI or sent onwards to the next step in your process.

Database administrators might not want additional queries running on the database as they could impact other operations or increase costs.

Top-right user icon > About. Check if ‘Pushdown processing’ is selected and the status is 'Yes'. If it is not shown, then reach out to your Experian contact or to Support.

Reach out to your Experian contact or contact support to discuss.

We are adding support for additional databases based on demand, so reach out to your Experian contact to ensure they are aware of your requirements.

Some Functions don't have a direct translation into SQL. Where possible, Functions are translated to SQL equivalents, but this isn’t always possible.

There are some steps that do not have equivalent database functionality. For example, where a step is referencing a lookup file or address reference data, the database does not have access to this data.

The SQL sent to the database won’t include that step or subsequent steps linked to that step, but will be able to retrieve the results until that point in the Workflow.

The SQL sent to the database won’t include that step with the unsupported function or subsequent steps linked to that step, but will be able to retrieve the results until that point in the Workflow.

Processing time will depend on record volume, Workflow complexity and other environmental factors. However, removing the time spent transferring data and taking advantage of database caching, indexing, and pre-optimization techniques has seen hours reduced to minutes for repeat queries on billions of records returning small volumes of aggregated results.

Generate database credentials that only have access to specific tables and views. Use these credentials in Aperture and select Enable for pushdown.

Credentials can only be used by the users / user groups assigned to them, so create another External system credential with fewer users that has Enable for Pushdown selected.

It is also possible to create a new Space to create your Pushdown processing Workflows and only give selected users / user groups access to the Space.

Yes, you need a JDBC driver to connect to the database - either a custom driver or the JDBC license add-on module.

Was this helpful?

Previous: Use the REST API

Next: Use the SDK

Aperture Data Studio v2

Extend Data Studio functionality

Previous topic:
Monitor data