Pushdown Processing is an optimization technique that reduces the volume of data loaded into Data Studio and the time taken to process the data by moving data processing to the data source.
The functionality allows parts of a Workflow to be executed directly on the database tables, which brings the following benefits:
Results can be viewed in Data Studio or exported downstream as part of the data pipeline like any other Workflow.
The currently supported list of databases, with more to come:
Data Studio Functions like Uppercase or Greater than are mapped to the equivalent SQL-flavor for that specific database.
Add, AddPeriod, After, And, Before, CompareDate, Concatenate, Constant, Divide, EndsWith, Equals, Extract, FirstNonNull, GreaterThan, GreaterThanOrEqual, IfThenElse, Insert, IsAlphabetic, IsAlphanumeric, IsDate, IsDecimal, IsEmpty, IsEven, IsFalse, IsNegative, IsNull, IsOdd, IsPositive, IsTrue, Length, LessThan, LessThanOrEqual, LowerCase, Multiply, NotAlphabetic, NotAlphanumeric, NotDate, NotDecimal, NotEmpty, NotEven, NotFalse, NotNegative, NotNull, NotOdd, NotPositive, NotTrue, Or, Pad, Remainder, RemoveWhiteSpace, Replace, ReplaceFirst, Round, StartsWith, Substr, Subtract, Tag, ToNumber, Trim, UnquoteValue, UpperCase
Not available for SQLite:
MatchesExpression, PowerOfN
Not available for SQLite, Snowflake, Hive:
FormatPattern, Repeat
Not available for SQLite, SQL Server, Snowflake, Hive:
CompactSpaces, FromHexToText, FromTextToHex, RegexReplace, RemoveNoise, RemoveUnprintableCharacters, Soundex, TitleCase
For more advanced users and use cases, it is possible to amend the SQL that is generated. Reasons for doing this might be to trigger a stored procedure or for a (supported) database that uses a non-standard SQL syntax, such as an older version.
To call a stored procedure, you can create a custom Function in Data Studio (that does anything/nothing, maybe just Constant Value) then associate some SQL with that Function.
The Function to SQL mapping requires a file named sql.<dbtype>.user.json
be created, where <dbtype>
is one of: hive, oracle, postgresql, snowflake, sqlite, sqlserver. This file should be saved in the directory C:\ApertureDataStudio\pushdown which may need to be created. This location assumes you are using defaults, so your JDBC drivers directory is located at C:\ApertureDataStudio\drivers.
If this custom file sql.<dbtype>.user.json
exists for a database, the entries within in it will automatically override the default SQL mappings for those Functions.
token is the External label value of a Custom Function or the Native Function name as shown in the Function script editor or Show script window.
sql is the SQL Script to be executed that will be shown in the UI when the user opens Show script. SQL input parameters will pass column data values to the stored procedure, i.e. column named ‘ID’ , or to the Function, i.e. ‘#v1#’ (is Input Value 1, v2, v3, v4 etc.). The number of parameters depends on the Function and if optional checkbox/params are being defined, so ‘MoreThan(InputValue, InvertFalse, IgnoreCaseFalse, ComparisonValue)’.
Clicking Show step results on a Workflow step will execute the full query on the database, but then limit the number of rows returned. This allows you to see the column schema and a few thousand example records without waiting to transfer millions of records. However, if you do wish to see the full result, there is an option you can click above the grid to re-query and transfer all rows.
Data Studio will convert your Workflow into a SQL query which will be sent to the database and the results of the query will be displayed to you in the UI or sent onwards to the next step in your process.
Database administrators might not want additional queries running on the database as they could impact other operations or increase costs.
Top-right user icon > About. Check if ‘Pushdown processing’ is selected and the status is 'Yes'. If it is not shown, then reach out to your Experian contact or to Support.
Reach out to your Experian contact or contact support to discuss.
We are adding support for additional databases based on demand, so reach out to your Experian contact to ensure they are aware of your requirements.
Some Functions don't have a direct translation into SQL. Where possible, Functions are translated to SQL equivalents, but this isn’t always possible.
There are some steps that do not have equivalent database functionality. For example, where a step is referencing a lookup file or address reference data, the database does not have access to this data.
The SQL sent to the database won’t include that step or subsequent steps linked to that step, but will be able to retrieve the results until that point in the Workflow.
The SQL sent to the database won’t include that step with the unsupported function or subsequent steps linked to that step, but will be able to retrieve the results until that point in the Workflow.
Processing time will depend on record volume, Workflow complexity and other environmental factors. However, removing the time spent transferring data and taking advantage of database caching, indexing, and pre-optimization techniques has seen hours reduced to minutes for repeat queries on billions of records returning small volumes of aggregated results.
Generate database credentials that only have access to specific tables and views. Use these credentials in Aperture and select Enable for pushdown.
Credentials can only be used by the users / user groups assigned to them, so create another External system credential with fewer users that has Enable for Pushdown selected.
It is also possible to create a new Space to create your Pushdown processing Workflows and only give selected users / user groups access to the Space.