Workflow trigger execution is the process of triggering the execution of a workflow when a 'watched' file changes. This allows for the unattended execution of workflows when a new version of a file is copied/uploaded.
This process is driven by a configuration file (written in YAML) which, when uploaded to Data Studio, will cause certain source files to be monitored such that when they change, the specified workflows are executed.
The following sample YAML will be used to illustrate the intended behavior:
---
workflows:
- name: Data Validation
- sourceTriggers:
- source: Customer v1
Location: C:\ApertureDataStudio\sampledata
filenamePattern: Customer V\d+\.csv
appendExtension: .tmp.#
replaceSource: true
The order of the keywords is hierarchical: the file has to start with three dashes (---
) followed by the keyword workflows:
. Next, indented with two spaces and a dash is the keyword name:
which is the (case-sensitive) workflow name. Lastly, the indented keyword sourceTriggers:
starts a new section used to define various parameters (using sub-keywords) for one or more source triggers.
Multiple source triggers are supported, however, the trigger is the logical OR of the data source changes (not AND) meaning that only one data source file has to change in order to trigger the workflow execution.
The parameters for the workflows keyword are:
name
(required)The parameters for the sourceTriggers keyword are:
source
(required)
The name of the source used in the workflow to be triggered. This should be the name as shown in the Available data sources tab in the Workflow Designer (rather than the name of the file on disk - the file extension and underscore characters shouldn't appear).
location
(required)
This defines the both the location of the source file named in the previous parameter and the location in which the watched file will reside. It will be either:
the name of a file data store defined in filedatastores.properties
or
the directory path location on disk where the file is stored
For example, where my filedatastores.properties
file defines a data store as: Sample\u0020Data\u0020source=c\:/ApertureDataStudio/sampledata
.
The location
may be specified as Sample Data Source or as C:\aperturedatastuio\sampledata
. A file path may be defined for the user's import directory (e.g. C:\ApertureDataStudio\import\5
).
filenamePattern
(required)
Defines the name of the file(s) that have to be watched. You may use a Regular Expression that has to match a file used as a source input. Note that on Linux file names are case sensitive.
For example, Customer V\d+\.csv
indicates that any .csv
file named Customer V{n}
(where {n}
is an integer number) will be used as a trigger. In this case, all of the following will be valid trigger files: Customer V1.csv
, Customer V2.csv
, Customer V99.csv
.
replaceSource
(optional, defaults to false
if not present)
This is a Boolean
value that determines whether the watched file will be used as the new source for the triggered workflow when the watched file has the defined filename pattern but a different name from what is defined in the source
keyword. Use replaceSource=true
if you want the watched file that triggers the workflow to be used as the source when the workflow is automatically executed.
appendExtension
(optional)
An optional parameter indicating that if the trigger file has the same name as the original source, the original file will be backed up rather than overwritten. The backed up file will be written to the \data\backups
folder in the Data Studio database and appended with the given extension.
If the extension contains a #
, the #
will be replaced by a number to ensure the uniqueness of the renamed files.
This example defines multiple workflows with multiple sources. The first workflow defined in this configuration file has two source triggers defined, meaning the workflow will be executed is either source is updated. The second workflow's source trigger has another simple filename pattern defined, to detect files arriving that have been timestamped in the format yyyymmdd
.
workflows:
- name: My First Workflow
- sourceTriggers:
- source: Customer Data
location: c:\data\customer
filenamePattern: customer.csv
- source: Product Data
location: c:\data\product
filenamePattern: product_\d+\.csv
- name: My Second Workflow
- sourceTriggers:
- source: Order Data
location: c:\data\order
filenamePattern: product_20[0-9]{6}.csv
replaceSource: true
You can use filedatastores.properties
to define folders on the server that are visible to Data Studio users via the UI.
Any location defined in this file can store workflow sources and be used as a location for watched files. Here's an example of a filedatastores.properties
file:
Sample\u0020Data\u0020source=c\:/ApertureDataStudio/sampledata
Admin\u0020Data\u0020source=c\:/ApertureDataStudio/import/5
Trainee\u0020User\u0020source=c\:/ApertureDataStudio/import/1058
Trigger\u0020source=d\:/ApertureDataStudio/triggers; flatten=true
You upload the YAML file in the same way as any other data file. The only requirement is that the file has to have the .yaml
extension.
When a YAML file is uploaded, the FileUploadHandler will parse the file and report any parse errors to the user. The contents of the YAML file will replace the previous upload.
Therefore, to delete a workflow, remove the workflow name from the YAML file and re-upload it. To add a workflow, add it to the YAML file and re-upload it. To modify a workflow, change the details in the YAML file and re-upload it.
All existing triggers can be removed by uploading a YAML file with no workflows:
---
workflows:
The workflow entries in the YAML file will be checked for:
name
matches a known workflow (case dependent)source
matches a known source name within the workflowfilenamePattern
is a valid regular expressionThis file will be loaded at server startup so that if the server is shut down and restarts, all previously watched workflows will be reloaded just as if the user had reloaded the YAML file.
All YAML file uploads and the resulting parse actions are reported back to the user in the UI and in the server's log file. The uploads and all workflow executions are audited as usual.
The administrator may verify that the correct workflow triggers have been loaded by clicking on the username in the top menu and selecting Show Workflow Triggers.
When uploading a file with the same name as any of the 'watched' files, you will get an option to either overwrite or create a new version of the file.
To ensure the defined trigger continues to work, you have to overwrite the existing file.
A dialog will appear when the job has completed successfully.
When manually uploading a file with the same name as any of the 'watched' files, you will get an option to either overwrite or create a new version of the file.
To ensure the defined trigger continues to work, you have to overwrite the existing file.
A dialog will appear when the job has completed successfully.
You can set up notifications to report on the state of the triggered workflow.
Because workflows are executed asynchronously, if the user is currently logged in, they will see the Job Completed dialog once a workflow has completed executing.