Work like clockwork

Screenshot from ubuntu

I was always puzzled with tutorials and courses on ‘How to install X’ as they do talk about ‘how to install X’ but in many cases, it is all about running X locally and manually, and the authors hardly mention getting X into production. The assumptions of this article that: (1) you need to extract data from different sources in batches; (2) you already did your research and selected Talend — in my case, it is Talend Open Studio — or Pentaho-in my cases, Pentaho Data Integration; (3) you are moving the mode of your ETLs from development or test mode to production; (4) you use an ubuntu server to run the production versions of Talend / Pentaho; (5) you’ve already installed the prerequisites like Pentaho and Java on the server; (6) you set up all environmental variables.

Set up the Cron Job

To run the ETL jobs on a schedule, I used a crontab, a file which contains the schedule of cron entries to be run and at specified times. Each entry of the file follows a particular format as a series of fields, separated by spaces and / tabs. For example, I wanted to run my ETL script every day at 11.30pm, thus, the format of the cron statement was: 30 23 * * * command.

To run ETL jobs as a non-root user, type crontab -e. Otherwise, usesudo crontab -e.

Schedule Pentaho with Crontab

In my case, Pentaho was installed in the opt directory -/opt/pentaho/data-integrationwhile Pentaho jobs and transformations were located in the home directory -/home/ubuntu/repository/project/content_pdi.

Thus, the crontab looked like:

30 23 * * * /opt/pentaho/data-integration/kitchen.sh -file=/home/ubuntu/repository/project/content_pdi/project_1.kjb

Schedule Talend with Crontab

When building my Talend job, I’ve extracted executable files in the folder-/home/ubuntu/repository/project/content_talend.

Thus, the crontab looked like:

30 23 * * * /bin/sh /home/ubuntu/repository/project/content_talend/project_1_run.sh

Reload the crontab to apply the changes, using:

sudo service cron restart

And you are done! I hope the article will help you to get a working prototype / to configure your ETL in the production mode.

--

--