Screenshot from ubuntu

I was always puzzled with tutorials and courses on ‘How to install X’ as they do talk about ‘how to install X’ but in many cases, it is all about running X locally and manually, and the authors hardly mention getting X into production. The assumptions of this article that: (1) you need to extract data from different sources in batches; (2) you already did your research and selected Talend — in my case, it is Talend Open Studio — or Pentaho-in my cases, Pentaho Data Integration; (3) you are moving the mode of your ETLs from development or test mode to production; (4) you use an ubuntu server to run the production versions of Talend / Pentaho; (5) you’ve already installed the prerequisites like Pentaho and Java on the server; (6) you set up all environmental variables.

Set up the Cron Job

To run ETL jobs as a non-root user, type crontab -e. Otherwise, usesudo crontab -e.

Schedule Pentaho with Crontab

Thus, the crontab looked like:

30 23 * * * /opt/pentaho/data-integration/ -file=/home/ubuntu/repository/project/content_pdi/project_1.kjb

Schedule Talend with Crontab

Thus, the crontab looked like:

30 23 * * * /bin/sh /home/ubuntu/repository/project/content_talend/

Reload the crontab to apply the changes, using:

sudo service cron restart

And you are done! I hope the article will help you to get a working prototype / to configure your ETL in the production mode.

I’m a data consultant, specializing in ETL, reporting, BI, dashboarding and analytics