mardi 26 février 2013

Export job that can be run from command line

This tutorial is going to show us how to use the opens source ETL Talend Open Studio to create a job that gets 3 input parameters and then generate a result depending on those three inputs.


1. Introduction

Before I start this tutorial I want to thank Ian Mayo for sponsoring this article in support of PlanetMayo Open Source projects.

The following document presents steps for creating a Talend job that will create a CSV file from another CSV file by passing all the rows but with an extra column called "tag" that contains a constant value. This job is going to take three parameters :
- Input file : The path of the input file
- Output file : The path of the output file
- attribute : the value of the "tag" column.

To create this job we will follow those steps :
- Creating input and ouput files,
- Creating the parameters,
- Creating the Talend job.

2. Input and Output files :

2.1 Input file:


To create the input file go to "metadata" --> "File delimited". 






Right click on it and choose "create file delimited". A window will appear. Enter the name.

















Click "Next" and browse to the input file by clicking on "browse".

















Choose the file and then click on "Open".

















Then click "Next".

















Check "Set heading rows as column names" to make Talend ignore the first row containing the headers. Click "Next".

















Choose a name for the generated schema, then click "Finish".















 

 

2.2 Output file:

The output file is like we said before the same as the input file except that it has an extra column called "tag". So the best thing to do is to just duplicate the input_file and add the extra column to its schema.

To perform that just right click on the input file created before, then choose "Duplicate" and name it "issue_1_output".











Right click on the created output file and choose "Edit file delimited" a window will appear.

















Click "Next" without modifying anything. Modify the path of the file then click "Finish".

















Next develop the output file metadata and the right click on its schema and choose "Edit schema".
Add the third column called "tag" to the schema like below. And the press "Finish".














 

 

3. The parameters :

To create the parameters we are going to create what we call a "Context group".
To do that, on the "Repository view", right click on "Contexts" then choose "Create context group", a window will appear. Type a name for the context "issue_1_context"n then click "Next".















Add three parameters like below. Then go to the tab "Values as table" and enter default values to those
parameters.
















The first and second parameter are of type "File". To enter their values navigate to your file system and choose the files. Enter a default value to the third parameter, for example "TOTO" as below, then press "Finish".












 

 

4. The job :

Now lets create our job.
On the "Repository view" right click on "Job Designs" and choose "Create job".
Give a name to this job and press "Finish". The designer window will appear.





















Now from the "Repository view" you drag and drop the three objects create before : The input file as a "tFileInputDelimited", the ouput file as a "tFileOutputDelimited"  and the context.
From the "Palette view" drag and drop a "tMap" compenent. Then we link the three components in the desgienr like below.









Double click on the "tMap" to open the mapping table.
Drag and drop the columns "first_name" and "last_name" from the left table to the right table like below.
For the column tag we will insert the value of the context parameter. To do that type on it "context.attribute" like below. Then close the mapping table by pressing "Ok".













Final thing to do is to open the "issue_1_output" properties and check "include header" to include the header to the output file like below.










You can now run your job and check the output file.










 

 

5. Making the job dynamic :

Now that our job is ready to work we are going to make the paths to the input and output file dynamic.
This is very simple, we are going to replace the files paths by the values of the context parameters.
To do that just go to the properties of input and output file component, click on the field "File name/stream" a window will appear, choose "Change to built-in property" and press "Ok" like below.


Then in the field "File name/stream" type "context.input_file" for the input file and "context.output_file"
for the output file.









You can run the job again, the result is the same but this time the paths of the files are taken from the
context parameters and not from the metadata.

6. Exporing the job :

Now we are going to export the job as a batch file and see how we can run it using parameters outside Talend.
First thing to do is to close the job designer. Then on the "Repository view" right click our job "Issue_1_job"
and choose "Export job" a window will appear. Choose the destination of the generated export and then click "Finish" without modifying anything.

















Go to where you placed your export and unzip it and look for the file "Issue_1_job_run.bat". If you double click on this batch file our job is going to run.

Now we are going to modify this batch file to make it run using our parameters dynamically. To make that possible we edit the batch file with any text editor and modify it like below :


Then to run our job all we have to do is to call it from a command line like this :

run.bat "C:\Users\ELHASSMU\Desktop\issue 1\issue_1_input.csv" "C:\Users\ELHASSMU\Desktop\issue 1\issue_1_output.csv" "TITI"

Where the first parameter "C:\Users\ELHASSMU\Desktop\issue 1\issue_1_input.csv" is the path of the input file. The second parameter "C:\Users\ELHASSMU\Desktop\issue 1\issue_1_output.csv" is the path of the output file and the final parameter "TITI" is the value of the column "tag".

You can run again the job by using the modified batch file and take a look on the new output file :













You can see that the column tag contains now the value "TITI" which means that the job is taking now values from the batch file and not from the context.

7. Conclusion

That's all. I hope this tutorial was clear and will be able to help you to improve yourself on using Talend Open Studio.
Do not hesitate to let me any feedbacks, suggestions or critics.

2 commentaires:

  1. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in Talend, kindly contact us http://www.maxmunus.com/contact
    MaxMunus Offer World Class Virtual Instructor led training on Talend. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
    For Demo Contact us:
    Name : Arunkumar U
    Email : arun@maxmunus.com
    Skype id: training_maxmunus
    Contact No.-+91-9738507310
    Company Website –http://www.maxmunus.com



    RépondreSupprimer
  2. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in Map Reduce Design Patterns
    MaxMunus Offer World Class Virtual Instructor led training on Map Reduce Design Patterns. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
    For Demo Contact us.
    Nitesh Kumar
    MaxMunus
    E-mail: nitesh@maxmunus.com
    Skype id: nitesh_maxmunus
    Ph:(+91) 8553912023
    http://www.maxmunus.com/


    RépondreSupprimer