New Article

Talend Certification Tips - Tip #002 : Know your tMap Joins

Hi Talend Addicts,

Ready to learn more about the Talend great ETL tool!

This post is part of the new series discussing the most important things (Tips) to know for anybody who want to tackle the Talend Open Studio for Data Integration Certification Exam.

Today's Tip is: Tip #002: Know your tMap Joins!

tMap is one, if not THE, most important component in a Talend Job. Personally I have never seen any Talend Job without at least one occurence of this component, and believe it or not many questions around this component on the certification exam!

the tMap can be used to (obviously) map data from input to output, but even more.
  • Map data - already said that, but very important, so I say it again!
  • Filter input data
  • Filter output data
  • Transform data
  • Join data
In today's post I'll focus on the last but not least one - JOIN Data

So let's kick this off with some examples; and see what are the different types of joins you can build with Talend tMap component.

Setting up the Stage

We will use here a very simple example. Two tables:
  • Companies table: to store companies:
    • Id of the company (Primary Key)
    • Name of the company
  • Employees table: to store employees:
    • Id of the employee (Primary Key)
    • Name of the employee
    • CompanyId: Id of the company (Foreign Key)
And the data will look like following:


As you can see, example shows all possible use cases:
  • A company with only one employee
  • A company with more than one employee
  • A company with no employees
  • And an employee not related to any company.
Join #1: Cartesian Product
  • Result : 
    • Get all possible combinations
  • How:
    • Company data is set as the Main input
    • Employee data is set as the lookup input
    • tMap link: no link between inputs
    • Match Model: N/A
    • Join Model: N/A




Join #2: Left Outer Join - First Match
  • Result : 
    • Get all companies with or without employees
    • The first matched employee is returned
  • How
    • Company data is set as the Main input
    • Employee data is set as the lookup input
    • tMap link: company.Id = employee.companyId
    • Match Model: First Match
    • Join Model: Left Outer Join



Join #3: Left Outer Join - All Matches
  • Result : 
    • Get all companies with or without employees
    • All the matched employees are returned
  • How:
    • Company data is set as the Main input
    • Employee data is set as the lookup input
    • tMap link: company.Id = employee.companyId
    • Match Model: All Matches
    • Join Model: Left Outer Join



Join #4: Inner Join- First Match
  • Result : 
    • Get only companies with employees
    • The first matched employee is returned
  • How:
    • Company data is set as the Main input
    • Employee data is set as the lookup input
    • tMap link: company.Id = employee.companyId
    • Match Model: Unique Match
    • Join Model: Inner Join

Join #5: Inner Join- All Matches
  • Result : 
    • Get only companies with employees
    • All the matched employees are returned
  • How:
    • Company data is set as the Main input
    • Employee data is set as the lookup input
    • tMap link: company.Id = employee.companyId
    • Match Model: All Matches
    • Join Model: Inner Join



Conclusion
Hopefully you got the idea, and more especially you got how the tMap can be so powerful and therefore so important while building any Talend Job and during the certification exam.

Comments