Reply to comment

user warning: INSERT command denied to user 'robertom16108'@'217.64.195.236' for table 'captcha_sessions' query: INSERT into captcha_sessions (uid, sid, ip_address, timestamp, form_id, solution, status, attempts) VALUES (0, '443fa63c9eadf974a2d84b6d5b720ce8', '23.22.212.158', 1368994463, 'comment_form', '2179da21d0920c02f49c1fff6b7a3f16', 0, 0) in /home/mhd-01/www.robertomarchetto.com/htdocs/www/modules/captcha/captcha.inc on line 99.

Talend Open Studio vs Pentaho Kettle, a comparison

Talend Open Studi (TOS) and Pentaho Data Integration (Kettle) are two comprehensive and widely used Open Source ETL tools. ETL (Extraction Tranformation Loading) is the process in which you extract data from different sources (database, file, web, etc), apply some tranformations (value replacements, computations, etc) and finally load the data in a destination (database, file, web, etc). ETL is key process in Business Intelligence and Datawarehouse management.

Furthemore, ETL tools are very usefull in common data management task, when you need to move or process large amount of data.

 

Pentaho Kettle and Talend Open Studio

Pentaho Data Integration PDI (Kettle) is the default ETL tool in Pentaho's ecosystem. With a very intuitive graphical editor (Spoon) you can define procedures that are stored in XML format. Procedures can interpreted by Kettle runtime in different ways, using the command line utility (Pan), a small server (Carte), a database repository (Kitchen) or directly from the IDE (Spoon).

Talend Open Studio is the ETL tool developed by Talend, a company focused on data integration and data management solutions. Talend uses an user friendly and comprehensive IDE (similar to Pentaho Kettle's) to design the procedures. Procedures can be tested inside the IDE and can be translated in Java (or Perl) code. You can always view and edit the generated Java (or Perl) code and achieve great control and flexibility.

Both are very good, user friendly and cross plaform (Java based) tools. The main difference is that Kettle is an interpreter of ELT procedures in XML format, while Talend Open Studio is a code generator (Java or Perl) tool.

 

Learning curve, easy to use, documentation

Both Pentaho Kettle and Talend Open Studio are user friendly tools. Having a good background in data and database management (JDBC, SQL, file formatting, programming basics, etc) becoming productive is a matter of days.

Kettle and Talend comes with a graphical tool that makes thing quick and easy to do. These tools help you to design and test reliable ETL procedures quickly.

Pentaho Data Integration Kettle
Pentaho Data Integration (Kettle)

 

Pentaho Data Integration Kettle
Talend Open Studio

 

Pentaho Kettle IDE is quite simpler to learn but is slighty less featured compated to Talend's one. Talend Open Studio is a little more difficult to understand but once you get familiar with the IDE you can enjoy the great flexibility and power of this tool.

One of Talend Open Studio requirements is to define the correct schema of the data you are going to process and the IDE helps you much on this. Kettle has less constraints on this, so you can build procedure a little quickly. Anyway metadata definition in Talend is an important feature because helps you on making procedures more reliable in production enviroments.
 

Talend Open Studio and Pentaho Kettle are both user friendly, well documented and have a strong comunity support. In Talend Open Studio you need a little more effort to get familiar with. But once you get started you can enjoy the great power and potential of the tool.



Reliability, maturity, support

In Open Source enterprise support, developer comunity and real world implementations matters. Talend and Pentaho has strong community support and are strong, well known companies. Open Source Business Intelligence is growing and real world applications are becoming more and more common.

In this scenario Talend and Pentaho offers some of the more widely used Open Source ETL tools. They growing in real world and mission critical implementations despite commercial ones.

Talend's company is more focused on data integration and data management solutions, while Pentaho is focused on Business Intelligence. Talend Open Studio is developed very actively by Talend which is investing in a rich data integration ecosystem, while Kettle is an important project for Pentaho but a little less developed and extended than Talend's one.

Pentaho and Talend's solutions are very reliable, mature and fast growing products. Real world enterprise implementations are becoming common in both cases. Support services are available via subscriptions and books or direct consulting services are available for any requirements as well.

Components, technology and features

Talend Open Studio is an Eclipse based Java tool. The procedures you design in the graphical editor can be compiled in Java bytecode or in Perl scripts. In the case of Java you can take advantage of the whole Java ecosystem with ease.

Components and features are very comprehensive, mixing both general proupose tools and specific ones. Talend has specific set of RDBMS components among generic ones, so you can take control of advanced features of a specific DB vendor quickly. You can store definitions in repositories that can help you when projects are growing.

Pentaho Data Integration (Kettle) is developed in Java (Swing). Kettle is a interpreter of procedures written in XML format. The features and components are a little less comprensive than Talend ones, but you can find everything you need to build complex ETL procedures. Kettle provides a JavaScript engine (as well as a Java one) to take control of data manipulation in deep.

Talend is more features rich and has a more flexible technology than Kettle's one. With Talend you can use the full Java (or Perl) ecosystem with ease and use some vendor specific DB functions instead of generic ones. Kettle is a good tool as well, with more ETL focused features.

Performances benchmark

As a code generator tool Talend Open Studio translates procedures in compact and fast Java (or Perl) code.

Kettle is an interpreter of ETL procedures written in XML format. Kettle provides you an JavaScript engine to take control of data processing in deept. A Java engine is also provided but the JavaScript one i more common. The interpreted nature of Kettle makes it sometimes slower in some procedures than Talend.

You can find some interesting benchmarks on Kettle and Open Studio at the following links:

In the Open Source version of Kettle is easy to deploy procedures in clustered enviroments. These (and more) features are avaiable in the subscription version of Talend Open Studio as well.
 

Talend procedures, when compiled in Java bytecode, are compact and fast. Pentaho Kettle procedures are interpreted and in some situations can be sometimes slower than Talend's one. This depends on what components you are using and how you are using them. Talend's compiled code and strong metadata enforcements helps you on being efficent in common scenarios, while in Kettle performance depends on what components you are using.

Deployment and integration in BI platforms

Talend Open Studio (TOS) is a generic ETL tool well supported by SpagoBI and Jasper Server BI platforms. You can compile procedures in small Java packages or Perl scripts. This make procedures really easy to deploy and run outside a BI platform.

Kettle (PDI) is the default tool in Pentaho Business Intelligence Suite, tightly integrated in its ecosystem. When you need to run kettle procedures outside Pentaho platform you need to install the full Kettle enviroment or some of its core libraries.

With Kettle is easy to deploy procedures in clustered enviroments and store them in database tables. In Talend you can have these and more features with the subscription version. Like Talend, Kettle has a more feature rich version with subscription.

Talend is very easy to deploy as standalone Java or Perl application, and is the default tool in SpagoBI and JasperServer platforms. Kettle is the default tool for Pentaho Business Intelligence Suite and when used as standalone requires kettle core enviroment to run procedures.

Conclusions

Both products are well known, reliable and user frienldy Open Source tools. A commercial version with additional support and features is also avaiable.

Kettle is the default ETL tool for Pentaho Business Intelligence Suite, it's easy to learn and is very common because it's integration in Pentaho. Kettle has a simple enviroment focused in ETL procedures development.

Talend is a more generally prouposed and comprehensive tool, used by default in JasperServer an SpagoBI. You can deploy procedures in Pentaho or in standalone applications with no effort if you need to.

The main difference is that Kettle is an interpreter, while Talend is a code generator. Kettle is the Pentaho's tool focused in ETL procedures, while Talend Open Studio is part of a wider ecosystem of data integration and data management solutions. In this scenario Talend products are more actively developed and are more platform indipendent.

Pentaho Kettle is very easy to use and is a good solution in Pentaho enviroments. If you need a tool than can help you on general proupose data integration tasks Talend offers an excellent choice, very flexible and powerfull.

You can download Talend Open Studio at http://www.talend.com/download.php?src=HomePage, and Pentaho Data Integration (Kettle) at http://sourceforge.net/projects/pentaho/files/. Commerical versions and services are avaiable at Talend's and Penthao's web sites.

 

Free slides

pentaho_kettle_pdi        talend_slides

Reply

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.