If you are a developer or a manager who is looking for a centralized version control system for Talend ETL jobs which can be achieved using Microsoft TFS, you have clicked on the right blog!
I have penned this blog with the intention of outlining general guidelines and rules surrounding development standards specific to the Talend Open Studio integration with Microsoft TFS.
Although these standards should be adhered to whenever possible, they are subject to interpretation and specific situations may require deviation from some or all these rules. They should be viewed as a general guide when making changes or additions to the data warehousing physical data model.
What are Talend Open Studio and TFS all about, you ask? Allow me to explain.
What is Talend Open Studio?
Talend Open Studio is an open source ETL used for data integration. It consists of more than hundreds of components along with built-in connectors to connect with multiple data sources such as RDBMS, Excel, SaaS etc. A user needs to drag and drop these components and connect them to create ETL jobs. The main advantage of this open studio is that it is free of cost.
What is TFS?
Microsoft TFS refers to Team Foundation Server (TFS) which helps manage teams and their code. TFS entails a combination of version control, issue tracking, as well as application lifecycle management. In all development projects version control plays a vital role. Version control generates backups, improves visibility, helps teammates to collaborate, and accelerates product delivery.
Now, developers can maintain version control for ETL jobs on Talend Open Studio and enable a feature to work parallelly on TFS.
Before I take you through the steps, let me elaborate on the ‘Why’ and the ‘How’ of Nitor Infotech’s methodology of achievement.
Nitor Infotech’s Methodology of Achievement
1] Why? –
A question that is often asked while choosing to take a new step is “Why?”
Why use TFS when SVN and GIT are already available for Talend version control?
Answer: It would really be helpful for developers or managers who are looking for a centralized version control system for Talend ETL jobs which can be achieved using TFS.
Also, if Visual Studio has been used in project development, we get TFS2010 free along with MSDN subscription download releases. So, it is also helpful for maintaining versions for ETL jobs as well under one roof instead of looking for different tools for one project.
2] How? –
1. Fundamental understanding: It almost goes without saying but understanding how anything works in the first place is crucial. The key problem was to discover a solution that was not implemented before, thus the process of identifying a solution was based entirely on the principles. During this process, the essential objects that could be useful in terms of performing Talend jobs in combination with TFS were determined.
2. Logical Thought Processing: After you’ve figured out the fundamentals, all that’s left is to figure out how to put them into action. Starting with the project config file to test TFS versioning, eventually found and added up all pertinent key Metaconfig files that were stored in various folder structures.
3. Validation & Verification: The evidence gathered through real-time testing can help to reduce the risk of procuring new technologies or other innovations. Following the development, several scenarios were tested and validated across all stakeholders to assure stability and accuracy.
Steps to follow for Talend Open Studio Integration with TFS
Here are the steps you should follow to achieve Talend Open Studio integration with TFS. Let’s begin!
1. Mapping TFS with Local environment
a. Open Visual Studio -> Continue without code.
b. Connect to the TFS server
c. Initially the local path will not be mapped.
d. Map this local path to with TFS path to get periodic changes.
e. Alternative: – One can create a new workspace and map the source path with the local path separately.
2. Open the Talend application.
3. Click on ‘Manage connection’.
4. Map the root directory i.e., workspace path with TFS root level of Talend structure on Local device.
This location will be the root level for new project creation as well as the importing of all existing Talend projects from TFS.
Note: This is a one-time procedure. Talend may ask to restart the application. Click OK.
5. Once the workspace is mapped with the desired path, we can create a new project or click on ‘Import project’ to get the existing project. Get the latest version of TFS before importing the existing project.
6. After clicking ‘Import project’, browse and select the root directory of the existing project. Enter a project name that’s identical to the existing one and click on ‘Finish’.
7. If the project is already imported, it will not allow you to import, and you will see a pop-up message – “This project name already exists”.
8. On reopening Talend, select an existing project. Click on ‘Finish’.
This project is imported to the workspace we mapped with Talend. All jobs inside the existing project will be in the process folder.
Note: TFS by default keeps all files and folder properties as Read-only. So, while opening Talend jobs an error may occur, like “An error occurred (File /ETL/process/etl1_0.1.properties is read-only.)”
To overcome this issue, it is essential for a developer to make sure to check out the project folder from TFS.
9. A developer can work on respective jobs once the check out is done.
10. Once new jobs are created locally or existing jobs are edited, those needed to be pushed to the TFS server, the pending changes of the project folder need to be checked in.
For checking in, we only check-in files from the following folders since the job can run based on initially setup files of all the other folders.
(Note: Currently identified folder structure for check is basically required. We may need to include others as per requirement and job creation)
11. One can also check in newly created folder structure inside these folders by right clicking on TFS folder structure and selecting the option ‘Add Items to the Folder…’. Then only select the folders to be checked in to source control.
12. Click ‘Next’ and include or exclude files as per requirement. Then click on ‘Finish’.
13. A best practice to avoid merge conflicts and work-loss of other developers is to individually check-in files of the respective job you are working on.
(For instance, if a user only makes changes to job ETL2, he only needs check-in the files pertaining to job ETL2 which might exist in the aforementioned folders. If no changes exist in the context and routes folder, the user needs to only check-in the following files in the process folder to the server. )
Important note:
Two users can access, edit and check-in two separate jobs under a project simultaneously and the discrepancy is handled by the version control. However, if two users attempt to check-in the same job files after editing, a conflict error is thrown and the user who checks-in last must decide to check-in either their local version or the server version or decide to manually merge both the local and the source file. As these files may contain important files generated while job creation, there is a high risk of work loss for either developer.
So, it is mandatory for each developer to work on independent Talend jobs.
I’d like to clarify that we can have multiple repositories, each containing several projects in TFS which help to maintain Talend Jobs. In the case of a shared repository, there are a few limitations but with better collaboration and communication in the team, these can be overcome. In short, the advantages of having better version control provide a remarkable advantage in project management and product delivery.
I hope my blog has helped you with some general guidelines and rules connected to the development standards specific to the Talend Open Studio integration with Microsoft TFS. Feel free to write to us at Nitor Infotech with your thoughts about this integration.
You can find out how your business can adopt a big data strategy and personalize your customer experience here. Also, in case designing efficient data systems for your business is on your agenda, you can learn more about data engineering by reading this datasheet.