Imagine a scenario wherein you are making drastic changes in your existing data factory, say by changing naming conventions or handing over your development resources to a new customer. There are several instances where you may need to create a new data factory to develop in, or at least revise the source of your existing data factory and bring in a new repo.
All of this, and then some, can be done with the help of Azure Data Factory’s Git configuration. With it, you can readily manage the contents of your existing data factory with no hassle at all.
In my blog, I will show you how you can deploy an existing data factory repository brand on a new environment with the help of this Git configuration.
But, before I begin, I want to shed light on a prerequisite that will help you with this migration- you must create environment specific resources (such as Global Parameter, Private End Point, Integration Runtime etc.) on the new Data Factory environment, so that when we point it to our branch, the existing resources that are created manually will be published automatically to the branch.
An added benefit of this is that it resolves the error “Pipeline is not found, please publish first”.
Now, without further ado, here’s how you do it.
This happened because we checked the “Import existing resources to repository” option during Git configuration.
So, we will get two new folders in the release branch as shown below:
In our case, we named it as “ProdRepo”.
Then Click on Publish to deploy the branch on the Data Factory.
a) As we know, the “adf_publish” branch is by default a publish branch in ADF and here we have the “ProdRepo” branch as a Collaboration Branch.
b) Azure wants to get deployed everything on the adf_publish branch from Collaboration Branch “ProdRepo” so that Data Factory can run independently even without pointing to any Collaboration branch.
c) After publishing now, even if we disconnect the Collaboration branch (through Git Configuration of data Factory), we can still see all the pipelines, datasets, linked services and other resources on ADF.
d) This is because everything has been published on adf_branch now and does not need a Collaboration_Branch anymore i.e., ProdRepo.
e) Publishing everything helps you to resolve the error – “Pipeline is not found, please publish first”.
And there you have it! You are now equipped to move your existing data factory repositories to new ones using the Git configuration as arsenal.
Write to us at Nitor Infotech if you want to learn more about how you can manage copious amounts of data with our data engineering services.
Subscribe to our fortnightly newsletter!