We will cover various approaches used to perform data migration from MongoDB to Redshift in this article.
MongoDB is an open source NoSQL database which stores data in JSON format using a document-oriented data model. Data fields can vary by document. MongoDB isn’t associated with any specific data structure so there’s no particular format or schema for data in it.
Amazon Redshift data warehouse is essentially an enterprise-class, relational database query and management system that can achieve efficient storage and optimum query performance through massive parallel processing, columnar data storage, and very efficient, targeted data compression encoding schemes. Read more about Redshift Architecture here.
There are two ways to replicate data from MongoDB to Redshift:
We will be covering the steps involved in writing custom code to load data from MongoDB to Redshift as well as its limitations.
For the purpose of demonstration, assume that we need to move the ‘products’ collection into Redshift that holds the product details of the manufacturing company.
Two cases should be taken into consideration while transferring data:
Let us take a look at both the scenarios:
The .csv file of the required MongoDB collection will have to be generated using Export command as follows:
Open command prompt and go to the below path to run the BCP command
Run the mongoexport command to generate the output file for the products collection.
MONGOEXPORT –HOST LOCALHOST -U ‘USERNAME’ -P ‘PASSWORD’ –DB MONGODB –COLLECTION PRODUCTS –OUT D:\WORK\ARTICLES\PRODUCTS.JSON
Note that here might be numerous transformations needed before loading this data into Redshift. Achieving this using code will become extremely hard. A tool that provides an easy environment to write transformations might work for you.
Files from local machine can be easily uploaded to AWS in many ways, some of which are given below.
One way is to upload it using file upload utility of S3 which is an intuitive alternative.
You can also achieve this AWS CLI that provides easy commands to upload it to the S3 bucket from your local machine.
As a pre-requisite, you need to install and configure AWS CLI. You can read the user guide to learn more about installing AWS CLI.
Run the following command to upload the file into S3 from the local machine
AWS S3 CP D:\WORK\ARTICLES\PRODUCTS.JSON S3://S3BUCKET011/PRODUCTS.JSON
CREATE TABLE SALES.PRODUCTS SKU VARCHAR 100 , TITLE VARCHAR 100, DESCRIPTION VARCHAR 500, MANUFACTURE_DETAILS VARCHAR 1000, SHIPPING_DETAILS VARCHAR 1000, QUANTITY BIGINT, PRICING VARCHAR 100
After running the query a tabular structure without records will be created within Redshift. To check this, run the following query:
SELECT * FROM SALES.PRODUCTS
COPY DEV.SALES.PRODUCTS FROM ‘S3://S3BUCKET011/PRODUCTS.JSON’
IAM_ROLE ‘ROLE_ARN’ FORMAT AS JSON ‘AUTO’;
You will need to confirm if the data has loaded successfully by running the query.
SELECT * FROM SALES.PRODUCTS LIMIT 10;
This should return the record inserted from products file.
Limitations of the Custom ETL Scripts Approach:
The process mentioned above is debilitated, erroneous and more often than not, hard to implement and maintain which may impact the consistency and availability of your data in Redshift.
There is an easier way to replicate data from MongoDB to Redshift.
A ready to use data integration solution can assist you to migrate this data without writing any code. This is how the process will look like when done through a tool:
(a) Full Dump and Load (b) Incremental load for append-only data (c) Incremental load for mutable data
That’s it! You are all set. Your Data Integration Platform will take care of gathering your data incrementally and uploading it seamlessly from Mongodb to Redshift in real-time.
In addition to this, you can bring data from various different sources – databases, cloud applications, SDKs, and more with a Data Integration Platform. This will future proof your data integration set up as well as provide you with the flexibility to immediately replicate data from any source into Redshift.
Reach out to us at Nitor Infotech to learn more about migrating data and witness how easy it is to load data from MongoDB to Redshift along with several other sources and accelerate the process of generating powerful analytical workflows.
Subscribe to our fortnightly newsletter!