MongoDB to Redshift- Data Migration | Nitor Infotech
Send me Nitor Infotech's Monthly Blog Newsletter!
×
nitor logo
  • Company
    • About
    • Leadership
    • Partnership
  • Resource Hub
  • Blog
  • Contact
nitor logo
Add more content here...
Artificial intelligence Big Data Blockchain and IoT
Business Intelligence Careers Cloud and DevOps
Digital Transformation Healthcare IT Manufacturing
Mobility Product Modernization Software Engineering
Thought Leadership
Aastha Sinha Abhijeet Shah Abhishek Suranglikar
Abhishek Tanwade Abhishek Tiwari Ajinkya Pathak
Amit Pawade Amol Jadhav Ankita Kulkarni
Antara Datta Anup Manekar Ashish Baldota
Chandra Gosetty Chandrakiran Parkar Deep Shikha Bhat
Dr. Girish Shinde Gaurav Mishra Gaurav Rathod
Gautam Patil Harish Singh Chauhan Harshali Chandgadkar
Kapil Joshi Madhavi Pawar Marappa Reddy
Milan Pansuriya Minal Doiphode Mohit Agarwal
Mohit Borse Nalini Vijayraghavan Neha Garg
Nikhil Kulkarni Omkar Ingawale Omkar Kulkarni
Pooja Dhule Pranit Gangurde Prashant Kamble
Prashant Kankokar Priya Patole Rahul Ganorkar
Ramireddy Manohar Ravi Agrawal Robin Pandita
Rohan Chavan Rohini Wwagh Sachin Saini
Sadhana Sharma Sambid Pradhan Sandeep Mali
Sanjeev Fadnavis Saurabh Pimpalkar Sayanti Shrivastava
Shardul Gurjar Shravani Dhavale Shreyash Bhoyar
Shubham Kamble Shubham Muneshwar Shubham Navale
Shweta Chinchore Sidhant Naveria Souvik Adhikary
Sreenivasulu Reddy Sujay Hamane Tejbahadur Singh
Tushar Sangore Vasishtha Ingale Veena Metri
Vidisha Chirmulay Yogesh Kulkarni
Big Data | 27 Nov 2020 |   10 min

MongoDB to Redshift-Data Migration

featured image

We will cover various approaches used to perform data migration from MongoDB to Redshift in this article.

A Brief Overview of MongoDB and Redshift

MongoDB is an open source NoSQL database which stores data in JSON format using a document-oriented data model. Data fields can vary by document. MongoDB isn’t associated with any specific data structure so there’s no particular format or schema for data in it.

Amazon Redshift data warehouse is essentially an enterprise-class, relational database query and management system that can achieve efficient storage and optimum query performance through massive parallel processing, columnar data storage, and very efficient, targeted data compression encoding schemes. Read more about Redshift Architecture here.

Approaches to transfer data from MongoDB to Redshift

There are two ways to replicate data from MongoDB to Redshift:

  1. Using a ready-to-use Data Integration Platform
  2. Writing custom ETL code with the help of Export Utility

We will be covering the steps involved in writing custom code to load data from MongoDB to Redshift as well as its limitations.

Transfer Data from MongoDB to Redshift using Custom Code

For the purpose of demonstration, assume that we need to move the ‘products’ collection into Redshift that holds the product details of the manufacturing company.

Two cases should be taken into consideration while transferring data:

  1. Move the data for a one-time load into Redshift.
  2. Incrementally load data into Redshift, which applies only when the data volume is high.

Let us take a look at both the scenarios:

One Time Load

The .csv file of the required MongoDB collection will have to be generated using Export command as follows:

Open command prompt and go to the below path to run the BCP command

C:\PROGRAM FILES\MONGODB\SERVER\4.0\BIN

Run the mongoexport command to generate the output file for the products collection.

MONGOEXPORT –HOST LOCALHOST -U ‘USERNAME’ -P ‘PASSWORD’ –DB MONGODB –COLLECTION PRODUCTS –OUT D:\WORK\ARTICLES\PRODUCTS.JSON

Note that here might be numerous transformations needed before loading this data into Redshift. Achieving this using code will become extremely hard. A tool that provides an easy environment to write transformations might work for you.

  • Upload above generated .txt file to S3 Bucket

Files from local machine can be easily uploaded to AWS in many ways, some of which are given below.

One way is to upload it using file upload utility of S3 which is an intuitive alternative.
You can also achieve this AWS CLI that provides easy commands to upload it to the S3 bucket from your local machine.

As a pre-requisite, you need to install and configure AWS CLI. You can read the user guide to learn more about installing AWS CLI.

Run the following command to upload the file into S3 from the local machine

AWS S3 CP D:\WORK\ARTICLES\PRODUCTS.JSON S3://S3BUCKET011/PRODUCTS.JSON

Create Table schema before loading the data into Redshift

CREATE TABLE SALES.PRODUCTS SKU VARCHAR 100 ,
TITLE VARCHAR 100,

DESCRIPTION VARCHAR 500,

MANUFACTURE_DETAILS VARCHAR 1000,

SHIPPING_DETAILS VARCHAR 1000,

QUANTITY BIGINT,

PRICING VARCHAR 100

After running the query a tabular structure without records will be created within Redshift. To check this, run the following query:

SELECT * FROM SALES.PRODUCTS

  • Using copy command load the data from S3 to Redshift

COPY DEV.SALES.PRODUCTS FROM ‘S3://S3BUCKET011/PRODUCTS.JSON’

IAM_ROLE ‘ROLE_ARN’ FORMAT AS JSON ‘AUTO’;

You will need to confirm if the data has loaded successfully by running the query.

SELECT * FROM SALES.PRODUCTS LIMIT 10;

This should return the record inserted from products file.

Limitations of the Custom ETL Scripts Approach:

  1. In certain cases where data needs to be moved once or in batches, the custom ETL script method works well, but becomes extremely tedious if data needs to be copied from Mongodb to Redshift in real-time.
  2. In case you are dealing with large volumes of data, incremental load needs to be performed. Incremental load (change data capture) becomes tough since additional steps needed to achieve it.
  3. Transforming data before loading it into Redshift is extremely difficult to attain.
  4. While writing code to extract a subset of data, scripts could break as the source schema keeps changing or evolving resulting in data loss.

The process mentioned above is debilitated, erroneous and more often than not, hard to implement and maintain which may impact the consistency and availability of your data in Redshift.

There is an easier way to replicate data from MongoDB to Redshift.

A ready to use data integration solution can assist you to migrate this data without writing any code. This is how the process will look like when done through a tool:

  • Connect to your MongoDB.
  • Select a replication mode:

(a) Full Dump and Load (b) Incremental load for append-only data (c) Incremental load for mutable data

  • For every collection in Mongodb, select a table name in Redshift where it needs to be copied.

That’s it! You are all set. Your Data Integration Platform will take care of gathering your data incrementally and uploading it seamlessly from Mongodb to Redshift in real-time.

In addition to this, you can bring data from various different sources – databases, cloud applications, SDKs, and more with a Data Integration Platform. This will future proof your data integration set up as well as provide you with the flexibility to immediately replicate data from any source into Redshift.

Reach out to us  at Nitor Infotech to learn more about migrating data and witness how easy it is to load data from MongoDB to Redshift along with several other sources and accelerate the process of generating powerful analytical workflows.

Related Topics

Artificial intelligence

Big Data

Blockchain and IoT

Business Intelligence

Careers

Cloud and DevOps

Digital Transformation

Healthcare IT

Manufacturing

Mobility

Product Modernization

Software Engineering

Thought Leadership

<< Previous Blog fav Next Blog >>
author image

Nitor User

   

You may also like

featured image

10 Heuristic Principles in UX Engineering

Say, you’ve built a modern, cutting-edge application. It has a complex, multi-layered user interface (UI), that is the basis for some amazing features. Since you’re the one who has built the applic...
Read Blog


featured image

ETL Testing: A Detailed Guide

Just in case the term is new to you, ETL is defined from data warehousing and stands for Extract-Transform-Load. It covers the process of how the data is loaded from the multiple source system to t...
Read Blog


featured image

Getting Started with ArcGIS Online

GeoServer is an open-source server that facilitates the sharing, processing and editing of geospatial data. When we are dealing with a large set of geospatial d...
Read Blog


subscribe

Subscribe to our fortnightly newsletter!

We'll keep you in the loop with everything that's trending in the tech world.

Services

    Modern Software Engineering


  • Idea to MVP
  • Quality Engineering
  • Product Engineering
  • Product Modernization
  • Reliability Engineering
  • Product Maintenance

    Enterprise Solution Engineering


  • Idea to MVP
  • Strategy & Consulting
  • Enterprise Architecture & Digital Platforms
  • Solution Engineering
  • Enterprise Cognition Engineering

    Digital Experience Engineering


  • UX Engineering
  • Content Engineering
  • Peer Product Management
  • RaaS
  • Mobility Engineering

    Technology Engineering


  • Cloud Engineering
  • Cognitive Engineering
  • Blockchain Engineering
  • Data Engineering
  • IoT Engineering

    Industries


  • Healthcare
  • Retail
  • Manufacturing
  • BFSI
  • Supply Chain

    Company


  • About
  • Leadership
  • Partnership
  • Contact Us

    Resource Hub


  • White papers
  • Brochures
  • Case studies
  • Datasheet

    Explore More


  • Blog
  • Career
  • Events
  • Press Releases
  • QnA

About


With more than 16 years of experience in handling multiple technology projects across industries, Nitor Infotech has gained strong expertise in areas of technology consulting, solutioning, and product engineering. With a team of 700+ technology experts, we help leading ISVs and Enterprises with modern-day products and top-notch services through our tech-driven approach. Digitization being our key strategy, we digitally assess their operational capabilities in order to achieve our customer's end- goals.

Get in Touch


  • +1 (224) 265-7110
  • marketing@nitorinfotech.com

We are Social 24/7


© 2023 Nitor Infotech All rights reserved

  • Terms of Usage
  • Privacy Policy
  • Cookie Policy
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it. Accept Cookie policy