Setting up Dev Endpoint using Apache Zeppelin with AWS Glue
Send me Nitor Infotech's Monthly Blog Newsletter!
×
Software Product Engineering Services Company
  • Company
    • About
    • Leadership
    • Partnership
  • Resource Hub
  • Blog
  • Contact
Software Product Engineering Services Company
Add more content here...
Artificial intelligence Big Data Blockchain and IoT
Business Intelligence Cloud and DevOps Digital Transformation
Healthcare IT Manufacturing Mobility
Product Modernization Software Engineering Thought Leadership
Aastha Sinha Abhijeet Shah Abhishek Suranglikar
Abhishek Tanwade Ajinkya Pathak Amol Jadhav
Ankita Kulkarni Antara Datta Anup Manekar
Chandra Gosetty Chandrakiran Parkar Dr. Girish Shinde
Gaurav Mishra Gaurav Rathod Harshali Chandgadkar
Madhavi Pawar Milan Pansuriya Minal Doiphode
Mohit Agarwal Mohit Borse Nalini Vijayraghavan
Neha Garg Omkar Ingawale Omkar Kulkarni
Pranit Gangurde Prashant Kamble Priya Patole
Ravi Agrawal Robin Pandita Rohini Wwagh
Sachin Saini Sadhana Sharma Sambid Pradhan
Sanjeev Fadnavis Saurabh Pimpalkar Shardul Gurjar
Shravani Dhavale Shubham Kamble Shubham Muneshwar
Sidhant Naveria Sujay Hamane Tejbahadur Singh
Tushar Sangore Vasishtha Ingale Veena Metri
Vidisha Chirmulay
Software Engineering | 02 Apr 2021 |   7 min

Setting up Dev Endpoint using Apache Zeppelin with AWS Glue

AWS Glue is a powerful tool that is managed, relieving you of the hassle associated with maintaining the infrastructure. It is hosted by AWS and offers Glue as Serverless ETL, which converts the code into Python/Scala and execute it in Spark environment.

AWS Glue provisions all the required resources (Spark cluster) at runtime to execute the Spark Job which it takes ~7-10 mins and then starts executing your actual ETL code. To reduce this time AWS Glues provides Development endpoint, which can be configured in Apache Zeppelin (provisioned with the spark environment) to interactively, run, debug and test ETL code before deploying as Glue job or scheduling the ETL process.

In order to successfully set up the Dev Endpoint on AWS Glue, first let us understand some of its prerequisites:

  1. An IAM Role for the Glue Dev Endpoint with the necessary policies. E.g.: AWSGlueServiceRole
  2. Table in Glue Data Catalog and the necessary connection
  3. I am assuming that you know your way around VPC networking, security groups, etc., E.g.: The Dev Endpoint requires a security group that allows the port 22, since we need that for the SSH tunneling.

Further, we can move on to the process involved in setting up the Dev Endpoint on AWS Glue. Here’s how it goes:

  • Create an SSL Key Pair
  • You can create it using PuTTyGen tool or you can create it under AWS EC2 -> Network & Security -> Key Pairs
  • You need the Public Key which should look like:
  • You will need the Private Key in .ppk format. If you’re using PuTTYgen, you will get it in the .ppk format whereas in case of EC2 you will get the file in .pem format and you will have to convert it into .ppk using the PuTTygen conversion tool.
  • Make sure that the file has permissions of 400 or 600 – in case of Linux whereas in case of windows follow the below steps:
    • Right click on PEM file > Properties > Security > Advanced.
    • Make sure that you are the owner of the file and have disabled the inheritance (once you have disabled it, it will look like below screenshot)
  • Spin up the AWS Glue Dev Endpoints:
    • Create it by going to AWS Glue -> Dev endpoints -> Add endpoint and you should see this:
    • In Development endpoint name: Give it any name; IAM Role: Select the role which you’ve created; and Click Next.
    • Choose “Skip Networking Information” if you have S3 data stores, otherwise you can select the rest two as per your instances or security groups. Click Next.
    • Now from your key pair values, paste the Public Key here as shown below and click Next:
    • Review the details and click on Finish:
    • Now let the process run and wait until the Provisioning status shows a “READY” State.
  • SSH tunnel for Glue Dev Endpoint: When your dev endpoint is provisioned, check that its network interface has a public address attached to it and make note of it (e.g., ec2-xx-xxx-xxx-xx.us-west-2.compute.amazonaws.com).

a. Create SSH tunnel using PuTTY:

b. Using SSH:

  • Zeppelin Notebook:
    1. Download the Zeppelin Notebook 0.7.3 version.
    2. Unzip the file and copy the folder under C: Drive.
    3. Go to localhost:8080
    4. On the top-right corner, click anonymous > interpreter > Search for spark > edit
    5. Have Connect to existing process checked
    6. Set host to localhost and port to 9007
    7. Under properties, set master to yarn-client
    8. Remove executor.memory and spark.driver.memory properties if they exist.
    9. Save, Ok.
  • Write your first Glue program:
    1. Under Notebook, create Notebook > select Spark Interpreter
    2. Try your program and execute it.

And with that, your AWS Glue is up and running. Obviously this was just scratching the surface. There is a lot more to learn but I’ll cover those in another blog.

Reach out to us at Nitor Infotech to learn more about how you can deploy AWS Glue to simplify your ETL work, and if you’re in the mood for some light reading, you can take a look at a blog that I had penned down about AWS Redshift PartiQL.

Related Topics

Artificial intelligence   Big Data   Blockchain and IoT   Business Intelligence   Cloud and DevOps   Digital Transformation   Healthcare IT   Manufacturing   Mobility   Product Modernization   Software Engineering   Thought Leadership  
<< Previous Blog Next Blog >>

Shubham Muneshwar

Lead Data Engineer

Shubham is an AWS certified professional and is highly proficient in designing cloud-based architectures using AWS. He has significant experience in designing dashboard reporting systems and building python orchestrated frameworks. Shubham is a passionate coder and works on different technical stacks as the situation demands. He is big fan of cricket, Sufi music and automating almost everything.

   

You may also like

Beyond the Obvious: 9 Mantras for Customer Comfort

It takes two to tango they say, that is true in more ways than one, every human relationship is a story of mutuality. We start our journey with very obviously mutual relationships, I say obvious be...
Read Blog


Virtualization vs. Cloud Computing: What’s the Difference?

At a time when the word ‘virtual’ features on the first page of our dictionaries (our vocabularies), the concept of virtualization in the world of cloud computi...
Read Blog


User Acceptance Testing (UAT): Explained

Have you heard of “user acceptance testing”? We are sure your answer would be "Yes" if you are involved in any part of the SDLC (Software Development Life Cycle). “It’s All About the User”...
Read Blog


Subscribe to our monthly newsletter!

We'll keep you in the loop with everything that's trending in the tech world.

Services

    Modern Software Engineering


  • Idea to MVP
  • Quality Engineering
  • Product Engineering
  • Product Modernization
  • Reliability Engineering
  • Product Maintenance

    Enterprise Solution Engineering


  • Idea to MVP
  • Strategy & Consulting
  • Enterprise Architecture & Digital Platforms
  • Solution Engineering
  • Enterprise Cognition Engineering

    Digital Experience Engineering


  • UX Engineering
  • Content Engineering
  • Peer Product Management
  • RaaS
  • Mobility Engineering

    Technology Engineering


  • Cloud Engineering
  • Cognitive Engineering
  • Blockchain Engineering
  • Data Engineering
  • IoT Engineering

    Industries


  • Healthcare
  • Retail
  • Manufacturing
  • BFSI

    Company


  • About
  • Leadership
  • Partnership
  • Contact Us

    Resource Hub


  • White papers
  • Brochures
  • Case studies
  • Datasheet

    Explore More


  • Blog
  • Career
  • Events
  • Press Releases
  • QnA

About


With more than 15 years of experience in handling multiple technology projects across industries, Nitor Infotech has gained strong expertise in areas of technology consulting, solutioning, and product engineering. With a team of 700+ technology experts, we help leading ISVs and Enterprises with modern-day products and top-notch services through our tech-driven approach. Digitization being our key strategy, we digitally assess their operational capabilities in order to achieve our customer's end- goals.

Get in Touch


  • +1 (224) 265-7110
  • marketing@nitorinfotech.com

We are Social 24/7


© 2022 Nitor Infotech All rights reserved

  • Terms of Usage
  • Privacy Policy
  • Cookie Policy
x
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it. Accept Cookie policy