Setting up Dev Endpoint using Apache Zeppelin with AWS Glue | Nitor Infotech
Send me Nitor Infotech's Monthly Blog Newsletter!
×
nitor logo
  • Company
    • About
    • Leadership
    • Partnership
  • Resource Hub
  • Blog
  • Contact
nitor logo
Add more content here...
Artificial intelligence Big Data Blockchain and IoT
Business Intelligence Careers Cloud and DevOps
Digital Transformation Healthcare IT Manufacturing
Mobility Product Modernization Software Engineering
Thought Leadership
Aastha Sinha Abhijeet Shah Abhishek Suranglikar
Abhishek Tanwade Abhishek Tiwari Ajinkya Pathak
Amit Pawade Ankita Kulkarni Ankita Patidar
Antara Datta Anup Manekar Ashish Baldota
Chandra Gosetty Deep Shikha Bhat Dr. Girish Shinde
Ekta Shah Gaurav Mishra Gaurav Rathod
Gautam Patil Harish Singh Chauhan Harshali Chandgadkar
Kapil Joshi Krishna Gunjal Madhavi Pawar
Marappa Reddy Mayur Wankhade Milan Pansuriya
Minal Doiphode Mohit Agarwal Mohit Borse
Nalini Vijayraghavan Nikhil Kulkarni Omkar Ingawale
Omkar Kulkarni Pooja Chavan Pooja Dhule
Pranit Gangurde Prashant Kankokar Priya Patole
Rahul Ganorkar Rashmi Nehete Ravi Agrawal
Robin Pandita Rohan Chavan Rohini Wwagh
Sachin Saini Sadhana Sharma Sambid Pradhan
Sandeep Mali Sanjay Toge Sanjeev Fadnavis
Saurabh Pimpalkar Sayanti Shrivastava Shardul Gurjar
Shravani Dhavale Shreyash Bhoyar Shubham Kamble
Shubham Muneshwar Shubham Navale Shweta Chinchore
Sidhant Naveria Souvik Adhikary Sujay Hamane
Tejbahadur Singh Uddhav Dandale Vasishtha Ingale
Vidisha Chirmulay Yogesh Kulkarni
Software Engineering | 02 Apr 2021 |   7 min

Setting up Dev Endpoint using Apache Zeppelin with AWS Glue

featured image

AWS Glue is a powerful tool that is managed, relieving you of the hassle associated with maintaining the infrastructure. It is hosted by AWS and offers Glue as Serverless ETL, which converts the code into Python/Scala and execute it in Spark environment.

AWS Glue provisions all the required resources (Spark cluster) at runtime to execute the Spark Job which it takes ~7-10 mins and then starts executing your actual ETL code. To reduce this time AWS Glues provides Development endpoint, which can be configured in Apache Zeppelin (provisioned with the spark environment) to interactively, run, debug and test ETL code before deploying as Glue job or scheduling the ETL process.

In order to successfully set up the Dev Endpoint on AWS Glue, first let us understand some of its prerequisites:

  1. An IAM Role for the Glue Dev Endpoint with the necessary policies. E.g.: AWSGlueServiceRole
  2. Table in Glue Data Catalog and the necessary connection
  3. I am assuming that you know your way around VPC networking, security groups, etc., E.g.: The Dev Endpoint requires a security group that allows the port 22, since we need that for the SSH tunneling.

Further, we can move on to the process involved in setting up the Dev Endpoint on AWS Glue. Here’s how it goes:

  • Create an SSL Key Pair
  • You can create it using PuTTyGen tool or you can create it under AWS EC2 -> Network & Security -> Key Pairs
  • You need the Public Key which should look like:
  • You will need the Private Key in .ppk format. If you’re using PuTTYgen, you will get it in the .ppk format whereas in case of EC2 you will get the file in .pem format and you will have to convert it into .ppk using the PuTTygen conversion tool.
  • Make sure that the file has permissions of 400 or 600 – in case of Linux whereas in case of windows follow the below steps:
    • Right click on PEM file > Properties > Security > Advanced.
    • Make sure that you are the owner of the file and have disabled the inheritance (once you have disabled it, it will look like below screenshot)
  • Spin up the AWS Glue Dev Endpoints:
    • Create it by going to AWS Glue -> Dev endpoints -> Add endpoint and you should see this:
    • In Development endpoint name: Give it any name; IAM Role: Select the role which you’ve created; and Click Next.
    • Choose “Skip Networking Information” if you have S3 data stores, otherwise you can select the rest two as per your instances or security groups. Click Next.
    • Now from your key pair values, paste the Public Key here as shown below and click Next:
    • Review the details and click on Finish:
    • Now let the process run and wait until the Provisioning status shows a “READY” State.
  • SSH tunnel for Glue Dev Endpoint: When your dev endpoint is provisioned, check that its network interface has a public address attached to it and make note of it (e.g., ec2-xx-xxx-xxx-xx.us-west-2.compute.amazonaws.com).

a. Create SSH tunnel using PuTTY:

b. Using SSH:

  • Zeppelin Notebook:
    1. Download the Zeppelin Notebook 0.7.3 version.
    2. Unzip the file and copy the folder under C: Drive.
    3. Go to localhost:8080
    4. On the top-right corner, click anonymous > interpreter > Search for spark > edit
    5. Have Connect to existing process checked
    6. Set host to localhost and port to 9007
    7. Under properties, set master to yarn-client
    8. Remove executor.memory and spark.driver.memory properties if they exist.
    9. Save, Ok.
  • Write your first Glue program:
    1. Under Notebook, create Notebook > select Spark Interpreter
    2. Try your program and execute it.

And with that, your AWS Glue is up and running. Obviously this was just scratching the surface. There is a lot more to learn but I’ll cover those in another blog.

Reach out to us at Nitor Infotech to learn more about how you can deploy AWS Glue to simplify your ETL work, and if you’re in the mood for some light reading, you can take a look at a blog that I had penned down about AWS Redshift PartiQL.

Related Topics

Artificial intelligence

Big Data

Blockchain and IoT

Business Intelligence

Careers

Cloud and DevOps

Digital Transformation

Healthcare IT

Manufacturing

Mobility

Product Modernization

Software Engineering

Thought Leadership

<< Previous Blog fav Next Blog >>
author image

Shubham Muneshwar

Lead Data Engineer

Shubham is an AWS certified professional and is highly proficient in designing cloud-based architectures using AWS. He has significant experience in designing dashboard reporting systems and building python orchestrated frameworks. Shubham is a passionate coder and works on different technical stacks as the situation demands. He is big fan of cricket, Sufi music and automating almost everything.

   

You may also like

featured image

15 Performance Improvement Techniques for Your iOS App

In the world of iOS app development, app performance refers to the speed, responsiveness, and ...
Read Blog


featured image

The Ultimate Guide to Different Types of Testing

In today’s competitive scenario, businesses that want to stand out against their peers must invest in building best-in-class software that is performant and failure-proof. To ensure sustained funct...
Read Blog


featured image

The Importance of ChatGPT and Why it is Becoming Popular

Imagine having a conversation with a chatbot that feels almost human. That’s exactly what OpenAI ChatGPT brings to the table. The remarkable technology of Generative Pre-trained Transformer (GPT) p...
Read Blog


subscribe

Subscribe to our fortnightly newsletter!

We'll keep you in the loop with everything that's trending in the tech world.

Services

    Modern Software Engineering


  • Idea to MVP
  • Quality Engineering
  • Product Engineering
  • Product Modernization
  • Reliability Engineering
  • Product Maintenance

    Enterprise Solution Engineering


  • Idea to MVP
  • Strategy & Consulting
  • Enterprise Architecture & Digital Platforms
  • Solution Engineering
  • Enterprise Cognition Engineering

    Digital Experience Engineering


  • UX Engineering
  • Content Engineering
  • Peer Product Management
  • RaaS
  • Mobility Engineering

    Technology Engineering


  • Cloud Engineering
  • Cognitive Engineering
  • Blockchain Engineering
  • Data Engineering
  • IoT Engineering

    Industries


  • Healthcare
  • Retail
  • Manufacturing
  • BFSI
  • Supply Chain

    Company


  • About
  • Leadership
  • Partnership
  • Contact Us

    Resource Hub


  • White papers
  • Brochures
  • Case studies
  • Datasheet

    Explore More


  • Blog
  • Career
  • Events
  • Press Releases
  • QnA

About


With more than 16 years of experience in handling multiple technology projects across industries, Nitor Infotech has gained strong expertise in areas of technology consulting, solutioning, and product engineering. With a team of 700+ technology experts, we help leading ISVs and Enterprises with modern-day products and top-notch services through our tech-driven approach. Digitization being our key strategy, we digitally assess their operational capabilities in order to achieve our customer's end- goals.

Get in Touch


  • +1 (224) 265-7110
  • marketing@nitorinfotech.com

We are Social 24/7


© 2023 Nitor Infotech All rights reserved

  • Terms of Usage
  • Privacy Policy
  • Cookie Policy
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it. Accept Cookie policy