×
Rushikesh Pawar
Trainee Software Engineer
Rushikesh Pawar is a Trainee Software Engineer at Nitor Infotech. He is a passionate software engineer specializing in data engineering, ade... Read More

Are you in search of a solution that offers high-performance, column-oriented, real-time analytics? How about a data store that can handle large volumes of data and provide lightning-fast insights? Well, Apache Druid can do it all for you. Before proceeding with this blog, I strongly recommend that you read my previous blog about Apache Druid, to get a complete overview about its features, architecture, and comparisons with other open-source database management systems.

Done reading? Great!

Now in this blog, you will dive into the world of Apache Druid and explore the step-by-step process of installing and setting up this cutting-edge technology. You will also delve into the intricacies of data ingestion, understanding how to seamlessly bring data into Apache Druid for data analysis.

By the end of this blog, you will have a fully functional Apache Druid cluster ready to handle real-time analytical needs for your business.

Prerequisites before installation

Got your basics ready? Awesome! You are now set to embark on this journey of installation and data ingestion with Apache Druid and discover how it can revolutionize your data analytics workflows.

Quick Note:

collatral

Learn how we helped a leading retail chain optimize sales and marketing functions with our Dashboarding & BI solution, driving actionable insights for increased effectiveness.

Deploying Apache Druid on a single server and connecting it to Kafka for real-time data ingestion can be achieved by following a few steps.

Let’s explore these steps in the next section!

14 Steps to Deploy Apache Druid with Kafka for Real-Time Data Ingestion

Step 1: Install Java

Ensure that Java is installed on your system as it is essential for running Apache Druid.

Install Java

Step 2: Verification

Ensure that both Java and Python are installed.

Verification

Step 3: Get Apache Druid

Download the Apache Druid tar file from the official website.

Get Apache Druid

Step 4: Extract the downloaded file

Extract the contents of the downloaded tar file to a directory on your system.

Extract the downloaded file 1

Extract the downloaded file 2

Step 5: Set Environment Variables

Set the JAVA_HOME and DRUID_HOME environment variables in your Linux.bashrc file to point to the Java and Druid installation directories, respectively.

Set Environment Variables

Step 6: Start Druid

Initiate the Apache Druid service by executing the “start-micro-quickstart” command. This command allocates 4 CPUs and 16 GB of RAM to Druid.

Once started, access the Druid web console by copying the provided link into your browser.

Start Druid

Step 7: Load Data

In the Druid web console, navigate to the “load data” section and choose “start a new streaming spec”.

Load Data

Step 8: Connect to Kafka (here data is consumed from Kafka)

Select Apache Kafka as the data source and then click on Connect data.

Connect to Kafka

Step 9: Configuration

Specify the Bootstrap Servers and Kafka Topic details. Click “Apply” and then “Next” to proceed.

Configuration

Step 10: Data Parsing

Once the data starts loading, check the following details according to data format, which in this case is JSON.

After disabling the “Parse Kafka metadata” option, click Apply to view the data in a tabular format. Then click Next.

Data Parsing

Step 11: Data transformation

After clicking ‘Next’ a few times, you will reach the data transformation options.

In the data transformation phase, you can perform column transformations, wherein you will add a new column named “temp_F”. To accomplish this, navigate to the “Add column transform” option, where you’ll be prompted to input details such as the name of the column.

Keep the default type as “expression” and proceed to write an expression that calculates the values for the new column.

In this instance, we are converting Celsius to Fahrenheit. Once the expression is defined, the new column will be seamlessly incorporated into the dataset.
Data transformation 1

Data transformation 2

Data transformation 3

Step 12: Data segmentation

Now, we need to select the data segmentation criteria to create the data segment.

Data segmentation

Step 13: Finalize and Submit

After navigating through several screens by clicking ‘Next’, click on the ‘Submit’ button.

Once data ingestion is complete, navigate to the “data source” tab in the Druid web console to view details of the ingested data source.

Finalize and Submit

Step 14: Data Exploration

Navigate to the “Query” tab in the Druid web console to explore and query the ingested data.

Data Exploration

That’s it!

By following the 14 steps above, you will successfully deploy Apache Druid with Kafka for real-time data ingestion.

As a recap, here are a few important things to keep in mind when installing Druid:

  • Ensure Python and Java are installed.
  • Configure environment variables like DRUID_HOME and JAVA_HOME.
  • Launch Druid with the correct command for your computational needs.
  • Choose partitioning and segmentation criteria based on your data volume and velocity to avoid segment issues.

In a nutshell, Apache Druid is a powerful tool that helps businesses make better decisions using real-time data. It’s fast, scalable, and flexible, making it ideal for tasks like interactive analytics, operational monitoring, and personalized recommendations. With its ability to handle both historical and real-time data, Apache Druid is transforming how businesses use data to drive success.

Now, it’s time to unleash the power of Apache Druid and unlock the full potential of your data analytics workflows. Feel free to reach out to Nitor Infotech with your thoughts about this blog.

Till then, happy exploring!

subscribe image

Subscribe to our
fortnightly newsletter!

we'll keep you in the loop with everything that's trending in the tech world.

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it. Accept Cookie policy