Just in case the term is new to you, ETL is defined from data warehousing and stands for Extract-Transform-Load. It covers the process of how the data is loaded from the multiple source system to the data warehouse.
In this blog, I’d like to share what I know about:
Let’s start!
ETL testing ensures that data extracted from multiple systems is complete, transformed as per business rules, and loaded accurately in a data warehouse.
Let’s now turn to the approach.
ETL testing mainly focuses on the quality of data which is getting loaded into multiple data layers from the source. The process is like this:
Source-> Staging-> ODS-> Data Mart.
Here are the main categories of ETL testing:
1. Metadata testing
Ensure table name, column name, data type, data length, constraints, index and relationships for all source and target tables is as per the mapping document.
Example: In MS SQL, we can verify this by using INFORMATION_SCHEMA which contains TABLES, COLUMNS, and TABLE_CONSTRAINTS.
2. Integrity testing
Ensure the relationship between the parent and child table is maintained as expected after every data refresh. Data present in the child table as Foreign Key is mapped with the parent table primary Key.
Example: Check the count of unmatched rows in foreign key table by running queries on source and target data.
3. Data comparison testing
Design complex SQL queries to compare data from source to target tables. As source data is present in heterogenous tables or CSV files, we need to design an ETL package to fetch all source date and load as it is in combined source tables. This can be effectively compared with target tables.
For a large amount of data, we can compare unique or NULL values between key columns and use aggregate functions like Min, MAX, Avg to compare aggregation logics.
Example: We can verify this by using except queries. In the case of a huge amount of data, we can focus on important columns to be compared.
4. Data transformation testing
Ensure data which is loaded in target system is as per business transformation rules. For each transformation rules develop SQL queries on source tables and compare with data mart table measures. For effective test coverage, we can add permutation and combinations on source data to cover all scenarios.
Example: Verify business rules/data validation rules by developing SQL queries
5. Data quality testing
Ensure expected data is only loaded in target tables. Unwanted data is eliminated or replaced with valid data as per business requirement.
Example: null key values, duplicate records in source data and invalid data types in fields (e.g., alphabetic characters in a decimal field).
Now that you know about the types of ETL testing, let’s examine its methodologies.
The goal of unit testing is to verify mapping fields and business rules at an early stage of the development cycle in feed or staging layer.
Test Scenarios:
The goal of integration testing is to ensure that end to end workflows or the ETL package is executed as per expected outcomes.
Test Scenarios:
The goal of system testing is to ensure all transformation logics are implemented as per business requirements. It majorly focuses on data validation.
Test Scenarios:
If you would like to read about more methodologies, read our blog.
Allow me to tell you about bugs in ETL testing.
Here is a list of possible bugs you may come across:
Bugs like these can be avoided if best practices are followed. Let’s take a look at some…
Let’s now look at key challenges in ETL testing.
Here are some of the major obstacles:
Now it’s time to get acquainted with tools used in ETL testing.
It is one of the automations ETL testing tools that leverage analytics for data validation and ETL testing. It is a smart data testing solution that automates data validation and testing.
It is an analyst tool. As soon as data is loaded, it will highlight duplicate, NULL values.
It automates end-to-end ETL testing with complete accuracy and increased coverage. iCEDQ is an ETL Testing platform, designed to identify any data issues in and across structured and semi-structured data.
Well, there you have it! Allow me to wrap up my ideas…
ETL testing requires ensuring data accuracy, correctness, and quality on BI reports which helps the end user to make accurate decisions at the right time. An end to end testing process plays a crucial role to ensure that extracted data is loaded in the data warehouse as per expected transformation business rules.
Reach out to us with your thoughts about this blog. Visit us at Nitor Infotech to learn about our quality engineering offerings.
Subscribe to our fortnightly newsletter!