A Data Quality Framework for Your DWH

Author

Hasan Caliskan

Managing Partner

How can we ensure data quality in a structured manner?

Conclusion

How can we ensure data quality in a structured manner?

Data Warehouse projects have three layers with clear differences in requirements:

Data ingestion
Data transformation
Data delivery.

This three-layered approach helps us categorize the data quality actions and prioritize them. Let's look at them one by one.

If you are using a different data platform that does not offer secure data sharing, you can explore the possibility of the mobile app data vendor pushing the data into your cloud infrastructure from where it can be easily ingested into your data warehouse.

A Framework for data quality

‍

Step 1: Ingestion

Ingestion is the point of data entry to the data warehouse. Different sources make the data available in various ways, such as transactional databases, transferred files, API calls, in-cloud data shares, etc.

The data that arrives in the data warehouse environment is source data. We have no control over the creation of the source data; it may be external or internal, but still from outside the data warehouse. Therefore, regardless of the delivery method or where the data is coming from, there are standard steps to ensure quality at data ingestion.

In this layer, we inspect the source data to ensure it is fresh, complete, and as expected. We can check expectations around shape, uniqueness, and size. We can also check content such as permitted values, means, averages, and distribution of measures.

Treating the source data as “immutable” and keeping it intact is critical. We treat the source data almost like the items in an evidence room. We need to be able to say: "This is how we received the data." We don’t transform the data at this stage; we keep it intact. This approach has multiple advantages. For example, the original data will be readily available in data transformation updates or bug fixes. Another example is uncovering issues with the source data. With the source data intact, it is easier to trace the issue to the source, communicate with the data owners, and resolve the source data issue.

The data quality activities in the ingestion step ensure the data quality going into the Data Warehouse.

Step 2: Transformation

After receiving the source data and checking it against our expectations, we build the data warehouse. Every view and table of our data warehouse is built afresh with the source data and our code that transforms it. Our data warehouse project (code) performs this step. With the 205 Data Lab approach (and most commonly in the modern data stack), this step is performed primarily by dbt.

The transformation step is quite different from the ingestion step. Unlike the source data, where we have no control over what was received, we control the transformation here. We run the transformation as code.

Since the transformation is code-based, we expect it to run the same way every single time. Once we adequately test that the code performing the data transformation works as expected, we deploy the code and do not expect errors of the sort we tested for. Occasionally we may encounter (if we are good, rarely) scenarios that we have yet to account for. We will change the code, test, and deploy in those cases. And we will not expect the same error again. The individual models (tables and views) will be accurate if the code is “tested and deployed.”

Therefore, the approach in this step is to have controlled deployment processes and practices and catch any bugs and errors early on, before deployment.

Step 3: Delivery

At the beginning of the transformation, there are the source models that we discussed above. At the end of the transformation, models will be consumed by other platforms, usually called "mart" models.

We will expose these models to reporting and visualization tools such as Looker (or Tableau, Mode, Sigma, or another). Alternatively, they could be used by machine learning models, operational analytics tools, or other applications that consume the transformed data.

For these models, similar to the source models, there are quality expectations at the destination. The requirements from the Data warehouse are similar: Operational and semantic checks on the data and ensuring what is promised is what is actually delivered.

We have outlined the three stages of the Data Warehouse with different data quality requirements: Ingestion, transformation, and delivery. Each step requires a different approach and a different set of tools. Below is our recommended checklist of data quality activities in each layer: ingestion, transformation, and delivery.

Our recommended checklist:

Please see below our data quality recommended checklist aligned with the three steps in the data warehouse projects.

Ingestion	Transformation	Delivery
Staging Models for All Entities: Stage models with a unique_key tested for “not null” and “unique.	Dbt Style Best Practices Code Style (facilitated with a linter tool)	Standard Delivery Tests: Correctness of shape
Standard Ingestion Tests: No nulls for required columns Correctness of data types Recency and Freshness tests Correctness of shape (expected columns are there)	Standard Transformation Tests: Unique key for every model Not null and unique tests for every unique key Column value assertions: relationships, accepted value for booleans, and category variables, not null tests (where required)	Anomaly Checks: Volume of data, distribution of KPIs (e.g., mean, average, standard deviance)
Snapshotted Source Data	Quality Control for new and existing models	Blue / Green Deployment
Anomaly Checks: Volume of data Distribution of KPIs (e.g., mean, average, standard deviance)	CI/CD guardrails: Version control PR template, PR checks before deployment (“Slim CI”)	Version-controlled BI tools’ best practices
Permission Management for differentiated access to raw, transformed, and mart data
Monitoring the transformation project
Alerts integrated with (email, slack)
Documentation

We are also including some of the visualization elements in the scope due to the significant interdependencies

To be continued:

There are an increasing number of tools available to improve data quality. We intend to capture our most up-to-date assessment based on our experience in this blog. With additional blog pieces to follow, we’ll continue expanding on these three stages by providing examples of different tools, best practices, and practical checklists we use.

Stay tuned!

Conclusion

By leveraging bulk data from providers like data.ai, B2B companies selling to mobile app publishers and developers can optimize their sales efforts and achieve scalable results. Implementing a data-driven approach, integrating it with CRM systems, and continuously refining the strategy based on feedback will ensure that your sales team targets high-potential apps and publishers, ultimately leading to increased revenue and success. By moving away from manual and inefficient processes, you'll be able to achieve greater consistency and effectiveness in your sales efforts.

Are you interested in exploring this approach for your company and need implementation help? With over five years of experience working with mobile app data, our team can help you implement this type of solution, even if you don't have a data warehouse or access to internal data resources. Start a conversation with us today, and let us help your team sell more effectively and efficiently!

2 min

How to structure data quality actions in your cloud data warehouse?

Author

Hasan Caliskan

Table of contents

How can we ensure data quality in a structured manner?

A Framework for data quality

Step 1: Ingestion

Step 2: Transformation

Step 3: Delivery

Our recommended checklist:

To be continued:

Conclusion

How can we ensure data quality in a structured manner?

A Framework for data quality

Step 1: Ingestion

Step 2: Transformation

Step 3: Delivery

Our recommended checklist:

To be continued:

Stay tuned!

Conclusion

Building Trustworthy Artificial Intelligence: Anjali Bansal on the Critical Role of Governance

Data With a Human Touch: How Analytics Is Powering Every Layer of Modern Event Management

Bridging Strategy and Execution: CRM Insights from Both Sides of the Table with Ryan McGuire

How NYC Uses Data to Shape Fairer Laws

The ETL Build vs. Buy Decision: Finding the Middle Path for Data Integration

What AI Can’t Fix: Lessons from a Cross-Industry Data Leader

Scaling Data Science and Experimentation: Lessons from Shuling Ding’s Journey at Tubi

Bridging R&D and Data Science – A Conversation with Maxwell Dylla

Data Leaders Uncovered: Tackling Pharmaceutical Data Challenges with Prem K. Narasimhan

Data Leaders Uncovered: Data-Driven Excellence in Retail

Introducing "Data Leaders Uncovered"

Why Your CRM Is No Longer Your Single Source of Truth

The Fractional Shift: The Future of Data Teams

Unlocking Your GTM Potential: Why Investing in a Data Warehouse is the Next Step

Blue-Green Deployment: Say Goodbye to Disruptions In Your Data Warehouse

How to Leverage Bulk Mobile App Data for B2B Sales Success

AI-Powered Periodical Management Reports :

Don’t Miss Out On Future Articles

2 min

How to structure data quality actions in your cloud data warehouse?

Author

Hasan Caliskan

Table of contents

How can we ensure data quality in a structured manner?

A Framework for data quality

Step 1: Ingestion

Step 2: Transformation

Step 3: Delivery

Our recommended checklist:

To be continued:

Conclusion

How can we ensure data quality in a structured manner?

A Framework for data quality

Step 1: Ingestion

Step 2: Transformation

Step 3: Delivery

Our recommended checklist:

To be continued:

Stay tuned!

Conclusion

Related Posts

Building Trustworthy Artificial Intelligence: Anjali Bansal on the Critical Role of Governance

Data With a Human Touch: How Analytics Is Powering Every Layer of Modern Event Management

Bridging Strategy and Execution: CRM Insights from Both Sides of the Table with Ryan McGuire

How NYC Uses Data to Shape Fairer Laws

The ETL Build vs. Buy Decision: Finding the Middle Path for Data Integration

What AI Can’t Fix: Lessons from a Cross-Industry Data Leader

Scaling Data Science and Experimentation: Lessons from Shuling Ding’s Journey at Tubi

Bridging R&D and Data Science – A Conversation with Maxwell Dylla

Data Leaders Uncovered: Tackling Pharmaceutical Data Challenges with Prem K. Narasimhan

Data Leaders Uncovered: Data-Driven Excellence in Retail

Introducing "Data Leaders Uncovered"

Why Your CRM Is No Longer Your Single Source of Truth

The Fractional Shift: The Future of Data Teams

Unlocking Your GTM Potential: Why Investing in a Data Warehouse is the Next Step

Blue-Green Deployment: Say Goodbye to Disruptions In Your Data Warehouse

How to Leverage Bulk Mobile App Data for B2B Sales Success

AI-Powered Periodical Management Reports :

Don’t Miss Out On Future Articles