3 min

The ETL Build vs. Buy Decision: Finding the Middle Path for Data Integration

On March 1, 2025, one of our clients received an unwelcome surprise in their inbox.

Applied AI

Data Analytics

Data Engineering

The ETL Vendor Surprise: A Wake-Up Call

On March 1, 2025, one of our clients received an unwelcome surprise in their inbox. A major ETL vendor had notified them that their current plan would be retired and unless they took action, they'd be automatically migrated to a more expensive tier.

The result? Their monthly invoice increased by over 50% overnight. No change in data volume. No additional features used. Just a policy change that dramatically affected their ETL costs.

This scenario raises an immediate question many data teams face: Should they have built their own ETL solution from the start, or is there a better approach to data integration?

ETL Choices are not Binary..

When faced with unexpected ETL vendor changes, it's tempting to swing to extremes. Either double down on vendor dependency ("That's just the cost of doing business with ETL tools") or reject external tools entirely ("We'll build our own data pipelines from scratch").

But the wisest approach to ETL and data integration often lies somewhere in between.

Rethinking ETL Build vs. Buy as a Spectrum

The traditional "build vs. buy" framing suggests a binary choice for ETL solutions, but modern data integration decisions exist on a spectrum with various hybrid approaches available:

There is a spectrum of choices available.
There is a spectrum of choices available.

Pure Buy/Configured Buy: Fully managed solutions with minimal to moderate customization. We view these as variations of the same approach, as even "pure buy" solutions require some configuration. Within this category, using multiple ETL tools for different connectors based on pricing efficiency is a valid strategy.

Hybrid Approach: Open-source frameworks with commercial support or specialized tools for specific use cases

Supported Build: Custom solutions built on managed infrastructure or with key components from vendors

⦁ Pure Build: Completely custom, in-house development from the ground up

⦁ Mixed Strategy: Employing multiple approaches simultaneously for different data sources or use cases. For example, using vendor solutions for standard SaaS connectors while building custom pipelines for high-volume proprietary systems.

The goal isn't to pick one approach for your entire stack, but to strategically position each component along this spectrum based on your specific needs. Many organizations find that a mixed strategy—using different approaches for different data sources—provides the optimal balance of efficiency, control, and cost-effectiveness.

Our Data Warehouse-First Philosophy

As a data warehouse-first company, our approach emphasizes getting data into the warehouse in its raw form first, then performing transformations within the warehouse environment. This philosophy shapes our perspective on the ETL landscape in several important ways.

Illustrative diagram of 205 Data Lab’s Data Warehouse–First approach, where data from tools like Salesforce and others flows into Snowflake, transformations occur centrally, and clean data is then shared back with applications and platforms.
Illustrative diagram of 205 Data Lab’s Data Warehouse–First approach, where data from platforms and apps flows into a data warehouse, transformations occur centrally, and clean data is then shared back

We deliberately avoid using the transformation capabilities of ETL tools, despite vendors often highlighting these features as differentiators. In our experience, transformations are better performed inside the warehouse using SQL and dedicated transformation tools like dbt. This approach provides better visibility, version control, testing capabilities, and keeps business logic in a centralized, accessible location rather than scattered across various ETL pipelines.

By focusing on the "EL" rather than the "T" in ETL, we can evaluate tools more clearly on their core data movement capabilities without being distracted by transformation features we don't intend to use. This allows us to potentially combine what might traditionally be seen as separate "pure buy" and "configured buy" options into a single category, as the configuration we require relates primarily to connection settings rather than complex transformation logic.

This warehouse-first approach also enables us to mix and match different ETL tools based on pricing efficiency for specific connectors. Since we're not deeply embedding business logic in these tools, switching between them for different data sources becomes more feasible, reducing vendor lock-in and optimizing costs.

A Decision Framework for Finding Balance

When should you build, and when should you buy? This question deserves more nuance than the binary choice it appears to present.

Any Hidden Costs of Building Your Own ETL Stack?

Strategic value should be your primary consideration. Commodity functionality that doesn't differentiate your business is typically best purchased. These are the standard data pipelines that every company needs but don't create competitive advantage. In contrast, data integration that directly supports your core business differentiators often benefits from custom building. When your unique business logic creates value that competitors can't easily replicate, the control of a custom solution may justify the investment.

The complexity of your requirements also guides this decision. Standard data movement patterns with predictable transformations are well-served by vendor solutions that have refined these common scenarios. However, when your business requires unique logic, unusual data manipulations, or complex interdependencies between systems, off-the-shelf tools may create more friction than they solve. Custom solutions can be tailored to your specific complexity rather than fighting against the assumptions built into vendor products.

Consider also how frequently your requirements change. Stable business processes with predictable data needs benefit from the reliability of established vendor solutions. Rapidly evolving businesses with frequently changing data sources, transformations, or destinations may find the adaptability of custom solutions more valuable. The more often your requirements shift, the more you'll benefit from the flexibility to change without negotiating with a vendor's roadmap.

Your team's expertise significantly influences the build-vs-buy balance. Organizations with limited engineering resources or data experience will naturally lean toward vendor solutions that provide managed services and support. Companies with strong data engineering teams may find that building plays to their strengths and provides more satisfying work for their teams. However, even technical organizations should consider whether their specialized talent is best deployed building data pipelines or focused on more differentiating work.

Scale considerations become increasingly important as data volumes grow. Small to medium data volumes are efficiently handled by most vendor solutions without prohibitive costs. At massive scale, the economics often shift toward custom solutions that can be optimized for your specific patterns and integrated with your infrastructure. The largest organizations frequently find thaat building becomes more cost-effective past certain volume thresholds.

Timeline pressure also influences this decision. When immediate results are needed to meet business deadlines, vendor solutions offer faster implementation with pre-built components. Organizations that can invest in longer-term solutions may find that the upfront time cost of building pays dividends through lower ongoing costs and better alignment with business needs.

Integration needs round out this framework. If standard connectors to common systems satisfy your requirements, vendor solutions offer tremendous value through their connector libraries. When your business relies on complex, custom integrations to proprietary systems or requires unusual data transformations between sources and destinations, custom solutions can be designed specifically for these unique integration challenges.

Building Resilience Through Strategic Flexibility

The key to resilience in your data stack isn't avoiding vendors or committing to them entirely—it's maintaining the flexibility to adjust your approach as circumstances change.

Building this flexibility begins with modularity. Design your data stack so components can be replaced independently without disrupting the entire system. This means creating clear boundaries between different parts of your pipeline and avoiding monolithic architectures that entangle multiple functions.

Equally important are abstraction layers between critical systems. These interfaces act as buffers that reduce switching costs when you need to replace a component. For example, a well-designed data access layer can shield your applications from changes in underlying warehouse structure or ETL processes.

Continuous skill investment in your team is often overlooked but vital. Even when using vendor solutions, maintain enough in-house expertise to understand how they work and how you might replace them if necessary. This knowledge prevents complete dependency and keeps options open when vendor changes occur.

Vendor diversification provides additional protection against disruption. Relying on a single provider for all your data needs concentrates risk—when they change pricing or deprecate features, your entire pipeline is affected. Spreading critical functions across multiple vendors or mixing vendor tools with custom components creates a more resilient architecture.

Finally, make regular evaluation a discipline. The build-vs-buy balance isn't set once and forgotten. As your organization grows, data volumes increase, and business needs evolve, the optimal position on the spectrum will shift. Schedule periodic reviews of each component to assess whether its current implementation still makes sense for your evolving needs.

Not All ETL Tools Are the Same

If you will prefer the "buy" option, keep in mind that there are significant differences between ETL tools that go well beyond feature checklists. These differences can dramatically impact your long-term success and satisfaction.

The distinction between ETL and reverse ETL functionality is fundamental yet often overlooked. Traditional ETL tools excel at moving data into your warehouse, creating a central repository for analysis. Reverse ETL platforms take the opposite approach, activating your warehouse data by pushing it to operational systems where it drives business processes. Few vendors truly excel at both directions. Your organization likely needs bidirectional data movement, so consider whether you're better served by one versatile tool or by combining specialized solutions for each direction.

Pricing structures vary dramatically across the ETL landscape, and these differences become more consequential as you scale. Some vendors charge based on the number of connectors, others on computing resources consumed, and increasingly common are consumption-based models like Monthly Active Rows (MAR). What begins as an affordable solution can quickly become cost-prohibitive as your data volumes grow. The most problematic pricing models penalize success—the more value you extract from your data, the more expensive the tool becomes. Look for transparent pricing that aligns costs with the value you're receiving.

Connector availability often features prominently in marketing materials, with vendors showcasing impressive libraries of pre-built integrations. However, the depth and reliability of those connectors matter far more than their quantity. A tool advertising 200+ connectors offers little value if the specific ones you need lack critical functionality or require constant maintenance. Evaluate connectors based on their depth of field mapping options, handling of incremental syncs, and ability to manage the specific API requirements of your most important data sources.

Perhaps the most overlooked factor when selecting an ETL tool is the potential for lock-in and the associated switching costs. Today's convenient solution can become tomorrow's expensive constraint. Assess factors like proprietary formats, the ability to export configurations, and whether the tool creates dependencies that would be difficult to replicate elsewhere. The best vendors provide clear migration paths both into and out of their platforms, demonstrating confidence in their value proposition rather than relying on high switching costs to retain customers.

Locked in?

Architectural differences between ETL tools reflect their origins and intended use cases. Some platforms are optimized for high-volume, batch-oriented workflows, while others excel at low-latency, real-time data movement. Some are truly cloud-native, designed from the ground up for distributed processing, while others are adaptations of on-premise solutions with the limitations that entails. These architectural choices may not be immediately apparent but will significantly impact performance as your data needs grow.

As data regulations tighten globally, governance capabilities have shifted from nice-to-have features to essential requirements. ETL tools vary dramatically in how they handle data governance, lineage tracking, and compliance documentation. Some provide comprehensive audit trails that track every transformation and access point, while others leave these concerns largely to the user. For organizations in regulated industries, these differences can determine whether a tool is viable regardless of its other capabilities.

The ETL tool that best serves your organization depends not just on features and cost, but on alignment with your specific data challenges, team capabilities, and growth trajectory. Take the time to thoroughly evaluate options against your unique requirements rather than following market trends or defaulting to the most popular option.

Conclusion: Finding Your ETL Middle Path

Every organization's ideal position on the ETL build-vs-buy spectrum will be different, influenced by team capabilities, business priorities, and growth stage.

The key is to approach data integration decisions thoughtfully rather than reactively. When an ETL vendor surprises you with changes, use it as an opportunity to reassess your position on the spectrum—not to abandon strategic thinking altogether.

By embracing the middle path for your ETL strategy, you can build a data integration approach that combines the best of vendor efficiency and custom flexibility, positioning your organization for sustainable data management no matter what surprises vendors might have in store.

Essential ETL Requirements to Address - If You Are Set on Building your ETL Tool:

⦁ Incremental Data Syncs (Change Data Capture - CDC): Syncing only new or changed records from operational databases or SaaS tools. Challenge: Requires custom logic to track changes, avoid duplication, and ensure data integrity.

⦁ Implementing the Correct Sync Type: Insert, Update, Upsert, Delete: Keeping destination tables aligned with source systems, even when deletions or partial updates occur. Challenge: Hard to manage without built-in support for record identity and merge strategies.

⦁ Handling API Rate Limits and Pagination: Pulling large volumes of data from APIs like Salesforce or HubSpot. Challenge: Requires manual implementation of throttling, backoff, and pagination handling.

⦁ Automatic Schema Change Handling: Adapting to new fields in source systems without breaking the sync process. Challenge: Manual pipelines often fail silently or require rework when schemas evolve.

⦁ Job Scheduling and Orchestration: Running sync jobs in a logical, dependency-aware sequence. Challenge: Cron jobs lack visibility, dependency control, and retry support.

⦁ Error Handling, Retries, and Alerting: Automatically retrying transient failures and notifying stakeholders on persistent issues. Challenge: DIY solutions rarely include robust error management or alerting.

⦁ Data Volume and Load Management: Syncing large datasets without overwhelming source systems. Challenge: Requires partitioning, batching, and load-balancing logic that's hard to build and maintain.

⦁ Secure Credential and Connection Management: Storing and rotating secrets securely across environments. Challenge: Custom approaches often pose compliance or security risks.

⦁ Audit Logging and Sync History: Tracking sync history for visibility and debugging. Challenge: Manual logs are often incomplete or siloed.

⦁ Resource waiting on standby: For errors and bugs a resource needs to be available to quickly provide a solution, with high availability, in the same time zone.

Don’t Miss Out On Future Articles

Stay in the loop with everything you need to know.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.