The myth of the single source of truth
Why chasing one golden dataset wastes time, creates frustration, and distracts from the real goal: trusted and usable data.
We need a single source of truth. Simple.
One place where everyone can find everything needed, and everyone can have the same numbers and the same definitions. A golden dataset that eliminates confusion and powers every decision with confidence. From dashboarding to ML models.
No more debates whether revenue has grown by 3% according to Finance or 5% according to Marketing. No more analysts rebuilding the same queries in slightly different ways. No more arguing about which dashboard or system to trust. In theory, a single source of truth solves everything.
Unfortunately, the uncomfortable reality is that the “single source of truth” rarely works the way we think. After building and rebuilding data warehouses, refining semantic layers, and curating the “perfect” data marts on top of other data marts that, for some reason, were deemed not good enough, people would still argue about the numbers, tables, and dashboards. Worse, teams waste a considerable amount of time debating which “version” should be the source of truth.
Unlike physics or mathematics, the truth in data depends on the context in which we view it. Chasing the illusion of a universal dataset that everyone can use for any initiative distracts from the right objectives: ensuring that the right people have the correct data they can actually use.
Why the “Single Source” rarely works
The idea of a single, canonical dataset often feels like the answer to fragmented data sources. But in practice, it frequently leads to frustration, endless rebuilds, and data quality issues.
Business definitions are not universal.
For example, what does “active customer” mean? Finance, product, and marketing will each give you a different answer. It is not as simple as saying it is when the “active” flag is true in the database! Trying to force a rigid definition creates friction and ignores the context in which each team operates.
And assuming you agree on a single definition today, the business will evolve. The context as well. And when you look at the definition a year later, it no longer applies. But as nobody dares to change the carefully crafted “single source”, teams create their own side pipelines and datasets. Back to chaos.
Tools and layers multiply
Engineering teams also contribute to the complexity of this challenge. From raw data to transformation pipelines, semantic layers, BI dashboards, and AI models, there are several places where numbers are defined and redefined. Even if you manage to build a golden dataset, by the time the numbers reach the dashboard, the “truth” has been reshaped three times. And as the delivery tends to be slow, backlogs grow, and businesses that can’t wait fall back on spreadsheets that are out of control.
To summarize, building a single source of truth requires a significant amount of effort, often for minimal payoff, and you are actually always one step behind the business.
The real goal: usable and trusted data
The goal of “single source of truth” initiatives is to create trust in the data people use to make decisions. And trust doesn’t necessarily come from centralization. It comes from usability, clarity, and alignment with business needs.
Empower domain ownership
Instead of centralizing everything, let domain teams own their slice of the truth. Of course, give them standards, tooling, and governance to make it usable across the organization. Otherwise, you will end up with another type of chaos, and you will be back thinking that centralizing everything is the solution. A federated model often scales better than a single, monolithic dataset.
And if domains want to have a different definition of “active customer”, it is acceptable. Just document them. Lineage, metadata, and plain-language documentation often do more for trust than another round of centralisation. What matters is that everyone knows which number they are using and why.
Data as a product
Documentation is the most common weakness I have seen over the years. Teams tend to deliver datasets and tables without guiding users, who often misinterpret the data and erode trust. And no, adding a one-sentence description to a table is not documentation!
Treat datasets the same way you’d treat a customer-facing product. Who’s using them? What’s confusing? What’s missing? Collect feedback, and iterate. Think usability before elegance: is it discoverable, queryable, and self-explanatory for the people who need it? A dataset that 80% of your teams actually use, even if imperfect, creates more value than the perfect data mart that never gets adopted.
To summarize, let’s not forget the ultimate goal of providing data to the business: users need to get answers to their questions, based on their specific context. Building a fancy golden dataset is an engineering solution and ambition, not a business requirement.
Conclusion
The pursuit of a single source of truth often means centralisation and frequently creates frustration for both engineers and business users. In practice, there will always be multiple truths, shaped by context, time, and business perspective. Trusted data is about context and adoption.
When organizations shift the goal from one version of the data to trusted data products, they unlock far more value. Engineers can stop chasing migrations, analysts can stop debating whose numbers are “right,” and business stakeholders can focus on making decisions.
Ask yourself:
Do your teams (both engineering and business) understand the definitions behind the data?
Do they trust what they are using to make decisions? And if not, why?
Are you continuously fighting the same data quality battles?
The challenge you are facing may not be to build a single dataset, but to make your data usable. And very often, the documentation and the data product approach are missing. So the question isn’t whether you can centralize truth, but whether your teams can trust and act on the data they have.
Are you building for control, or for confidence?