Most companies treat the proverb "data is the new oil" as gospel. They try to capture and store as much data as possible in hopes of achieving better business outcomes. But if adding more data doesn't result in better decisions, you should switch to a knowledge-first approach—one that provides useful data with context to solve real-world problems.
Our increasing ease of storing and analyzing data has led businesses to stockpile data without knowing how to use it, and the repercussions have been evident. More data does not guarantee better business decisions. Too much data often leads to “paralysis by analysis“—the inability to make decisions because we’re too busy analyzing everything. Also, making meaningful comparisons between options becomes difficult, leading to poor decisions.
A recent report from NewVantage Partners said that 97.0% of participating organizations are investing in data initiatives, yet only 39.7% reported that they were managing data as an enterprise business asset. Barely over a quarter, 26.5%, reported that they had created a data-driven organization.
Why is this the case? Where is the disconnect between collecting more and more data and making it the backbone of a company?
It is essential to consider the following question when setting up the systems governing decision-making within an organization-
Are we accessing the best, most contextual form of data for decision-making?
Making Sense of Data by Turning It into Knowledge
Data, a set of raw and unfiltered stimuli—facts, symbols, or observations—carries no inherent value due to a lack of context. Once a context, structure, or meaning is imbibed into this data, it becomes information. Knowledge is the synthesis of information over time with its associated context. It is the ability to apply information to answer a question, demonstrating a higher level of value than simple discrete information.
According to the DIKW pyramid, there is a hierarchy among the three, with each leading to the other in succession. When considering the way businesses approach their most crucial decisions, this pyramid becomes especially relevant. Businesses can make significant impacts on the outcome by determining the level at which they view the pyramid when making the most important decisions.
As an example, suppose that you are a telecom operator in the US targeting a young audience, and you have the following data derived from the logging records of leading mobile network operators (source: Statista)
Data: In 2021, the US had 498.91 million wireless subscriber connections.
The only information you receive is subscriber volume. To make this useful, you must find out why, when, and how (context). These are the questions that still need to be answered if you are looking to grow your business.
After digging through this data set, suppose you found out that out of the 498.91 million wireless subscribers, 58% of the users belong to the age group 18-30 and use the service actively between 5 p.m. and 10 p.m. during the day.
|Age Group||Percentage of Subscribers|
Information: The age group 18-30 makes up 359.21 million wireless subscriber connections who heavily use services from 5 to 10 p.m.
Taking this a step further, you could build a knowledge layer by understanding how young audiences use their mobile phones and services.
Knowledge: Young audiences expect streaming services and digital payments to be integrated into their mobile plans to benefit from flexibility and affordability.
With the help of this knowledge layer, you will be able to concentrate your efforts directly on the right channels and communicate with your target audience. Knowledge-first is about using abductive reasoning to selectively choose only the right channels, the most effective strategies, and the correct communication.
A deeper understanding of young audiences’ preferences eliminates uncertainty and experimentation, enabling you to profit directly from new revenue streams like mobile money, digital payments, and streaming services. In a data-first approach, you would still need tests to confirm your efforts (like ROAS on multiple paid campaigns) which would reduce your ability to create and deliver higher value within the same time frame.
A knowledge-first approach introduces a continuum of understanding in which ingested data is continuously distilled into knowledge, which, when combined with intuition and experience, leads to better decisions. It is like hitting a bulls-eye instead of trying to gravitate your shots in multiple iterations. By putting data into context and turning it into knowledge and insights, you can improve the efficiency and effectiveness of decision-making and arrive at actionable insights faster. After all, actionable insights—insights that are relevant, specific, contextual, and data-backed—are the ultimate goal of gathering data within a business.
Common Issues with the Conventional Approach to Data
With the advent of data warehouses and data lakes, the amount of data that businesses collect has grown exponentially, and the value we place on data has consistently evolved. Businesses are now moving past the question of whether they should collect their data and how, towards the question of how this data can be used for the best possible outcomes. In this journey, there are several obstacles that they battle.
Loss in Semantic Meaning of Data
As businesses collect large amounts of data, they have the potential to lose sight of the semantic meaning behind that data.
Semantics is the mapping between the data and its relation to the real world. It involves the meaning behind and the practical use of the data being referred to.
A common issue with the way that data is being collected today is the lapse in focus on the context of the data. Any data that is captured without a full understanding of the context, people, and relationships involved run the risk of not providing a complete picture.
This makes insights derived from this data divorced from the reality of the meaning of the data being analyzed.
Sidelining of End Users from the Data Journey
In the Data Mesh Radio podcast, Juan Sequeda, Principal Scientist at data.world, discusses this issue as follows-
If people are not able to answer their questions even if you’ve integrated data or put it into a lake, then you have not been successful. Success needs to be defined by the people who need to answer a question…anybody can compare products on Amazon and make a decision about it. We need to start treating data that way.
Because of the exponential growth in the amount of data available to businesses, there has been a misplaced focus on the amount of data collected. This treats the entirety of business data as a technical problem. The focus should rather be on the people who use this data and what they would need to utilize it best, which is more of a people problem.
The desired outcome of all data activities should always be actionable insights, not merely to play around with the data coming into the warehouse or data lake as much as possible.
What does this misplaced focus imply?
End users of the data product are gradually being removed from the equation.
Overcollection and Overcomplexity
“The more, the better” is a simplistic yet unfortunately common way of looking at data. People might think that more data means better segmentation, more ways to give information, and more detailed results.
American computer scientist and Google’s former Research Director Peter Norvig has been quoted as saying, “More data beats clever algorithms, but better data beats more data,” and we couldn’t agree more. While more data can certainly mean better training of models, more detailed results, etc., the accumulation of large amounts of unstructured and undefined data can lead to the formation of data swamps that data teams struggle to wade through.
A data swamp can be formed due to the lack of governance, structure, or processes surrounding the data. Having clear metadata before data is put into a data lake is a must to reduce complexity, give complete context to the data, and reduce the chances of forming a data swamp.
How can such issues with data be addressed? How can data be utilized in a way that gives it the maximum context, hence putting knowledge before the data itself?
These are questions whose answers we are still exploring along with the rest of the data world, but here are a few recommendations.
Contextualize Data Using Knowledge Graphs
Data in an organization is never an isolated entity. It involves a background of people, context, and relationships.
- Context– If I see a piece of data, what does it mean? What exactly is it measuring?
- People– Who is the owner of the data, who makes the decisions based on this data, and which teams are involved?
- Relationships– How is one piece of data connected to others?
In the bid to collect more and more data, this background or semantic meaning of the data can be lost, which can end up making the data murkier than it ought to be.
Knowledge graphs are one way that the data world has been attempting to solve this problem. Modern-day data catalogs powered by knowledge graphs integrate knowledge and data at a large scale in the form of a graph model. In Gartner’s view, knowledge graphs are the future of data catalogs, as they can efficiently model, explore, and query complex data with complex interrelationships across silos.
By representing data in a graph that depicts the relationship between and context behind each piece of data, organizations can ensure that each piece of data is automatically linked with its context. Data is not restricted to silos or verticals but forms an organization-wide continuous graph, even supporting heterogeneous data formats of the future.
Get the Right Balance Between Centralization and Decentralization
As companies grow, finding the right degree of centralization of data can be a challenge. Highly centralized data can cause significant delays, reduce visibility for the users of the data, and go against the principles of data democratization that organizations strive towards. On the other hand, excessive decentralization can manifest in the formation of data silos or problems of security.
An organization must strike the right balance between the two, which can be a very company-specific balance. The following are a few questions that would need to be answered:
- Who are the main stakeholders? What do they use the data for?
- Are there different categories of stakeholders?
- How much collaboration is required between the various teams?
- How much data fluency do these various stakeholders have?
- How would the system need to be scaled?
…among others. Based on the answers to these questions, a balance can be struck that proves valuable for how the company uses its data. An important step is to establish clear definitions of data, track data usage metrics, and make it easy to access. Certain integral data concepts might need to be fixed, while others can be more flexible. This would make it possible to use the data as much as possible while still treating it as a product with a clear definition.
Form a Continuum of Data
Does this scenario sound familiar? Your team has a business question that needs to be answered, so it reaches out to a data analyst or data scientist. The question is framed very specifically so that you can receive the most accurate data that has to do with the question. But once the team achieves the intended goal using this very specific dataset, it is forgotten. The next business question requires a new set of data, and you repeat the entire process once again.
This direct approach can seem convenient from a myopic point of view. A business question is answered as soon as it arises in a straightforward manner. No extra effort is required.
However, what this approach leads to is an inflexible, modular way of looking at data. The data in the warehouse or lake itself does not hold any meaning; it is simply a tool to be molded according to short-term requirements. It merely exists as an answer to a specific question. When data is used to answer a business question, it is a key time when business users, technical users, data practitioners, etc., may be brought together. At the end of the project, the business loses all that valuable information.
Rather, the knowledge gained should be used as an accelerant to drive data quality for future projects. Going back to the DIKW pyramid, this approach restricts valuable business data to the bottom two rungs of the ladder. If we wish to treat data as the resource that it is, a more well-rounded approach would be to treat data as a continuum of knowledge.
This means that once any data is transformed to extract valuable knowledge from it, it isn’t discarded but instead used as a base to build more knowledge. This would lead to the formation of a knowledge layer, which would avoid redundancy and repetition due to the overlap of business questions and divergent data types. Instead, it would encourage the recycling of data analysis, treating past analyses as resources to construct this layer and build upon it.
Combine Data and Intuition
Data and intuition are often pitted against each other, one being the proponent of knowledge and the other being the antithesis of it. However, is it really possible to separate all human impulses from business decisions, and if so, should it be done?
If valid decisions could be made by any individual with access to the pertinent data, the background or experience of the people making crucial decisions would not matter at all as long as we are able to present our data to them in the most accurate way. However, this is not the case. Experts in a field are sought after not because they can read the data the best but because their years of experience and knowledge have created a unique intuition in their field that is insurmountably valuable. This intuition is not based on the lack of data; rather, it is the culmination of years of data and the lessons learned from it.
As Peter Drucker says, we must “believe in intuition only if you discipline it.”
Let us consider an example to understand it better.
In 2013, Airbnb decided to revamp its listings page. They designed a version with many improvements that users had agreed were qualitatively better, like a bigger emphasis on the map showing all the listings and a more visual display. Yet, the data showed that the new version of the page was not performing up to the mark. With surface-level reliance on this data, the team should have gone back to the original version of the page or reduced the changes on it.
However, their intuition told them that the upgraded version of the page should ideally perform much better than it was, so they decided to look deeper into the context. They realized that the new version was performing poorly only on Internet Explorer, as an important click-through action was broken on IE. Fixing this issue pushed the performance to the expected levels.
(Learn more about Airbnb’s experiments here)
This example perfectly displays the purpose of the use of data in a manner that includes its context and allows business intuition to work alongside it.
Wrapping It All Up…
The way that data is treated has constantly been evolving over the past decade, and it continues to do so even now. Latest concepts like data mesh, treating data as a product, knowledge graphs, etc., highlight the need that data practitioners feel for a way to improve or maybe even revolutionize the way that data is thought of and utilized.
As we continue to mature in our data journeys, we must make a conscious effort to increase not just the volume and sources of data but also the efficiency and effectiveness of the way that it is handled, involving the people in the process, so that it satisfies its true purpose—to answer business questions in a reliable way.
Each organization may approach its data journey in a different way, depending on its requirements. All the suggestions contained within this blog, whether it is about treating data as a constantly improving product or allowing intuition and data to co-exist and amplify each other, tie back to the same purpose; to treat data more as a reliable foundation for knowledge rather than a mere resource that needs to be collected or hoarded. Doing so causes businesses to remain stuck at the bottom rung of the DIKW ladder instead of moving towards enabling data to become a gateway piece to true knowledge.