Advice, stories, and expertise about work life today.



We know your inbox is protected space, so we promise to send only the good stuff, twice a month.

Big Data Editor's Picks Machine Learning Trends

The top 3 fallacies of (Big) Data value extraction

I have discovered many new things over the past two and half years, from the business value of location intelligence to the high complexity of building digital representations of the physical world.

It all used to be quite basic. The "digital maps" of the 1990s served as limited use cases, helping users getting from A to B without needing to consult a physical map. In the last 30 years, enabled by rapid development and huge amounts of data, a very high degree of sophistication has evolved. Today, a typical transportation and logistics use case sounds something like this:

"Give me the best route across four stops considering all truck restrictions, minimize fuel consumption based on road geometry and traffic, respect timing constraints for each one of the stop-overs. Give me also the cheapest lunch options no more than five minutes away from my second stop and inform the transportation manager when I enter the pre-determined geofence of my last stop."

It sounds like a whole new world. Straightforward as it may seem, our "new basic" model involves huge amounts of data that needs to be kept accurate, rich, and fresh to remain meaningful. Location intelligence is quickly becoming another "Big Data business" with phenomenal opportunities to deliver value to customers.

Yet, through discussions with companies across industries, I realize there are still a lot of people sitting on large amounts of data and not knowing what to do with it, let alone the underlying value it may represent. Many things are getting in the way of data monetization projects. Some are, in my opinion, quite basic misconceptions. Here are my (current) top 3 fallacies of Big Data value extraction from these discussions.


The future is open data

There are two types of promoters of open data: the Ideologists and the Free Riders. The Ideologists are the ones with a strong belief that the “free model” is a necessity and the world can only be a better place when things are in the public domain.

Personally, I can side with the Ideologists, at least partially. Some data have huge value to society. Anonymized health data, for instance, when aggregated, given a timing- and a location-stamp, have the power to improve public health by accelerating research in disease prevention and treatment.

The Free Riders, on the other hand, are usually the ones selling services on top of open data and largely avoiding the costs to acquire, handle, improve, store and make data accessible. Given its (extremely) high costs, the economics of open data do not hold in most cases. There has to be an incentive for companies or entities investing in high-quality data production to continue doing so.

As the joke says, when something on the internet is free, you are the product. When it comes to data, it’s worth looking deeper at the quality, ownership, and party responsible for bearing the costs in the data value chain.

Data is the new oil. Has anyone ever heard of free oil?

We need increasing amounts of data to be competitive

Not always. There’s a critical threshold for any given type of data, but there are other reasons for data monetization projects to get stuck. Lack of sufficient data is rarely the main issue, as these are all contributing factors:

  • Lack of skills: not only the obvious scarcity for data scientists, but also lack of business acumen, communication skills, etc
  • Tendency for complex models
  • Asking the wrong questions/unclear business objectives
  • Lack of infrastructure and budget

Again, things like pattern recognition (e.g. image, voice, video) requires a lot of samples to train machine learning models. But in many business contexts, it’s frequently also about quality, diversity, and data availability. Today, many business problems can be improved with the right data, instead of trying to get more of the same.

One example I love is the Traffic Gaaye (Cow in Hindi) application, developed by Videocon and Mobiiworld. We know cows cause traffic jams on Delhi streets. Realizing that the cows themselves had unique ways to avoid congestion, the teams decided to track them and ingest the data to provide for alternative routes. This reduced the average travel time by 15 minutes and turned the cows from part of the problem to part of the solution. Not more of the same data probes, but the right data makes a real difference.

In order to innovate, we need to protect our data and keep it for ourselves

In the best case, this is only a half-truth.

In 2015, three competing automotive companies – BMW, Audi, Daimler – jointly acquired HERE Technologies, but what is remarkable is that they decided to share vehicle sensor data as part of a common platform benefitting everyone. HERE started publishing the first set of platform services based on this shared monetization strategy shortly thereafter.

In this case, aggregating originally competing sources of data enabled more reliable services than any single player could create on its own. Innovation improves — as it relies on a common platform — and differentiation is possible via the end-user services which still need to be conceived, designed, and delivered.

Breaking data silos enables services that were not possible before. And data silos exist everywhere, even within a company’s different departments.

Innovation needs to be user-centric. It comes from better services enabled from breaking data silos.

So, the next time your data monetization initiative seems to be going nowhere, look under the hood and study the potential misconceptions your company may be facing!

In short:

  1. Data is not free. Data monetization is key to a sustainable business
  2. Quality vs Quantity. The right data, not just more data.
  3. Break silos to unlock innovation

Defeat urban congestion

Building smart infrastructure is now a reality

Download the eBook

Sign up for our newsletter