Discover why data diversity is the lifeblood of business in the information age.
Increasingly we find ourselves able to leverage technologies such as machine learning and predictive analytics, and breakthroughs in AI architecture and engineering are partly to thank for this.
The vast increase in the amount of data we are able to generate, store and examine, thanks to the Internet of Things and our always-online world, is what has really made it possible though.
Of course, business using information to make better decisions is nothing new. What’s different these days is the scale of available information – the volume of data generated by the world’s businesses doubles every 1.2 years – and the opportunities it provides to innovate.
Businesses which build their operations on their ability to collect and use data are disrupting industries every day. Amazon, Uber and Airbnb all grew by operationalizing big data generated through their own core activities in the markets they have individually dominated. Data ranging from website clickstreams to vehicle fuel consumption is used to tailor services to meet customer demands and streamline business processes. Now though, innovators seeking out the new cutting-edge are looking beyond the data immediately available from their primary activities and operations.
When it comes to data, as with all other aspects of business, diversity is critically important. This is true in a meta-sense, as non-representative data sets are less likely to yield workable insights than those which cover all facets of the issue under investigation. It’s also true in terms of the variety of data available.
Variety has always been one of the fundamental “V’s” of Big Data – alongside volume, velocity and various others that have been added over the years. Today – with the sheer divergence of datasets available – it’s more critical than ever, as insight can often be found in unexpected places.
To launch a truly diverse data-driven strategy, the key is to think beyond the data that an organization already has readily available, or that which would be simplest for them to collect. Thanks to breakthroughs in technology such as image analysis and natural language processing, meaning can be extracted in an automated way from video, handwriting, recorded speech and the text of emails and social media posts.
In the healthcare field this has made it possible for robots to efficiently and accurately diagnose patients based on data from medical scan images correlated with their doctors’ hand-written notes. In certain situations, machines have proven themselves to be able to do this at least as accurately as (the best) humans – and a lightning speed.
And in marketing, advertisers are developing methods of better understanding their customers’ lives and habits by analyzing how, when and where their products are talked about, photographed and posted to social media.
This messy, scrambled-up, unstructured data in fact makes up over 90% of the data generated worldwide. The rest is nice, orderly structured data, often generated by machines talking to each other and making logs. This data is made up of numbers which can easily be slotted into charts and tables and analysed with simple mathematics. As well as making up a majority of the volume of our data, unstructured data very probably holds a majority of the so-far undiscovered insights. Getting at them isn’t always easy – but the rewards for those with the initiative and imagination to try are potentially enormous.
Other sources of data which are increasingly being mined for insight include locational data such as GPS tracking and satellite imagery. Putting this information to work would have been beyond the budget of any but the largest organizations just a decade ago. Now farmers across the globe are becoming used to the idea of combining satellite and meteorological data to determine the optimum timing and placement of crops, and retailers track the movements of shoppers using near-field communications that interact with their smartphones. Thanks to the emergence of the ‘as-a-service’ infrastructure provision, start-ups anywhere in the world can access diverse data sets, crunch them, and learn from them.