Predictions for 2014, Round 2: More Big Data, Data Quality, and Analytic Convergence

Jake Freivald's picture
 By | December 17, 2013
in analytical convergence, analytics, big data, Data Quality, data scientists, data stewards
December 17, 2013

Dilbert understood data quality issues back in 2008. (We only used half of the image to make sure we complied with Fair Use laws -- check out the full comic at

Let's pick up where we left off on Friday: Our first three predictions for 2014 concern Big Data, Data Discovery, and InfoApps.
1. The term “Big Data” will get smaller, but data volumes will continue to go up. 
2. Isolated data discovery will hit the wall. Integrated data discovery will pick up steam.
3. Analytic apps will be the biggest driver of user adoption in 2014.
Here are four more predictions for 2014, related to big data, data quality, and data analytics.
4. Machine-generated data (including from the “Internet of Things”) will grow faster than any other Big Data source used for analytics purposes. (This is especially true in manufacturing and healthcare.)
Many people who want to discuss Big Data are focused on unstructured, human-generated data, such as social media. We'll continue to see people addressing those issues.
However, we believe that 2014 will see even more interest in machine-generated data. We're already seeing a massive increase in the amount of machine-generated data flooding into our enterprises. It's often structured and contains only a tiny amount of information: location and temperature readings from a refrigerator truck, for instance, or sensors from thousands of parts of an airplane. 
But there are an awful lot of them, and they often have to be handled in real time. Moreover, since they contain such a small amount of information in each record, they'll only be useful if they're reconciled to other information, such as product master data or purchase orders. 
Analysts are going to feel like they're being buried in sand, and they're going to be looking for ways to make castles.
5. The title “Data Steward” will become as hot as “Data Scientist”.
Data Scientists became a hot commodity as people warmed up to the idea of Big Data. Everyone from Forbes to McKinsey and Co. have talked about the need for people who can make sense out of huge volumes of tough-to-manage data.
The need for more Data Scientists won't go away anytime soon; however, more and more companies are going to see that there's a pressing need for *good* data for the scientists to analyze. That's not a scientific function, but an operational one; it doesn't need someone who knows how to crunch numbers, but someone who knows how to get high-quality numbers to crunch.
For that reason, we expect to see Data Stewards -- businesspeople who understand data and how to cleanse it in the course of a business process -- to become more important. A quick perusal of job-seeker sites shows Data Steward jobs at about one-fifth of the number of Data Scientist jobs; expect that to get a lot closer to a one-to-one ratio.
6. Data quality will have its day – data quality issues will more than double in prominence and effort.
With more data analysis going on all the time -- especially with Big Data analytics -- more problems with our data will surface. It's only a matter of time before people start to emphasize the quality of the data needed to get higher-quality results.
This isn't a new problem. Data quality has always been important. The Dilbert cartoon excerpted above (cut off to make sure we're falling under Fair Use: Click to see the whole thing) comes from 2008. 
But with the increased emphasis on making data-driven decisions, the issue is going to elevate -- and we believe it will get twice as much emphasis in 2014 as it got this year.
7. Analytic convergence – the convergence of predictive analytics, data discovery, GIS, and other forms of analytics – will lead to analytic automation through machine learning, smart ETL, and other automated processes.
One of the most interesting things about being a technologist in this industry  right now is the number of different technologies that are all moving in the same direction. Two examples: 
a) Very large data sets often aren't very useful by themselves. Analysts need to extract the most interesting subsets to do analysis. With the convergence of statistical analytics (e.g., our RStat product) and data extraction and ETL capabilities (e.g., our DataMigrator product), analysts will be able to use the precision of predictive analytics to help determine which data sets should be extracted.
b) Machine learning technologies are taking large segments of data and clustering them into pre-aggregated groups for data scientists to analyze. When this gets done quickly and the results get applied automatically to maps and data discovery tools, it changes how quickly data scientists can get off the ground and how many projects they can handle in a year.
Okay, we've made some predictions, and we're going out on a limb on some of them. What do you think? Anyone placing bets?