Which of the Following Characteristics About Big Data Is Not True?

In the world of big data and big analytics, what are the answers to the age-old question, “which of the following characteristics about big data is true?” It’s a concern of mine, and I recently had an interesting exchange with a successful entrepreneur who was sitting in front of me at a conference. He was indeed a big data and big analytics guy and was quite agitated at my question, which concerned whether or not he was on the right side of the trend – literally, if he wasn’t then he would be “the next Ericsson.” To his surprise, I told him that was a silly comparison because it was simply suggesting that there might be an Erickson, not that there would be only one.

Still, my point was one which shouldn’t surprise anyone. Big data is here to stay, and we’re going to need a whole lot more creativity, experience, and patience in order to figure out what its characteristics are going to be over time. After all, as the internet evolves, and as new technologies are developed, we need to evolve with them, not against them.

So, I asked again, “Which of the following characteristics about big data is not true?” And this time, he agreed with me. “No, it’s not a positive thing,” he said. “It’s the same as saying that because X happened before Y, it’s a negative thing.” Indeed, that may very well be true, but it certainly isn’t the right way to think about it.

Indeed, I’ve often been described as a data scientist, even though I’m a software engineer. Data science involves lots of reading, studying, and writing. Data scientists will often have published research, and they will use mathematical methods to explore the data which they are studying. They use a different way to process and interpret the big data than regular scientists do. And, it’s important to note that many regular scientists now use some form of big data in their work.

However, one characteristic that I see very seldom mentioned in the media, or in academic circles, is what I call the second wave of big data. In my view, it is even more important than the first wave. That is, it applies even if you are a regular researcher. Data scientists like journalists, researchers, and software engineers all need to pay close attention to the second wave of big data, because it is very important.

“What is the second wave of big data?” It’s information, which is open to a wide variety of interpretations. If you ask a data scientist about what is the second wave of big data, they will likely tell you it is unprocessed, unorganized, and sometimes even uninteresting. That may well be true, but that also comes with a lot of advantages, as well as limitations. In other words, information is big data only in the right sense of the word.

For instance, when I was in graduate school, we spent many a summer running data analysis experiments on a large number of subjects. We collected lots of information, of course, but we really didn’t have any idea what to do with it until we were asked to go into the field and try to interpret the findings. That was quite difficult and took time. Another example is having patients’ blood pressure measured regularly over a period of time; that kind of data is certainly less messy and more likely to be interpreted easily. However, there are still many potential explanations for why a patient’s blood pressure measurement may change from time to time.

The truth is, the “big data” description is a broad term that can apply to many different situations. That is one reason why we have special computer programs and training programs specifically designed for helping people interpret big data. People in financial markets, for instance, may be interested in understanding which of the following characteristics about big data is not true? They might be interested in knowing if their investment strategies are working or learning how to improve them; or perhaps they want to know if there are any substantial relationships between variables. The first step is to acknowledge that all of these are possible and important, and then to build computer programs and training curriculum around those possibilities.