What Is Big Data?
What is Big Data? According to Wikipedia: “The term ‘big data’ (abbreviated as B.D) is a term that combines information theory and the phrase ‘big data’ to describe large-scale, interconnected, managed databases and their associated technologies.” According to IDEA magazine (pages 14 & 15), “When systems involving hundreds of millions of objects are combined, the resulting information is often much more complex than can be studied or analyzed using conventional information processing technologies.” In a related vein,”…an analysis of data using computers will always need trained computer software specialists.” In order to interpret, predict, and ultimately control the value generated by this collective, almost infinite collection of data sets it very difficult to design a method of action that truly fits the data, which is why so many in the IT community are continually looking for better tools to analyze big data.
How does Big Data fit into the current discussion about agile software development and its necessity for fast, robust, adaptable systems? In fact, we often hear one senior manager of a large software company to complain to his team mates that the velocity with which he is moving his software projects is too slow and he wants some new rules to speed it up. He is looking for velocity bumps along the way, not established processes. If you ask what is big data to him, he might say three things:
First, he would say that big data analytics is necessary because: “ather people have been talking about this stuff for a long time but nothing has happened. We just don’t have the skills in the group.” Second, he would point out that there is no inherent structure in unstructured data and that it must be manipulated and then treated like a structured document. Finally, he would argue that because no formal methodology exists to manage it, managers must rely on computer skills and judgment to make sense of it all.
I think this last statement is the biggest fallacy out there and it is also the most dangerous. There is nothing inherent or deep about unstructured data. Structured document management (SCM) was developed decades ago and works very well for some things. What is commonly called big data is actually just unstructured information – and that information is increasingly important due to technological change. It has to be managed named in a structured way for managers to use it in a cost-effective way and also for them to feel good about the progress they are making.
So what is big data? According to my view, it is predictive analytics applied in a new way to unstructured data. It analyzes data mining, behavioral science, and traditional databases to make predictions about customer trends, motivations, and organizational needs. For example, managers could use this information to forecast where customer dollars are likely to move next, where demand is likely to increase next year, where supply will peak next year, and even when next quarter’s revenue will end.
Traditional databases usually deal with fixed data in a traditional format. This means you get to have access to hundreds or even thousands of rows at a time – but it means you must either sort through them one-by-one (analog processing) or parallel – send the filtered results over multiple sockets, and then sort again (xerographic processing). Streaming platforms on the other hand deal with large amounts of unprocessed data in real-time. This means it’s much more likely that the result from streaming will be a fast and accurate representation of what a traditional database would return. This makes using streaming platform data much more efficient than traditional methods
Streaming is also used for several industries, like retailing, where fast analysis is needed. The process does not involve sorting through huge amounts of data sets; instead, certain criteria are targeted in a very short amount of time to get the desired result. Some common examples of these industries include coupons that need to be redeemed quickly in order to save money or recommendations that must be actively sought out and consumed. This leverages the ability to obtain information in real time and analyze it using natural language processing (NLP), as well as leveraging previously-collected data sets. With NLP, these “instances” can be labeled, analyzed, and refined (or smoothed) according to pre-established rules – ultimately resulting in an even more relevant result.
Another example of leveraging traditional data sources to generate insightful insights into what is happening in the marketplace comes from marketing/testing/launch environments. Here, data mining enables marketers and testers to test and evaluate customer experiences before launching a new product or service. This leverages the ability to obtain data sets from all angles: behavioral, survey, and interaction. It can also measure key performance indicators against previously measured metrics, as well as derive metrics from new and unique customer experiences. Ultimately, this allows for the development of improved customer experiences, which lead to more repeat customers, and ultimately to higher gross sales and margins.