What is Big Data? By definition, “big data” is any collection of data which is extremely complex in nature and/or is extremely large in volume. Consequently, big data analytics has come to describe a new and highly effective approach to the analysis and mining of huge databases. More specifically, this new approach enables analysts and researchers to exploit and make sense of the unstructured big data in an efficient and meaningful manner.
In simple terms, big data can mean any unstructured data that is in the form of unprocessed raw materials such as internet logs, social network postings, or even real time auction sales and purchase orders. However, as many of you may already know, big data does not only pertain to massive and extremely complex databases. Instead, it refers to the analysis of smaller, less structured data sets. In fact, many of the largest investments by private sector companies have been made on systems that provide analytics and mining capabilities for small and unstructured data sets. Therefore, while traditional business intelligence (BI) tools such as SQL, Oracle, and Mongoose continue to dominate the BIS scenario, unstructured data is rapidly becoming an integral part of the BIS toolkit.
So, what is big data in the context of BIS? BIS (big data analytics) revolves around the ability to mine unstructured data sets for insights. Often, data analysis is performed without the knowledge of the user. As such, users may not be able to interpret the results or may be missing key information. However, with the use of sophisticated tools such as big data analytics, analysts and researchers are able to extract and interpret the relevant information in a meaningful manner. In short, big data analytics provides analysts and researchers with a greater capability to understand and make sense of the complex and often unorganized data sets.
One popular application of BIS is the Netflix application, which uses rich analytics from the Netflix consumer’s broadband device to track user behavior, including duration of time spent watching television and total number of episodes watched over a period of days. The Netflix application is capable of using both structured and unstructured data, although it has proven most efficient with unstructured data. With this in mind, Netflix uses both Kibana and Spark, two of the most popular analytics frameworks today. Kibana is an Open Source project based on the Map/Rank algorithm that is used by Google and Yahoo to rank websites. Spark is a framework that was developed at the University of Cambridge in the United Kingdom in order to allow developers to create data streams that can efficiently collect and analyze large amounts of data.
Another popular application of BIS is Facebook’s fan site, which allows users to browse through the fan page on their PCs and mobile devices. However, as users browsing other user’s pages are not necessarily “logged-in”, it makes it difficult for Facebook to track how the site is being used. Facebook uses data from the “audio and video streams” and from the “effects and interactions” of the pages being visited to generate statistics about user behavior. This data, in turn, is then used to provide advertisers with custom ads that are more tailored to each user and more likely to convert.
Amazon also uses unstructured and structured data analysis techniques to monitor its e-commerce site, namely the Mechanical Data Store. The Mechanical Data Store collects and stores customer information, including product descriptions, prices, shipping options, user demographics, and location. Amazon’s data analysis tools are based on Jaroark, an open source framework that allows programmers to efficiently manage and process complex data sets.
EBay uses sophisticated sensors and web feeds to determine the performance of its auctioning platform. eBay uses data mining techniques to create customized questionnaires for customers asking detailed questions about their purchases, browsing habits, shopping preferences, and the factors that influenced their purchase. In this respect, eBay is following a similar trend to Facebook: Companies are building tools that will allow them to understand and better serve their customers. Amazon has also recently launched Web Intelligence, a proprietary tool that will allow the company to create personalized user interfaces for its web applications. Amazon Web Services, a division of the e-commerce giant, is in the process of developing and releasing Web Analytics software that will allow users to “customize and segment” webpages, according to eBay.
Google has recently released tools called Picasa and Google Trends to help monitor the interests and conversations of people around the world. Facebook, too, is using data to determine where its ads are most effective and is experimenting with other features such as allowing users to upload data sets to prove they are “active”. Google is clearly aware that its focus on user privacy is beginning to influence its ability to make more money off of its social media efforts. Many large businesses are investing heavily in these endeavors, and while Twitter and Facebook have large quantities of users, Google’s interest in how much it is willing to pay for user data, especially in terms of clicks, is clearly on the forefront of their minds. As the Internet becomes ever more central to all aspects of our society, and as more organizations choose to filter and profile information by specific interests and demographics, the era of what is called Big Data is upon us.