Which of the following sources is going to produce big data the fastest? The most obvious answer, and perhaps the easiest to calculate is to divide the total amount of data by its length. This quick calculation will reveal the answer as approximately how much time it will take to process data from each of the two pages in the data warehouse. Obviously, this number will vary depending on the size of the data warehouses and the volume of data that they contain.
Data Warehouse Aggregations. When considering which of the following sources is likely to produce big data the fastest, a consideration must be made of the granularity of the individual data warehouses. Data warehouse aggregations are used to combine different dimensions of the same data source into a single view. For example, a customer purchase order may be represented as a set of tabular rows that have been sorted by each item’s date of purchase. This means that once the sales order is completed, the tabular rows will contain not only the date of purchase but the product name, quantity ordered, stock id, store name, and more.
Aggregations. Aggregations represent an increasingly challenging task when attempting to identify the fastest method of processing data. There are at least two reasons for this: first, the overall size of the database, and second, the speed with which new aggregations can be added or deleted from the collection of data. As previously mentioned, the greater the granularity of the data warehouse collection, the more difficult it will be to maintain an accurate answer to which of the following sources is likely to produce big data.
The traditional sources used by companies like Sears, Microsoft, Cisco, Priceline, and others are simply too large to be able to query and extract useful information from quickly. However, the increased amount of time available to query and update data means that analysts can now more efficiently query the collective results for which of the following sources is likely to produce big data: sales order entry clerks, online customers, retail stores, and vendors. When considering which of the following sources is likely to produce big data, it’s important to remember that accuracy of the returned aggregate value depends upon how accurately the analysts managed the sales order and purchase entry clerks data.
Sales order entry clerks. This question actually assumes that all their data are updated in a consistent manner. For example, sales order entry clerks may have access to sales data only on a specific day, on a particular month, or on a particular date. In this case, it would take a significant number of queries on which of the following sources is likely to produce big data on which of the data source are likely to give aggregated value: sales order entry clerks’ website, and the vendors’ website. However, it would still take at least one additional query on the sales order entry clerk’s website and another query on the vendors’ website to arrive at the second set of aggregated values. Thus, although it would still be more accurate to look at how many times the two sets of aggregated values from the website and the vendors’ website were refreshed (which would also take some time), the data still does not stay consistent.
Data warehouses. Analytics data warehouses allow an analyst to easily query multiple sets of data simultaneously. However, data warehouses have limitations that may offset the significant advantages they offer. First, it may not be feasible for data warehouses to provide the kind of aggregate value (aggregated per month, per quarter, or per year) that may be relevant to queries on which of the following sources is likely to produce big data: orders of manufacturing done by a factory and its customers; or the frequency with which orders are performed by a factory and its customers. It also may not be feasible for data warehouses to maintain the consistency needed for meaningful analytics.
In addition to the in-memory analytics, cloud computing offers two other forms: the application and the grid computing. An application is any collection of applications that can be executed in the cluster. The execution of such applications may be done in the server or in the client machine. Grid computing, on the other hand, refers to the large scale use of a server to process the requests of hundreds of client machines. The distributed nature of grid computing allows the processing of large amounts of large data elements in a short period of time, making it ideal for use with large data sets and time series.
The key to making data driven decision making possible is the accurate analysis of both sources. The best source is still in-memory analytics, while cloud computing delivers much more intelligence. The two together will make data driven decision making possible for businesses of all sizes. Data visualization makes this possible in a way that traditional database management systems simply could not.