How to Analyze Big Data
In previous posts on “How To Analyze Big Data Using Data Sets” we introduced several applications of cluster analysis, such as mean square and the binomial tree. We reviewed several tools for statistical inference using ensemble data and explained how to analyze and interpret results from cluster analysis. This post provides an update on how to analyze big data using these tools.
Researchers at the Massachusetts Institute of Technology have developed a new way of analyzing complex, hierarchical data sets. Psychologists can apply new research questions based upon many complex datasets, including ISSP, Lsat, PISA, and the GLOBE Project, to evaluate statistical association data. The new airline data set is not very individualized but rather qualify as high data based on many sources. So, for example, this data could be used to help predict airline prices.
It is sometimes necessary to combine multiple measurements with traditional methods. For instance, many surveys offer participants their choice of how much they want to earn, what kind of car they would prefer, or even what gender they would like. Surveys typically ask people to rank their preferences for items over a number of choices, and then calculate their expected earnings, salary, and convenience. One way to analyze this data is to evaluate expected earnings as a function of the ranking of the items. For instance, suppose someone wants to earn more than $6000 a year, he or she might also rank making a regular commute to work higher than commuting to work by car. If the person were to combine these two measures of desirability, they would end up with a measure called “desirability quality score”, which expresses a person’s preferences about money and work more strongly than his/her opinions about cars or gender.
In addition, many psychological studies use a statistical test called multivariate analysis. This involves analyzing large numbers of data points using many different types of statistical methods. The goal is to identify relationships among variables, thus, identifying which characteristics are related to one another and which characteristics are unrelated. For instance, if someone wants to know how likely he/she is to have frequent attacks of anxiety, depression, or ulcers, there may be many factors that relate to the likelihood of experiencing these problems and how prone to them someone is.
However, not all of us can analyze such data without a significant source of information. Many psychologists rely on respondents’ self-report, which is a more accurate but less complete source of data. Also, it takes a long time to collect and analyze large amounts of information. There are problems with self-reporting that arise from the fact that people often fabricate their own experiences in order to fit a certain description. As a result, there are inherent weaknesses in many attempts to collect empirical psychological data on diverse topics.
Another major drawback is that large samples can provide only a small number of statistically significant relationships. This means that even when a relationship does exist, there is a reasonably good chance that it is not clinically significant. As a result, conclusions drawn from this type of data may be inaccurate.
Fortunately, advancements in computing technology have afforded researchers more reliable ways to analyze large sets of data. Computational methods are especially useful for psychologists who need to study large numbers of data without losing any statistical power. Computer scientists and software developers have created many tools that help psychologists make sense of large numbers of standardized questionnaires. Some tools allow analysts to adjust parameters so that they can detect a strong relationship between variables. This allows researchers to determine whether there is indeed a relationship and to establish what type of relationship exists.
In short, analytic methods help psychologists make sense of complex data in an easy and systematic fashion. It has made analyzing research much easier and more efficient. As technology continues to improve, we will undoubtedly see many more improvements in analyzing data. However, until these improvements are complete, it will remain necessary for psychologists to use analytic methods. Only by using these methods can a psychologist to know with certainty whether or not their results are accurate.