![]() |
||
|
||
| Thinking with Data | ||
Francis Bacon and Wise Data Analysisby Paul F. Velleman, Ph.D. We often analyze data because we hope to understand the world better. Statistical thinking guides (or should guide) our quest. But to think statistically, we must engage in habits of mind that are simply not natural for humans. "Aha!" I hear from some of you, "I knew it was unnatural!" But you may be surprised to learn that philosophers and scientists have known that this kind of thinking was unnatural for more than 400 years. As you'll see in what follows, my source for centuries-old wisdom is Sir Francis Bacon writing in his crowning work, Novum Organum. Bacon is widely regarded as the founder, along with his contemporary Galileo, of modern scientific thought. He was far ahead of his time. Seven Unnatural Habits of Mind In each of the brief essays that will appear in this newsletter, I plan to discuss one of the seven unnatural habits that are required for sound statistical thinking. When we are aware of these habits, we can more effectively free ourselves of the problems they bring. The seven unnatural habits:
Let's start today with Thinking Critically. Think Critically What does it mean for a data analyst to think critically? We should start by keeping in mind our goal of letting the data tell us what it knows. Far too often, we probe our data with a specific question we had in mind without stopping to ask the data if it has anything else interesting to sayYes, I know there's traditional stuff about the so-called "scientific method" in which you may only ask the data to answer the question you formulated before you collected the data. I consider such restrictions foolish and unscientific. So do most working scientists I know. There are risks to inviting the data to speak. In particular, we should check the data's credentials. Are the data competent to tell us anything about the world? The starting point for this check is to be sure we know the "W's". As we were taught in elementary school about a good news story, we must know the Who, What, When, Where, and Why of the data. If we don't know the individuals about whom (or about which) we have data, we can't understand anything the data say. The Who may not be peoplethey may be calendar quarters or companies, or sales records, or earthquakes. But we should be clear about who is included in our data and (sometimes more important) who was excluded. The What identifies what was actually measured or recorded. Often these are the variable names. If the measurements are quantitative, we should also know the measurement units. The When and Where help us to judge whether the data are relevant to our circumstances. Knowing Why and how the data were collected can help us be alert for biases. A key toolperhaps the key toolfor checking the data's credentials is data display. For categorical data we look at bar charts and pie charts, and relate two variables with contingency tables. For individual quantitative variables, we can make histograms or probability plots and consider the distribution, center, and spread. For groups, boxplots offer simple comparisons. For pairs of variables, we can make scatterplots and consider the direction of the relationship, whether it is straight or not, and how strong it is. For more variables, rotating plots and dynamic graphics can show us much of the structure. In all of these displays we can look for clusters or subgroups and for outliers, which would call for further investigation. Early in a data analysis, outliers may be errors in recording or transcription. It can be particularly helpful to identify individual cases in displays using the query tool. The challenge of critical thinking is that it calls on you to doubt your data even if you collected the data yourself. Critical thinking requires creativity. You must think about things that are not in front of you and imagine ways in which things might have gone wrong. In effect, you should go looking for trouble. What Bacon Tells Us Did Sir Francis know how hard critical thinking is? Here's Bacon on the subject: The human understanding when it has once adopted an opinion (either as being the received opinion or as being agreeable to itself) draws all things else to support and agree with it. And though there be a greater number and weight of instances to be found on the other side, yet these it either neglects and despises, or else by some distinction sets aside and rejects; in order that by this great and pernicious predetermination the authority of its former conclusions may remain inviolate. If his language seems Elizabethan, it's because, well, he was Elizabethan. What his linguistic flourishes boil down to is this: don't rely on what you thought before you looked at the data. Stand back and let the data speak. Graphicsand especially interactive graphicsare the tool of choice for this first step in a successful data analysis. But they'll keep showing up in future essays as well. How can we help you? Learn more about Statistics. |