Is Your Problem Really Statistics?

Date: 2023-02-13

Summary

I follow up on Sérgio Moreira’s thought-provoking blog post, with a long rant on philosophy of science and the research process. If you’re a student struggling with your thesis, I included a checklist in the end to help you pinpoint what exactly is that you’re having trouble with. If you’re an experienced researcher, but have never heard of the new experimentalism, consider reading the severe testing section. I hope you find those sections interesting enough to read the rest of the post.

Are You Really Struggling With Statistics?

This post was inspired by, and aims to follow-up on this post from Sérgio Moreira’s Blog. You should really go read the original, it’s a good piece of concise writing on this topic. The gist of the post (if I’m reading it correctly) is that many people think their problem is with statistics when in fact it lies elsewhere.

Acknowledgments and Disclaimer

As I said, I aim to build on that post. This is neither a critique, a review, or a comment, its intended to be a follow-up. This post is my own doing, so don’t falter Sérgio for my failings. That said I was obviously influenced by Sérgio’s approach to statistics. I was also greatly inspired by my co-advisor’s Leonel Garcia-Marques views on philosophy of science, and by my advisor’s, Sara Hagá’s, wise remarks on these (and other) topics. Without them I wouldn’t be where I am today. I am actually working with Leonel and Sara on a paper that aims to draw insights from the philosophy of science to shine some light on the replication crisis debates. This being said, I hope to act like a good DJ and make a compelling remix of their hits, sprinkling my personal touch her and there. But I do run the risk of having a bad performance as a DJ and ruining the source material…

Severe Testing

I’ve been fascinated with the new experimentalism school of philosophy of science ever since Leonel introduced me to it. I’m particularly fond of Deborah Mayo’s work, reconcilingupgrading Fisher’s and Neyman-Pearson frequentist statistics into her own error statistics. I’m actually even more of a fan of her concept (or her take) of severe testing. This is a lot to unpack, specially if you’ve never heard of these topics. The gist if that for new experimentalists, when experiments are independent from the theories they are testing, proponents of opposing theories are forced to agree on their results, but not how to interpret them. Moreover, future theories are forced to account for the findings of those experiments. In that sense, experiments are said to have a life of their own (see Hacking’s, and Mayo’s many writings). Mayo’s concept of severe testing builds on the work of Popper, and later philosophers like Lakatos. To avoid reviewing the history of ideas in philosophy of science here, let me redirect you to Chalmers (2013), which is an amazing textbook on the subject. Back to Mayo’s severe testing, then. To Mayo, experiments are only as informative to the extent they pose severe tests to their hypothesis. My students usually get the concept of severe test very quickly when I ask them: “What would you think if I told you I was giving you a very severe test tomorrow”. Then, I go on to say: “What could you say of a student who aced that test?”. The best response I got so far was “That the student cheated!”. If we disregard cheating though, we see that if students ace a test they are very likely to fail, unless they really know their stuff, then they must really know said stuff. Contrast that to a student who did great on a very easy test. Students are often tempted to suggest that that student probably didn’t know much of that subject. However, we cannot actually don’t know that. If a test is really easy students who don’t know their stuff will pass, but students who really know their stuff will pass too. This leads us to conclude that severe tests are informative, while non-severe tests are everything but. An experiment can said to be severe if it satisfy Mayo’s severity requirement—if the predicted result is very unlikely unless the hypothesis is true. In those cases, if researchers find their predicted result, and that result was very unlikely were the hypothesis to be false, we have strong evidence in favor of the hypothesis being true.

I’m sorry for the lecture…and I haven’t even gotten to the point I wanted to make… I’ll speed up now. What I really like about Mayo’s work is that she makes it crystal clear that the severity of a study is a product of the entire research process. Meaning, (1) that the severity of a great study design can be ruined by a sloppy data analysis; (2) that great data analysis are useless if the experiment wasn’t severe to begin with; and (3) that the validity of any interpretation of the results, lies in how unlikely those results are expected to be unless the hypothesis, and only that hypothesis, is true. This approach makes it abundantly clear that any compromises along the research process cannot truly be fixed down the line. It also shows how to do great research we have to strive to formulate good hypothesis, perform appropriate analysis, and make sensible interpretations.

Why the Problem Usually Is Not Statistics

For the problem in a given study to lie only in its statistical analysis, its hypotheses have to be well formulated, and the study design has to pose a severe theory-independent test of those hypotheses. For that to be the case, discounting a struck of amazing luck, the researchers have to have a good knowledge of the literature and some mastery of research methods. In my experience, if you really do know the literature, if you know your hypothesis, and you know why you tested them the way you did, you also have a pretty good idea of what you have to find, statistically speaking, to support your hypothesis. How else could you have arrived at a good study design? Having a good study design implies that not only you know your hypotheses, as you know how to operationalize and test them in very concrete terms. In that case, you usually know that you expect the control group to score higher or lower in what you measured than the experimental group. You probably know that you expect them to score higher or lower on average. At that point you know need a way of testing for differences in those means. I’d wager it is much harder knowing all of the above, than figuring out what statistical test/model you can use to test for that difference. It seems very unlikely for you to have had such a strong training in methods but so confused on statistics. If that is your case, please do not take this as an insult or critique. If that is your case, not only do I believe you’re in the minority, as I believe you tackled the hard part, and you’ll be able to figure out the statistics sooner than you think. What I do think is much more likely is that if we are having trouble understanding what statistical test to run, we are also probably having a hard time grasping our study design and/or our hypothesis. To be clear, I do recognize data analysis as related but separate domain. It does take time and effort to understand how to statistically model and test your hypothesis. However, statistics among psychologists is famous for being considered hard (and I do think it can be). Thus, I’ve seen people being quick in assuming they must be having a problem with it. But isn’t having a good grasp of research methods equally as challenging. Is understanding a complex literature that much easier than knowing how to statistically compare some means?

Complex Skills

I think we should think of philosophy of science, research methods, and data analysis as complex skills. Complex in the sense that each of those skills can be decomposed into several skills. Knowing a literature implies knowing its history and its present debates. It implies knowing the abstract theories, and in which concrete ways people have tested them. Likewise, understanding research methods means understanding abstract concepts like randomization, sampling, but also having a knack for telling when a procedure becomes so boring participants stop paying attention.

Mastering data analysis also means understanding distributions, modeling, statistical inference, and the like. But to analyze your data you also need to remember that three letter acronym that you used to name a column means. You usually have to know how to use some sort of calculator, be it JASP, R, Python, SPSS, Excel, etc… You have to clean your data file (e.g., maybe you left some test responses there), you may have to wrangle (reformat) the rows and columns to fit what your calculator is expecting…and so on.

The problem with these skills being complex, is that it makes it really easy for us to feel like we’re struggling with one thing when we’re actually struggling with another. For example, some people really do understand how they want to analyze their data, they just don’t know how to do it in their chosen calculator. Other times, people are actually proficient users of their calculators, they just have no idea of how to statistically test their hypothesis. But wires can get even more crossed. People may be under the impression they need to use a given model, failing to understand how they’re going to analyze their data with it. Maybe they’re right to be confused because they might actually need to use another model, as no one is forcing them to use the one they’re trying to use. If you think I’m arrogantly listing the mistakes others have made…you clearly don’t know me… I’ve made all of the above and many more.

How Can We Navigate this Complexity

I won’t lie. Doing good research is hard. It is complex. We just have to continue to find ways of dealing with the complexity. I believe it helps to break things down into tiny pieces.

It’s the Circle of LifeResearch

Click here if this made you want to listen to the song

I also believe it helps to think of research as a very iterative/circular process. As I’ve learned that any scientific writing goes through way, way, way more drafts/iterations than I could imagine as student (shout out to WriteOn Workshop and Sara for teaching me that). So to can be said of the entire research process, including data analysis. Sometimes you’re planning a study and have a good idea of how you’re going to analyze the data, then you change the procedure, and your analysis plan will change accordingly. Other times, you read a groundbreaking paper that makes you rethink your theoretical argument, and you revise everything accordingly. Whatever it is, just don’t expect you’ll move through the literature review to interpreting your results, and drawing insights into your field in a linear fashion.

Knowledge Checklist (The Map)

Below is my attempt at creating a map to guide you through the complexity. The idea is that you try to answer each question as best as you can. Whenever you feel like you don’t really know the answer try and do your best to find it before continuing. Read the literature, meet with your collaborators/advisors, ask your colleagues for help, do your best to answer it before moving on.

When you’re feeling like you’re struggling with a question about your research/thesis, grab this checklist and go through it. Even if you have done so before. Maybe something changed in your design from when your first looked at it. Maybe you have followed all the steps and now you want to do follow-up analyses or studies. Maybe you just forgot some details and you’re struggling with something you weren’t at first. Whatever the case may be, just grab the checklist and iterate through it.

Do I Really Understand?

My Research Question?

My Study Design?

How Did I Operationalize the Hypotheses

The Models I Want to Compare?

Note: this assumes you’re following a model comparison approach (Judd et al., 2017; also mentioned in Sérgio’s post).

Do I Understand Statistical Inference?

Note: I’m not getting into the nuances and debates of statistical inference here, or frequentism vs Bayesianism. I do side with Mayo’s error statistics, and recommend her book “Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars”. Regardless, I think this checklist generally applies.

The Structure of My Dataset?

How to Compare the Models in My Chosen Calculator?

How Do I Interpret My Results?

Thank you

Thank you so much for reading!

If you would like to give some feedback please open an issue on this blog’s GitLab.