Please Stop Paying to Teach SPSS

Date: 2023-03-24

Summary

I make the case for why universities should not pay for bulk SPSS licenses for students, and build statistics curricula around it. My main points boil down to that constituting unpaid advertising, and creating vendor-lockin, when there are plenty of better alternatives. Contrary to popular expectations I will not argue for replacing SPSS with R on all courses. Instead, I’ll even suggest relying solely on interactive simulations, other teaching aids, and teaching no software at all in introductory stats courses. Ok… Those simulations can be built as R/Shiny apps, but they don’t have to…

Why Universities Shouldn’t Pay to Teach Software

As the title of this post suggests, the main point I wish to get across is that there’s little reason for universities to continue to buy bulk licenses for SPSS. To be clear, I see how for some institutions who have researchers who feel more productive in SPSS there might be a reason to buy a few licenses for them, but not a bulk license for all students.

I Have Nothing Against IBM

I’d just like to take a second to clarify that I have nothing against IBM, nor do I have anything against people making a profit out of running an IT business. I also don’t have a problem with public institutions buying services from private entities. Note, that I actually like IBM and appreciate all they have done for the IT industry. IBM currently owns Red Hat, one of the biggest businesses built on building, selling, and managing Linux distributions. If you know me, you would know I have a very strong preference for free and open source software (FOSS), and I do believe science should be built on FOSS, not on proprietary software. However, my arguments for not basing curricula around SPSS would hold even if SPSS became FOSS.

Teaching is Advertising

The first thing I think that universities forget is that by teaching a given software they are advertising it. They are implicitly conferring their more or less prestigious seal of approval. Students may then perceive the software as being an industry standard or of academic rigor.

Teaching Leads to Vendor-Lockin

Teaching a particular program, and only that particular program, at an educational institution can do more than advertise that program. It can lead students to be vendor-locked to that particular application. A student who only learned a given software at their university is less likely to go through the trouble of learning a new software. Students are also more likely to look for employment opportunities that specify knowledge of that program as a requirement, and are more likely to shy away from employers asking for expertise in a competing product. This can either lead students to waste their time (that’s scarcer when they’re on the job hunt) learning a new software, or to employers having to change their software stack to accommodate the preferences of the talent pool. I reckon that if Windows wasn’t taught on elementary and high school it wouldn’t have the monopoly it has today… This is all to say that universities do more than advertise the software they use, they train new generations of professional on it, and can change market preferences. In turn, this means companies have tremendous vested interests in getting their software to be taught at universities. That’s why they offer it at discount prices or even for free for student use.

I believe we’re at a point where the value proposition of SPSS being taught at introductory is so strongly tipped in favors IBM’s interests, versus those of students or universities, that universities should be getting paid by IBM to teach it, not the other way around. Whatever discount IBM is giving to your university it is not enough, your university shouldn’t be spending a dime on student licenses, it should be making top dollar for continue to stick with it, when there are so many alternatives. Obviously, I’m speaking from an economical sense, if the goal is teaching excellence and arming students with the tools for the future, than SPSS should not be taught at all, or should be taught in conjunction with its alternatives.

There Are Free Alternatives

Speaking of alternatives… Not only are there so many alternatives to SPSS as there are free and open source (FOSS) alternatives. As we will see these alternatives are not only “free as in free beer”, as they are “free as in free speech”.

Free as in Free Beer

FOSS software can be, and frequently is, free of cost for the user. It can be provided for free by the developers to anyone who wishes to use it under the terms of a FOSS license. When there are programs such as JASP or Jamovi that are completely free of costs, I find it hard to see the value proposition in SPSS. They sometimes offer even more features than SPSS, and are (arguably) easier to use and more visually appealing. Note that with SPSS being a paid application not only should it be better than the free alternatives, it should be sufficiently better as to be worth the cost. I appreciate that may have been true once, but I find it so hard to see how that is still the case.

Free as in Freedom

As Stallman explains, FOSS software needs not to be “free as in free beer” but it needs to be “free as in freedom”. More specifically, it must give users the freedom to inspect a program’s source code, make modifications to the program, and share those modifications with others. This means that, unlike proprietary software, FOSS software can be easily audited, without breaking any laws. More importantly, anyone is free to improve a FOSS application and the improved version. Most FOSS projects are also open to contributions, meaning you can share your improvements with the original authors, who can than integrate the improvements into the official project. Universities employ some of the most (academically) qualified people. This means many of them could probably improve the FOSS applications they rely on for teaching, and share those improvements. Thus, if universities replace SPSS with a FOSS alternative, they may be able to improve those alternatives, making them even better, and more suitable for the use-cases at that institutions.

Addressing Arguments in Favor of SPSS

I reckon it’s worth taking the time to address some of the arguments I’ve heard in favor of SPSS. Feel free to disagree with my counter-points. Also, feel free to open an issue on this blog’s GitLab with your feedback. Please note that I’m addressing the arguments from the point of view of what I already proposed in the previous section, that paying for bulk SPSS licenses for students is hardly justified today. Hence, my responses are to be taken as counter-points to paying for bulk licenses, and only teaching SPSS, ignoring its alternatives.

SPSS is Easy and User-Friendly

This is among the many points I believe could have been true in the past, but no longer hold today. From my experience SPSS takes about one or two semester to teach and learn. Claiming a software that takes one or two semesters, at higher education institutions, to learn is user friendly seems questionable at best. Imagine if someone said: “You should install app X on your phone/computer! It’s so easy to use, you just have to enroll in a course at my university to learn how to use it!”.

You might be tempted to reply: “And doesn’t R take as long if not more?”. Yet, you would be mistaken in thinking I’m claiming otherwise, or in thinking that R is the only alternative to SPSS. Have you taken a look at JASP or Jamovi. Aren’t day easier or as easy as SPSS? With those being free options (thus infinitely cheaper than SPSS), how much easier would SPSS have to be to justify the cost for the bulk licenses?

SPSS is the Gold Standard

Another point that might have been true in the past but no longer holds. Cutting-edge statistical models, tests, and tools are published as R packages, or maybe in other programming languages, they are not built as SPSS modules first. Running more complex models like linear mixed models, or structural equations is often impossible in SPSS, and relatively straightforward in R. You might say there’s also AMOS, but now we’re talking about teaching yet another paid software to complement SPSS. This places the increases the cost to institutions a cost you must compare with 0 (i.e., the cost of the free alternatives) and justify.

Our Researchers All Use SPSS

I see more and more researchers using R or some other software. Even in the past, I’ve seen researchers in my department using STATISTICA, not SPSS. Moreover, if some researchers feel more productive in SPSS institutions can always buy a license just for them, they don’t need to buy a bulk license for all students.

There Are More Resources to Teach SPSS

This may actually be the case if we just count the number of published textbooks, I don’t know… Yet, it may not even be true if you look at the number of books published in the last five years. Regardless, I don’t think that’s the point. No course built on SPSS lists every textbook about SPSS in existence in its references. Thus, the question is not if there more resources on SPSS than on X. The question is if you can find one or two textbooks for X, of equal or greater quality than the ones you had for SPSS. Today, I believe this is the case for most software that you could insert in X’s place.

Our Instructors Only Know SPSS

Is it though? How many people teaching statistics don’t know, or couldn’t learn R if you gave them the time? Is there really no one at your institution that can teach R? Would it be so terrible if you hired someone to teach R, even if only to your staff?

Addressing Arguments Against R

As I’ve said, I’m not even making the case for always replacing SPSS with R, in some introductory courses I believe the best bet is to not teach a statistical software at all. Still, I would like to take the time to address some of the criticism I’ve heard about R.

R is Hard

Let’s take a look at a how you compute a linear regression on R.

# Linear regression on R
regression <- lm(dependent_variable_name ~ independent_variable_name,
                 dataset_variable_name)

# To see the output
summary(regression)

Let’s say you want to model the impact of people’s grade at the end of high-school (GPA in the US, many other things in Europe) with their current income. Assuming you have participant’s grades in a column named “grade”, their income in a column named “income”, and you names your dataset “dataset” (all reasonable names for those variable I’d say), you’d compute the linear regression with the following R code:

regression <- lm(income ~ grade, dataset)

summary(regression)

I know I’m biased, but I think those look like two very human-readable lines of code, even for someone not trained in R. How many menus and submenus would you have to click through to do that on SPSS? You could say that to write that R code you need to know a lot about programming. I don’t think that’s entirely true. Technically to write that code you just need to know how to create a variable, and use the lm() and the summary() functions. You do have to learn how to do that, but I never said R was a self-evident truth, I just said R isn’t that hard. To do a linear regression in SPSS you also have to know what menus to click. You might say that SPSS has a visual interface, that the buttons/menus have labels that you can read and understand, but if you think about it, you have to have some experience to know what submenus lie beneath each menu. You also have to be taught that.

R is Just a Trend

R has been around for over 29 years). It was heavily inspired by the S language that first appeared over 47 years ago). Even if using R for data analysis in psychology, the idea that you can use computer programming to do math, statistics, and analyze data doesn’t seem to be going anywhere. Students who learn R, or any other programming language for that matter, will have a much easier time learning a new programming language. However, students who do not know any programming language, will have to learn programming from scratch, perhaps in more stressful environments than their university.

It’s Hard Finding a Good Textbook for R

I believe this might have been true in the past, but it hardly seems the case now. Moreover, I find that there are a lot of free (as in free beer, and sometimes as in freedom) resources on R, including entire books. Here’s the short list of my favorites:

Any book by PsyTeachR, particularly Data Models.
Data Analysis A Model Comparison Approach to Regression, ANOVA, and Beyond (it’s not about R but has examples in R).
R for Data Science (more for data wrangling and graphics, less for models and tests).

I’ve also found these very cool-looking books, that teach fundamental concepts of philosophy of science and research methods using R (the first also uses STATA and Python), but I haven’t explored them in depth:

Teaching R Takes Time Away from Teaching Stats

I completely agree. However, that is time you spend teaching your students one of the most in-demand skills for today’s world and for the future—programming. Still, for some courses and scenarios that trade-off may not be worth making so I’ll propose an alternative.

Arguments for Not Teaching any Software

To be clear I’m not suggesting teaching stats to psych students with only pen, paper, and a calculator. I’m proposing using live simulations in the classroom. Shout out to Armando Machado for making a similar suggestion a few year ago at the APPE annual conference. I believe at the time Armando used a spreadsheet to make his demonstration. My proposal is for using more visually-appealing interfaces and renderings of the simulation. Take a look at a lot of Shiny apps out there, for example apps from the QHELP project. You don’t even have to rely on R/Shiny apps you can use JASP, or any other application/website/etc that showcases the concepts you want to teach. Let your students get an intuitive feel for what data look like under the null hypothesis, under their hypothesis, or under competing hypothesis. Show them just how easy or how hard it is to get false negatives and false positives under different conditions. Let them see, graphically, what it looks like when data do not meet model assumptions. Make them think about what it means for their model, for their hypothesis test, and for the conclusions they can and can’t take away from it. Read scientific papers with them and break down the statistical sections, what is the rationale behind the analysis, what statistical hypothesis did each test actually test, what are reasonable explanations for that statistical finding, etc… Guiding students in this process, actually teaching critical thinking about data analysis, seems like a more appropriate goal for an introductory course on statistics, than teaching what buttons to click on SPSS.

Regardless of what software you teach, JASP, Jamovi, R, Python, SAS, etc, it will always be something students have to learn in addition to statistical theory and data analysis practice. From my experience, students end up mixing the difficulties they have learning the tools with the challenges they face learning actual statistics. The mix-up goes works both ways. Sometimes they think they have a problem with the tool, when their problem is mastering the theory. Other times they feel like they are having a hard time understanding how to model the data, when they are just struggling with how to compute that model in a given software. If you decide to not teach any specific software in introductory stats courses, and concentrate on the theory, practice, and critical thinking, those mix-ups won’t happen. Now, this might be when some people can say: Well, if we don’t teach students any statistical software, how will they be able to perform the statistical analysis for their studies? The thing is that I’m only proposing this for entry-level statistical courses. From my experience, students aren’t really expected to perform statistical outside of statistical courses, until they work on their masters' thesis. They may have assignments during their bachelors where they collect data, and report the results, but I’ve never seen them be deducted points for not performing statistical hypothesis tests. In Europe, where university degrees are typically more specialized, the curricula for degrees in psychology tends to feature more than one statistical course. For instance, I had two courses on statistics, one on each semester of my freshman year. I do think the courses should be more spaced in time, and there should be more of them, but that’s beside the point here. The point here is that can avoid teaching any software in the first course, and choose something beside SPSS for the second. If you want students to leave your course with some programming skills, you can teach them R, Python, Julia, or some other language. If you just want to show them an easy tool to perform their analysis, teach them JASP or Jamovi. With the money that you save on expensive licenses you might even be able to afford hiring someone to teach your doctoral students, and/or your faculty, more advanced tools and workflow, like automatically generating reports with R+RMarkdown, or with Python+Jupyter, etc…

Thank you

Thank you so much for reading!

If you would like to give some feedback please open an issue on this blog’s GitLab.