Date: 2023-03-24
I make the case for why universities should not pay for bulk SPSS
licenses for students, and build statistics curricula around it. My main
points boil down to that constituting unpaid advertising, and creating
vendor-lockin, when there are plenty of better alternatives. Contrary to
popular expectations I will not argue for replacing SPSS
with R
on
all courses. Instead, I’ll even suggest relying solely on interactive
simulations, other teaching aids, and teaching no software at all in
introductory stats courses. Ok… Those simulations can be built as
R/Shiny
apps, but they don’t have to…
As the title of this post suggests, the main point I wish to get across
is that there’s little reason for universities to continue to buy bulk
licenses for SPSS
. To be clear, I see how for some institutions who
have researchers who feel more productive in SPSS
there might be
a reason to buy a few licenses for them, but not a bulk license for all
students.
I’d just like to take a second to clarify that I have nothing against
IBM, nor do I have anything against people making a profit out of
running an IT business. I also don’t have a problem with public
institutions buying services from private entities. Note, that
I actually like IBM and appreciate all they have done for the IT
industry. IBM currently owns Red Hat, one of the biggest businesses
built on building, selling, and managing Linux
distributions. If you
know me, you would know I have a very strong preference for free and
open source software (FOSS), and I do believe science should be built on
FOSS, not on proprietary software. However, my arguments for not basing
curricula around SPSS
would hold even if SPSS
became FOSS.
The first thing I think that universities forget is that by teaching a given software they are advertising it. They are implicitly conferring their more or less prestigious seal of approval. Students may then perceive the software as being an industry standard or of academic rigor.
Teaching a particular program, and only that particular program, at an educational institution can do more than advertise that program. It can lead students to be vendor-locked to that particular application. A student who only learned a given software at their university is less likely to go through the trouble of learning a new software. Students are also more likely to look for employment opportunities that specify knowledge of that program as a requirement, and are more likely to shy away from employers asking for expertise in a competing product. This can either lead students to waste their time (that’s scarcer when they’re on the job hunt) learning a new software, or to employers having to change their software stack to accommodate the preferences of the talent pool. I reckon that if Windows wasn’t taught on elementary and high school it wouldn’t have the monopoly it has today… This is all to say that universities do more than advertise the software they use, they train new generations of professional on it, and can change market preferences. In turn, this means companies have tremendous vested interests in getting their software to be taught at universities. That’s why they offer it at discount prices or even for free for student use.
I believe we’re at a point where the value proposition of SPSS
being
taught at introductory is so strongly tipped in favors IBM’s interests,
versus those of students or universities, that universities should be
getting paid by IBM to teach it, not the other way around. Whatever
discount IBM is giving to your university it is not enough, your
university shouldn’t be spending a dime on student licenses, it should
be making top dollar for continue to stick with it, when there are so
many alternatives. Obviously, I’m speaking from an economical sense,
if the goal is teaching excellence and arming students with the tools
for the future, than SPSS
should not be taught at all, or should be
taught in conjunction with its alternatives.
Speaking of alternatives… Not only are there so many alternatives to
SPSS
as there are free and open source (FOSS) alternatives. As we will
see these alternatives are not only “free as in free beer”, as they are
“free as in free
speech”.
FOSS software can be, and frequently is, free of cost for the user. It
can be provided for free by the developers to anyone who wishes to use
it under the terms of a FOSS license. When there are programs such as
JASP
or Jamovi
that are completely free of costs, I find it hard to
see the value proposition in SPSS
. They sometimes offer even more
features than SPSS
, and are (arguably) easier to use and more visually
appealing. Note that with SPSS
being a paid application not only
should it be better than the free alternatives, it should be
sufficiently better as to be worth the cost. I appreciate that may have
been true once, but I find it so hard to see how that is still the case.
As Stallman explains, FOSS software needs not to be “free as in
free beer” but it needs to be “free as in
freedom”.
More specifically, it must give users the freedom to inspect a program’s
source code, make modifications to the program, and share those
modifications with others. This means that, unlike proprietary software,
FOSS software can be easily audited, without breaking any laws. More
importantly, anyone is free to improve a FOSS application and the
improved version. Most FOSS projects are also open to contributions,
meaning you can share your improvements with the original authors, who
can than integrate the improvements into the official project.
Universities employ some of the most (academically) qualified people.
This means many of them could probably improve the FOSS applications
they rely on for teaching, and share those improvements. Thus, if
universities replace SPSS
with a FOSS alternative, they may be able to
improve those alternatives, making them even better, and more suitable
for the use-cases at that institutions.
I reckon it’s worth taking the time to address some of the arguments
I’ve heard in favor of SPSS
. Feel free to disagree with my
counter-points. Also, feel free to open an issue on this blog’s
GitLab with your
feedback. Please note that I’m addressing the arguments from the point
of view of what I already proposed in the previous section, that paying
for bulk SPSS
licenses for students is hardly justified today. Hence,
my responses are to be taken as counter-points to paying for bulk
licenses, and only teaching SPSS
, ignoring its alternatives.
This is among the many points I believe could have been true in the
past, but no longer hold today. From my experience SPSS
takes about
one or two semester to teach and learn. Claiming a software that takes
one or two semesters, at higher education institutions, to learn is user
friendly seems questionable at best. Imagine if someone said: “You
should install app X
on your phone/computer! It’s so easy to use, you
just have to enroll in a course at my university to learn how to use
it!”.
You might be tempted to reply: “And doesn’t R
take as long if not
more?”. Yet, you would be mistaken in thinking I’m claiming otherwise,
or in thinking that R
is the only alternative to SPSS
. Have you
taken a look at JASP
or Jamovi
. Aren’t day easier or as easy as
SPSS
? With those being free options (thus infinitely cheaper than
SPSS
), how much easier would SPSS
have to be to justify the cost for
the bulk licenses?
Another point that might have been true in the past but no longer holds.
Cutting-edge statistical models, tests, and tools are published as R
packages, or maybe in other programming languages, they are not built as
SPSS
modules first. Running more complex models like linear mixed
models, or structural equations is often impossible in SPSS
, and
relatively straightforward in R
. You might say there’s also AMOS
,
but now we’re talking about teaching yet another paid software to
complement SPSS
. This places the increases the cost to institutions
a cost you must compare with 0 (i.e., the cost of the free alternatives)
and justify.
I see more and more researchers using R
or some other software. Even
in the past, I’ve seen researchers in my department using STATISTICA
,
not SPSS
. Moreover, if some researchers feel more productive in SPSS
institutions can always buy a license just for them, they don’t need to
buy a bulk license for all students.
This may actually be the case if we just count the number of published
textbooks, I don’t know… Yet, it may not even be true if you look at
the number of books published in the last five years. Regardless,
I don’t think that’s the point. No course built on SPSS
lists every
textbook about SPSS
in existence in its references. Thus, the question
is not if there more resources on SPSS
than on X
. The question is if
you can find one or two textbooks for X
, of equal or greater quality
than the ones you had for SPSS
. Today, I believe this is the case for
most software that you could insert in X
’s place.
Is it though? How many people teaching statistics don’t know, or
couldn’t learn R
if you gave them the time? Is there really no one at
your institution that can teach R
? Would it be so terrible if you
hired someone to teach R
, even if only to your staff?
As I’ve said, I’m not even making the case for always replacing SPSS
with R
, in some introductory courses I believe the best bet is to not
teach a statistical software at all. Still, I would like to take the
time to address some of the criticism I’ve heard about R
.
Let’s take a look at a how you compute a linear regression on R
.
# Linear regression on R regression <- lm(dependent_variable_name ~ independent_variable_name, dataset_variable_name) # To see the output summary(regression)
Let’s say you want to model the impact of people’s grade at the end of
high-school (GPA in the US, many other things in Europe) with their
current income. Assuming you have participant’s grades in a column named
“grade”, their income in a column named “income”, and you names your
dataset “dataset” (all reasonable names for those variable I’d say),
you’d compute the linear regression with the following R
code:
regression <- lm(income ~ grade, dataset) summary(regression)
I know I’m biased, but I think those look like two very human-readable
lines of code, even for someone not trained in R
. How many menus and
submenus would you have to click through to do that on SPSS
? You could
say that to write that R
code you need to know a lot about
programming. I don’t think that’s entirely true. Technically to write
that code you just need to know how to create a variable, and use the
lm()
and the summary()
functions. You do have to learn how to do
that, but I never said R
was a self-evident truth, I just said R
isn’t that hard. To do a linear regression in SPSS
you also have to
know what menus to click. You might say that SPSS
has a visual
interface, that the buttons/menus have labels that you can read and
understand, but if you think about it, you have to have some experience
to know what submenus lie beneath each menu. You also have to be taught
that.
R
has been around for over 29
years). It was
heavily inspired by the S language that first appeared over 47 years
ago). Even if
using R
for data analysis in psychology, the idea that you can use
computer programming to do math, statistics, and analyze data doesn’t
seem to be going anywhere. Students who learn R
, or any other
programming language for that matter, will have a much easier time
learning a new programming language. However, students who do not know
any programming language, will have to learn programming from scratch,
perhaps in more stressful environments than their university.
I believe this might have been true in the past, but it hardly seems the
case now. Moreover, I find that there are a lot of free (as in free
beer, and sometimes as in freedom) resources on R
, including entire
books. Here’s the short list of my favorites:
Any book by PsyTeachR, particularly Data Models.
Data Analysis A Model Comparison Approach to Regression, ANOVA, and
Beyond (it’s not about R
but has
examples in R
).
R for Data Science (more for data wrangling and graphics, less for models and tests).
I’ve also found these very cool-looking books, that teach fundamental
concepts of philosophy of science and research methods using R
(the
first also uses STATA
and Python
), but I haven’t explored them in
depth:
I completely agree. However, that is time you spend teaching your students one of the most in-demand skills for today’s world and for the future—programming. Still, for some courses and scenarios that trade-off may not be worth making so I’ll propose an alternative.
To be clear I’m not suggesting teaching stats to psych students with
only pen, paper, and a calculator. I’m proposing using live simulations
in the classroom. Shout out to Armando
Machado for making a similar
suggestion a few year ago at the APPE annual
conference. I believe at the time Armando used a spreadsheet to make his
demonstration. My proposal is for using more visually-appealing
interfaces and renderings of the simulation. Take a look at a lot of
Shiny apps out there, for example apps from the QHELP
project. You don’t even have to rely on R/Shiny
apps you can use JASP
, or any other application/website/etc that
showcases the concepts you want to teach. Let your students get an
intuitive feel for what data look like under the null hypothesis, under
their hypothesis, or under competing hypothesis. Show them just how easy
or how hard it is to get false negatives and false positives under
different conditions. Let them see, graphically, what it looks like when
data do not meet model assumptions. Make them think about what it means
for their model, for their hypothesis test, and for the conclusions they
can and can’t take away from it. Read scientific papers with them and
break down the statistical sections, what is the rationale behind the
analysis, what statistical hypothesis did each test actually test, what
are reasonable explanations for that statistical finding, etc… Guiding
students in this process, actually teaching critical thinking about data
analysis, seems like a more appropriate goal for an introductory course
on statistics, than teaching what buttons to click on SPSS
.
Regardless of what software you teach, JASP
, Jamovi
, R
, Python
,
SAS
, etc, it will always be something students have to learn in
addition to statistical theory and data analysis practice. From my
experience, students end up mixing the difficulties they have learning
the tools with the challenges they face learning actual statistics. The
mix-up goes works both ways. Sometimes they think they have a problem
with the tool, when their problem is mastering the theory. Other times
they feel like they are having a hard time understanding how to model
the data, when they are just struggling with how to compute that model
in a given software. If you decide to not teach any specific software in
introductory stats courses, and concentrate on the theory, practice, and
critical thinking, those mix-ups won’t happen. Now, this might be when
some people can say: Well, if we don’t teach students any statistical
software, how will they be able to perform the statistical analysis for
their studies? The thing is that I’m only proposing this for entry-level
statistical courses. From my experience, students aren’t really expected
to perform statistical outside of statistical courses, until they work
on their masters' thesis. They may have assignments during their
bachelors where they collect data, and report the results, but I’ve
never seen them be deducted points for not performing statistical
hypothesis tests. In Europe, where university degrees are typically more
specialized, the curricula for degrees in psychology tends to feature
more than one statistical course. For instance, I had two courses on
statistics, one on each semester of my freshman year. I do think the
courses should be more spaced in time, and there should be more of them,
but that’s beside the point here. The point here is that can avoid
teaching any software in the first course, and choose something beside
SPSS
for the second. If you want students to leave your course with
some programming skills, you can teach them R
, Python
, Julia
, or
some other language. If you just want to show them an easy tool to
perform their analysis, teach them JASP
or Jamovi
. With the money
that you save on expensive licenses you might even be able to afford
hiring someone to teach your doctoral students, and/or your faculty,
more advanced tools and workflow, like automatically generating reports
with R+RMarkdown
, or with Python+Jupyter
, etc…
Thank you so much for reading!
If you would like to give some feedback please open an issue on this blog’s GitLab.