Year: 2017

This is Gabriel’s rant about R from the first episode.

I’m not going to talk about things that are necessary features of the language. I get that the flexibility of an object-oriented language that can handle relational data requires a certain learning curve. But it doesn’t have to be as painful as R. I may be a Stata user so I come with the expectation that there’s just a single flat-file database in memory, but I also know Python so I know damn well an object-oriented language doesn’t have to be this infuriating.

  • A tutorial culture that doesn’t represent a realistic workflow. Every R book or MOOC I have ever read or taken spends the first few hours on stuff like vectors, matrices, and frames before showing you basic workflow things like loading a file from disk and getting summary statistics for the variables. If R’s tutorial culture was Rick James it would be stomping its muddy boots on your couch shouting “fuck your learning curve”
  • A documentation culture that doesn’t represent a realistic workflow. The examples in manual entries are more likely to start with a bunch of colons and commas than read.table().
  • A documentation culture that doesn’t show output, just a script. If I could get it to work, I wouldn’t be reading the documentation. Knowing what I should be looking for would be really helpful.
  • Output, we don’t need no output. We don’t have to show you any stinking output.
  • The default arguments are ridiculous. Every time I loop over a bunch of filepaths or URLs, I cry to the heavens in anguish that paste() defaults to interpolating a bunch of spaces unless you explicitly tell it not to.
  • camelCase, vs _, vs .
  • The default functions are awful and everyone knows it, which is why there’s a library that replaces every function in base R that runs faster and with a simpler syntax and people are shocked that you would run a “for” loop like in any other language instead of vectorizing.
  • RStudio encourages you to save session memory, which in turn encourages really bad habits for reproducibility.
  • CRAN is such a pain to deal with that library developers post their code to GitHub, which is a pain for users to install, especially if there are dependencies. Keeping your R libraries working is like using Linux in 1995. I got so annoyed with getting the update on an R library to work that I found it easier to just learn Python.
  • Even when you call output, the default output is ridiculous. When I look at a model fit object I wonder, where the fuck are the standard errors? I know this is why you do the summary of the lm object, not the lm object itself, but that it’s an extra step to get meaningful output is another example of bad default assumptions that yields gratuitous hostility to the user.
Scroll to top