3 + 5
[1] 8
In this workshop we’ll use R in the extremely useful RStudio software. For the most part we’ll work interactively, meaning we’ll type stuff straight into the R console in RStudio (Usually this is a window on the left or lower left) and get our results there too (usually in the console or in a window on the right).
Panels like the ones below mimic the interaction with R and first show the thing to type into R, and below the calculated result from R.
Let’s look at how R works by using it for it’s most basic job - as a calculator:
3 + 5
[1] 8
12 * 2
[1] 24
1 / 3
[1] 0.3333333
12 * 2
[1] 24
Fairly straightforward, we type in the expression and we get a result. That’s how this whole book will work, you type the stuff in, and get answers out. It’ll be easiest to learn if you go ahead and copy the examples one by one. Try to resist the urge to use copy and paste. Typing longhand really encourages you to look at what you’re entering.
As far as the R output itself goes, it’s really straightforward - its just the answer with a [1]
stuck on the front. This [1]
tells us how many items through the output we are. Often R will return long lists of numbers and it can be helpful to have this extra information.
We can save the output of operations for later use by giving it a name using the assignment symbol <-
. Read this symbol as ‘gets’, so x <- 5
reads as ‘x gets 5’. These names are called variables, because the value they are associated with can change.
Let’s give five a name, x
then refer to the value 5 by it’s name. We can then use the name in place of the value. In the jargon of computing we say we are assigning a value to a variable.
<- 5
x x
[1] 5
* 2 x
[1] 10
<- 3
y * y x
[1] 15
This is of course of limited value with just numbers but is of great value when we have large datasets, as the whole thing can be referred to by the variable.
At the top level, R is a simple language with two types of thing: functions and objects. As a user you will use functions to do stuff, and get back objects as an answer. Functions are easy to spot, they are a name followed by a pair of brackets. A function like mean()
is the function for calculating a mean. The options (or arguments) for the function go inside the brackets:
sqrt(16)
[1] 4
Often the result from a function will be more complicated than a simple number object, often it will be a vector (simple list), like from the rnorm()
function that returns lists of random numbers
rnorm(100)
[1] 1.47256024 1.29169511 0.99957669 -1.66669109 0.79719193 0.26782022
[7] 2.04272994 -0.66643966 0.06911632 1.04225305 0.40957072 0.64616185
[13] -0.71041108 -2.35416697 0.55662415 0.05560853 -0.62985993 -1.01870056
[19] -0.11297347 0.04062896 0.30076103 0.09424395 0.54194803 -0.54649898
[25] 0.21811435 -0.81209495 0.86208899 -1.73925410 0.77490316 0.76434979
[31] 0.36255158 -0.51893828 0.44959494 -0.32742936 -0.95607301 0.16432137
[37] 1.96002830 -1.39588296 0.50174841 -0.80843610 -0.45222729 1.45419624
[43] 1.95726099 0.97958970 1.08837813 -0.61544725 -0.57110233 -1.29231021
[49] 0.16928352 1.21372021 -0.93426950 -0.66557444 0.37423921 0.36012935
[55] -0.92146694 -0.62514742 -0.02364334 1.57371632 -1.73492403 0.32237212
[61] 0.85176996 -0.93129223 -0.64389288 0.27515111 -0.45861833 -0.67906038
[67] -0.14333792 -0.19566556 0.43172204 -0.17091497 0.54435125 0.30873434
[73] 2.22696549 -0.59101557 0.27137745 -1.16324901 -1.35604531 0.92102623
[79] 1.04220231 3.25626997 0.13889997 0.80740752 -0.96502148 -1.28569584
[85] 1.28231013 0.19344992 -0.20726918 -0.32611486 -0.96013222 0.95912294
[91] 0.44894869 0.43097114 -0.58019851 0.94117703 0.54648849 -1.65314011
[97] 0.45281453 -0.85377981 -2.14442343 0.12486076
We can combine objects, variables and functions to do more complex stuff in R, here’s how we get the mean of 100 random numbers.
<- rnorm(100)
numbers mean(numbers)
[1] -0.07812037
Here we created a vector object with rnorm(100)
and assigned it to the variable numbers
. We than used the mean()
function, passing it the variable numbers
. The mean()
function returned the mean of the hundred random numbers.
One of the more common objects that R uses is a dataframe. The dataframe is a rectangular table-like object that contains data, think of it like a spreadsheet tab. Like the spreadsheet, the dataframe has rows and columns, the columns have names and the different columns can have different types of data in. Here’s a little one
names age score
1 Guido 24 97.69001
2 Marty 45 75.11871
3 Alan 11 48.89457
Usually we get a dataframe by loading in data from an external source or as a result from functions, occasionally we’ll want to hand make one, which can be done with various functions, data.frame
being the most common.
data.frame(
names = c("Guido", "Marty", "Alan"),
age = c(24,45,11),
score = runif(3) * 100
)
Many of the tools we use in will come in R packages, little nuggets of code that group related functions together. Installing new packages can be done using the Packages
pane of RStudio or the install.packages()
function. When we wish to use that code we use the library()
function
library(somepackage)
R provides a command, called ?
that will display the documentation for functions. For example ?mean
will display the help for the mean()
function.
?mean
As in all programming languages the internal documentation in R is written with some assumption that the reader is familiar with the language. This can be a pain when you are starting out as the help will seem a bit obscure at times. Don’t worry about this, usually the Examples
section will give you a good idea of how to use the function and as your experience grows then the more things will make more sense.