Appendix B — R Fundamentals

B.1 About this chapter

  1. Questions:
  • How do I use R?
  1. Objectives:
  • Become familiar with R syntax
  • Understand the concepts of objects and assignment
  • Get exposed to a few functions
  1. Keypoints:
  • R’s capabilities are provided by functions
  • R users call functions and get results

B.2 Working with R

In this book we’ll use R interactively — for the most part we’ll type code and get our results straight back.

Panels like the ones below mimic the interaction with R and first show the thing to type into R, and below the calculated result from R.

Let’s look at how R works by using it for it’s most basic job - as a calculator:

 3 + 5
[1] 8
 12 * 2
[1] 24
 1 / 3
[1] 0.3333333
 12 * 2
[1] 24

Fairly straightforward, we type in the expression and we get a result. That’s how this whole book will work, you type the stuff in, and get answers out. It’ll be easiest to learn if you go ahead and copy the examples one by one. Try to resist the urge to use copy and paste. Typing longhand really encourages you to look at what you’re entering.

As far as the R output itself goes, it’s really straightforward - its just the answer with a [1] stuck on the front. This [1] tells us how many items through the output we are. Often R will return long lists of numbers and it can be helpful to have this extra information.

B.3 Variables

We can save the output of operations for later use by giving it a name using the assignment symbol <-. Read this symbol as ‘gets’, so x <- 5 reads as ‘x gets 5’. These names are called variables, because the value they are associated with can change.

Let’s give five a name, x then refer to the value 5 by it’s name. We can then use the name in place of the value. In the jargon of computing we say we are assigning a value to a variable.

 x <- 5
 x
[1] 5
 x * 2
[1] 10
y <- 3
x * y
[1] 15

This is of course of limited value with just numbers but is of great value when we have large datasets, as the whole thing can be referred to by the variable.

B.3.1 Using objects and functions

At the top level, R is a simple language with two types of thing: functions and objects. As a user you will use functions to do stuff, and get back objects as an answer. Functions are easy to spot, they are a name followed by a pair of brackets. A function like mean() is the function for calculating a mean. The options (or arguments) for the function go inside the brackets:

sqrt(16)
[1] 4

Often the result from a function will be more complicated than a simple number object, often it will be a vector (simple list), like from the rnorm() function that returns lists of random numbers

rnorm(100)
  [1]  0.13725117  1.15053953 -1.33514972 -0.41653701  2.06054035 -0.64588954
  [7] -1.55580753 -0.34325024 -1.06532094 -0.85723154 -0.60491960  0.68708687
 [13]  1.16429981 -1.58280868 -0.06037991  1.04775118 -1.22182536  0.17955351
 [19]  1.96792870 -0.73514506 -0.09245996 -2.04115814  0.09115589 -2.06875953
 [25]  0.60749711 -1.19057064 -2.19101725 -0.64303166 -1.07811457 -0.92496780
 [31]  1.81465783 -0.94847780  0.16486733  0.73411231  0.01831619  0.18433989
 [37]  0.34624682  0.52086691 -1.17377913 -0.51081437  2.59981144  2.84956334
 [43] -1.61104386  0.54909245 -0.31273217 -0.81219821 -1.07918712  1.21725194
 [49] -1.37796637  0.80540159  0.04986691 -0.51480601  0.58306548 -0.32214528
 [55] -1.53681526  1.66393602 -0.03295588 -0.21601134 -0.04159231 -0.80956955
 [61]  0.10258682  0.86826709  0.82465292 -0.94333946  0.83528209 -1.56309582
 [67]  1.12888050 -0.54376884 -0.59515111  0.18579468 -1.94447273 -0.66489446
 [73]  1.43824821 -0.40627747  0.86534480  0.18928178  0.39606071 -1.30585674
 [79] -0.80631299 -0.29543123  0.61903555 -0.35615739  0.89861686  0.02386714
 [85]  0.84839066  0.51521997 -1.16629143 -1.44182224 -3.26291551 -0.08128962
 [91] -0.06038791  0.25437045 -0.72213962 -1.43201581 -0.36028975 -1.79616649
 [97]  0.81709007  0.64610256 -1.40352731 -0.20261978

We can combine objects, variables and functions to do more complex stuff in R, here’s how we get the mean of 100 random numbers.

numbers <- rnorm(100)
mean(numbers)
[1] -0.02735934

Here we created a vector object with rnorm(100) and assigned it to the variable numbers. We than used the mean() function, passing it the variable numbers. The mean() function returned the mean of the hundred random numbers.

B.4 Dataframes

One of the more common objects that R uses is a dataframe. The dataframe is a rectangular table-like object that contains data, think of it like a spreadsheet tab. Like the spreadsheet, the dataframe has rows and columns, the columns have names and the different columns can have different types of data in. Here’s a little one

  names age    score
1 Guido  24 26.74395
2 Marty  45 24.45194
3  Alan  11 78.69251

Usually we get a dataframe by loading in data from an external source or as a result from functions, occasionally we’ll want to hand make one, which can be done with various functions, data.frame being the most common.

data.frame(
  names = c("Guido", "Marty", "Alan"),
  age = c(24,45,11),
  score = runif(3) * 100
)

B.5 Packages

Many of the tools we use come in R packages, little nuggets of code that group related functions together. To use the code in a package we load it with the library() function

library(somepackage)

B.6 Using R Help

R provides a command, called ? that will display the documentation for functions. For example ?mean will display the help for the mean() function.

?mean

As in all programming languages the internal documentation in R is written with some assumption that the reader is familiar with the language. This can be a pain when you are starting out as the help will seem a bit obscure at times. Don’t worry about this, usually the Examples section will give you a good idea of how to use the function and as your experience grows then the more things will make more sense.

Roundup
* R is an excellent and powerful statistical computing environment
For you to do

Work through the exercises below — the code runs in your browser. Type your answer and press Run; use Show solution if you get stuck.

B.6.1 Variables

Write the R code required to add two plus two.

2 + 2
2 + 2

Now create two variables a and b, each containing the number 2, and add those.

a <- 2 b <- 2 a + b
a <- 2
b <- 2
a + b

B.6.2 Vectors

Variables can hold more than single numbers. Try assigning a vector of 16 random uniform numbers from the runif() function to a variable called rand.

rand <- runif(16)
rand <- runif(16)

And vectors can be operated on as a single thing — sometimes the operation returns a single number, sometimes a vector. Here are some R vectors:

a <- c(1, 2, 3)
b <- c("a", "b")
d <- c(a, b)

What is the result of mean(a)?

6

2

3

What is the result of a + 2?

3, 4, 5

1, 2, 3, 2

What is the result of d * 2?

3, 4, 5

2, 4, 6

Error …

2, 4, 6, NA, NA

B.6.3 Dataframes

Dataframes are rectangular, spreadsheet-style objects that hold data we want to analyse. A small one called small_df is loaded for you; the str() function tells us about a dataframe. Use str() on small_df.

str(small_df)
str(small_df)

What does the output tell us about the column names?

It has chromosome data in it

It is text-based (CHaRacter) data

It has numeric data in it

B.6.4 Using R help

Use the help() function (or ?) to answer these questions.

Does the mean() function compute…

the geometric mean

the average

the arithmetic mean

the harmonic mean

the (trimmed) arithmetic mean

What does the c of the c() function stand for?

concatenate

convolve

combine

complete

Do you fully understand what the apply function does?

Yes!

No

No, but that’s ok — I’ve read the help, it’s a bit opaque, but some googling will help