# Chapter 5 The Fundamentals of R

## 5.1 Four Fundamentals

The essence of R:

``````R <- c(1:4)
R``````
``##  1 2 3 4``

(See Vectors later).

• Vector-based: R is not a procedural language

[Two] reasons to use R for Data Science:

• Designed for data: R can manipulate big data sets
• Graphics Are Graspable: people understand graphical data

[Three] fundamental principles of R per John Chambers:

• Objects: Everything that exists in R is an object
• Functions: Everything that happens in R is a function call
• Interfaces: to other softwares are an integral part of R

[Four] ways of programming R:

• Command line: entering R commands in a terminal
• Source file: running a set of commands from a saved file
• R GUI interface: available for Mac, WIndows, and Linux
• Code chunks in RStudio: allows debugging as you write

## 5.2 Basic Maths

R has all the basic mathematical functions:

``1 + 1``
``##  2``
``1 + 2 + 3``
``##  6``
``3 * 7 * 2``
``##  42``
``4 / 3``
``##  1.333333``

R obeys the standard order of mathematical operations (PEMDAS):

1. Parentheses ( )
2. Exponents ^
3. Multiplication x
4. Division
6. Subtraction -
``(2 ^ 5) + (2 * 5)``
``##  42``

The use of white space between operators is recommended.

## 5.3 Variables

Unlike statically-typed languages such a C++, R does not require variable types to be declared. An R variable can represent any data type or R object, such as a function, result, or graphical plot. R variables can be redeclared.

• Variable names can contain alphanumeric characters
• but not periods `.` or underscores `_`
• Variable names are case sensitive

### 5.3.1 Assigning variables

R variable assignment operators are `<-` (default) and `=` (acceptable).

``````x <- 2
x``````
``##  2``
``````y = 5
y``````
``##  5``

You can also assign left-to-right with `->`, but variables are not often assigned that way.

``````7 -> z
z``````
``##  7``

Assignment operations can be used successively to assign a value to multiple variables

``````a <- b <- 42
a``````
``##  42``
``b``
``##  42``

You can also use the built-in `assign` function:

``````assign("q", 4)
q``````
``##  4``

### 5.3.2 Removing variables

`rm(variablename)` removes a variable.

``rm(q)``

## 5.4 Data Types

R has four main data types:

• Numeric
• Character (a.k.a Nominal)
• Date
• Logical

You can check the type of variable with `class(variablename`)

``````x <- "eh?"
x``````
``##  "eh?"``
``class(x)``
``##  "character"``
``````y <- 99
y``````
``##  99``
``class(y)``
``##  "numeric"``

### 5.4.1`Numeric` data types

Numeric data includes both integers and decimals — positive, negative, and zero — similar to `float` or `double` in other languages. A numeric value stored in a variable is automatically assumed to be numeric in R.

You can test whether data is numeric with `is.numeric()`:

``is.numeric(y)``
``##  TRUE``

And if it’s an integer with ``is.integer()`:

``is.integer(y)``
``##  FALSE``

The response of `FALSE` is because to set an integer as a variable you must append the value with `L`:

``````y <- 99L
is.integer(y)``````
``##  TRUE``

R promotes `integers` to `numeric` when needed.

### 5.4.2`Character` data types

R handles Character data in two primary ways: as `character` and as `factor`. They are treated differently:

``````x <- "data"
x``````
``##  "data"``
``class(x)``
``##  "character"``

and

``````y <- factor("data")
y``````
``````##  data
## Levels: data``````

The `levels` are attributes of that factor.

To find the length of a `character` (or `numeric`):

``nchar(x)``
``##  4``

This does not work for `factor` data.

### 5.4.3`Date` data types

R has numerous types of dates. `Date` and `POSIXct` are the most useful.

``````date1 <- as.Date("2018-03-28")
date1``````
``##  "2018-03-28"``
``class(date1)``
``##  "Date"``
``as.numeric(date1)``
``##  17618``

and

``````date2 <- as.POSIXct("2018-03-28 10:45")
date2``````
``##  "2018-03-28 10:45:00 PDT"``
``class(date2)``
``##  "POSIXct" "POSIXt"``
``as.numeric(date2)``
``##  1522259100``

Using `as.numeric` also changes the underlying type:

``class(date1)``
``##  "Date"``
``class(as.numeric(date1))``
``##  "numeric"``

### 5.4.4`Logical` data types

`Logical`s can be either `TRUE` (`T` or `1`) or `FALSE` (`F`or 0). `T` and `F` are not recommended as they are simply shortcuts to `TRUE` and `FALSE` and can be overwritten, causing woe, anguish, mayhem, and rioting. (`TRUE` or `F`?)

Logical data types have a similar test function `is.logical()`:

``````k <- TRUE
class(k)``````
``##  "logical"``
``is.logical(k)``
``##  TRUE``

## 5.5 Data Structures

R data structures are containers for data elements:

• Vectors – collections of only same-type elements
• Matrices – rectangular containers of only same-type elements
• Data Frames – contain many types of vectors , all of the same length
• Arrays – Vectors with dimensions for each same-type element
• Lists – containers for elements of multi-type data types

### 5.5.1 Vectors

Vectors are the heart of R; it is a vectorised language. An R `Vector` is:

A collection of elements of the same type.

Operations are applied to each element of a vector without the need to loop through them. This separates R from other programming languages and makes it most suited to manipulation and graphical presentation of data.

Vectors do not have a dimension: there is no `column` or `row` vector. Unlike `mathematical vectors` there is no difference between column or row orientation.

#### 5.5.1.1 Creating a vector

Vectors are created with `c`, meaning “combine”:

``````x <- c(1, 2, 3, 4, 5, 6, 7, 8)
x``````
``##  1 2 3 4 5 6 7 8``

Operations are applied to all elements at once:

``x + 2``
``##   3  4  5  6  7  8  9 10``
``x -3``
``##  -2 -1  0  1  2  3  4  5``
``x * 2``
``##   2  4  6  8 10 12 14 16``
``x / 4``
``##  0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00``
``x^2``
``##   1  4  9 16 25 36 49 64``
``sqrt(x)``
``##  1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427``

#### 5.5.1.2 Vector creation shortcuts

``1:8``
``##  1 2 3 4 5 6 7 8``
``8:1``
``##  8 7 6 5 4 3 2 1``
``-3:4``
``##  -3 -2 -1  0  1  2  3  4``
``4:-3``
``##   4  3  2  1  0 -1 -2 -3``

#### 5.5.1.3 Accessing vector elements

Any element of a `Vector` can be directly access using [square brackets] to point to it:

``x``
``##  1 2 3 4 5 6 7 8``
``x``
``##  4``
``x``
``##  8``

#### 5.5.1.4 Counting within Vectors

You can check the length of a vector:

``x``
``##  1 2 3 4 5 6 7 8``
``length(x)``
``##  8``
``y``
``````##  data
## Levels: data``````
``length(y)``
``##  1``
``length(x + y)``
``## Warning in Ops.factor(x, y): '+' not meaningful for factors``
``##  8``

and count the number of charactors in a vector:

``````q <- c("One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight")
q``````
``##  "One"   "Two"   "Three" "Four"  "Five"  "Six"   "Seven" "Eight"``
``nchar(q)``
``##  3 3 5 4 4 3 5 5``

#### 5.5.1.5 Combining Vectors

Two vectors of the same or different length can be combined:

##### 5.5.1.5.1 Vectors of the same length
``````x <- 1:8
x``````
``##  1 2 3 4 5 6 7 8``
``````y <- -3:4
y``````
``##  -3 -2 -1  0  1  2  3  4``
``x + y``
``##  -2  0  2  4  6  8 10 12``
``x - y``
``##  4 4 4 4 4 4 4 4``
``x * y``
``##  -3 -4 -3  0  5 12 21 32``
``x / y``
``````##  -0.3333333 -1.0000000 -3.0000000        Inf  5.0000000  3.0000000
##   2.3333333  2.0000000``````
``x^y``
``````##     1.0000000    0.2500000    0.3333333    1.0000000    5.0000000
##    36.0000000  343.0000000 4096.0000000``````
##### 5.5.1.5.2 Vectors of different lengths

For two `vectors` of different lengths, the shorter vector is recycled, and R may issue a warning:

``x + c(1, 2)``
``##   2  4  4  6  6  8  8 10``
``x + c(1, 2, 3)``
``````## Warning in x + c(1, 2, 3): longer object length is not a multiple of
## shorter object length``````
``##   2  4  6  5  7  9  8 10``

#### 5.5.1.6 Comparison of two Vectors

``````x <- c(1:8)
x``````
``##  1 2 3 4 5 6 7 8``
``x > 5``
``##  FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE``
``````y <- c(3:10)
y``````
``##   3  4  5  6  7  8  9 10``
``x > y``
``##  FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE``

The `all()` function tests whether all elements are `TRUE`

``````x <-  10:1
y <-  -4:5
x``````
``##   10  9  8  7  6  5  4  3  2  1``
``y``
``##   -4 -3 -2 -1  0  1  2  3  4  5``
``all(x < y)``
``##  FALSE``

The `any()` function tests is any element is ’TRUE`:

``any(x < y)``
``##  TRUE``

including vectors, matrices, data frames (similar to datasets), and lists (collections of objects).

#### 5.5.1.7 Factor Vectors

`Factors` are an important concept in R. `Factors` contain `levels`, which are the unique values of that `factor` variable.

``q``
``##  "One"   "Two"   "Three" "Four"  "Five"  "Six"   "Seven" "Eight"``
``````qFactor <- as.factor(q)
qFactor``````
``````##  One   Two   Three Four  Five  Six   Seven Eight
## Levels: Eight Five Four One Seven Six Three Two``````

Note that the order of `levels`does not matter unless the `ordered` argument is set `TRUE`:

``````factor(x=c("High School", "Doctorate", "Masters", "College"),
levels=c("High School", "College", "Masters", "Doctorate"),
ordered=TRUE)``````
``````##  High School Doctorate   Masters     College
## Levels: High School < College < Masters < Doctorate``````

### 5.5.2 Matrices

A familiar mathematical structure, `matrices` are essential to statistics.

A `Matrix` is a rectangular structure of rows and columns in which every element is of the same type, often all numerics.

`Matrics` can be acted upon similarly to `Vectors`, with PEDMAS-style element-by-element addition, subtraction, division, and equality.

#### 5.5.2.1 Creating a Matrix

``````A <- matrix(1:12, nrow=3)
A``````
``````##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12``````

Any element of a `matrix`can be directly accessed using [square bracket] co-ordinates:

``A[2,3]``
``##  8``
``A[3,4]``
``##  12``

#### 5.5.2.2 Dimensions of a Matrix

``nrow(A)``
``##  3``
``ncol(A)``
``##  4``
``dim(A)``
``##  3 4``

``A``
``````##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12``````
``````B <-  matrix(13:24, nrow=3)
B``````
``````##      [,1] [,2] [,3] [,4]
## [1,]   13   16   19   22
## [2,]   14   17   20   23
## [3,]   15   18   21   24``````
``A + B``
``````##      [,1] [,2] [,3] [,4]
## [1,]   14   20   26   32
## [2,]   16   22   28   34
## [3,]   18   24   30   36``````

#### 5.5.2.4 Multiplying Matrices

``A * B``
``````##      [,1] [,2] [,3] [,4]
## [1,]   13   64  133  220
## [2,]   28   85  160  253
## [3,]   45  108  189  288``````

#### 5.5.2.5 Logical querying

``A == B``
``````##       [,1]  [,2]  [,3]  [,4]
## [1,] FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE FALSE``````

#### 5.5.2.6 Naming rows and columns

``````colnames(A) <- c("A1", "A2", "A3", "A4")
rownames(A) <- c("First", "Second", "Third")
A``````
``````##        A1 A2 A3 A4
## First   1  4  7 10
## Second  2  5  8 11
## Third   3  6  9 12``````
``A["First", "A2"]``
``##  4``
``A[1,2]``
``##  4``

Two special `vectors``letters` and `LETTERS` – create lowercase and UPPERCASE letter named matrix columns or rows:

``````C <- matrix(21:40, nrow=2)
colnames(C) <- LETTERS[1:10]
rownames(C) <- c(letters[1:2])
C``````
``````##    A  B  C  D  E  F  G  H  I  J
## a 21 23 25 27 29 31 33 35 37 39
## b 22 24 26 28 30 32 34 36 38 40``````

### 5.5.3 Dataframes

The `data.frame` is perhaps the primary reason for R’s growing popularity as a powerful, focussed, and flexible language for use in all aspects of Data Science.

A `data.frame` is a rectangular collection of vectors, all of which are of the same length but differing data types.

A `Data Frame` looks like an Excel spreadsheet in that the data is organised into columns and rows. In statistical terms, each column is a variable while each row contains specific observations. Similar to a Matrix only in that it is also rectangular, a `data.frame` is a much more flexible and comprehensive data structure.

#### 5.5.3.1 Creating a Dataframe

Using the existing functions:

``(x <- 8:1)``
``##  8 7 6 5 4 3 2 1``
``(y <- -3:4)``
``##  -3 -2 -1  0  1  2  3  4``
``(q <- c("One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight"))``
``##  "One"   "Two"   "Three" "Four"  "Five"  "Six"   "Seven" "Eight"``

The simplest way of creating a `Dataframe` is with the `data.frame()` function:

``````theDF <- data.frame(x, y, q)
theDF``````
``````##   x  y     q
## 1 8 -3   One
## 2 7 -2   Two
## 3 6 -1 Three
## 4 5  0  Four
## 5 4  1  Five
## 6 3  2   Six
## 7 2  3 Seven
## 8 1  4 Eight``````

This creates an 8x3 `data.frame` consisting of three `vectors`. Notice that the data types are included below the column headings.

To assign names to the `vectors`:

``````theDF <- data.frame(First=x, Second=y, Third=q)
theDF``````
``````##   First Second Third
## 1     8     -3   One
## 2     7     -2   Two
## 3     6     -1 Three
## 4     5      0  Four
## 5     4      1  Five
## 6     3      2   Six
## 7     2      3 Seven
## 8     1      4 Eight``````

To assign names to the rows:

``````rownames(theDF) <- c("One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight")
theDF``````
``````##       First Second Third
## One       8     -3   One
## Two       7     -2   Two
## Three     6     -1 Three
## Four      5      0  Four
## Five      4      1  Five
## Six       3      2   Six
## Seven     2      3 Seven
## Eight     1      4 Eight``````

#### 5.5.3.2 Examining a Dataframe

The `nrow()`, `ncol()`, `dim()`, `rownames()`, and `names()` functions are available to investigate its properties:

``(nrow(theDF))``
``##  8``
``(ncol(theDF))``
``##  3``
``(dim(theDF))``
``##  8 3``
``(rownames(theDF))``
``##  "One"   "Two"   "Three" "Four"  "Five"  "Six"   "Seven" "Eight"``
``(names(theDF))``
``##  "First"  "Second" "Third"``

Elements of any `vector` of a `data.frame` can be directly accessed using the `\$` or `[row, col]` operators:

``(theDF\$Second)``
``##  -3 -2 -1  0  1  2  3  4``
``(theDF[7, 3])``
``````##  Seven
## Levels: Eight Five Four One Seven Six Three Two``````

To specify an entire row, leave out the column specification, vice versa for specifying an entire column:

``(theDF[2, ])``
``````##     First Second Third
## Two     7     -2   Two``````
``(theDF[, 2])``
``##  -3 -2 -1  0  1  2  3  4``

To specify more than one row or column, use a `vector` of indices:

``(theDF[3:5, 2:3])``
``````##       Second Third
## Three     -1 Three
## Four       0  Four
## Five       1  Five``````

To specify multiple columns by name, use a `character vector` of the column names:

``(theDF[, c("First", "Third")])``
``````##       First Third
## One       8   One
## Two       7   Two
## Three     6 Three
## Four      5  Four
## Five      4  Five
## Six       3   Six
## Seven     2 Seven
## Eight     1 Eight``````

To find the `class` of the entire `data.frame`:

``(class(theDF))``
``##  "data.frame"``

or the `class` of any `vector`:

``(class(theDF\$Third))``
``##  "factor"``

#### 5.5.3.3 Displaying a Dataframe

`data.frames` can be small, large, big, huge, or ginormous, depending on their size. The `head()` and `tail()` functions functions print only the first or last few rows, or the number of rows you set:

``(head(theDF))``
``````##       First Second Third
## One       8     -3   One
## Two       7     -2   Two
## Three     6     -1 Three
## Four      5      0  Four
## Five      4      1  Five
## Six       3      2   Six``````
``(head(theDF, n=5))``
``````##       First Second Third
## One       8     -3   One
## Two       7     -2   Two
## Three     6     -1 Three
## Four      5      0  Four
## Five      4      1  Five``````
``(tail(theDF, n=5))``
``````##       First Second Third
## Four      5      0  Four
## Five      4      1  Five
## Six       3      2   Six
## Seven     2      3 Seven
## Eight     1      4 Eight``````

### 5.5.4 Arrays

An `Array` is a multidimensional Vector whose elements are all the same type, but which also have attributes having dimensions (`dim`) that can also be named (`dimnames`).

#### 5.5.4.1 Creating `Arrays`

To create an `Array`, the first element is the row index, the second the column index, and the remaining elements are for the outer dimensions `row`, `column`, `number of arrays`:

``````theArray <- array(1:12, dim = c(2, 3, 2))
theArray``````
``````## , , 1
##
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
##
## , , 2
##
##      [,1] [,2] [,3]
## [1,]    7    9   11
## [2,]    8   10   12``````

#### 5.5.4.2 Accessing Arrays

Individual elements of an `Array` are accesssed using square brackets similar to a `Vector` but in this case by `[row, column, array #]`.

``theArray[1, , ]``
``````##      [,1] [,2]
## [1,]    1    7
## [2,]    3    9
## [3,]    5   11``````
``theArray[2, , ]``
``````##      [,1] [,2]
## [1,]    2    8
## [2,]    4   10
## [3,]    6   12``````
``theArray[1, , 1]``
``##  1 3 5``
``theArray[1, , 2]``
``##   7  9 11``
``theArray[, , 1]``
``````##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6``````
``theArray[, , 2]``
``````##      [,1] [,2] [,3]
## [1,]    7    9   11
## [2,]    8   10   12``````

### 5.5.5 Lists

`Lists` are used to store any number of items of any type: all `numeric` or all `character` vectors, or a mix of them; complete `data.frames`; and even other `lists`.

#### 5.5.5.1 Creating Lists

`Lists` are created with the `list()` function. Each argument to the function becomes an element of the list:

``list(1, 2, 3)``
``````## []
##  1
##
## []
##  2
##
## []
##  3``````

Single-element lists can contain multi-element vectors:

``list(c(1, 2, 3))``
``````## []
##  1 2 3``````

Here’s a two-element list with the second element a five-element `vector`:

``````list1 <- list(c(1, 2, 3), 3:7)
list1``````
``````## []
##  1 2 3
##
## []
##  3 4 5 6 7``````

A two-element `list` with the first element an `array`, the second element a ten-element `vector`:

``````list2 <- list(theArray, 1:10)
list2``````
``````## []
## , , 1
##
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
##
## , , 2
##
##      [,1] [,2] [,3]
## [1,]    7    9   11
## [2,]    8   10   12
##
##
## []
##    1  2  3  4  5  6  7  8  9 10``````

#### 5.5.5.2 Creating Empty Lists

Empty `lists` of a determined length are created using a `vector`:

``(emptyList <- vector(mode = "list", length = 4))``
``````## []
## NULL
##
## []
## NULL
##
## []
## NULL
##
## []
## NULL``````

Note: Enclosing an expression in round brackets displays the results immediately after execution.

#### 5.5.5.3 Naming Lists

`Lists` can have names, and each element of a `list` can have a unique name

``names(list2)``
``## NULL``
``(names(list2) <- c("The Array", "The Vector"))``
``##  "The Array"  "The Vector"``
``list2``
``````## \$`The Array`
## , , 1
##
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
##
## , , 2
##
##      [,1] [,2] [,3]
## [1,]    7    9   11
## [2,]    8   10   12
##
##
## \$`The Vector`
##    1  2  3  4  5  6  7  8  9 10``````

#### 5.5.5.4 Naming List Elements

Names can also be assigned to `list` elements during creation using name-value pairs. This can also include naming the `list` itself:

``(list3 <- list(theARR=theArray, theVECT=1:10, List3=list2))``
``````## \$theARR
## , , 1
##
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
##
## , , 2
##
##      [,1] [,2] [,3]
## [1,]    7    9   11
## [2,]    8   10   12
##
##
## \$theVECT
##    1  2  3  4  5  6  7  8  9 10
##
## \$List3
## \$List3\$`The Array`
## , , 1
##
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
##
## , , 2
##
##      [,1] [,2] [,3]
## [1,]    7    9   11
## [2,]    8   10   12
##
##
## \$List3\$`The Vector`
##    1  2  3  4  5  6  7  8  9 10``````

#### 5.5.5.5 Adding To A List

New elements can be added to a `list` by appending a `numeric` or `named` index that does not yet exist:

``length(list3)``
``##  3``

Adding a `numeric` index:

``````list3[] <- 11
length(list3)``````
``##  4``
``list3``
``````## \$theARR
## , , 1
##
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
##
## , , 2
##
##      [,1] [,2] [,3]
## [1,]    7    9   11
## [2,]    8   10   12
##
##
## \$theVECT
##    1  2  3  4  5  6  7  8  9 10
##
## \$List3
## \$List3\$`The Array`
## , , 1
##
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
##
## , , 2
##
##      [,1] [,2] [,3]
## [1,]    7    9   11
## [2,]    8   10   12
##
##
## \$List3\$`The Vector`
##    1  2  3  4  5  6  7  8  9 10
##
##
## []
##  11``````

Adding a `named` index:

``````list3[["AddedElement"]] <- 12:16
length(list3)``````
``##  5``
``list3``
``````## \$theARR
## , , 1
##
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
##
## , , 2
##
##      [,1] [,2] [,3]
## [1,]    7    9   11
## [2,]    8   10   12
##
##
## \$theVECT
##    1  2  3  4  5  6  7  8  9 10
##
## \$List3
## \$List3\$`The Array`
## , , 1
##
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
##
## , , 2
##
##      [,1] [,2] [,3]
## [1,]    7    9   11
## [2,]    8   10   12
##
##
## \$List3\$`The Vector`
##    1  2  3  4  5  6  7  8  9 10
##
##
## []
##  11
##