Damian Pavlyshyn
TRUE
or FALSE
)NA
sBasic R objects:
c()
function, or using the :
shortcut## [1] "a" "b" "c"
c()
function, or using the :
shortcut## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## [91] 91 92 93 94 95 96 97 98 99 100
R treats everything as a vector - even individual numbers are just length-one vectors. This is why we see [1]
after every output of a single number:
## [1] 2
The “c” in c()
stands for “concatenate” - what we think of as building a vector out of numbers, R considers to be concatenating a bunch of length-one vectors into a single vector.
This means that we can concatenate longer vectors in the same way:
## [1] 1 2 3 10 20 -1 -2
## [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
## [20] 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76
## [39] 78 80 82 84 86 88 90 92 94 96 98 100
How can we get the odd numbers from 1 to 100 from even
?
## [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
## [20] 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76
## [39] 78 80 82 84 86 88 90 92 94 96 98 100
How can we get the odd numbers from 1 to 100 from even
?
## [1] 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
## [26] 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99
To extract a subset of elements by their indices, put a vector of indices in square brackets
Warning: Unlike many programming languages, R indexes the first element of a vector by 1 rather than 0!
## [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
## [20] 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76
## [39] 78 80 82 84 86 88 90 92 94 96 98 100
## [1] 2
To extract a subset of elements by their indices, put a vector of indices in square brackets
Warning: Unlike many programming languages, R indexes the first element of a vector by 1 rather than 0!
## [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
## [20] 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76
## [39] 78 80 82 84 86 88 90 92 94 96 98 100
## [1] 6 8 10 12 14
To extract a subset of elements by their indices, put a vector of indices in square brackets
Warning: Unlike many programming languages, R indexes the first element of a vector by 1 rather than 0!
## [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
## [20] 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76
## [39] 78 80 82 84 86 88 90 92 94 96 98 100
## [1] 6 10
Note: even[3,5]
does not have a vector in the square brackets and so will not work!
To extract all except a few indices, put a negative sign before the vector of indices
## [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
## [20] 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76
## [39] 78 80 82 84 86 88 90 92 94 96 98 100
## [1] 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42
## [20] 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80
## [39] 82 84 86 88 90 92 94 96 98 100
If we index using a vector of logical values (TRUE
or FALSE
), this will extract all elements of the original vector corresponding to the TRUE
indices
## [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
## [20] 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76
## [39] 78 80 82 84 86 88 90 92 94 96 98 100
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE
## [1] 2 4 6 8 10 12 14 16 18
Use the length
function to figure out how many elements there are in a vector
## [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
## [20] 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76
## [39] 78 80 82 84 86 88 90 92 94 96 98 100
## [1] 50
What happens if we try to extract an invalid index?
## numeric(0)
No error thrown!!
(In fact, this returns the type of the elements making up the vector, which can be useful information.)
What happens if we try to extract an invalid index?
## numeric(0)
No error thrown!!
(In fact, this returns the type of the elements making up the vector, which can be useful information.)
What happens if we try to extract an invalid index?
## numeric(0)
No error thrown!!
(In fact, this returns the type of the elements making up the vector, which can be useful information.)
## [1] NA
No error thrown!!
(This makes less sense)
If we try to assign a vector with different types, type coercion happens.
## [1] "1" "2" "a"
It’s not always obvious how R will decide to do the type coercion, so I don’t recommend relying on this (and besides, if you are trying to use the numbers 1 and 2, and the string “a” in a single vector, something has probably already gone wrong!)
If we try to assign a vector with different types, type coercion happens.
## [1] "1" "2" "a"
It’s not always obvious how R will decide to do the type coercion, so I don’t recommend relying on this (and besides, if you are trying to use the numbers 1 and 2, and the string “a” in a single vector, something has probably already gone wrong!)
It’s almost always safe and sensible to coerce integers into floating point numbers, though. You may not even have realised that you are doing type coercion when typing something like:
## [1] 1.0 2.0 3.5
Two-dimensional analogs of vectors
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12
Indexing: put the rows you want before the comma, columns you want after the comma
## [1] 4
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12
What does A[c(1,3), c(2,4)]
return?
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12
What does A[c(1,3), c(2,4)]
return?
## [,1] [,2]
## [1,] 4 10
## [2,] 6 12
To extract whole rows (or columns), we just leave the column (or row) specification blank.
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 3 6 9 12
## [,1] [,2]
## [1,] 4 10
## [2,] 5 11
## [3,] 6 12
list()
functionUse [[
or $
notation to refer to a specific key-value pair
## [1] "Honda"
## [1] "Fit" "CR-V" "Odyssey"
The most general definition is simply 2-dimensional array of data (ok, that’s not especially enlightening).
It is good practice to have a standard format for all data tables so that we can compare and combine them, and write software that works generically. This specification is
In this example, each row encodes the 2016 presidential election results of a single county.
The columns show the various quantites, or variables that were measured in that county.
## fips_cod county total dem gop other dem_prop gop_prop
## 1 26041 Delta County 18467 6431 11112 924 0.3482428 0.6017220
## 2 48295 Lipscomb County 1322 135 1159 28 0.1021180 0.8767020
## 3 01127 Walker County 29243 4486 24208 549 0.1534042 0.8278220
## 4 48389 Reeves County 3184 1659 1417 108 0.5210427 0.4450377
## 5 56017 Hot Springs County 2535 400 1939 196 0.1577909 0.7648915
## 6 20043 Doniphan County 3366 584 2601 181 0.1734997 0.7727273
## other_prop state state_name
## 1 0.05003520 MI Michigan
## 2 0.02118003 TX Texas
## 3 0.01877372 AL Alabama
## 4 0.03391960 TX Texas
## 5 0.07731755 WY Wyoming
## 6 0.05377302 KS Kansas
## # A tibble: 6 x 11
## fips_cod county total dem gop other dem_prop gop_prop other_prop state
## <chr> <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 26041 Delta Cou~ 18467 6431 11112 924 0.348 0.602 0.0500 MI
## 2 48295 Lipscomb ~ 1322 135 1159 28 0.102 0.877 0.0212 TX
## 3 01127 Walker Co~ 29243 4486 24208 549 0.153 0.828 0.0188 AL
## 4 48389 Reeves Co~ 3184 1659 1417 108 0.521 0.445 0.0339 TX
## 5 56017 Hot Sprin~ 2535 400 1939 196 0.158 0.765 0.0773 WY
## 6 20043 Doniphan ~ 3366 584 2601 181 0.173 0.773 0.0538 KS
## # ... with 1 more variable: state_name <chr>
## 'data.frame': 3112 obs. of 11 variables:
## $ fips_cod : chr "26041" "48295" "01127" "48389" ...
## $ county : chr "Delta County" "Lipscomb County" "Walker County" "Reeves County" ...
## $ total : int 18467 1322 29243 3184 2535 3366 510940 78264 24661 8171 ...
## $ dem : int 6431 135 4486 1659 400 584 298353 40967 3412 1093 ...
## $ gop : int 11112 1159 24208 1417 1939 2601 193607 35191 20655 6863 ...
## $ other : int 924 28 549 108 196 181 18980 2106 594 215 ...
## $ dem_prop : num 0.348 0.102 0.153 0.521 0.158 ...
## $ gop_prop : num 0.602 0.877 0.828 0.445 0.765 ...
## $ other_prop: num 0.05 0.0212 0.0188 0.0339 0.0773 ...
## $ state : chr "MI" "TX" "AL" "TX" ...
## $ state_name: chr "Michigan" "Texas" "Alabama" "Texas" ...
Consider the following simple data frame that counts the total number of votes for the two major parties:
## votes_dem votes_gop
## 1 486351 91189
## 2 318 211
## 3 5904 10239
Now let’s look at its structure:
## 'data.frame': 3 obs. of 2 variables:
## $ votes_dem: num 486351 318 5904
## $ votes_gop: num 91189 211 10239
## [1] TRUE
## [1] 486351 318 5904
which loads a tibble called vehicles
:
## # A tibble: 6 x 12
## id make model year class trans drive cyl displ fuel hwy cty
## <dbl> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 13309 Acura 2.2CL/~ 1997 Subcom~ Autom~ Front-~ 4 2.2 Regu~ 26 20
## 2 13310 Acura 2.2CL/~ 1997 Subcom~ Manua~ Front-~ 4 2.2 Regu~ 28 22
## 3 13311 Acura 2.2CL/~ 1997 Subcom~ Autom~ Front-~ 6 3 Regu~ 26 18
## 4 14038 Acura 2.3CL/~ 1998 Subcom~ Autom~ Front-~ 4 2.3 Regu~ 27 19
## 5 14039 Acura 2.3CL/~ 1998 Subcom~ Manua~ Front-~ 4 2.3 Regu~ 29 21
## 6 14040 Acura 2.3CL/~ 1998 Subcom~ Autom~ Front-~ 6 3 Regu~ 26 17
We’ll load the dataset directly from the course website (in later lectures we’ll see how to load files from your hard drive)
elections <- read_csv(
"http://web.stanford.edu/class/stats32/assets/lecture-2/2016-presidential-election-county-results.csv",
col_types = "cciiiidddcc"
)
This is the data set from earlier in the lecture:
## # A tibble: 6 x 11
## fips_cod county total dem gop other dem_prop gop_prop other_prop state
## <chr> <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 26041 Delta Cou~ 18467 6431 11112 924 0.348 0.602 0.0500 MI
## 2 48295 Lipscomb ~ 1322 135 1159 28 0.102 0.877 0.0212 TX
## 3 01127 Walker Co~ 29243 4486 24208 549 0.153 0.828 0.0188 AL
## 4 48389 Reeves Co~ 3184 1659 1417 108 0.521 0.445 0.0339 TX
## 5 56017 Hot Sprin~ 2535 400 1939 196 0.158 0.765 0.0773 WY
## 6 20043 Doniphan ~ 3366 584 2601 181 0.173 0.773 0.0538 KS
## # ... with 1 more variable: state_name <chr>
fueleconomy
: Package information on CRANhttps://cran.r-project.org/web/packages/fueleconomy/index.html
Optional material