6 Terminology
Each language has its own jargon and R is not the exception. These are some of the most common terms with their meanings and representation:
6.1 Vector
An ordered collection of usually numbers. E.g., x <- c(3,1,4,1,5,9). The ‘c’ in
function c() can be thought as collection or column.
- character vectors: stores a string of characters.
- logical vectors: a collection of False and True values. E.g.:
## [1] 6 5 4 3 2 1 0
## [1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE
rev()returns a reversed version of its argument.A missing value is represented by the NA characters. For example:
## [1] FALSE FALSE TRUE FALSE FALSE
-is.na() returns TRUE for missing values.
6.2 Dataframes
Dataframes are a collection of vectors in which the columns can be of different types. Usually, a row has one data observation with different aspects of the observation in different columns.
6.3 Factor
A categorical variable in a dataframe may be considered a factor, and each of its categories a level.
c1 = rep(letters[1:2], each = 4)
c2 = rep(c('Yes', 'No'), each = 2, 2)
c3 = rep(c('English','Japanese'),4)
c4 = rnorm(8)
mydata = cbind(c1,c2,c3)
mydata = as.data.frame(mydata)
mydata = cbind(mydata, c4)
mydata## c1 c2 c3 c4
## 1 a Yes English 0.34134757
## 2 a Yes Japanese -0.63737743
## 3 a No English -0.01157832
## 4 a No Japanese -0.24766104
## 5 b Yes English -0.72545116
## 6 b Yes Japanese -1.14623683
## 7 b No English 0.35646986
## 8 b No Japanese -0.37289745
## [1] a a a a b b b b
## Levels: a b
rep(x)replicate the value in xcbind()combine the arguments by columnsas.data.frame(x)coerce \(x\) into a dataframe$is used to access a factor in a dataframe
6.4 Indexing
A row x of a datafame can be selected using square brackets []:
## c1 c2 c3 c4
## 3 a No English -0.01157832
A column maybe selected the same way:
## [1] Yes Yes No No Yes Yes No No
## Levels: No Yes
This way of indexing can be combined:
## c1 c3
## 1 a English
## 3 a English
## 5 b English
## 7 b English