# 6 Terminology

Each language has its own jargon and R is not the exception. These are some of the most common terms with their meanings and representation:

## 6.1 Vector

An ordered collection of usually numbers. E.g., x <- c(3,1,4,1,5,9). The ‘c’ in function c() can be thought as collection or column.

• character vectors: stores a string of characters.
• logical vectors: a collection of False and True values. E.g.:
a = 0:6
b = rev(a)
b
##  6 5 4 3 2 1 0
c = a>b
c
##  FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE
• rev() returns a reversed version of its argument.

• A missing value is represented by the NA characters. For example:

d = c(1,2,NA,4,5)
is.na(d)
##  FALSE FALSE  TRUE FALSE FALSE

-is.na() returns TRUE for missing values.

## 6.2 Dataframes

Dataframes are a collection of vectors in which the columns can be of different types. Usually, a row has one data observation with different aspects of the observation in different columns.

## 6.3 Factor

A categorical variable in a dataframe may be considered a factor, and each of its categories a level.

c1 = rep(letters[1:2], each = 4)
c2 = rep(c('Yes', 'No'), each  = 2, 2)
c3 = rep(c('English','Japanese'),4)
c4 = rnorm(8)

mydata = cbind(c1,c2,c3)
mydata = as.data.frame(mydata)
mydata = cbind(mydata, c4)
mydata
##   c1  c2       c3          c4
## 1  a Yes  English  0.34134757
## 2  a Yes Japanese -0.63737743
## 3  a  No  English -0.01157832
## 4  a  No Japanese -0.24766104
## 5  b Yes  English -0.72545116
## 6  b Yes Japanese -1.14623683
## 7  b  No  English  0.35646986
## 8  b  No Japanese -0.37289745
mydata$c1 ##  a a a a b b b b ## Levels: a b • rep(x) replicate the value in x • cbind() combine the arguments by columns • as.data.frame(x) coerce $$x$$ into a dataframe • $ is used to access a factor in a dataframe

## 6.4 Indexing

A row x of a datafame can be selected using square brackets []:

mydata[3,] # select all columns of the third row.
##   c1 c2      c3          c4
## 3  a No English -0.01157832

A column maybe selected the same way:

mydata[,2] # select all rows of the second column.
##  Yes Yes No  No  Yes Yes No  No
## Levels: No Yes

This way of indexing can be combined:

mydata[c(1,3,5,7),c(1,3)] # select every other rows of the first and third columns.
##   c1      c3
## 1  a English
## 3  a English
## 5  b English
## 7  b English