R Programming

Different from S? Sweave

01 Background

Resources

📖 R for Data Science ⭐️⭐️⭐️⭐️⭐️
📖 The Book of R
📖 The Art of R Programming
📹 Data Science: Foundations using R Specialization by Coursera ⭐️⭐️⭐️
🌐 R for Beginners by CRAN
https://r-graph-gallery.com/

Limited in big data, too big to load in memory

02 Core Concepts

Key Terms

Facets
Geoms
Asethetic
Atomic: (1) character (2) number (real) (3) integer (4) complex (5) logical Graphs tutorial
Attributes: (1) names, dimnames (2) dimensions (matrices, arrays) (3) class (4) length (5) other user-defined attributes/metadata
Coercion Implicit / Explicit
Matrices
Factors Ordered / Unordered
Missing Values NaN is Na, but Na is not NaN
Data Frames
Reading Data
- read.table and read.csv for tabular
- source for reading R code files
Dput, Dget, Dumping
Subsetting
Partial Matching
https://www.storybench.org/getting-started-data-visualization-r-using-ggplot2/
cacher package
Plotting base, lattice, ggplot2, tidyverse

dir
?args  # helper
args() # determine the arguments of a

”...” argument used heavily in generic functions

Lexical Scoping

Complete Cases Times and Dates

as.Date for dates
as.POSIXct and as.POSIXlt for times
strptime can be used to coerce to the above

Loop Functions (to make looping more easy on the command line)

lapply look over a list and evaluate a function on each element
sapply same as lapply but try to simplify the result
apply apply function over the margins of array
tapply apply function over subsets of a vector
mapply multivariate version of lappy

Factor Variables

Debugging Tools - Basic Tools

traceback
debug
browser
trace
recover

str Function

compactly display the internal structure of an R object
an alternative to summary

Simulation

rnorm generate random Normal variates with a given mean and standard deviation
dnorm evaluate the Normal probability (with a given mean/SD) at a point (or vector of points)
pnorm evaluate the cumulative distribution function for a Normal distribution
rpois generate random Poisson variates with a given rate

R Profiler

system.time()
Rprof
summaryRprof()

Good folder hiearchy

Project.Rproj
Data/
Scripts/
Output/

Knitr

Rpubs

R Markdown

Subsetting

X[,1:5] # all rows for columns 1 to 5
X[1:5,] # rows 1 to 5 for all columns
X[(x$var1 <= 3), ] # All rows where value at column var1 is <= 3
 
# Deal with missing values
X[which(X$var2 > 8), ] # return indices are greater than 8, does not return Na's
 
# Sort
sort(X$var1)
X[order(X$var2, X$var3), ] # order by var2, then var3
 
# Ordering with plyr
 
 
# Adding rows and columns
X$var4 <- rnorm(5)
Y <- cbind(X, rnnom(5)) # add a column to the right hand side of X
 
 
# Check for missing values
sum(is.na(X$var1)) # number of missing values in column var1
any(is.na(X))
colSums(is.na(X)) # number of Na's per column
all(colSums(is.na(X))) # number of Na's in entire data set

Cross tabs - similar to x table in Excel

> data("UCBAdmissions")
> DF <- as.data.frame(UCBAdmissions)
> summary(DF)
      Admit       Gender   Dept       Freq      
 Admitted:12   Male  :12   A:4   Min.   :  8.0  
 Rejected:12   Female:12   B:4   1st Qu.: 80.0  
                           C:4   Median :170.0  
                           D:4   Mean   :188.6  
                           E:4   3rd Qu.:302.5  
                           F:4   Max.   :512.0  
> xt <- xtabs(Freq ~ Gender + Admit, data = DF)
> xt
        Admit
Gender   Admitted Rejected
  Male       1198     1493
  Female      557     1278

Flat Tables

🧠

Explorer

R Programming

R Programming

01 Background

02 Core Concepts

Graph View

Table of Contents

Backlinks