R Programming
Different from S? Sweave
01 Background
Resources
- 📖 R for Data Science ⭐️⭐️⭐️⭐️⭐️
- 📖 The Book of R
- 📖 The Art of R Programming
- 📹 Data Science: Foundations using R Specialization by Coursera ⭐️⭐️⭐️
- 🌐 R for Beginners by CRAN
- https://r-graph-gallery.com/
Limited in big data, too big to load in memory
02 Core Concepts
Key Terms
- Facets
- Geoms
- Asethetic
- Atomic: (1) character (2) number (real) (3) integer (4) complex (5) logical Graphs tutorial
- Attributes: (1) names, dimnames (2) dimensions (matrices, arrays) (3) class (4) length (5) other user-defined attributes/metadata
- Coercion Implicit / Explicit
- Matrices
- Factors Ordered / Unordered
- Missing Values NaN is Na, but Na is not NaN
- Data Frames
- Reading Data
read.tableandread.csvfor tabularsourcefor reading R code files
- Dput, Dget, Dumping
- Subsetting
- Partial Matching
- https://www.storybench.org/getting-started-data-visualization-r-using-ggplot2/
cacherpackage- Plotting
base,lattice,ggplot2,tidyverse
dir
?args # helper
args() # determine the arguments of a
”...” argument used heavily in generic functions
Lexical Scoping
Complete Cases Times and Dates
as.Datefor datesas.POSIXctandas.POSIXltfor timesstrptimecan be used to coerce to the above
Loop Functions (to make looping more easy on the command line)
lapplylook over a list and evaluate a function on each elementsapplysame as lapply but try to simplify the resultapplyapply function over the margins of arraytapplyapply function over subsets of a vectormapplymultivariate version of lappy
Factor Variables
Debugging Tools - Basic Tools
tracebackdebugbrowsertracerecover
str Function
- compactly display the internal structure of an R object
- an alternative to
summary
Simulation
rnormgenerate random Normal variates with a given mean and standard deviationdnormevaluate the Normal probability (with a given mean/SD) at a point (or vector of points)pnormevaluate the cumulative distribution function for a Normal distributionrpoisgenerate random Poisson variates with a given rate
R Profiler
system.time()RprofsummaryRprof()
Good folder hiearchy
Project.Rproj
Data/
Scripts/
Output/
Knitr
Rpubs
R Markdown
Subsetting
X[,1:5] # all rows for columns 1 to 5
X[1:5,] # rows 1 to 5 for all columns
X[(x$var1 <= 3), ] # All rows where value at column var1 is <= 3
# Deal with missing values
X[which(X$var2 > 8), ] # return indices are greater than 8, does not return Na's
# Sort
sort(X$var1)
X[order(X$var2, X$var3), ] # order by var2, then var3
# Ordering with plyr
# Adding rows and columns
X$var4 <- rnorm(5)
Y <- cbind(X, rnnom(5)) # add a column to the right hand side of X
# Check for missing values
sum(is.na(X$var1)) # number of missing values in column var1
any(is.na(X))
colSums(is.na(X)) # number of Na's per column
all(colSums(is.na(X))) # number of Na's in entire data setCross tabs - similar to x table in Excel
> data("UCBAdmissions")
> DF <- as.data.frame(UCBAdmissions)
> summary(DF)
Admit Gender Dept Freq
Admitted:12 Male :12 A:4 Min. : 8.0
Rejected:12 Female:12 B:4 1st Qu.: 80.0
C:4 Median :170.0
D:4 Mean :188.6
E:4 3rd Qu.:302.5
F:4 Max. :512.0
> xt <- xtabs(Freq ~ Gender + Admit, data = DF)
> xt
Admit
Gender Admitted Rejected
Male 1198 1493
Female 557 1278Flat Tables