Spring 2018
Lectures
TuTh 12:30-2pm, 60 Evans
Labs
Thursday 2-4pm, 332 Evans Thursday 4-6pm, 332 Evans
Instructor | Office | Office Hours | |
---|---|---|---|
Will Fithian | 301 Evans | wfithian@berkeley.edu | Tu 2-3pm, W 11am-12pm in Evans 301 |
GSI | Office Hours | |
---|---|---|
Kevin Attiyeh | kattiyeh@berkeley.edu | M 8-10am, Tu 9-11am in Evans 428 |
There is no book for this class. Instead, we have provided extensive lecture notes. If you would like some additional optional reading, you can try the following books.
- The Statistical Sleuth: A Course in Methods of Data Analysis by Ramsey and Schafer
- Introductory Statistics with R by Peter Dalgaard
Neither of these books covers all of the topics we will cover, nor do they have the same perspective and focus as this class – they do not have extensive use of bootstrapping and resampling methods. But for those students wanting some additional structure or R assistance these books may be helpful and should be at the right level for this class
Online R Resources
Downloading RStudio onto your own computer and editing files locally is the recommended way to do assignments, if your computer is capable of doing this.
If you have a Chromebook or no laptop, you can alternatively use mybinder:, an online R environment for using RStudio. Be sure to download any file you are editing before you quit the session!
In addition, the following references may be helpful:
- Try-R: A friendly, interactive resource for learning the basics of R
- Base R cheat sheet from RStudio
- Longer R cheat sheet from CRAN
- Official tutorial from CRAN
- Google search to answer questions about specific functions / packages.
There are many more options! If you find a good one that isn’t on the list, let me know.
Syllabus
Note links will only work once the material is posted (i.e. as the semester progresses). Visit the sites from previous semesters to get access to the reading material in advance (there is only small variation from year to year)
Week | Description | Chapter | Lab Link | Assignment Due |
---|---|---|---|---|
01 | Boxplots, discrete distributions, intro to continuous distributions | 01 | Lab 1 | |
02 | Continuous distributions, density curves, density estimation | 01 | Lab 2 | |
03 | Permutation test, t-test and assumptions | 02 | Lab 3 | Hw1 Due (F) |
04 | More on assumptions, type I error, multiple testing, Bonferroni corrections | 02 | Lab 4 | |
05 | Confidence intervals, Bonferroni corrections, review simple regression | 02, 03 | Lab 5 | Hw2 Due (W) |
06 | Polynomial regression, loess curves | 03 | Lab 6 | |
07 | Finish loess curves, smooth density plots, pairs plots, alluvial plots, mosaic plots | 03, 04 | Lab 7 | Hw3 Due (M) |
08 | Heatmaps, hierarchical clustering, PCA | 04 | Midterm Review | Hw4 Due (W) |
09 | Midterm (M) Finish PCA, start multiple regression |
04, 05 | Lab 8 | Project 1 Due (F) |
10 | Multiple linear regression, fitting and interpretation, fitted values, residuals, Multiple R-squared, Residual degrees of freedom and residual standard error | 05 | Lab 9 | |
Spring Break | ||||
11 | Multiple regression with categorical explanatory variables and interactions, Inference in multiple regression: F-tests via the anova function | 05 | Lab 10 | |
12 | Inference in multiple regression: t-tests, standard errors, confidence intervals and prediction intervals. Variable selection in linear regression. Regression diagnostics | 05 | Lab 11 | Hw5 Due (M) |
13 | The Classification problem and logistic regression, interpretation in terms of odds, binary predictions via confusion matrices, precision and recall, deviance, variable selection via AIC | 06 | Lab 12 | Hw6 Due (W) |
14 | Regression trees, classification trees and Random Forests | 07 | Lab 13 | Project 2 Due (F) |
15 | Reading and recitation week: no class | Project 3 Due (F) |