|Class||Tuesday||04:30–07:20 PM||Savery 140|
|Lab||Friday||01:30–03:20 PM||Savery 121|
This course is the first of a two-quarter introductory quantitative methods sequence for PhD students in the social sciences. Students are introduced to the full stack of skills needed to conduct modern social science data analysis including data visualization, data wrangling, programming for data analysis, reproducible research, probability and statistical inference, linear regression, and causal inference. Students will learn these skills while applying them to their own research problem. The objective of this course is to provide students with a hands-on introduction and overview of modern social science data analysis and a base of skills to pursue further research in quantitative methods. The next course in the sequence, POLS 503, covers linear models and causal inference methods in more detail.
By the end of the course students will be able to
- Complete a research project demonstrating mastery of statistical data analysis from exploratory analysis to inference to modeling
- Visualize data and statistical models using R
- Import and clean tabular data for use in statistical analysis using R
- Write clean, reusable, and reliable R code using current best practices
- Produce reproducible research using literate programming (R Markdown), version control (git and GitHub), and current best practices
- Explain the difference between descriptive, predictive, and causal statistical questions.
- Explain the difference between estimation and hypothesis testing that can be used to make causal claims from observational data
- Conduct, interpret, and communicate results from t-tests,
$\chi^2$tests, and linear regression.
- Explain limitations of causal inference using observational data (selection on observables) and methods for making causal inference with observational data (difference-in-difference, before-and-after, regression discontinuity).
- Feel empowered working with data
The course is suitable for students with a large range of prior exposure to statistics and mathematics. No prior statistical, mathematical, or programming experience is necessary beyond arithmetic, algebra, and elementary calculus.
The most important prerequisite is a willingness to work hard on possibly unfamiliar material.
Since the background of students taking this course is heterogeneous, the topics and structure of the course is such that those with a wide background of technical skills are likely to find something useful in it.
There are three main types of assignments for students
- Weekly homework: Learning data analysis requires practice. There will be weekly homework assignments. See the assignments page.
- Research project: Students will complete a research project. The expectation is that students will work on the project throughout the quarter and apply concepts and skills to that project soon after covering them in class. See the project page for more details.
- Reading Assignments: Students are expected to come to class prepared. I have chosen textbooks that are accessible, so we will not spend valuable class time summarizing assigned readings. Instead we will use class for more value-added learning activities. As part of that, before each class students will provide feedback and questions on the readings that will be used to guide class in-class discussion.
Students should have a laptop that they can bring to both class and lab as we will integrate computing with learning data analysis and statistics throughout the course.
This course will use R, which is a free and open-source programming language primarily used for statistics and data analysis. We will also use RStudio, which is an easy-to-use interface to R. Instructions to install or upgrade R are here.
This course will also use git (through RStudio) for version control, which is like “track changes” for a directory of files, in reproducible research. Homework assignments will be distributed and submitted via GitHub, which is a website that hosts git repositories. If that did not make sense, don’t worry; we’ll cover it in the course.
Students will have access to DataCamp classroom. You can use this for additional practice.
This course will primarily rely on the following texts:
- Imai, Kosuke. 2017. Quantitative Social Science: An Introduction. Princeton University Press.
- Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science. O’Reilly Media. (free online)
Quantitative Social Science (QSS) will provide the outline of the statistical topics that we will cover. R for Data Science (R4DS) covers R programming for data visualization and wrangling.
Additionally, these texts are supplemented by the following:
- QSS Tidyverse Code. R4DS and this course will use a set of R packages known as the tidyverse. However, QSS does not. This course supplements QSS with tidyverse.
- R for Data Science Solutions, to help you when working through R for Data Science problems.
There will be some additional readings as indicated in the schedule.
Students will be evaluated on the whole of their work in this course.
For this course, grades on the 4.0 scale have the following interpretation:
|3.7||Somewhat below average|
|3.6||Not up to expectations|
|≤ 3.5||Way below expectations|
For questions regarding the content of the course, ask and answer them on our Slack channel. If you have a question about the topic, it is likely that someone else had the same question. Posting questions and answers publicly allows us all to learn from each of these questions and answers.
Reserve emails to the instructors for personal matters.
A summary of changes to the syllabus and schedule are posted in the CHANGELOG/
Beyond what the teaching team can provide, there are several resources on campus that you can go to for assistance with data, computing, and statistical problems:
- Center for Social Science Computing and Research (CSSCR) has a drop-in statistical consulting center in Savery 119. They provide consulting on statistical software, e.g. R. Go there for software or data related questions.
- CSSS Statistical Consulting provides general statistical consulting. Go there for questions about statistical methods.
- eScience Data Science Office Hours
Science should be open, and this course builds up other open licensed material, so unless otherwise noted, all materials for this class are licensed under a Creative Commons Attribution 4.0 International License.
If you find any typos or other issues in this page, or any other page in the site, go to issues, click on the “New Issue” button to create a new issue, and describe the problem.