Data Analysis I (Data Wrangling and EDA)
Due Date: February 6, 2018 4:30 pm
Exact instructions of directory organization to follow.
The purpose of this assignment is to encourage you to find, load, clean, and explore your data before continuing to analysis.
- State your research question, and your expected research design to answer that question.
- What are your data sources? Cite the data appropriately using name or title of the data set, authors, version, creation date, and persistent data identifier (e.g. DOI) if available or URL if one is not.
Load, describe, and clean your data. Explain the steps you are taking where approporiate. Some questions to consider are:
- Check for missing values. In some cases, some numeric values correspond to missing values. These are often extremely large or small values.
- What are types (string, numeric, integer, date, etc.) and anticipated range of values that variables can take? Do they take those values in the data?
What is the unit of observation of your data? What will the unit of observation be for your analysis? Do you need to merge datasets? If so, what is their common identifier?
- If possible, produce some summary statistcs and appropropriate plots of the distributions of the variables you expect to use as the response and explanatory variables.
What issues did you encounter in getting and cleaning the data? What issues remain?