Clean, Tidy Data Sets
The data sets below are ready to visualize and model. They are tidy. I filtered out useful subsets. I selected and descriptively renamed the most important variables. I meaningfully reordered factors.
Original Data Sources
- For replication data sets, I recommend starting with the Dataverse archives for AJPS [web and PSRM.
- For really raw data for wrangling practice, I recommend (ordered from least to most difficult) Donald Trump’s tweets [GitHub], the World Bank’s World Development Indicators [GitHub], Google political ads data [web, Dropbox], or 10 million dyadic events [Dataverse].
- For data on international politics, I recommend COW [web], DESTA [web], and Matt Fuhrmann’s data on nuclear weapons [web].
- For data on political institutions, I recommend Polity IV [web], DD [web], Freedom House [web], and DES [web].
- For US state politics, I recommend the Correlates of State Policy data set [web], which combines variables from many projects in single, enormous collection.
- For data on legislator ideology, I recommend NOMINATE [web] and the American Legislatures Project [web].
- For data on human rights, I recommend Human Rights Scores [web], PTS [web], CIRI [web], and ITT [web].
- For survey data, I recommend the ANES [web], CCES [web], CSES [web], and the World Values Survey [web].
- For data from randomized experiments, I recommend TESS [web].
- Google now has a search for data sets.
- Let’s go meta: PolData is a data set of data sets.