This is a cleaned version of the replication data from Clark and Golder (2006).

The data are election level, so that each row of the data set represents one election.

# load packages
library(tidyverse)

glimpse()
## Observations: 555
## Variables: 10
## $country <chr> "Albania", "Albania", "Albania", "Argenti... ##$ year                 <dbl> 1992, 1996, 1997, 1946, 1951, 1954, 1958,...
## $average_magnitude <dbl> 1.00, 1.00, 1.00, 10.53, 10.53, 4.56, 8.1... ##$ eneg                 <dbl> 1.106929, 1.106929, 1.106929, 1.342102, 1...
## $enep <dbl> 2.190, 2.785, 2.870, 5.750, 1.970, 1.930,... ##$ upper_tier           <dbl> 28.57, 17.86, 25.80, 0.00, 0.00, 0.00, 0....
## $en_pres <dbl> 0.00, 0.00, 0.00, 2.09, 1.96, 1.96, 2.65,... ##$ proximity            <dbl> 0.00, 0.00, 0.00, 1.00, 1.00, 0.20, 1.00,...
## $social_heterogeneity <fct> Bottom 3rd of ENEG, Bottom 3rd of ENEG, B... ##$ electoral_system     <fct> Single-Member District, Single-Member Dis...

# Variable Descriptions

## country: Country Name

• Coding: The name of the country that held the election.
• Type: character

## year: Year

• Coding: The year of the election.
• Type: integer

## average_magnitude: Average District Magnitude

• Coding: The average (or mean) of the district magnitude (the number of seats available in the district) across all the districts in the country. For the U.S. House of Representatives, this would be one, because we have single-member districts (i.e., magnitude of one). In Israel, the average magnitude is 120, because they have a single national district with 120 members. [ctk: add another specific example here.]
• Type: double
# histogram
qplot(average_magnitude, data = cg)
## stat_bin() using bins = 30. Pick better value with binwidth.

## enep: The Effective Number of Electoral Parties

• Coding: Calculated as $$ENEP_j = \dfrac{1}{\sum_{i = 1}^n v_{ij}^2}$$, where $$ENEP_j$$ represents the effective number of electoral parties in election $$j$$ and $$v_{ij}$$ represents the vote share (as a proportion) for party $$i$$ in election $$j$$. For the details of this measure, see Clark and Golder (2006).
• Type: double
# histogram
qplot(enep, data = cg)
## stat_bin() using bins = 30. Pick better value with binwidth.

## eneg: The Effective Number of Ethnic Groups

• Coding: Calculated as $$ENEG_j = \dfrac{1}{\sum_{i = 1}^n p_{ij}^2}$$, where $$ENEG_j$$ represents the effective number of ethnic groups in the country when election $$j$$ occurred and $$p_{ij}$$ represents the proportion of the population falling into ethnic group $$i$$ when election $$j$$ occurred. For the details of this measure, see Clark and Golder (2006) or Clark, Golder, and Golder (ctk), chapter ctk.
• Type: double
# histogram
qplot(eneg, data = cg)
## stat_bin() using bins = 30. Pick better value with binwidth.

## electoral_system: The Type of Electoral System

• Source: This varaible is created from average_magnitude.
• Coding:
• "Single-Member Distict": when average_magnitude = 1.
• "Small-Magnitude PR": when 1 < average_magnitude $$\leq$$ 7.
• "Large-Magnitude PR": when average_magnitude > 7.
• Type: factor
# bar plot
qplot(electoral_system, data = cg)

## social_heterogeneity: Terciles of ENEG

• Source: This varaible is created from eneg.
• Coding:
• "Bottom 3rd of ENEG"
• "Middle 3rd of ENEG"
• "Top 3rd of ENEG"
• Type: factor
# bar plot
qplot(social_heterogeneity, data = cg)

## Other Variables

See Clark and Golder (2006) for the definitions of upper_tier, en_pres, and proximity.