This is a cleaned version of the replication data from Clark and Golder (2006).

The data are **election level**, so that each row of the data set represents one election.

```
# load packages
library(tidyverse)
# load data
cg <- read_rds("parties.rds") %>%
glimpse()
```

```
## Observations: 555
## Variables: 10
## $ country <chr> "Albania", "Albania", "Albania", "Argenti...
## $ year <dbl> 1992, 1996, 1997, 1946, 1951, 1954, 1958,...
## $ average_magnitude <dbl> 1.00, 1.00, 1.00, 10.53, 10.53, 4.56, 8.1...
## $ eneg <dbl> 1.106929, 1.106929, 1.106929, 1.342102, 1...
## $ enep <dbl> 2.190, 2.785, 2.870, 5.750, 1.970, 1.930,...
## $ upper_tier <dbl> 28.57, 17.86, 25.80, 0.00, 0.00, 0.00, 0....
## $ en_pres <dbl> 0.00, 0.00, 0.00, 2.09, 1.96, 1.96, 2.65,...
## $ proximity <dbl> 0.00, 0.00, 0.00, 1.00, 1.00, 0.20, 1.00,...
## $ social_heterogeneity <fct> Bottom 3rd of ENEG, Bottom 3rd of ENEG, B...
## $ electoral_system <fct> Single-Member District, Single-Member Dis...
```

`country`

: Country Name- Coding: The name of the country that held the election.
- Type: character

`year`

: Year- Coding: The year of the election.
- Type: integer

`average_magnitude`

: Average District Magnitude- Coding: The average (or mean) of the district magnitude (the number of seats available in the district) across all the districts in the country. For the U.S. House of Representatives, this would be one, because we have single-member districts (i.e., magnitude of one). In Israel, the average magnitude is 120, because they have a single national district with 120 members. [ctk: add another specific example here.]
- Type: double

```
# histogram
qplot(average_magnitude, data = cg)
```

`## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.`

`enep`

: The Effective Number of - Coding: Calculated as \(ENEP_j = \dfrac{1}{\sum_{i = 1}^n v_{ij}^2}\), where \(ENEP_j\) represents the effective number of electoral parties in election \(j\) and \(v_{ij}\) represents the
**vote share**(as a proportion) for party \(i\) in election \(j\). For the details of this measure, see Clark and Golder (2006). - Type: double

```
# histogram
qplot(enep, data = cg)
```

`## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.`

`eneg`

: The Effective Number of - Coding: Calculated as \(ENEG_j = \dfrac{1}{\sum_{i = 1}^n p_{ij}^2}\), where \(ENEG_j\) represents the effective number of ethnic groups in the country when election \(j\) occurred and \(p_{ij}\) represents the proportion of the population falling into ethnic group \(i\) when election \(j\) occurred. For the details of this measure, see Clark and Golder (2006) or Clark, Golder, and Golder (ctk), chapter ctk.
- Type: double

```
# histogram
qplot(eneg, data = cg)
```

`## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.`

`electoral_system`

: The Type of Electoral System- Source: This varaible is created from
`average_magnitude`

. - Coding:
`"Single-Member Distict"`

: when`average_magnitude`

= 1.`"Small-Magnitude PR"`

: when 1 <`average_magnitude`

\(\leq\) 7.`"Large-Magnitude PR"`

: when`average_magnitude`

> 7.

- Type: factor

```
# bar plot
qplot(electoral_system, data = cg)
```

See Clark and Golder (2006) for the definitions of `upper_tier`

, `en_pres`

, and `proximity`

.

`social_heterogeneity`

: Terciles of ENEG`eneg`

.`"Bottom 3rd of ENEG"`

`"Middle 3rd of ENEG"`

`"Top 3rd of ENEG"`