Final Inter-Coder Reliability

Each news item in the dataset was coded by two coders (see coding procedure). After each of the two coders coded the material, they held a consensus discussion in order to resolve dissenting codes. The final dataset created for the statistical analysis consists of the codes after the consensus discussion.

To assess the coding quality, we extracted the codes of the individual coders before the consensus discussion and calculated the interrater agreement. It was calculated based on the framework presented by Klein (2018). The next table contains the results of the reliability analysis for each of the observed variables. The material needed for reproduction of the reliability calculation is attached below.

Tone (item level) Opposing positions (item level) Civil society, citizen, expert presence (actor level) Opposition speaker presence (actor level) Religious multiperspectivalness (actor level)
Percent Agreement .94 .91 .88 .92 .92
Brennan and Prediger's Kappa .92 .86 .87 .91 .92
Cohen/Conger's Kappa .74 .82 .85 .79 .82
Scott/Fleiss' Pi .74 .82 .85 .79 .82
Gwet's AC .94 .88 .87 .91 .92
Krippendorff's Alpha .74 .82 .85 .79 .82
N 1,700 1,699 10,968 10,968 10,968

Note: Reliability analyses for all variables included first and second coder during main stage of content analysis. The constructs civil society presence, citizen presence, and expert presence were captured by a single variable (see codebook). All estimates are unweighted.

In-group reference (justification level) Out-group reference (justification level) Common good reference (justification level) DQI index [in-group, out-group, common good reference] (justification level)
Percent Agreement .96 .90 .92 .92
Brennan and Prediger's Kappa .91 .81 .85 .73
Cohen/Conger's Kappa .58 .62 .53 .61
Scott/Fleiss' Pi .58 .62 .53 .61
Gwet's AC .95 .87 .91 .86
Krippendorff's Alpha .58 .62 .53 .61
N 4,867 4,867 4,867 4,867

Note: Reliability analyses for all variables included first and second coder during main stage of content analysis. The estimates for DQI index are based on ordinal weighting as proposed by Gwet (2014, pp. 91-92).

Valence (actor reference level) Recognition (actor reference level) Outrage (actor reference level) Responsiveness (actor reference level)
Percent Agreement .84 .97 .98 .95
Brennan and Prediger's Kappa .78 .94 .95 .92
Cohen/Conger's Kappa .74 .69 .77 .68
Scott/Fleiss' Pi .74 .65 .74 .71
Gwet's AC .79 .97 .97 .94
Krippendorff's Alpha .75 .71 .79 .64
N 3,863 3,863 3,863 3,864

Note: Reliability analyses for all variables included first and second coder during main stage of content analysis. For actor-reference level variables, coders had to code references between actors appearing in the text. A reference is identified by the text, the reference giving actor, and the referred to actor, i.e. there can only be two references for any pair of actors (one in each direction) in a given text.

Benchmark Scale (Landis & Koch, 1977)

<0.0 Poor
0.0 - 0.2 Slight
0.2 - 0.4 Fair
0.4 - 0.6 Moderate
0.6 - 0.8 Substantial
0.8 - 1.0 Almost Perfect

Material

References

Gwet, K. L. (2014). Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters (4th ed.). Advanced Analytics, LLC.

Klein, D. (2018). Implementing a general framework for assessing interrater agreement in Stata. Stata Journal, 18(4), 871-901. https://doi.org/10.1177/1536867X1801800408

Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. https://doi.org/10.2307/2529310