University of Mannheim | Mediated Contestation in Comparative Perspective project | Documentation - Item Selection - EITM-based - Thematic Classification

Online Coding

This section describes the online coding process used to code the items preselected by the topic modeling. The goal of this step was to identify a certain amount of thematically relevant items. The graphic below gives an overview of the different steps:

Online Coding

In the coding process a human coder was presented with a preselected item. This coder assigned a relevance score using a 4-point scale (not relevant, uncertain not relevant, uncertain relevant, relevant) to this item. Items that were rated as thematically irrelevant were discarded. Items that were coded as either uncertain not relevant, uncertain relevant, or relevant were subjected to a second coding decision: Here, another coder could rate the item as thematically irrelevant or relevant. If the coding decision of the second coder was consistent with the decision of the first coder the item was passed on to the pool of relevant items. If the coders disagreed in their coding decision the item was subjected to a third coder who served as a tiebreaker.

To facilitate this complex online coding process we designed, developed and implemented an online coding tool called the [Blanked] Relevance Coder (BRC).

Round-Robin-Based Binning

Classification process overview — The round-robin-based binning process.

The collected news items were partitioned into bins of a similar relevance score, from which the items were then randomly drawn and presented to the coders. The bins of items were created for each country and media type. Each bin of items had the same amount of items n_b across all of its containing sources, i.e. equal-frequency binning was used. The process could also be seen as a clustering of items with a similar relevance score for each source.

First, the list of news items collected for each source was sorted based on the assigned relevance score. Then a bin frequency n_b (in our case 2000) was introduced. After that, the generator has drawn one news item from each source in a round-robin style manner until the bin frequency was equal to or larger than n_b. If a source did not have any more items, it was skipped for that round and the following rounds. The last round was always completed, which means that the sample could be slightly larger than n_b.

The graphic shows an example for a bin size n_b = 10. The rows in the graphic represent the sources with their lists of news items (i_j with j = 1,..., n_s, where n is the number of items for this source). The list of news items is sorted by relevance score. The example bin was constructed from four sources (s₁ to s₄). The columns represent the rounds of the round-robin generator. In round one and two, items from all four sources were added to the bin. In round three, source 2 ran out of material and was skipped. Therefore items from the remaining three sources got added. The resulting bin in this example consisted of 11 items.

Finally, each list of selected items within a bin was randomly sorted.

First Coding Round

The items were grouped into packages of 20. These work packages were then deployed to a central cloud-based file storage. Each coder was presented with a list of sources and could retrieve the next package for each source from this list. Then they were presented with a coding interface, where they were able to assign codes on a 4-point scale for each item, i.e. not relevant, uncertain not relevant, uncertain relevant, or relevant. When all items were coded, the coding results were uploaded to a cloud-based database.

To reduce bias introduced by having too much material of one country and media type or for one single source coded by one coder, the interface had several rules in place.

The coders were stopped from:

retrieving more than 50% of packages from a single source.
retrieving more than 50% of packages for a country and media type.
contributing to the coding of more than 50% of the relevant material from a single source.
contributing to the coding of more than 50% of the relevant material for a country and media type.

Additionally, the interface stopped a coder from retrieving more material, if:

no more uncoded material of a source was left.
the target for the sampling of a source was reached.

For the sampling target, the desired sample size of 100 for a country and media type, was divided by the sources available for this country and media type. The target was then multiplied by 1.5, so additional items were available, in case other sources could not reach their target or items were removed during validation.

Second Coding Round

After the package coding, the coders validated each other in a validation step. The coded items of a coder, that were potentially relevant, i.e. uncertain not relevant, uncertain relevant, or relevant, were added to a validation pool. All other coders were then presented with that items and asked to code them again on a 2-point scale, as either not relevant or relevant.

The following three cases are possible after the first validation round. Both coders have

consented and have marked the item as relevant, then the item gets added to the pool of relevant items for sampling.
consented and have marked the item as not relevant, then the item gets discarded.
dissented, then the item gets added back to the validation pool and presented to a third coder.

The items in dispute were then presented to a third coder, who could again code the item again on a 2-point scale as not relevant or relevant.

When the majority of coders have marked the item as

relevant, the item gets added to the pool of relevant items for sampling.
not relevant, the item gets discarded.

Material

Results of the Online Coding [XLSX]