Page MenuHomePhabricator

protection level propagation
Open, Needs TriagePublic

Description

Problem:
When contentious topics show up in the news vandalism on the Wikipedia articles related to it usually shows up pretty quickly. The article then might be protected for some time in order to prevent more vandalism. A pattern we are seeing now is that people then move over to Wikidata and continue their vandalism there. (This might then in turn lead to vandalism showing up in the article anyway if it uses the data.) This especially happens when an infobox has "edit on Wikidata" links or something similar.
We need to find a way to make this attack vector less of a problem.

Possible solutions:

  • Find a way to protect the Wikidata automatically if at least X articles in the sitelinks section are protected.

BDD
GIVEN
AND
WHEN
AND
THEN
AND

Acceptance criteria:

Open questions:

Event Timeline

Not necessarily contentious topics showing up in the news. These days we're experiencing an unstoppable wave of vandalism over common entities with the aim of playing, not only with Wikipedia, but with all kinds of tools and services based on Wikidata. Vandalizing Wikidata has become a popular game, sometimes motivated by memes, with many cases grown from social media, especially having Google and Siri as targets. Today the former chair of Wikimedia Chile and former Executive Director of Wikimedia Argentina suggests blocking all unregistered Wikidata users in Chile as the best solution to stop vandalism in his country and the resulting bad press. Last week I semiprotected all the current sovereign states after Iraq's label was changed to "Iran". Some complaints were also raised because of this apparently trivial change, which was soon propagated to the Wikipedias, but its reversion was not, leaving lots of biographies with a nonsensical cause of death.

I don't think this is just a trend, there's no reason to think vandalism will stop at some point unless Wikidata stops being used in the Wikipedias, in Google, in Siri and in other tools, something that we obviously don't wish.

This isn't a small feat nor a pleasant measure, but maybe the best solution is to force users to register and complete a brief tutorial (for example, some instructions and a couple of test edits on certain items with low impact) before editing. This tutorial would make some vandals abandon their now-less-funny aims while it would help good-faith users. At this point I don't think the community can deal with anonymous edits on highly interlinked entities any longer. @Lydia_Pintscher, I'd love to read your thoughts on this and know if you see this as a solution or would like different measures to be applied. We might also want to expose the problem on the project chat and/or via RfC soon.

This isn't a small feat nor a pleasant measure, but maybe the best solution is to force users to register and complete a brief tutorial (for example, some instructions and a couple of test edits on certain items with low impact) before editing. This tutorial would make some vandals abandon their now-less-funny aims while it would help good-faith users. At this point I don't think the community can deal with anonymous edits on highly interlinked entities any longer. @Lydia_Pintscher, I'd love to read your thoughts on this and know if you see this as a solution or would like different measures to be applied. We might also want to expose the problem on the project chat and/or via RfC soon.

Obviously the current situation can't go on so we need to come up with smart ways to improve the situation like the one proposed here. I think only when we've explored enough other options should we resort to closing off the project more. So I don't oppose it completely but it'd pain me to do it without trying a lot of other things first. More ideas welcome for what those things could/should be.

It can be easily understood from my wording that the measure I expose would mean closing off the project more, but I don't think that's the case. I think the obligation to register doesn't reduce the set of people from whom we accept contributions, or at least we should consider that all administrative actions we currently have to apply reduce this set of potential contributors more significantly.

As an alternative, I can only think of forcing users to register but just to edit labels, descriptions and aliases (T189412 is related, though not the same). According to the data (2015) from "Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis", 95% of the vandalism on labels/descriptions/aliases was carried out by unregistered users, and only the remaining 5% was carried out by registered users. Considering the total number of bad edits by unregistered users, 63% were made on labels/descriptions/aliases, 31% on sitelinks, 5% on statements and 1% on "misc". I continue thinking that forcing users to register to edit anywhere would be a better approach, especially because these values would change dramatically, but also wanted to mention an alternative (or... well, half-alternative).

Maybe some sort of rate limit/captcha on high-visibility items?

Either that or do pending changes, but the latter would definitely need community consensus and could be controversial.

Related IMO

if we want to do this in some automated way as described in the current description then we should look at what different values of X would mean to wikidata items right now.
eg. given 4 sitelinks are protected on a client site, protect the wikidata item, how many items would that mean get protected?

Maybe a first step for this, or a positive step in any case, would be to let humans know which Wikimedia pages are protected directly from its corresponding Wikidata entity. This might consist on including an icon like https://commons.wikimedia.org/wiki/File:Semi_protect.svg next to each interwiki. What do you think?

Yeah that could be a first step indeed.

Is there any evidence of a link between protected Wikipedia pages and vandalism in Wikidata? The sample of edits I looked at showed no correlation (I checked 10 items vandalised by IPs from Spanish-speaking countries and 9 of them were not protected on the Spanish Wikipedia). That's a very small sample, but I'm not aware of any other data.

I do think it's a good idea to hide links for editing Wikidata on protected Wikipedia pages. but automatically protecting Wikidata items based the number of protected sitelinks seems like something more complicated that won't necessarily be very effective.

if we want to do this in some automated way as described in the current description then we should look at what different values of X would mean to wikidata items right now.
eg. given 4 sitelinks are protected on a client site, protect the wikidata item, how many items would that mean get protected?

Would you be able to do some stats for that? (I guess number of protected pages linked to an item and how many items there are with that number)

Maybe a first step for this, or a positive step in any case, would be to let humans know which Wikimedia pages are protected directly from its corresponding Wikidata entity. This might consist on including an icon like https://commons.wikimedia.org/wiki/File:Semi_protect.svg next to each interwiki. What do you think?

I like that idea.

What I think we really need is a page protection level which doesn't accidentally break things and create weird behaviour for other wikis. Semi-protection can prevent page moves/deletions from being reflected on the Wikidata item (breaking the interwiki links for the page) and stop people from adding interwiki links. If we had a protection level which didn't do that, we could safely protect a lot more highly visible items (e.g. any item used more X times by other projects) regardless of whether any corresponding Wikipedia pages are protected.

@Lydia_Pintscher not sure this is really a campsite task? Should probably be tackled as a hike or trailblaze?