Page MenuHomePhabricator

Make a SpecialPage to show stats on blocked IP (ranges) that attempt to edit
Open, LowestPublic

Description

In order to evaluate collateral damage caused by IP blocks, with emphasis on range blocks, it would be nice to access information on blocked users who click the edit button.

(Information being the number of times the particular IP/range has attempted to edit)

Event Timeline

Rjd0060 raised the priority of this task from to Needs Triage.
Rjd0060 updated the task description. (Show Details)
Rjd0060 changed Security from none to None.
Rjd0060 added subscribers: Rjd0060, Snowolf, Legoktm and 4 others.

I imagine this is WMF Analytics' area?

I imagine this is WMF Analytics' area?

I don't believe. To clarify it would be helpful for users (or some subset of users) to be able to view this information in real-time so that blocks may be adjusted in order to prevent as much collateral damage as possible.

Krenair renamed this task from Generate stats on blocked IP (ranges) that attempt to edit to Make a SpecialPage to show stats on blocked IP (ranges) that attempt to edit.Dec 17 2014, 11:57 PM
Aklapper triaged this task as Lowest priority.Dec 19 2014, 3:04 PM

@Rjd0060 Why did you add Wikimedia-Site-requests to this? AFAICS, this is about adding a new MediaWiki special page, not about changing a configuration on a Wikimedia wiki.

If there are concerns about identifying people against ranges, I am wondering whether something like this may be able to be actioned through <-> like abuse filters. They have a level of privacy, you have either the username or the IP address (depending on whether logged in or not) and the page that was attempted to be edited. A checkuser could always be undertaken to identify the specific range.

I think it would not be too difficult to set up some monitoring up by emitting events to Prometheus and charting them in Grafana. We'd use the block ID as the key (and maybe concatenate composite block IDs with an underscore?) and increment the counter for each edit attempt block due to IP/range block types. There's already a central place for recording this in WikimediaEvents BlockUtils.php.

While it's not as convenient as a special page (or being able to see the data directly alongside the block in Special:BlockList or via an API query) it would be better than nothing, IMHO.

That said, per T78840#1201560, I imagine we would need a legal/privacy review before we allowed public access to blocked edit attempts associated with an IP or IP range.

If we want to record this data in a MediaWiki managed DB table so that it's visible to only certain user groups, that will be more difficult, due to the scale of daily blocked edit attempts due to IP/range blocks.

If the number of blocked edit attempts is huge but the number of blocks is not that huge, maybe it could live in memcache? Memcache keys can get evicted so it's not exactly reliable but maybe better than nothing and simple to set up.

I suppose the heavy-handed approach would be to aggregate event data in the standard data pipeline (Spark etc) and write it into some database (Cassandra?) which has a thin web API to expose it to MediaWiki which can apply permission management. Seems way more effort than worth it.

If the goal is just to track very busy blocks, you could also take a sampling approach where MediaWiki increments some DB table with a 0.1% chance on each blocked edit.
Or maybe even adaptive sampling where if the block currently has N hits you sample with a roughly 1/N rate (e.g. between 0-9 hits no sampling, 10-90 hits you sample with 10%, between 100-900 hits 1% sampling etc). That requires loading the block hit counter before doing the sampling, but the block itself already needs to be loaded/cached so not too much difference there.