Page MenuHomePhabricator

Tag edits made in the WikiEditor (2010 wikitext editor)
Closed, ResolvedPublic

Description

This task involves the work with introducing a new change tag that would enable us to distinguish edits made with the 2010 wikitext editor (read: Extension:WikiEditor) from edits made with the other editing interfaces that we are currently tracking with explicit change tags. [i]

Note: currently, all edits made with the legacy wikitext source editors, along with edits made with other tools/interfaces (like Huggle and HotCat) are given a shared Other tag.

Requirements

  1. Implement a new change tag that enables people to view and filter edits made with the "2010 wikitext editor" within a Superset dashboard like: https://superset.wikimedia.org/r/768.
    • Where "edits made with the 2010 wikitext editor" refers to edits made with the Extension:WikiEditor
    • Where "view and filter edits" means people can apply filters like the following to edits tagged with this new 2010 wikitext editor tag:
      • platform
      • project_family
      • user_is_bot
      • is_reverted
      • namespace_is....
      • user_edit_count_bucket IN...
  2. The new "change tag" referenced in "1." should be named wikiedtior
  3. Within Superset, the wikieditor changed tag should be named 2010 wikitext editor

Considerations

This section contains the [known] tradeoffs we are accepting in this initial approach. The information will help people depending on data from the wikieditor change tag know what it does and does not mean.

  • Bots that are A) loading the edit page in a JS-executable environment and B) submitting edits using said page would "receive" the wikieditor tag.
  • Bots that include the edit form parameter within the URL they're using to execute a POST call (read: publish an edit) will "receive" the wikieditor tag

Open questions

  • 1. What performance metrics could introducing this new tag impact? E.g. Edit response time.
  • 2. When and how should the Editing Team evaluate the introduction of this new change tag's impact on the performance metrics defined in "1."?

Use cases

  • As originally articulated by @Whatamidoing-WMF...
    • "When I look at RecentChanges, I want to know which editing interface is being used for each edit.This can help me learn about other editing environments that I might want to try. It makes it possible for me to give accurate and relevant information (it's no good telling someone to 'click the button in the toolbar' if they've set their prefs to the 2003WTE, which has no toolbar). It will also help me understand the ecosystem, and not assume that everyone uses the same tools that I do. "
  • As a WMF Product Team that is considering making a change to one of MediaWiki's core editing interfaces, I want to know the proportion of edits that are likely to be impacted by this change, so that I can decide how and if to proceed wit making it.
    • Example: Realtime Preview and deciding about whether to impose a fixed width constraint on people using the 2010 and/or 2003 wikitext editors.

Minimal test case

Verify tag is being added as expected

  1. Log in
  2. Visit https://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-editing
  3. Verify the Enable the editing toolbar setting is enabled
  4. Open the source editor. E.g. visit https://en.wikipedia.org/w/index.php?title=User:PPelberg_(WMF)/sandbox&action=edit .
  5. Publish a change
  6. Visit: https://en.wikipedia.org/w/index.php?hidebots=1&hidecategorization=1&hideWikibase=1&tagfilter=wikieditor&limit=500&days=30&title=Special:RecentChanges&urlversion=2
  7. Verify the edit you published in "Step 5." appears

Verify tag is NOT being added as expected

  1. Log in
  2. Visit https://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-editing
  3. Verify the Enable the editing toolbar setting is disabled
  4. Open the source editor. E.g. visit https://en.wikipedia.org/w/index.php?title=User:PPelberg_(WMF)/sandbox&action=edit .
  5. Publish a change
  6. Visit: https://en.wikipedia.org/w/index.php?hidebots=1&hidecategorization=1&hideWikibase=1&tagfilter=wikieditor&limit=500&days=30&title=Special:RecentChanges&urlversion=2
  7. Verify the edit you published in "Step 5." does NOT appear

Done

  • 1. The answers to all ===Open questions are documented
  • 2. All ===Requirements are implemented
  • 3. The definition of the wikieditor tag is documented in the appropriate (to be determined) place(s) so that people depending on data from this tag know what it does and does not mean

i. We are currently using explicit change tags to track edits made with the following editing interfaces: VisualEditor , 2017 wikitext editor, and Switched from VisualEditor to wikitext editor. See: https://superset.wikimedia.org/r/773.

Event Timeline

Change 587228 had a related patch set uploaded (by Esanders; owner: Esanders):
[mediawiki/extensions/WikiEditor@master] Tag WikiEditor edits with a hidden tag

https://gerrit.wikimedia.org/r/587228

We think it is very important to gather this information, but obviously this will add a lot of tags to the database. Is that likely to be a problem?

CC @Krinkle, @Ladsgroup

change_tag table is at much better shape now and can handle such additions specially if it's for a period of time (and then we stop tagging and remove them database, like what we did with hhvm and php7 tags) but if we want to do it forever, it might cause issues specially for commons. My question here would be if you need it one-by-one case or you need statistics? If you need statistics, this can emit an event in EventLogger than we can feed it to turnilo.wikimedia.org and see the data through hadoop that has peta bytes of free space.

Or if you need it to be public and you care that you see it in one-by-one basis but you don't care much about historical data, this can be an extra column in recentchanges table that get cleaned up after a month automatically.

HTH

if we want to do it forever, it might cause issues specially for commons.

Most Commons edits are either uploads or data tags right now, right? Those two types of change not going to be affected by this, and neither will HotCat/etc. API-based gadget changes which seem to be a huge number of their changes.

Ultimately we want to be able to compare editor usage but ideally this would be with tags so it can be analysed in the same was as the other editors (VE/NWE/mobile web/mobile app). Currently in superset we can cross reference these editor tags with other metrics (namespace, user tenure, project etc)

if we want to do it forever, it might cause issues specially for commons.

Most Commons edits are either uploads or data tags right now, right? Those two types of change not going to be affected by this, and neither will HotCat/etc. API-based gadget changes which seem to be a huge number of their changes.

That's true. My point was around the fact the commons is really under stress in storage but you're right.

Since it doesn't involve API edits (gadgets and bots), I think we would be fine then but I personally prefer a more robust solution (like an extra column in RC table that has values like 'the app', 'wikitext editor 2017', etc.) but it can be done later and easily migrated.

Ultimately we want to be able to compare editor usage but ideally this would be with tags so it can be analysed in the same was as the other editors (VE/NWE/mobile web/mobile app). […]

Can you elaborate on how this relates to the Edit schema (Schema:EditAttemptStep)? Using EventLogging seems more suitable in terms of performance and scale for this kind of information.

In theory it would be the same as "saveSuccess" in EditAttemptStep, but without tying the tags to specific edits, I don't think superset would allow us to breakdown the results in the same way (user tenure, namespace etc.)

The EventLogging data in Druid (as seen from Superset and Turnilo) is aggregated and indeed doesn't easily cross-reference to other things. However, the EventLogging data is first stored in Hadoop (as seen from Hive and stat100x) where it is much richer and should allow you to do anything you need. Possibly adding a field to the Edit schema if needed.

I tried to quickly, unscientifically estimate how much extra tagging this would do.

From just a quick look at recent changes, it seems like about 50% of all edits are already being tagged. My method was to go to Special:RecentChanges?uselang=qqx (to reveal hidden tags) and search for "tag-list-wrapper". On that page right now, out of 500 edits, I see 225 with tags (some have multiple) – a lot of tools apply tags now (VisualEditor, various mobile interfaces, various gadgets…).

So it seems like the worst case is doubling the number of tags, which would be significant, but probably won't cause a huge problem instantly.

We could try harder to estimate this better, but I'd like to propose something different – how about we just deploy the patch for a short time (maybe a day or a week), then see the real effect it has, and make decisions on whether we can keep it based on that?

ppelberg updated the task description. (Show Details)

Note: I've updated the task description with – what I see as – the ===Requirements and ===Open questions that will guide this work. @NRodriguez: is there anything you perceive as unexpected in and/or missing from the task description?

Thanks Peter! It all looks great!

Is there a reason that we are opting for bucketing both the 2010 and 2003 editors into one tag?

In the future, we will consider adding additional granularity to enable us to distinguish between edits made with the WikiEditor aka "2010 wikitext editor" from edits made with the default MediaWiki editor, which is an HTML ‎<textarea>. @ppelberg to file ticket for this work before making this task as "Resolved."

(My guess is that it may be technically complex to separate them, and if that's the reason-- that makes sense. but if it indeed it is not the case, I think that would be awesome for our purposes)

Thanks Peter! It all looks great!

Wonderful. Thank you for giving it a read.

Is there a reason that we are opting for bucketing both the 2010 and 2003 editors into one tag?
(My guess is that it may be technically complex to separate them, and if that's the reason-- that makes sense. but if it indeed it is not the case, I think that would be awesome for our purposes)

Great instinct. The reason is as you described: distinguishing between edits made with the 2010 and 2003 wikitext editors is more complex.

Reason: it's difficult to reliably tag edits made with the 2003 wikitext editor because using it does not require Javascript which we depend on to assign tags. [i]


i. @DLynch: can you please correct anything I might've misstated here?

Is there a reason that we are opting for bucketing both the 2010 and 2003 editors into one tag?

Reason: it's difficult to reliably tag edits made with the 2003 wikitext editor because using it does not require Javascript which we depend on to assign tags. [i]

The patch-as-written will only tag edits made by the 2010 wikitext editor. The 2003 editor will still be left untagged.

There's a few states here to consider:

  1. 2010 editor was used.
  2. 2010 editor wasn't used because the editor had JavaScript disabled
  3. 2010 editor wasn't used because the editor has it disabled by preference
  4. A bot of some sort has made an automated edit by POSTing directly to the edit form (they'd have to get an edit token but that's doable)

The current patch will only trigger in the first case. It requires that the JavaScript added by the 2010 editor is added to the page and runs.

We could add further logging code that doesn't use JS that would catch the second case. However, telling the difference between the third and fourth case would be challenging. I don't think we can guarantee splitting the 2003 editor out from automated edits. (Though we could make it such that an automated edit would need to deliberately be hard to distinguish.)

Honestly, I'd recommend putting it off and/or getting someone to query how many people actually have usebetatoolbar = false set as a preference. The current patch gets us the 2010 editor, and anything else would just add to that.

Please make sure that if:

(a) I use the 2010 wikitext editor as my default (or the 2017 WTE or WikEd, both of which settings override the 2010WTE setting), and
(b) I post a message on someone's user talk page using Twinkle,

that this does not get marked as me using the 2010 wikitext editor. That type of edit should be marked as using Twinkle only, since I won't even have seen the page, much less used an editing window.

The patch as-written requires that you be on the editing page so javascript can run which will flag it as being the 2010 editor. That sort of edit via Twinkle, unless it's doing something really-weird-with-iframes, will still fall into the untagged set.

Thanks so much for all the thorough documentation! Excited to hear that the 2010 and 2003 editors will remain separate tags.

The patch-as-written will only tag edits made by the 2010 wikitext editor. The 2003 editor will still be left untagged.

Tagging edits made with the 2010 wikitext editor while leaving edits made with the 2003 editor untagged is a worthwhile improvement.

As a result, I've done the following:

  • Created T295340 as a follow-up to tag edits made with the 2003 editor.
  • Edited the requirements in the task description to reflect that we will not be tagging edits made with the 2003 editor (aka the "default MediaWiki editor, which is an HTML ‎<textarea> ) as part of this ticket

Before moving this ticket to "Ready to be worked on," @DLynch would it be accurate for me to think the following?

  1. As the patch is currently written, all bots that make automated edits (e.g. Twinkle as @Whatamidoing-WMF raised in T249038#7437961) will NOT be tagged as having been made with the 2010 wikitext editor because the change tag https://gerrit.wikimedia.org/r/587228 implements will only be applied to edits made directly with the 2010 wikitext editor's interface (read: the editor is loaded within a browser window)
  2. Bots that make automated edits by POSTing directly to the edit form will be tagged as having been made with the 2010 wikitext editor because they are ostensibly making edits directly with the 2010 wikitext editor's interface. Note: this is case "4." T249038#7433589.
  3. While important to consider, the second and third cases/states you mentioned in T249038#7433589 are not necessarily applicable in this context because here we are concerned with tagging edits people make and publish with the 2010 wikitext editor and these two cases deal with scenarios where people would not have had an opportunity to use the 2010 wikitext editor.

i. Cases:

  1. 2010 editor wasn't used because the editor had JavaScript disabled
  2. 2010 editor wasn't used because the editor has it disabled by preference

(I'm assigning this task over to @DLynch to address the questions in T249038#7491317)

The new "change tag" referenced in "1." should be named 2010 wikitext editor

Note that the tags are currently based on the software names, not the names that we (the WMF) have used to help disambiguate the editors; so what we call the 2017 editor is actually tagged as visualeditor-wikitext. The extension that we call the 2010 editor is wikieditor so that is what the tag is called in the current version of the patch. It would probably be odd to call the tag 2010 wikitext editor as nowhere in the WikiEditor extension is there any reference to this name.

When the tags are imported into SuperSet the tags can be renamed to their WMF names (as happened with the 2017WTE label).

  1. As the patch is currently written, all bots that make automated edits will NOT be tagged as having been made with the 2010 wikitext editor

Yes, assuming those bots are just making direct API calls to do their edits.

  1. Bots that make automated edits by POSTing directly to the edit form will be tagged as having been made with the 2010 wikitext editor

Maybe. If they're literally just doing a POST to the appropriate URL, they'll only be tagged if they deliberately include the form parameter that triggers the tagging. (Which they could.) If they're actually loading the edit page in a JS-executable environment and then submitting it then they would get tagged.

  1. While important to consider, the second and third cases/states you mentioned in T249038#7433589 are not necessarily applicable in this context

Probably, yes. I just wanted to call out the possible scenarios.

The new "change tag" referenced in "1." should be named 2010 wikitext editor

Note that the tags are currently based on the software names, not the names that we (the WMF) have used to help disambiguate the editors; so what we call the 2017 editor is actually tagged as visualeditor-wikitext.

This is information I did not know until now; thank you for sharing it, @Esanders.

The extension that we call the 2010 editor is wikieditor so that is what the tag is called in the current version of the patch.

Understood. Keeping the tag called wikieditor sounds good to me; I've updated the requirements in the task description to reflect this.

When the tags are imported into SuperSet the tags can be renamed to their WMF names (as happened with the 2017WTE label).

Noted. I've added this steps as a requirement to the task descriptions === Requirements section

  1. As the patch is currently written, all bots that make automated edits will NOT be tagged as having been made with the 2010 wikitext editor

Yes, assuming those bots are just making direct API calls to do their edits.

Noted.

  1. Bots that make automated edits by POSTing directly to the edit form will be tagged as having been made with the 2010 wikitext editor

Maybe. If they're literally just doing a POST to the appropriate URL, they'll only be tagged if they deliberately include the form parameter that triggers the tagging. (Which they could.)

@DLynch: this is helpful context. I've added the cases you're describing above to the task description's newly-created ===Considerations section. Can you please review that section to make sure it is accurate?

If they're actually loading the edit page in a JS-executable environment and then submitting it then they would get tagged.

Understood. I've also added this note to the task description's newly-created ===Considerations section.

  1. While important to consider, the second and third cases/states you mentioned in T249038#7433589 are not necessarily applicable in this context

Probably, yes. I just wanted to call out the possible scenarios.

Change 587228 merged by jenkins-bot:

[mediawiki/extensions/WikiEditor@master] Tag WikiEditor edits with a hidden tag

https://gerrit.wikimedia.org/r/587228

The edit I published (using the 2010 wikitext editor) at 15:39 shows up as seen in

Screenshot 2021-12-23 at 16.40.36.png (608×2 px, 260 KB)
while the later edit at 15:41 does not show up in the recent changes filtered by #wikiedtior
Screenshot 2021-12-23 at 16.49.48.png (278×1 px, 41 KB)
as I disabled the Enable the editing toolbar setting option before the edit was made. See

Screenshot 2021-12-23 at 16.42.18.png (198×1 px, 29 KB)

The tagging has resulted in slow queries on enwiki (T298225), and querying for the 'wikieditor' tag via Special:RecentChanges or otherwise is disabled for the moment (it will always return 0 results). The tags are still being added, and can be queried in replica databases. We'll solve this better in January.