Page MenuHomePhabricator

Accept more coordinate formats (e.g. 5°30') in VisualEditor map dialog
Open, Needs TriagePublic

Description

I would like to use all common formats in the dialog:

  • 51.5432
  • 51.5432°
  • 52° 30.45
  • 52° 30.45'
  • 53° 30' 30.78
  • 53° 30' 30.78"

Including:

  • Taking care of localization/internationalization (decimal point, comma etc., possibly even other number symbols)
  • Optional whitespace
  • With different possible Unicode characters, especially for the ' and " characters
  • With or without the last character
  • Relevant to decide what formats need to be supported: https://en.wikipedia.org/wiki/Wikipedia:Obtaining_geographic_coordinates

Out of scope:

  • Precision adjustment

Related:

Open design questions

  • How to display the normalized output (which will be visible in the wikitext)?
  • How to display errors and what is the error text for malformed inputs which are still not accepted?
  • Switch to one input field?

Event Timeline

If you implement this, please put the code in MediaWiki core (or Kartographer), so that it can also be used in UploadWizard.

We pulled this into our current WMDE-TechWish-Sprint-2022-02-02 to move this forward, focused purely on a technical investigation. For example:

  • Can we reuse existing code from Wikidata?
  • Can we find other FOSS libraries that do something similar? Maybe not to pull it in as a dependency, but at least to learn from it.
  • Should it be a service, or is it fine to do this exclusively in the user's browser?

Change 761323 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/Kartographer@master] [WIP] Accept more coordinate formats in VisualEditor dialog

https://gerrit.wikimedia.org/r/761323

Change 762834 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/UploadWizard@master] Several improvements to coordinate parsing code

https://gerrit.wikimedia.org/r/762834

We just decided, that we want to avoid "magic" normalization happening at the frontend. Instead the wikitext part should be relaxed as possible and Maps (Kartographer) should normalize in the backend. See the subticket created.

While I understand the idea, it leaves a series of unsolved issues behind:

  • I assume the word "backend" refers to making e.g. <mapframe latitude="50° 30'" … /> work. There is already a ticket for this, see T129713: Allow human-readable coordinates input.
  • I would like to point out that it's still the same "magic" with the same issues, no matter where it happens.
  • How should the user paste 50° 30' 10" into latitude="…" when both the single as well as the double quote character would create a conflict?
  • How can we make the map dialog in VisualEditor understand these formats and show the position on the map as a live-preview? This would require either some kind of reusable parse API, or duplicating the parser in the frontend (i.e. merging https://gerrit.wikimedia.org/r/761323).
  • From an UX perspective a parser in the frontend can be much more relaxed because the live-preview allows the user to immediately see the result. A parser in the backend should not do this, but behave a lot closer to the proven "garbage in, garbage out" rule.
  • A change to the behavior of the <mapframe>/<maplink> tags is effectively a change to what is allowed (and persisted) in the wikitext. This can not be changed later without potentially breaking existing usages, and must be a lot more robust because of this. Questions like "should weird stuff like 50 30.5 10.5' be allowed?" or "what if something with an N is wrongly posted as longitude?" become critical right away. To be safe the backend would need to reject everything that's vaguely ambiguous – dramatically reducing the usefulness of the feature. The frontend could do a lot more guesswork.
  • Following this argument it seems both is needed: The backend can accept a few very well defined formats. The frontend can accept a lot more, parse and re-format it into one of the formats the backend accepts. But when we go this route, why not simply make the frontend re-format everything into the same decimal format?

I suggest to discuss this again in story time.

I suggest to discuss this again in story time.

Sounds good! Seems like there is a lot here to go through.

I would like to point out that it's still the same "magic" with the same issues, no matter where it happens.

The difference between these types of magic is significant, though. What we realized during the last conversation was that a backend magic can be completely transparent to the user. If the best reference for an entity's coordinates provides the deg/arcmin/arcsec format and an editor copy-pastes into the wikitext, then the coordinates keep this same form even if they are normalized internally by Kartographer. The next editor can easily verify the coordinate because it's in the same format as the reference. I believe that the frontend magic we've been discussing would store a normalized coordinate (decimal lat/lon), which is different than what the user inputs. Doing anything else here (like normalizing on the frontend but storing the non-normalized input in the source) is possible, but would add a lot of complexity which we haven't considered yet.

Backend magic is more consistent with how content is usually treated. For example, here's a template for weight, which accepts various archaic units and shows metric equivalents.

I like your suggestion that the frontend could do more guesswork, but maybe you can give a concrete example?

I tried to explain it a bit better below, mostly for future reference and interested parties to read. Discussion should probably continue in story time.

The magic code is the same, no matter where it lives. One difference is if the user can see parts of the magic happening, and influence it if necessary.

An extreme example is when someone inputs "01/02/2022" as a date. Being "completely transparent" – as I understand it – means we store this as is. This allows us to proceed without the need to make a (possibly wrong) decision: is this January 2nd or February 1st? Problem: When to make this decision? Based on what (additional) information? Can we leave the system in an undecided state for a while? Is it ok to make a quick and possibly wrong decision right now, and possibly change it later? It doesn't look like this would work for the use case we talk about here. The backend can only accept unambiguous formats. Otherwise we would store content that can't be consumed by 3rd parties without using the same code as we do (which is a problem we actually have, but don't want to make worse).

Doing what's ultimately guesswork as a separate step in an UI with a live-preview – as seen e.g. on Wikidata – puts that last, critical decision into the hand of the user that (hopefully) has additional information. Bonus points for recording that additional information. That's what Wikidata's qualifiers and references do.

Templates are not "backend", as far as I'm concerned.

Change 762834 merged by jenkins-bot:

[mediawiki/extensions/UploadWizard@master] Several improvements to coordinate parsing code

https://gerrit.wikimedia.org/r/762834