Wikipedia:Wikidata/2018 State of affairs

This page is intended as a preparation for a sitewide RfC about the role of Wikidata on enwiki. Before an RFC can be had, it seems like a good idea to list a few things here. The section on "uses" should be pretty straightforward: the sections on benefits and disadvantages should be somewhat factual (no "I love it" or "I hate it"), but not necessarily 100% objective ("Wikidata is easier to edit / harder to edit" can both be valid points of view); please don't remove entries from either section unless they are patently unhelpful or untrue. If necessary, you can always use Wikipedia talk:Wikidata/2018 State of affairs ;-)

Current uses of Wikidata on enwiki

edit

Mainspace

edit

Data use

edit
  • Lists: Some lists were generated by ListeriaBot which draws data from Wikidata (e.g. List of female Egyptologists), and all changes to the lists made in enwiki were overwritten when the bot ran. Wikidata data control has been disabled. Technical support in Wikipedia for list generation is tracked at phab:T67626 (the designs are not final so it is unknown whether this support will have the same flaw as ListeriaBot).
  • Subheads or mini-leads:
    • In mobile view, the Wikidata description is inserted as an italicised subhead (e.g. London in the mobile view of German Wikipeda has the subhead "Hauptstadt des Vereinigten Königreichs" (Capital of the United Kingdom)). It isn't visible to desktop users and if one wishes to edit it, it is not obvious that it comes from Wikidata. (This has suffered from vandalism -- for example, in mobile view Social history of viruses used to be shown with the subhead "essay writing for deadly diseases". Editing of descriptions from mobile is tracked at phab:T90765). This use of Wikidata was rejected by en-WP and this does not appear on mobile webviews of en-WP)
    • In the android and iOS apps, the Wikidata description is also inserted as an italicised subhead
  • Templates: Some templates and their fields provide Wikidata. Sometimes this data must be explicitly requested in the article (this is called "opt-in") and sometimes this data is populated automatically subject to a Wikipedia override (this is described as "opt-out" data).
  • Inter-language links: Inter-language links are provided through Wikidata but can be overridden using local links
  • Navigation: used widely as a brief description of the contents in search results
  • Related pages: wikidata descriptions are used for the suggested links at the bottom of articles in mobile web and the apps.

Wikidata categories

edit

Many articles have one or more hidden tracking categories generated by templates used in them. These categories range the following usually:

A value differs between Wikidata and Wikipedia
Indicates Wikipedia or Wikidata pages which may need to be updated
A value is the same between Wikidata and Wikipedia
Indicates Wikipedia pages which could have data removed in favor of Wikidata, or which could be left alone as they work as is
No value is present in Wikipedia
Indicate Wikipedia pages which do not have data
No value is present in Wikidata
Indicates Wikidata pages which could be updated based on the Wikipedia data

Links to a Wikidata entry

edit

Instead of bluelinks/redlinks, some articles link some terms through an interwikilink to Wikidata, which is slighlty paler blue than a true bluelink.

Module WikidataIB: Values fetched from Wikidata using Module:WikidataIB where there is no corresponding article on the English Wikipedia have a link to the Wikidata entry with a marker and tooltip ( ), rather than a redlink:
At the end of a field there can be an "edit icon" ( ) with a tooltip and a link to the corresponding statement in the subject's entry at Wikidata. This can be disabled like this:
The documentation for Module:WikidataIB explains the basics, but is not yet updated with the |noicon= and |onlysourced= parameters.

Filtering returned values from Wikidata

edit

If you paste {{#invoke:Sandbox/RexxS/WdRefs|seeRefs}} into any section of an article and preview it, you'll see all of the statements held on Wikidata for that article, along with the references (if any) for each statement. As of 2017, about 50% of the statements are unreferenced, 25% are referenced to Wikipedia, and 25% are referenced to other sources (chart). However, it is frequent to find links to external sources in the Wikidata item itself that might be used to verify the information.

Module:WikidataIB allows an article to set a filter which rejects any values not sourced to something better than "Wikipedia".

At present the rejected sourcing is none or "Imported from ... Wikipedia". Others could be added on request, although for rare cases, using a local value is easier.

Other namespaces

edit

Visibility of a Wikidata change

edit

If a Wikidata item is used in a Wikipedia page in some way, either directly or via a transcluded template, then changing it will change the Wikipedia article. An editor who has their watchlist set to show Wikidata edits will see the change appear on their watchlist, except that:

  • If the Wikidata item is used in a transcluded template and not in the article, then the change will not be visible unless the template is also watched.
  • If the last edit to the Wikidata item was minor, then it will not appear in the watchlist; instead the previous edit appears in the watchlist.

The article history will not show edits to Wikidata that affect the article.

Perceived benefits of using Wikidata on enwiki

edit
  • Easier to connect enwiki pages to correlated pages on other projects.
    • I've been using it to connect copyright license templates on Commons to enwiki's, making copyright cleanup easier.
  • Wikidata is easier to train new users
    • No knowledge of wiki-syntax is necessary
    • Citations are more robustly added
    • There is more "auto-fill"
  • Where implemented in templates, can reduce size of a page's wiki-text; and lessen requirement to understand how to template data and parameters. Example is the use of {{authority control}} to display data with a parameter-less template
  • A newly created article could use a Wikidata-aware infobox to help provide an overview of the key facts available on that subject in other Wikipedias at a glance.
  • Whatever new fact we add to en-wp can be made available to all the other Wikipedias and vice-versa.
  • Potential of creating more dedicated, focused and easier understood edit interfaces for data.
  • Using data for visualisations, such as this graph of the number of painting items per 10-year bucket on Wikidata. This can be used for all sorts of Wikiprojects and checks for various lists.
    • We are almost at 200,000 painting items on Wikidata, but English Wikipedia has been illustrated with way more than that. Hopefully they will all be added to Wikidata soon. Meanwhile, subsets of paintings can also be shown per painter or collection, so e.g. of all the paintings by Rachel Ruysch currently on Wikidata with location information, we can show them on a world map here.
    • About cycling, the algorithm Cycling race is able to work in around 25 different Wikipedias always with the same calls, and have nine functions about infoboxes and tables. An obligation of sourcing was introduced at the creation of the Wikidata Cycling project.
  • Lessen requirement to manually maintain interwiki links, and to also prevent false or broken links. An example is {{Wikisource author}} which checks whether the link actually exists; and when a target page is moved at enWP or at enWS then pages automatically sync to new locations.
  • Data that are used across many articles, such as population data or current representatives / ruling parties etc can be updated en masse in far fewer steps using Wikidata, negating the need to update each Wikipedia article individually.
  • It makes editing article text easier as complex template calls (e.g. Infoboxes) that normally occupy the first screen of wikitext, with article text hidden below, are moved out of the wikitext and into wikidata.
  • It's easier for bots to enter data into Wikidata then to write it directly into Wikidata. This makes it easier for various stakeholders to interact with Wikimedia. A representative of the U.S. Census Bureau noted that they are interested in contributing data directly to Wikidata. With StrepHit and WikiFactMine we have automatic text-mining that can bring data in a new way into Wikimedia.
  • By cross-checking and merging info with Wikidata, we can identify and fix anomalies in that info (e.g., incorrect or vandalised birth or death dates).

Perceived disadvantages

edit
  • Wikidata does not have policies equivalent to WP:V and WP:BLP. To the extent that content in Wikidata does not comply with the Wikipedia 'policies it cannot be used in Wikipedia. See 2018 RfC close.
  • Very significant differences in both editing and community standards between projects. It is difficult enough to follow enwiki behavioural and editorial expectations. Given the sometimes-fractious interactions between enwiki and Commons, there is plenty of reason for enwiki editors to be very hesitant to start editing (and in particular removing) data on Wikidata.
  • Changes there influence content here, but don't appear in any meaningful way on our watchlist (checking the "Wikidata" box) and not at all in the enwiki page history
  • Wikidata is not as inviting to untrained new users as Wikipedia
    • Distracts from enwiki's efforts to be welcoming
    • Conceptually any database is more difficult to understand or describe as compared to an encyclopedia
    • Not much culture of conversation between enwp and Wikidata communities, as compared to for example enwp and Commons
  • Wikidata is built for software, not for people. Wikidata's "structured data" fields can make it difficult or impossible to enter information. If reliable sources say someone was "born in 1872 or 1873", or if a time-lapse image has a date of "March 1, 2016 - July 1, 2016", Wikidata hides the save button when you try to enter that information. Wikidata makes it impossible to save, and the user is left confused with no visible way to proceed.
  • Fear of something new
    • Its interfaces are in development and subject to change, so knowledge gained about using it could become obsolete
    • Wikidata's limitations not well documented in any popular layperson summaries
  • Patrolling on enwiki is not integrated with patrolling on Wikidata
    • Many people understand and trust enwiki's quality review process
    • Many people doubt Wikidata's quality review process
  • Fear of corruption
    • Data management is a major corporate sector - should we be cautious moving in this space to keep it community managed?
  • Wikidata content is of variable reliability. By its nature (scooping data from multiple wiki-projects) it cannot be treated as a reliable source given the differing standards across all wiki-projects (and wiki-data's own requirements).
  • Combined with above points, this means that article's infoboxes can change to include invalid data, sourced solely to Wikipedia, and the only way to know about it is to physically look at the page- and since it's not "vandalism", no wikidata recent changes patroller will ever stop it.
  • Parts of an article needs editing here, parts of an article need editing there, in a completely different editing environment
  • Different notability and sourcing standards
  • Added maintenance to keep things the same on both environments
  • Layout generated on wikidata may violate our guidelines (see e.g. wikiproject cycling discussion)
  • Increased complexity for users trying to correct an error in the article; no realistic way to do so without going to another project, if the information to be edited is not actually on enwiki but is instead represented only by a "property". Particularly confusing in infoboxes where Wikidata is used to fill parameters.
  • Increased complexity for new users, particularly if they wind up on another interface.
  • Concerns of circular sourcing. Wikidata's indiscriminate bot sourcing from Wikipedia, from other unreliable sources, and mass import-export with other databases gives false or unreliable information a false appearance of authority. It can become difficult or impossible to trace the origin of claims.
  • Authority Control: in theory a good thing, we get a lot of links to authoritative sites which we didn't have before. Apart from the name (no "authority control" is being done, but the name seems to suggest that a check with or by these sites has been done), the template often introduces links which we would not allow in External links or Further reading on that same article, as they offer nothing additional and have little to no extra information. I'll list some examples at the talk page here.
  • We can block links from being added to enwiki (through the blacklist); what if these links are inserted from Wikidata (in infoboxes or so)?
  • We can block or note certain kinds of edits through edit filters; but not when the same kids of edits are being made in Wikidata and then automatically displayed here
  • Edits via Wikidata bypass page protection, and allow blocked users to evade their block.
    • If we can't stop unwanted edits or editors, we have a problem, and we open up a new way for vandals, spammers, POV fighters, ... to sneakily change enwiki
  • A lot of added visual clutter in e.g. infoboxes (see George Auriol for an example)
  • Wikidata is unstable and its governance is immature, and the en-wiki community puts articles at risk when it commits to using any given WD field in an article. For example, consider field definitions and the use of bots. Wikidata field definitions can be changed by any editor, without discussion, (example) and as far as I have seen, there are no processes for controlling that or even discussing it in any centralized place where database design issues are considered. In addition, users at Wikidata run bots that sweep data from sources into targeted fields in Wikidata, and while there is a process to authorize such runs, the discussions I have seen are cursory and rubberstamp-like. (example of the results of such a bot run were discussed at WT:MED, here - what happened was that entries from a database that records in vitro of activity of chemicals, were swept into a "drugs used to treat" field in WD entries about diseases, leading to nonsense). Each of those two things are bad enough, but consider them together. If the en-wiki community agrees to accept field X being used in certain infoboxes, that field definition can be changed and entirely new (and incorrect) data can be swept into that field by bots throughout Wikidata, and then pass through to many WP articles.

Prior and current discussions about the use of Wikidata

edit

Interwikilinking

edit

General

edit

Topical

edit

Article-specific

edit
Talk:List of women linguists

Template-specific

edit