Jump to content

Talk:Community Wishlist Survey 2022/Generate Audio for IPA

From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by TheDJ (talk | contribs) at 17:02, 17 September 2023 (→‎Please scrape this "Phonos" immediately: Reply). It may differ significantly from the current version.

Latest comment: 10 months ago by TheDJ in topic Please scrape this "Phonos" immediately

Project Announcement and Feedback

Contributors who engaged with this Wish's proposal

Rollo Rosewood Akathelollipopman Eptalon Noé Xavier Dengra Akathelollipopman Noé Pigsonthewing Ainali Modest Genius Pigsonthewing 1234qwer1234qwer4 Nachtbold Xaosflux Femkemilene Wskent Bischnu Akathelollipopman Vis M Yodin Matě MrMeAndMrMe UV Daud I.F. Argana Huji Sdkb Ottawajin Lectrician1 Tmv Tranhaian130809 Celerias Meiræ Spiros71 NguoiDungKhongDinhDanh Javiermes Aca Dexxor Ed6767 Lollipoplollipoplollipop Omnilaika02 ToBeFree


Thank you for all of your feedback and for engaging with the original proposal for this wish. I wanted to make you aware that we have begun our work on this wish and, if your capacity allows, we would love any input you have on our Open Questions as well as our initial investigations into the engines.

Here's a corpus of IPA audio we have tested. Please let us know if you have any words you would like to test in this testing corpus. We will work on adding those words to our corpus!
Here's technical investigation of the IPA options and the languages supported by each option.


Thanks again for engaging with this impactful wish and for participating on the wishlist.
Best, NRodriguez (WMF) (talk) 18:01, 20 May 2022 (UTC)Reply

Contributors who engaged with this Wish's proposal

Nw520 Pelagic Wostr Gusfriend Ali Imran Awan TheInternetGnome Minorax Man77 NightWolf1223 HynekJanac L235 Libcub Teratix Penalba2000 JAn Dudí Lrkrol Sadads Bencemac Mbkv717 Stwalkerster Dave Braunschweig Trey314159 Labdajiwa Thingofme Pppery Hià Paradise Chronicle Serg! Camillu87 Geertivp Amorymeltzer Aimwin66166 Rotavdrag Paucabot WikiAviator Daniel Case Wutsje Ninepointturn Bilorv Pi.1415926535 DarwIn Feoffer Tomastvivlaren Kpjas SD0001 Lambsbridge Paul2520 Waldyrious Bestoernesto Michael Barera Vulphere Ericliu1912 Emaus KnowledgeablePersona Beta16 Bodhisattwa Pbsouthwood DaxServer Cybularny Quiddity Sunpriat Gaurav Jl sg Evrifaessa Valerio Bozzolan Brainulator9

NRodriguez (WMF) (talk) 18:08, 20 May 2022 (UTC)Reply

Open Questions

Can you help us build out the corpus of IPA words we will use to test the different libraries?

  • Has any tonal languages been included? I don’t think I see Swedish or any Chinese language, for example, but maybe there are some tonal languages in the corpus that I don’t recognize. Also, is the current corpus including unusual consonants or vowels? I have tested eSpeak myself and know that it cannot handle Cantonese (it cannot pronounce the syllabic m; I tried to figure out how to fix it but there’s really no documentation). Al12si (talk) 14:44, 12 November 2022 (UTC)Reply

Do you know of any open source libraries that we should consider while we investigate our options?

Do you see any risks to introducing the video files inside the reader experiences?

  • "Video"? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:21, 27 May 2022 (UTC)Reply
    I believe this is regarding the software extension used to play media files. There's a specific task for making the player display in a desirable way, at phab:T122901 (versus the full audio-player as currently used at d:shibboleth, or the icon+"listen" links as used at w:Shibboleth).
    The only risk I see is making sure the design is good: I.e. everyone (incl. screenreaders?) can access the audio-clip without leaving the page, but also still have access to the file/license info if desired. (@TheDJ:FYI) HTH. Quiddity (talk) 17:27, 27 May 2022 (UTC)Reply
  • I think the main issue with this feature is that it could display a false standard accent, making English projects sound more USA-centred, French projects sound more France-centered, Spanish projects sound more Madrid-centered and so on. A scripted sound can be prototypical, with approximate sounds for each consonants and vowels, an audio can't, audio fixes one version, with subtile traits such as length, highness, openness of vowels, pitch and others. There is no generic or neutral pronunciation. One way to deal with this issue may be to display several audio for each IPA, with regional distinctions. In addition with a preset for users to have in first their own local use, it may be interesting and less oppressive. Anyway, I am interested by this feature and I really hope you will make your UX tests public -- Noé (talk) 15:55, 7 November 2022 (UTC)Reply

Let us know any other thoughts you may have on the initial problem statement...

The Wikivoyages have phrasebooks. They don't use IPA – see voy:en:Wikivoyage:Phrasebook article template#Pronunciation guide for the English version; the other languages are similar – but it might be a useful source of words, and it's possible that getting IPA-based audio would encourage people to add IPA there. In the past, we've talked about both the value of IPA to some readers and need for audio (specifically, being able to hear the IPA without loading another page or covering up the text you're reading). Whatamidoing (WMF) (talk) 18:15, 30 May 2022 (UTC)Reply

Google Cloud dependency?

Is it the case that this feature is dependent on closed-source software in the Google Cloud, or is it independent and self-hosted? HLHJ (talk) 16:56, 15 October 2022 (UTC)Reply

Currently, yes. The open source solutions we found only supported a handful of languages, and didn't sound remotely as accurate as Google's TTS service. Rest assured this all done through the backend, and even then through a proxy, so no user data ever gets to Google. Longer-term we hope to switch back to open source once language support and quality is good enough. That is being tracked at phab:T317274. MusikAnimal (WMF) (talk) 03:13, 17 November 2022 (UTC)Reply

Schedule

@MusikAnimal (WMF) and @Whatamidoing (WMF) and @NRodriguez (WMF), can you please fill in/update Community Wishlist Survey 2022/Generate Audio for IPA#Release timeline ? —TheDJ (talkcontribs) 12:35, 23 November 2022 (UTC)Reply

@TheDJ: I've made a start and will do some poking ~TheresNoTime-WMF (talk) 20:43, 23 November 2022 (UTC)Reply

Am I missing something?

Am I misunderstanding something? I have just tried this in my af.wiktionary sandbox and the markup:

<phonos ipa="ˈbɜːrmɪŋəm" text="test" lang="en-GB" />

is pronounced as "test"

Both of these alternatives:

<phonos ipa="'bɜːrmɪŋəm" text="" lang="en-GB" />
<phonos ipa="'bɜːrmɪŋəm" lang="en-GB" />

generate an error: "The generated audio appears to be empty. The given IPA may be invalid, or is not supported by the engine. Using the 'text' parameter may help.".

How can a user ensure that the IPA is parsed and pronounced? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:04, 2 February 2023 (UTC)Reply

@Pigsonthewing: the "text" parameter is not a label, but is the written word in the language that is specified in the lang= paramter. See also mw:Help:Extension:Phonos. What is the word you are trying to produce, I can try to show you an example. — xaosflux Talk 19:53, 2 February 2023 (UTC)Reply
@Pigsonthewing think I figured it out, see testwiki:Birmingham, is that what you were trying to achieve? — xaosflux Talk 20:02, 2 February 2023 (UTC)Reply
Thank you, but no. My point is that the template is not - apparently - parsing the IPA, but the value of the "text" parameter. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:10, 2 February 2023 (UTC)Reply
@Pigsonthewing I think the documentation needs a lot of work and opened phab:T328705 about it. — xaosflux Talk 20:48, 2 February 2023 (UTC)Reply

Could we have a response, here, please, from User:NRodriguez (WMF), User:Whatamidoing (WMF), User:MusikAnimal (WMF), User:TheresNoTime-WMF, or one of the other WMF folk working on this? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:26, 26 February 2023 (UTC)Reply

Try ˈbɜːmɪŋəm or (even though en-GB is based on a non-rhotic accent) ˈbɜːɹmɪŋəm. The list of accepted phonemes is here and <r> is not one of them. Nardog (talk) 01:25, 27 February 2023 (UTC)Reply
Accepted by whom? The IPA I quoted above was copied from en:Birmingham. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:51, 27 February 2023 (UTC)Reply
By Google's text-to-speech engine, which Phonos relies on. So the description of Phonos as IPA-to-audio is somewhat misleading—it's really text-to-speech that sometimes accepts IPA as a bonus. The Google TTS supports IPA as input for only a subset of all supported languages (18 out of 53 to be exact). It also accepts not IPA but Pinyin and Jyutping for Mandarin and Cantonese. I've been advocating for renaming ipa="" and making it optional and supporting other phoneme schemes (Pinyin, Jyutping, and X-SAMPA), but they haven't made it clear they're doing it, which is super weird because doing so allows them to support with no extra cost 35 more languages, which include the 2nd, 6th, 7th, 8th, 9th, and 10th most widely spoken languages. Nardog (talk) 13:42, 27 February 2023 (UTC)Reply
@Pigsonthewing: as Nardog mentions, the voice models provided by Google (our currently-selected text-to-speech engine) only support certain phonemes and as such will "fall back" to reading the text parameter if an unsupported phoneme is provided in the IPA.
Unfortunately, we don't know how Google's voice models are implemented, but the current standard seems to be VITS (Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech)[1] — as we skip the step of phonemization (converting text to phonemes) by directly supplying the phonemes in the IPA, we need to ensure we input only phonemes which the voice model has been trained on. Additionally, when a model is trained, we don't always know the exact use certain phonemes are assigned — ə for example, is often used in at least 3 conflicting ways.
When building a tool such as this, we are limited by both the international phonetic alphabet (something I had only recently learnt from an impromptu chat with computational linguist Dr. Angus Andrea Grieve-Smith can be considered to "fall short of the ideal consistent representation that was sold to people"[2]) and the publicly available voice models.
As an aside, I recently spoke to Alan Pope, on whom a fairly robust voice model has been trained[3] — his blog post on the matter is a wonderful read for anyone interested in this part of the process! Of note is his voice models' supported phonemes.
I hope this goes a little way to highlighting the complexity, and resultant limitations, of what we're trying to do and I'd be more than happy to answer any further questions you may have. — TheresNoTime-WMF (talk • they/them) 14:39, 27 February 2023 (UTC)Reply
P.S. Way out of scope here, but wouldn't it be awesome to train our own voice model using a dataset provided by LinguaLibre? — TheresNoTime-WMF (talk • they/them) 14:54, 27 February 2023 (UTC)Reply
Though it is a common misconception that the IPA is "the ideal consistent representation"—so common that my enwp user page dedicates a section to it—it was never sold as such by the IPA (the association) itself. It was already telling you to "leave out everything that can be explained once for all" in 1904!
Out of curiosity, can you tell me what the three conflicting ways ə is used by Google? It might simply be that they correctly understand what a phoneme is: an abstract category encompassing multiple sounds (aka phones) in complementary distribution. But if not it has implications on template implementation when it's rolled out to major wikis. Nardog (talk) 15:56, 28 February 2023 (UTC)Reply
Maybe we should say the opposite, that Wikipedia doesn’t know what a phoneme is. The telling thing is that on Wikipedia most IPA is notated as phonetic, not phonemic. I have no idea who made this decision and why. Al12si (talk) 01:30, 23 March 2023 (UTC)Reply
Yes, ok, it's complex, but the Wish is called Generate Audio for IPA and the team claimed that they were working on that when attempting to cover the total failure of the Wishlist system some months ago. Theklan (talk) 21:42, 10 July 2023 (UTC)Reply

Not what was required

The proposal was for an IPA-to-audio renderer. It is apparent that what is being built is largely a plain-text-to-audio renderer. This is not what was requested, nor what is required. Rendering a text value will not allow anyone to know whether the IPA is correct, nor what the IPA is intended to sound like. It will not allow comparison of two different IPA representations of the same text lexeme. If an IPA-to-audio renderer is not possible, the request should have been - and indeed still should be - declined. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:06, 22 June 2023 (UTC)Reply

@Pigsonthewing You mentioned It is apparent that what is being built is largely a plain-text-to-audio renderer. Is this bold conclusion solely from the update posted today 22 June 2023? Or it's from something you have observed so far including the pilot wikis? Please let me know, so this can be cleared up.
This project is still about Generating Audio for IPA. ––– STei (WMF) (talk) 13:59, 22 June 2023 (UTC)Reply
Both today's update and the section above this one. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:31, 22 June 2023 (UTC)Reply
And also the current usage and examples. The deployment status is not about IPA rendering, is about an inline player, which is another wish. Theklan (talk) 21:32, 10 July 2023 (UTC)Reply
That never made sense anyway. The vast majority of IPA transcriptions are phonemic or allophonic transcriptions, which are language-specific and convey only selective information about exact articulatory configurations, omitting specifics that are either predictable according to the phonology of the language or irrelevant to the discussion at hand (see Handbook of the IPA, pp. 29–30). That means speech synthesis that directly derives audio from symbols is not an option (I guess unless you painstakingly recreate all the omitted parts in input for the audio to accompany each simpler, more legible transcription). So the only way that's humanly possible is language-specific text-to-speech. And it so happens that the only kinds of text-to-speech that don't sound horrendous are machine-trained ones, which typically accept IPA as input for only a portion of the supported languages (Google's, which CommTech initially went by, supports it for less than a half of all supported languages).
Then there are competing conventions. As the Handbook (p. 30) points out, /iː/ and /ɪ/, /iː/ and /i/, and /i/ and /ɪ/ are all valid ways to represent the vowels in heed and hid that are all "in accord with the principles of the IPA". So you can't tell whether /i/ is supposed to sound like the vowel in heed or hid just by looking at it. That means, even if you know what language is being transcribed, you can never tell if the resultant audio is correct without knowing the underlying context and conventions.
The very premise of the CWS wish was an untenable one, which is why I didn't vote for it and I suspect why (AFAICS) nobody who is actually a frequent editor of IPA transcriptions did. But CommTech didn't know that when they began working on it. Nardog (talk) 16:45, 22 June 2023 (UTC)Reply
"If an IPA-to-audio renderer is not possible, the request should have been - and indeed still should be - declined.". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:10, 22 June 2023 (UTC)Reply
You asked, as a reader, for a feature that made reading IPA redundant. You proposed automatic generation of audio from IPA, which is infeasible, as the means to accomplish it. That doesn't mean there aren't other means that can make reading IPA redundant for readers, like human editors manually inputting a prompt to generate audio, judging its quality, and adding it. Nardog (talk) 19:05, 22 June 2023 (UTC)Reply
I did not. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:26, 22 June 2023 (UTC)Reply
You didn't what? And whether my summary of your proposal is accurate or not, voters and CommTech certainly seem to have interpreted it that way. Nardog (talk) 01:59, 23 June 2023 (UTC)Reply
Sorry, but we are discussing here about a wish called "Generate Audio for IPA", which is not being done. Also, in the discussion we had this year about the lack of wishes fulfilled, the WMF team said that the "Generate Audio for IPA" was coming. Which is not. Theklan (talk) 21:34, 10 July 2023 (UTC)Reply
I've been watching this ad-nauseam over the last couple of months... and there are a few editors here who are disproportionally represented and attempting to influence what this feature should or should not be. I heavily advise inviting those who voted for the feature to give their opinion on what they want, with the information and experience that has been collected, as otherwise what has been built will likely not be accepted by those who asked for it.
Secondly, while personally I fear this is going to turn into a tool to fight the American vs British vs Canadian English wikiwars, I think it's important to realise that the general public probably won't care at all about IPA. It's my opinion that they only need a pronunciation and the whole IPA business can be removed from the lead as far as they are concerned. So even if we have gathered better feedback from more than the 4 people on this page, it is probably worth it to ask the general public what THEY want.
All in all, this seems a very good demonstration of why the Community Wishlist survey should be limited to smaller projects instead of these massive complicated projects that generally make it into the top 10 and why editors should not be doing product development. —TheDJ (talkcontribs) 13:42, 28 June 2023 (UTC)Reply
the general public probably won't care at all about IPA That's exactly why I advocated for making Phonos about generic text-to-speech rather than strictly about IPA-to-audio, which they turned down on the grounds that it was "not in the roadmap". It's alarming to me that they're still saying it's "about Generating Audio for IPA" despite the fact, according to this page, the project is supposed to address readers' inability to read IPA markup so generic TTS that supports more languages would clearly be a better solution. I hope they only mean that the CWS project is about IPA-to-audio and the Language team picks it up to make something that makes more sense. Nardog (talk) 16:36, 28 June 2023 (UTC)Reply
  • I voted for this, and think the primary benefit is that readers may want to know how a word should be pronounced. Many projects have spent considerable effort annotating these words with IPA - so an IPA-->sound solution could be useful, but I think the core benefit to the reader is just being able to hear the word without contributors recording and uploading audio files manually for each word to be announced. So perhaps an IPA rendered isn't being delivered, and maybe one day it could be - but working on a text-to-audio rendering solution isn't useless. — xaosflux Talk 14:26, 29 June 2023 (UTC)Reply
    It doesn't have to be either-or. If whatever engine you're relying on supports IPA for some languages, go for it, but it makes no sense to then preclude all other supported languages from being heard. Nardog (talk) 17:37, 29 June 2023 (UTC)Reply

@User:NRodriguez (WMF): Please see mw:Help talk:Extension:Phonos. The announcement has faulty examples. The "help page" is misleading. What are "some engines"? What is this extension supposed to do? The predominant effect I can see are inappropriate error messages and useless tracking categories. Community_Wishlist_Survey_2022/Reading/IPA_audio_renderer. Taylor 49 (talk) 21:27, 16 September 2023 (UTC)Reply

Please scrape this "Phonos" immediately

Yesterday I swichted the pronunciation template at Swedish wiktionary to Phonos. I had to partially revert the change due to dysfunctionality. Most likely I will remove it completely. I propose to completely scrape Phonos. Reasons:

  • it's dysfunctional: if "ipa=" is fed in but "file=" not then it causes an error and puts the page into a tracking cat, it cannot "read" IPA
  • it does not provide anything beyond the capabilities of the old templates
  • the look/layout is bad and hard to improve
  • it uses "Goole API" phab:T317274 (I do not want to end up with public WMF wikis accessible from ChromeBook only and only after logging into "your" Google account after having consented to Google's TOS, also the attitude "let's bet on proprietary software until free software is avaialable and good enough" is inherently wrong, it has been applied again and again during the past 25 years, and the outcome was again and again bad (MNG vs Macromedia, Theora vs Q264, ...), there is no need to have public WMF wikis dependent on (and paying to) Google)
  • it converts Vorbis files to MP3 phab:T346508 (there is really no reason to do so, waste of resources, and promotion of proprietary "technologies")
  • the documentation is incomprehensible, the announcements cross-posted too all wikis have faulty examples, it's obscure what the "PhonosInlineAudioPlayerMode" does or how to enable or disable it
  • difficult to invoke from LUA, has to be lauched through hacky "extensionTag" leaving behind "striptease markers"

@User:Nardog @User:Pigsonthewing @User:Xaosflux @User:TheresNoTime-WMF @User:TheDJ @User:Theklan @User:Al12si @User:Whatamidoing (WMF) @User:NRodriguez (WMF) @User:STei (WMF) @User:MusikAnimal (WMF) @[[User:Noé 1]] @User:HLHJ @User:Samwilson @User:Quiddity: I mean it should get deprecated on all WMF wikis, and deactivated on all WMF wikis soon later. Taylor 49 (talk) 15:46, 17 September 2023 (UTC)Reply

As the Status Updates section makes clear, installations of Phonos on WMF wikis are in the inline audio player mode so ipa= is not available, and the Language team plans to expand the offering of open language services with Text-to-Speech, creating a stable technological foundation for projects such as the IPA Audio Renderer, which indicates it won't rely on Google when/if the IPA-to-audio generation becomes available. Nardog (talk) 15:49, 17 September 2023 (UTC)Reply
These comments mostly make me want to just not work on MediaWiki. —TheDJ (talkcontribs) 17:02, 17 September 2023 (UTC)Reply