Page MenuHomePhabricator

Many common CJK ideographs show as tofu in Chromium
Open, Needs TriagePublicBUG REPORT

Description

Steps to reproduce: In Chromium, open https://en.m.wiktionary.org/wiki/%E5%8F%95#Derived_characters .

Observed behaviour: all derived characters display as tofu.

Expected behaviour: only derived characters without a font glyph display as tofu.

Cause: Due to a Chromium issue, if the first CJK ideograph in a block of text tofus, then *every* CJK ideograph tofus. Possible workarounds include ensuring a common CJK ideograph comes first, or putting a transparent CJK ideograph at the start of the block of text.

Noticed by @AAlhazwani-WMF ; @NBaca-WMF helped with the analysis.

Event Timeline

@dchan and I did some experimentation. We found a few things:

  • Adding latin letters before a non-rendering CJK character caused things to render the first time but not afterwards, for example prefixing x or x (x plus space). This strongly suggests that there's some statefulness to the bug.
  • Adding a known-good CJK character seems to work
  • We were able to force good behaviour by using ::before and then doing stuff that makes it render, not display, not take up space and (as a default of ::before) not be selectable.
html
<style>
    h1 > span > a::before {
        content: '啜';
        visibility: hidden;
        width: 0px;
        height: 0px;
        overflow: hidden;
        display: inline-block;
    }
</style>
  • I was also able to duplicate the behaviour in vscode, because it's an Electron app running on Chromium. Perhaps due to different rendering logic it was possible to add an x before a problem character anywhere in a line and cause the rest of the line to fail to render immediately.
  • I'm on Version 126.0.6478.127 (Official Build) (arm64) on an M3 MacBook Pro. @dchan, who is on Linux, had issues with the display: inline-block if we use a non-CJK character in the content field.

Here is a video of the same issue affecting a VisualEditor session:

When the rare CJK character U+2CFA7 is input at the start of a paragraph, it renders as tofu (which is expected), but it also causes subsequent non-rare CJK characters to render as tofu (which is unexpected). At first just the very next character is affected, but when the rare character is cut then pasted again, it affects all characters in the paragraph up until the first style change (which in this example is a link).