CJK Ideograph Variation or "Where's Wally in CJK"

KP Mawhood · August 2016

An author has requested three CJK glyph variations, but I cannot find them in the IVD charts. Is there anywhere I can look to find these, before I create them as alternates? The glyph shape is critical to the text (see below). Thanks

The archaic writing of the names of the trigrams are generally the same or in phonetically or graphical related versions of those found in hexagram texts. A few examples include, Qian 乾 is written <Glyph 1>;… Zhen 震 is written as <Glyph 2>. Two regional variants that also appear in the Wangjiatai Guicang include Li 離is written as Luo 羅, Kan 坎 is written as <Glyph 3> (read as Lao 勞).[1] So as not to confuse readers, the trigrams in this study are referred to by their traditional names.

Glyph 1, Qian 乾 ‎4E7E

Glyph 2, Zhen 震 ‎9707

Glyph 3, Kan 坎 ‎574E

Belleve Invis · August 2016

The [⿰𠦝𠂉] is a variant of 倝. Kangxi Zidian:

《集韻》居案切，音幹。日始出，光倝倝也。本作〈一〉，俗作𣉙，別作𧹳。◎按《說文》倝獨爲部，《集韻》从𠦝从入，今《字彙》附入人部，非。《說文》倝从屮（𡴐）作。

[⿱炏衣] comes from 殷周金文集成引得, a book about early Chinese characters.
I cannot find any source of <glyph 2>, searching 辰 and 口 from CHISE IDS Find does not return this character. Searching variants of 震 on glyphwiki returns the same no-result.

KP Mawhood · August 2016

Thank you! I just did a lot of learning.

Glyph 1 – u501d? So this is a close match – but, a new glyph would need to be created as a variation? Do I understand that correctly?

Glyph 2, I found 唇 from chars-to-containers.txt, then ebag_s030-104 on glyph wiki. The ttf download encodes it to U+3013, geta mark. Is that ok, or would you have a different recommendation?

Glyph 3 – u2c541? ⱔThis seems to be almost perfect.

David W. Goodrich · September 2016

The fundamental issue here is that you seek modern type for characters transcribed more than 2,000 years ago from a text already centuries old. In that period, variant forms were common, and occur even between snippets of the same text "duplicated" among the Wangjiatai bamboo strips. CJK Extension B included some archaic forms when it added 43,000 chars. to Unicode's second plane, blowing through the limit of double-byte encoding. Ext B fonts have been widely available for a decade, but as you noted in another thread they may not show reliably on the WWW. CJK Extensions C, D, and E are nowhere near B in size, but they seem to include a higher percentage archaic chars. in modern form. Unfortunately, very few fonts include them. A project involving excavated texts from before the Han dynasty is almost certain to require custom chars.: Unicode isn't going to catch up with what we already have anytime soon, fonts lag, and new discoveries will continue.

If your project is print only, then obviously the issues of encoding and pre-defined variations are irrelevant: you might as well encode your 3 glyphs as A, B, and C. Electronic publishing complicates the choices. U+3013 could serve as a short-hand for a single missing char., but each additional one would require its own font: it would be safer to code them in a single font up in the main Private Use Area (still within the first plane). One of the advantages of publishing in PDF format is that it allows fonts to be embedded, guaranteeing availability. However, depending on your other fonts there may already be a bunch of stuff up there; if any of those share your PUA codes confusion can ensue should clumsy processing zap some of the PDF's internal (I’ve seen files where adding sticky notes with OSX Preview caused problems).

InDesign offers the workaround of outlining individual characters, effectively turning them into in-line graphics: no font, no font issues. So why not start out with in-line graphics? The obvious disadvantage is that you will need to adjust the size and positioning yourself. This is not so hard: once you have adjusted the first you can copy and paste elsewhere, including via InDesign's search-and-replace.

Unlike a character in a font, an image can carry metadata, such as identification and cataloging information: tagging Waldo will help curious readers avoid blind alleys. The XMP Description and Keyword fields can be stuffed with, say, "Wangjiatai strip no. NNN, Guicang prognostication ABC, variant of 倝, Kanxi radical 人, components 十日十人", etc. Currently, this works most efficiently for SVG-format images in publications based on HTML5, such as ePub3, where SVG support is built in. The advantage to publishers is that authors using OpenOffice Writer or LibreOffice Writer can insert high-quality SVG images in their manuscripts instead of the usual screen-res *.jpg and *.png files. Authors, meanwhile, gain the ability to organize and search collections of character-images by essential details.

Currently, getting an SVG image into a PDF via InDesign requires passing through another vector format, likely Adobe Illustrator, though Adobe reportedly plans to re-introduce SVG. Another limitation is that for now PDF allows "object-level metadata" only for raster images, not vector: clicking on a raster image with Acrobat Pro's Edit Object tool opens access to the XMP metadata, but that choice disappears for vectors. The long-delayed PDF format 2.0 may address this, but for now only raster images can show their own metadata inside PDFs. Any HTML5 browser can enlarge an SVG and turn it into pixels, though a screen-grab will lose the metadata; vector-art software can preserve the XMP while offering more precise rasterizing.

When it comes to searching the on-line databases of Chinese characters, the "components" mentioned above can be quicker than the usual method, radical and stroke count. Glyph1 is a good example. Belleve Invis points out that the Kangxi dictionary sees this as a variant of 倝. The Kangxi dictionary is organized on the basis of 214 "radicals," recurring combinations of strokes. The radical for 倝 is 人, [shape-shifting] Kangxi no. 9, “person,” plus 8 strokes (for components 十日十; these in fact form Ext B character U+2099D, but it is neither a radical nor a common component). The Shuowen dictionary, compiled some 2,000 years ago, used 540 radicals, one of which happens to be 倝. Which is to say, that cluster of strokes was seen as defining a group of characters way back in the Han, and though use of the radical did not survive the test of time some characters did, including your 乾. However, modern dictionaries and look-up-systems scatter these among Kangxi radicals 人, 乙, 方, 月, 舟, etc., so tracking them down efficiently requires searching by components. I didn't find Glyph1, so I assume it needs a custom char., a “radical” form of 倝, modeled on 乾 less 乙.

For Glyph2, your Glyphwiki char. seems a good match. However, I generally avoid Glyphwiki chars., because their style is heavier than common Chinese fonts (a "regular" weight of Mincho 明 style, instead of the "light" common for Chinese Ming 明 typefaces). Also, when you open them up in a font editor Glyphwiki forms seem less tidy.

The components 火火衣 lead quickly to Glyph3, as well as a chart of its evolution. Your "almost perfect" match swaps the bamboo radical (竹) for the two fires (火). Swapping radicals works in some circumstances -- the Wangjiatai "duplicates" referenced above swap bamboo and grass. But here I expect your author really wants the two fires -- and not in the 裧 configuration (U+88E7).

Good luck!
David

CJK Ideograph Variation or "Where's Wally in CJK"

Comments

Categories