Vietnamese diacritics with lowercase i

Bendy's picture

I found this on the Vietnamese Alphabet page on Wikipedia:

The lowercase letter i should retain its dot even when accented. (However, this detail is often lost in computers and on the Internet, due to the obscurity of Vietnamese specialty fonts and limitations of encoding systems.)

Can anyone confirm or falsify this odd state of affairs?

lunde's picture

I have a couple takes on this...

My first thought is that while the statement on Wikipedia may be true, it may have come about due to the lack of a dot-less "i" on early typesetting machines, thus forcing one or more tone accents to be added to a standard "i."

My second thought is that Romanization and transliteration systems are designed to preserve the intent of the language in written form, and to make doing so easier, not harder. When using modern fonts, using a dotted "i" with accents falls into the realm of making things harder, not easier. (Romanization is the term I use when the system is the primary method for representing the language in written form, and transliteration is the term I use when it is a method represent the native script of a language using a different script, such as using the Latin script to represent Japanese.)

Related to my second thought, consider Pinyin. It is the standard method of transliterating Chinese (in China), is sometimes seen with special forms of the "a" and "g" characters, along with a seemingly special form of the acute accent to represent the rising tone in which the bottom-left tip is heavier than the upper-right one. For the "a" and "g" forms, this convention follows the forms typically see in italic faces, although the face is upright. In any case, both of these convention-like characteristics clearly conspire to make Pinyin harder to use, thus violating one of the important attributes of a transliteration system.

I guess what I am trying to say, in a somewhat round-about way, is that the statement on Wikipedia may be true and false, for the reasons I stated above. I have the TCVN 5712:1993 in my office, which may shed some light on this issue, but I suspect that it uses the dot-less "i" form.

Dr. Ken Lunde
Senior Computer Scientist, CJKV Type Development
Adobe Systems Incorporated
lunde@adobe.com

Bendy's picture

Thanks, Ken, for your full and well-formulated reasoning.

I agree that using a dotted form would be harder. Though I am attempting to incorporate Vietnamese support in my font, its orthography is somewhat unfamiliar to me. Since I've been unable to find examples of a dotted i with diacritics, and since my feeling is that Vietnamese is already ungainly enough with its compound diacritics, I am planning to omit the tittle for the time being.

Bendy's picture

On a related note, do you know any online resources dealing specifically with Vietnamese diacritic design? Currently I'm referring to Diacritics Project and such useful tips as on this page. Specifically I'm wondering about the scale and weight of horn and hook.

lunde's picture

I am unaware of such resources. Your best bet is to reference high-quality typefaces that Vietnamese support by design. Two that immediately come to mind are Minion Pro and Myriad Pro.

Dr. Ken Lunde
Senior Computer Scientist, CJKV Type Development
Adobe Systems Incorporated
lunde@adobe.com

John Hudson's picture

I don't think I have ever seen Vietnamese text with the dot preserved when a tone mark is added. As Ken notes, there may at one stage have been technical limitations that did not support dotless i, leading to the belief among some users that the dot should be preserved. A parallel case to that would be the French typewriter that lacked uppercase accent, leading many people to the conclusion that accents are properly omitted on uppercase letters, despite the continuous evidence of 500 years of French typography and publishing. In these situations, I recommend looking for examples of older, handset typography, books from reputable publishers, and also good quality vernacular handlettering in signs etc. In the case of Vietnamese, you might also investigate what Alexandre de Rhodes SJ did with i when he invented the system.

Michel Boyer's picture

There are many pdf samples of vietnamese fonts on the vnTeX site: http://vntex.sourceforge.net/fonts/samples/

Michel

Added: and here is a pdf that gives examples of use in mathematical texts: http://ctan.org/tex-archive/info/Free_Math_Font_Survey/vn/survey-vn.pdf

blokland's picture

Ken: Your best bet is to reference high-quality typefaces that Vietnamese support by design.

And sometimes solutions that show up can be quite different (uni1EBF, Minion Pro / Arial Unicode MS):

Bendy's picture

One of the resources I read said that the right aligned grave is becoming more common but still not really 'correct'.

Unfortunately my versions of Minion Pro and Myriad Pro don't support Vietnamese.

BTW that Arial base glyph really doesn't strike me as well designed. It looks all bumpy and warped.

Thanks Michel for your pdf links. Looks like they've forgotten to space/kern the horn in the Palatino in your TEX link. Interesting nonetheless, thank you.

Michel Boyer's picture

You can also look at SIL fonts. Here is from Gentium Basic. Charis and Doulos also have .VN glyphs for Vietnamese

lunde's picture

I have some thing intriguing to report.

I just opened TCVN 5712:1993, which is the Vietnamese standard for the legacy (non-Unicode) encoding for Quoc ngu (the name of the Latin-based Vietnamese script). The code charts for VN1, VN2, and VN3 encoding clearly show the use of a dot-less "i" for the characters that have accents above. But, and quite interestingly, the accompanying text uses a font that includes the dot. It is quite inconsistent. The body text consistently uses the dotted "i" form, but the headings are inconsistent. For example, the title for Section 6 includes the word "ki" with an acute accent, and the "i" is dotted. Then, at the top of page 7, the title for the VN1 code chart includes the same word, and in a similar font, but using the dot-less "i" form.

Anyone want some scans?

Dr. Ken Lunde
Senior Computer Scientist, CJKV Type Development
Adobe Systems Incorporated
lunde@adobe.com

Bendy's picture

Wow! Thanks for researching some more on this.

I wonder how these anomalies could have made it into such an important reference document. I'd be intrigued to see a scan.

In Michel's image of Gentium above it looks like some kind of language feature will do a glyph substitution to make the upper diacritics shrink for Vietnamese. But isn't it strange to have the full size version at all, or is there some other language that would use u+1EA5 and u+1EA7?

lunde's picture

Here are the scans of pp 6 and 7 of TCVN 5712:1993:

page 6
page 7

Dr. Ken Lunde
Senior Computer Scientist, CJKV Type Development
Adobe Systems Incorporated
lunde@adobe.com

Michel Boyer's picture

In Michel’s image of Gentium above it looks like some kind of language feature will do a glyph substitution to make the upper diacritics shrink for Vietnamese.

Yes, for the script latn here are some relevant glyphs associated to the languages dflt (default) and VIT (Vietnamese).

Another interesting thing is that the ligature sub i acutecomb by iacute is a required ligature in Charis and Doulos. One might have hoped to get a "i" with a dot and an acute with that combination, but that does not work either (I can't get it with Gentium either).

Michel

lunde's picture

Sorry, the Page 7 URL was wrong, and the forum won't let me edit it.

Here it is:

Page 7

Dr. Ken Lunde
Senior Computer Scientist, CJKV Type Development
Adobe Systems Incorporated
lunde@adobe.com

blokland's picture

Ken, thank you very much for the scans.

 
Ben: But isn’t it strange to have the full size version at all, or is there some other language that would use u+1EA5 and u+1EA7?

This beats me too. I can’t figure out what the advantage is of assigning a Unicode code point to what is basically a ‘wrong’ or at least an unexpected character representation, and the offering of alternative ‘correct’ versions. I could well be overlooking something here though, so perhaps someone can explain?

John Hudson's picture

Frank, there is sometimes a discrepancy between a language-specific diacritic form and a generic accent handling model. Much of the time, one knows the target language(s) for a font, but as one moves into larger fonts for widespread use one needs to start thinking about generic mechanisms for languages that one might not be aware of. Sometimes this means deliberately disabling language-specific diacritic forms as default representations, as in the SIL fonts, and relegating those forms to specific OT language systems. [Another approach would be an idea that Adam Twardoch put forward on the OT developer discussion list some time ago: register a generic language system tag, so that 'dflt' behaviour could still represent the majority or most common usage. So, for instance, the Vietnamese forms could be the default rendering, but a more generic mark placement would be accessible via this language system tag.]

Another instance of this problem is the Czech and Slovak 'caron' diacritics that take an apostrophe-like form. These are the normal default forms for the precomposed Unicode diacritic characters, but what happens when a user wants a more generic representation, i.e. an actual hacek sign above a d l or t. Given SIL's global and minority language mandate, it makes sense that they favour generic accent positioning as a default rendering.

blokland's picture

John, thanks for the clear explanation. It is an interesting approach in the SIL fonts.

I am participating on the OT list since early this year, so I actually did not know about Adam’s idea concerning a generic language system tag.

hdang's picture

"The lowercase letter i should retain its dot even when accented"

i don't known where it come from, but as a vietnamese, it's the first time i heard about it :-) and in Viet Nam, i never see this kind of typeface in any printed media (in fact, i've seen it 2-3 times in some very very old typeface).

On Vietnamese diacritic design, i think that the type designer should understand how vietnamese words is constructed:
fisrt at all, there are 29 "basic" characters in alphabets list:

a ă â b c d đ e ê g h i k l m n o ô ơ p q r s t u ư v x y

and may be 9+1 "composite characters":

CH GH GI KH NG NGH NH PH TH TR

(in some old dictionary, it's considered as a independent character in alphabets)

The Vietnamese words is build by composing Consonant (1 or more chars) + Vowel (single or composite vowel) and a tone mark (accent). (this is the logic we learn to read, write Vietnamese!)

By this logic, there are some notes:

- There is only one accent. Someone considers Vietnamese as double accent language, but it's view of non-vietnamese speaking people.

- The accent is for entire word, not for an individual character. This one is importance for legibility: the accent should be easy to be distinguished from another char because it'll be recognized individually: ề is not in alphabets list, normally 2 components of it will be read separately in 2 difference phrases.

With this is reason, in my view, all of vietnamese-enable Adobe typeface (Minion Pro, Myriad Pro, Arno Pro, Garamond Premier Pro) is bad for Vietnamese (ex: Minion Pro in the image of blokland's post)

hrant's picture

This is extremely interesting.

I would suggest that even if we [have to] give in to the Vietnamese script being ridiculously Latinized*, we might still liberate it from the yoke of Latinized typographic conventions.

* BTW, how is the Nôm revival going?

Hoang, I would ask that you please start a new thread with the facts and opinions you express above, so we can try to raise Vietnamese typesetting to the next level.

hhp

Bendy's picture

Hoang, let me see if I understand correctly what you're saying.

Your example /ề/ is made of one letter and one accent. The letter /ecircumflex/ and the /grave/ accent are two components, not three. In the same way /ohorn/ is not just a separate vowel from /o/, but a separate vowel letter — the orthography just happens to look similar. Putting a hook on the vowel letter modifies the tone, and the letter /ohornhook/ is also two components, not three.

Therefore, keeping the accents separate from the base letters, breves, circumflexes and horns is ideal.

The examples with the grave to the left of the circumflex is not great because it makes the circumflex look like an accent.

Is that right??

Jongseong's picture

Thanks, Hoang. You cannot separate type design from language, and basic knowledge of how the Vietnamese alphabet works is essential for good design.

I have some experience of transcribing Vietnamese proper names into the Korean alphabet, so I am familiar with the basic letters used in Vietnamese. Korean doesn't use tones, so we ignore the accents that mark tones altogether, just as tonal marks in Pinyin are usually ignored for transcribing Chinese proper names for the English language. But the transcription table between Vietnamese and Korean treats each of the basic Vietnamese letters separately, even those that look like accented variations of the same Latin letter. For instance, the Vietnamese "o" and "o horn" map onto different vowels in Korean.

I am interested to hear you opinions on the accents in other designs that support Vietnamese, such as the SIL typefaces that you can see samples of in this thread. Are they acceptable? Could they be improved?

David W. Goodrich's picture

"... just as tonal marks in Pinyin are usually ignored for transcribing Chinese proper names for the English language." While this may be true today, it is at least in part an artifact of the near-universal absence of the required characters from fonts (with most of the the exceptions being relatively-recent system fonts). In very many contexts tones are essential for distinguishing pinyin syllables, including the very common surname 王 Wa2ng from another family name 汪 Wa1ng. Hopefully, the shrinking of the planet and the spread of Unicode will allow us to improve our habits for rendering the languages of other peoples.

Jongseong's picture

Sorry to go slightly off-topic, but I doubt that the diacritics and special characters used in scholarly transcriptions of languages that don't use the Latin script will catch on in everyday use.

It's somewhat different for languages that use Latin-based scripts, including Vietnamese—in those cases, there exist unambiguous local spellings like Málaga, Düsseldorf, Chişinău, and Điện Biên Phủ. But there tend to be competing systems of transcriptions for languages that don't use the Latin script, and not everyone can be expected to easily determine which spelling is the most scientific. For everyday use it ultimately boils down to simplicity.

Everyday use entails simplifying and ignoring important distinctions in the original language. Speakers of English or Korean will not feel the need to distinguish between 王 Wa2ng and 汪 Wa1ng when they are speaking their own language (indeed, the idea of tones distinguishing syllables will be foreign to most), so why go through the trouble of marking them differently, unless we assume a specialized audience?

Syndicate content Syndicate content