I'm a little confused about /Tcomma and /Tcedilla
According to Adobe Latin 3:
http://blogs.adobe.com/typblography/latin_charsets/Adobe_Latin_3.html
Unicode | Character | Name | Description |
0162 | Ţ | Tcommaaccent | LATIN CAPITAL LETTER T WITH CEDILLA |
021A | Ț | uni021A | LATIN CAPITAL LETTER T WITH COMMA BELOW |
Why are they are using the /Tcommaaccent "name" for the cedilla glyph? Is there any reason for that? or it's just a bug?
When consulting the Unicode chart, I see "commas" on both glyphs...
http://www.fileformat.info/info/unicode/char/162/index.htmhttp://www.fileformat.info/info/unicode/char/21a/index.htmThe only difference is that the first shows a Sans font, and the second one shows a Serif font.
So far I have been using:
/uni0162 0162 for Tcedilla
/uni021A 021A for Tcomma
/uni0163 0163 for tcedilla
/uni021B 021B for tcomma
/uni015E 015E for Scedilla
/uni0218 0218 for Scomma
/uni015F 015F for scedilla
/uni0219 0219 for scomma
Is that Ok? Anyone can confirm/clarify?
Thanks in advance
Comments
The background here is that Unicode initially unified encoding of the Turkish S with cedilla and the Romanian S with comma accent, but then later decided to disunify them, keeping the existing U+015E and U+015F for the Turkish diacritic and adding the new U+0218 and U+0219 for the Romanian diacritic. [Ignore the T diacritic for a moment.]
However, the old 8-bit codepages covering Romanian, a lot of Romanian fonts, and Romanian keyboard drivers, had all used the unified encodings with S cedilla, so there was a lot of existing Romanian text using those characters instead of the new ones. Hence the need to provide {locl} mapping from the S cedilla characters to the S comma accent preferred glyph form for Romanian.
Now, about the T diacritic...
Unlike the S diacritic, whose unified encoding was used by two different languages with different preferred forms, the T diacritic was only used in Romanian. So although Unicode provided a new, disunified encoding for it as well as for the S diacritic, for a while Adobe, Tiro and others were using the comma accent form for both the old and new T diacritic encodings, since that was the form preferred for the only language using either encoding. A few years ago, though, through Microsoft's regional subsidiaries, we received feedback from Romanian users that in the event of failure of the {locl} feature, i.e. when software didn't do the glyph level substitution, it was preferable for both S and T diacritics to have the cedilla form instead of one having the cedilla and one the comma accent. In other words, the inconsistency is considered more objectionable than the incorrect diacritic form. So since then we've followed the Unicode character name for these diacritic characters (but not for the Baltic 'cedilla' diacritics, which all properly take the comma accent).
http://kitblog.com/2008/10/romanian_diacritic_marks.html
I keep meaning to do that, but always end up with the usual cedilla already implemented by the time I get around to the commaaccent.
Some Romanian/Moldovan speakers say they do not want the
locl
feature substituting the cedilla form to the comma form. This kind of substitution only extends the confusion between ţ and ț or ş and ș, it only solves what the characters look like not what they actually are (which is bent on breaking something at some point or another).In AGLFN 1.7 there is no tcommaaccent nor scommaaccent anymore, only uni-names.
See http://sourceforge.net/projects/aglfn.adobe/files/ with the comment in aglfn.txt:
- removed mappings for commaaccent names. These should now be assigned "uni" names.
For the Baltic cedilla letters with commas, they are also used in other languages, transcription systems or translitteration systems where a proper cedilla is required.
Having the comma below and cedilla identical seems nice on paper, but doesn't really help identify characters which is important on the computer.
In a European context, I've not found any instances in which these characters should be displayed with a cedilla, and for the most part font developers are making Latin fonts for European language support. It's also worth noting that Unicode explicitly annotates these characters as 'Latvian', and the 'WITH CEDILLA' naming is acknowledged as incorrect (but cannot be changed because Unicode character names are normative and covered by stability agreements).
In Unicode these decompose to base letters with combining cedilla, not with combining comma below. It is not a naming mistake but a blurry unification of the comma below with the cedilla, like it was for t and s with comma below or cedilla. It’s only in Version 3.0 that some cedillas were changed to look like commas to accomodate Latvian and Livonian.
See http://www.unicode.org/L2/L2013/13037r-cedillas-and-commas-below.pdf it's publicly available now.
Some ISO and DIN transliterations use the cedilla and sometimes the comma with d, n, t, k, etc., in those transliterations diacritics are supposed to look like what they are otherwise you don’t know what you're transliterating anymore.
Thanks for reminding me of the Marshallese use, and of Eric Muller's memorandum.
Unicode inherited the problem from previous ISO/IEC 8859-4 or -10, ECMA-94 or-144, or code pages where there characters with cedilla in their names had cedilla sometimes and comma below other times in the reference documents.
Disunification for Romanian was done poorly, doing the right thing the wrong way or at the wrong time doesn’t help.
What you looked at was Fileformat.info which show what it finds in some fonts, sometimes it’s right but sometimes it’s wrong.
(Whether it is acceptable (and acceptable to whom?) is another question....)