What character set do you usually reach for as a default when you start a new font?
Comments
-
Came upon this link at Linotype's site:
OpenType Character Sets – OpenType Std
Follow the link trail that starts on that page and you'll see Linotype's charsets and what they label them, plus the progression of language coverage from set to set, and notes on when and why the sets were chosen. Too bad there are no lists like .enc files or .nam files. But they do give you a sum total of the chars in the set, and a clear image of all the glyphs in the set. So, if you wanted, you could piece it together. Or ask them for lists, which is what I'm going to do. If I get 'em, I'll post them.
Also, the link that @Fernando Díaz provided in his comment is worth a bookmark:
http://underware.nl/latin_plus/
Underware has obviously done some thorough research. Hover over a character in the chart and you get a popup that gives you details about the character. It even tells you how many languages use that particular character.
Nice presentation. Very complete.
John Hudson said:BTW, on the subject of how to indicate what scripts/languages a font explicitly supports, in Windows 10 Microsoft has adopted Apple's 'meta' font table with <dlng> and <slng> tags for 'design language(s)'.................................1 -
The problem with the Unicode charts is that they don't indicate junk glyphs. If you comply with Unicode ranges, you end up including a lot of garbage, wasting time and bloating fonts. I think there should be characters in each range which are officially flagged as optional. That way each range could be calculated as complete, functionally complete or incomplete.1
-
The problem with the Unicode charts is that they don't indicate junk glyphs. If you comply with Unicode ranges, you end up including a lot of garbage, wasting time and bloating fonts. I think there should be characters in each range which are officially flagged as optional.
Optional to what? It's a character encoding standard.0 -
@Richard -- The Linotype sets are available on a thread in the Glyphsapp Forum, here: http://tinyurl.com/jhuga3m -- It is a plist file with the Linotype encoding inside Item 5>Item 2.
0 -
Optional to what? It's a character encoding standard.
Let's say Latin Extended A 0100-017F. For that range to be supported, all glyphs are assumed to be present. At least that's been my experience. When a client is adhering to some technical standard and they require Latin Extended A, they don't just mean some of it. For that particular range, there a few junk glyphs but not too many. But Latin Extended B has loads of never-going-to-be-used trash. Sure, I can try to convince a client that certain glyphs are worthless but it would be better if they could officially be deemed chaffy. As it is now, when a client requires Latin Extended B, I have to include glyph rubbish, including those idiotic ring acutes, knowing full well that they'll never, ever be used.
0 -
Well, you won't hear any argument from me that client procurement requirements are often daft when it comes to character sets, but it's hard to blame them. Most companies don't have script and language experts, or even text processing experts who really understand what Unicode is, or have the resources to spend two years — like Brill did — carefully documenting their needs and planning a major font project. So it's easier for them to simply point to what look, to them, like discreet blocks of Unicode characters and say, 'We need these'. [Of course, the blocks aren't really discreet — sometimes the casing pair for a character might be in a different block.] Some of the blocks — including Latin Extended B — are subdivided into labeled sections, and that can be useful in helping clients understand what they might or might not actually need.
If there were secondary documentation that mapped Unicode characters to specific uses, and this documentation were reliable and suitably endorsed — whether as a de facto or de jure standard —, then it would probably be easy to steer clients to this. [Corporate character subsets like WGL4 became de facto standards for some clients; heck, I've even had people say to me, 'We want the Helvetica World set'.] I don't think such a thing belongs in Unicode itself, though, because there has to be a commitment to the essential equality of all characters within that standard. As soon as you start saying that some characters are essential and others are optional, you start penalising lesser-used languages and specific communities.
1 -
As soon as you start saying that some characters are essential and others are optional, you start penalising lesser-used languages and specific communities.
It's certainly hard to say officially, "your language isn't worth supporting" but designating some characters as historical when they're not used in current language wouldn't be so controversial. And there are some unlikely characters that are superfluous to normal language usage that could be flagged as optional like interrobang or ring acutes without much argument. I don't think such a list will ever be produced but it sure would be nice.
0 -
I sympathize with your pain, Ray. But this has pretty much always been the case when working specifically for a client, to one extent or another, right? You often have to dig beneath the original brief to help the client figure out what the real need is.
And you have to show them that you know what you’re talking about (or at least have resources to call on ;-).
I have had to do this with clients — find out what languages, specifically, they are wanting to make sure they can support, or what markets specifically they are hoping to push into, now and in the foreseeable future, and help frame up a character set accordingly. And I explain to them the frivolity of the interrobang, for example, and why they might not want to pay me to draw the damn thing.
If they still want the whole block, then you charge for it, right?
1 -
George Thomas said:@Richard -- The Linotype sets are available on a thread in the Glyphsapp Forum, here: http://tinyurl.com/jhuga3m -- It is a plist file with the Linotype encoding inside Item 5>Item 2.
Addendum: Just looked again at the Linotype support docs. The W1G set is the same as the Windows Glyph List 4.0, and that's been unchanged since 2007 or so and I have that list. That simplifies things.0 -
Ray Larabie said:
As soon as you start saying that some characters are essential and others are optional, you start penalising lesser-used languages and specific communities.
It's certainly hard to say officially, "your language isn't worth supporting" but designating some characters as historical when they're not used in current language wouldn't be so controversial. And there are some unlikely characters that are superfluous to normal language usage that could be flagged as optional like interrobang or ring acutes without much argument. I don't think such a list will ever be produced but it sure would be nice.
1 -
We tend to think in terms of official standards bodies instead of languages. Perhaps this is just knee-jerk. Perhaps we need to name and generate lists that cover languages.0
-
@Richard Fink The list does contain W1G; it's Item 3 in the Item 2 subgroup. The only possibly drawback to some is that it uses the Glyphsapp naming convention although that shouldn't make a difference. They don't end up in the final font.
0 -
George Thomas said:@Richard Fink The list does contain W1G; it's Item 3 in the Item 2 subgroup. The only possibly drawback to some is that it uses the Glyphsapp naming convention although that shouldn't make a difference. They don't end up in the final font.
Chris Lozos said:We tend to think in terms of official standards bodies instead of languages. Perhaps this is just knee-jerk. Perhaps we need to name and generate lists that cover languages.0 -
All of the Linotype lists are useful as a reference but not much more since the coverage is limited.
0 -
The WGL4 set, when originally published in the late 90s, helpfully distinguished 'core' characters, necessary for language support, from optional characters such as the line- and boxdraw characters that are really only relevant for terminal emulator fonts (they were included in WGL4 because Microsoft is one of the companies that actually cares about terminal emulator fonts and how to build and correctly identify them in the system). Unfortunately, this useful feature was removed when the WGL4 set was updated and republished.1
-
@Richard Fink
Glyphs which are used in less popular languages isn't the main issue. There are lots of glyphs, scattered across the Unicode chart which are deprecated, historical or for academic use only. Under academic use, I include characters which are only used for biblical transliteration... there are a lot of those. Which is fine, but it's not necessary for every font. Maximize language coverage/reduce waste.
0 -
The ISO 10646 also has subsets defined for this purpose:
- MES-1 and MES-2 (Multilingual European Subset)
- Modern European Subset
- Contemporary Lithuanian Letters
- Basic Japanese
- Japanese Non Ideographic Extension
- Common Japanese
- Multilingual Latin Subset
2 -
The W1G set is the same as the Windows Glyph List 4.0,
Even if you’re just talking about alphabetic chars and not miscellaneous symbols, as I recall from when I investigated this, there are some minor differences.
For instance, the W1G set includes the “historic” ѢѣѲѳѴѵ Cyrillic chars for pre–1918 reform spelling. And the W1G also seems to specify the legacy 0x0162/163 Tcedilla chars of questionable value, which I believe the WGL4 does not. (Not entirely sure about this.)
The W1G also specifies a more complete set of encoded inferior/superior figures and signs.
0 -
@Kent Lew: Tcedilla is used in Gagauz and in some romanization systems. Besides, there are still a lot of existing Romanian texts that use it even if Tcomma should be used instead. Where the W1G is wrong regarding Tcedilla, on http://www.linotype.com/5801/european-ot-character-set-w1g.html, is the shape: if Scedilla has a cedilla, so should Tcedilla. For the historic Cyrillic characters ѢѣѲѳѴѵ, I remember Maxim Zhukov pointing out that even though they stopped being used in Russia in 1918, they are still used by the Russian Orthodox Church and Russian diaspora outside of Russia.3
-
Richard Fink said:Addendum: Just looked again at the Linotype support docs. The W1G set is the same as the Windows Glyph List 4.0, and that's been unchanged since 2007 or so and I have that list. That simplifies things.0
-
John Hudson said:The WGL4 set, when originally published in the late 90s, helpfully distinguished 'core' characters, necessary for language support, from optional characters such as the line- and boxdraw characters that are really only relevant for terminal emulator fonts (they were included in WGL4 because Microsoft is one of the companies that actually cares about terminal emulator fonts and how to build and correctly identify them in the system). Unfortunately, this useful feature was removed when the WGL4 set was updated and republished.
At Windows Glyph List 4 on Github.
I always wondered how and why the box drawing chars got there. So you're saying those were special purpose? No grand plan? Terminal emulation? Good to know.
It sounds like your character sets have become rather far-ranging. (I'm going to double back and click the link you provided earlier.)Ray Larabie said:@Richard Fink
Glyphs which are used in less popular languages isn't the main issue. There are lots of glyphs, scattered across the Unicode chart which are deprecated, historical or for academic use only. Under academic use, I include characters which are only used for biblical transliteration... there are a lot of those. Which is fine, but it's not necessary for every font. Maximize language coverage/reduce waste.
I don't blame you a bit for wanting to stick with characters that allow modern readers of a language to understand the meaning of what they are reading with nothing more, nothing less. And everything else off in a different category of "special purpose" or "optional" or "historical" or whatever.
BTW - @TimAhrensposted somewhere - not on Typedrawers - and I went looking for his post and I've come up dry so far - he posted that he's made a close study of this issue and there was a list of characters that, at least, he considers superfluous. Tim, if you're out there, weigh in.
I'm going to give the ISO sets a fresh look. In light of Unicode, they are obsolete. But that doesn't mean they were incorrect. Thanks.Denis Moyogo Jacquerye said:The ISO 10646 also has subsets defined for this purpose
Frode Bo Helland said:@Richard Fink Many of the languages Latin Plus claim to support are missing required characters or listed with wrong orthographies. Many of the sources does not support their conclusions.
Golly, I feel better now about being confused.
And I can't believe it was six years ago, but I remember when web fonts first arrived and @Ethan Dunham was putting Font Squirrel together and we would Skype about which direction the site and web fonts, in general, were taking. At the time, he was still thinking MacRoman as some kind of default set but I managed to dislodge that idea and make sure he focused on language coverage - a prominent and useful feature you'll find on Font Squirrel to this day. Ask the average web developer what's meant by saying that a font conforms to the Latin-3 character set and he'll look at you like you have two heads.
1 -
Latin Plus claims no “language specific characters”:
When I first extracted their list months ago I interpreted that to mean that it shares glyphs with other languages, with nothing specific to Hopi alone. In looking at the list just now that appears to be the case.
As for Tokelauan, the list has all the needed glyphs except for the stacked accents. I'm assuming those are special-purpose for linguistics which is why they are omitted. Omniglot does not list them. Geonames.de doesn't list the language, for reasons not known to me.
1 -
John Hudson said:Well, you won't hear any argument from me that client procurement requirements are often daft when it comes to character sets, but it's hard to blame them. Most companies don't have script and language experts, or even text processing experts who really understand what Unicode is....0
-
Frode,
Leaving out the Western "keyboard" characters - meaning ASCII and beyond, is not a part of my logic. I think it's big mistake to leave them out of any font.
On this page:
http://underware.nl/latin_plus/character_set/
The basic Western chars are lumped together under the category: "Language non-specific" characters, whatever that means to the folks at Underware.-1 -
@Frode Bo Helland There is only one list, at the download link. I don't see any in the list that have the basic character set.
Their list really should include the accented glyph names for all languages in the list that use them. But then that could complicate things further because other sites such as Omniglot don't even mention them.
0 -
@Frode Bo Helland I don't have an explanation for the additional glyphs, but it wouldn't hurt to ask someone at Underware why they are there. Loanwords, maybe? I honestly don't know.
0 -
The user and all related content has been deleted.-9
-
@Frode Bo Helland I counted 24 accented glyphs for Norwegian in the list, yet other sources indicate there should be only 18-20. That's why it appears there are too many to me.
I agree with you there are likely errors or omissions in that list, and I have found errors or omissions in other sources too. Working on adding to my character set has made me wish for an ultimate authority, but I'm not sure if that is even possible.
1 -
Thanks Frode. That's the info I had on Norwegian so I'm good on that.
0 -
I think I might have derailed this thread but I think this topic is very important. I imagine there are a lot of new type designers who are curious about extending Latin language coverage.
I just want to mention a few characters that consider I academic: IPA, Pinyin and Esperanto. The reason I categorize academic characters separate is so I can skip them in display fonts.
If you're ever explored Latin Extended B, you'll have noticed that there are loads of historical IPA characters. If you're a new to type design and you haven't bothered with combining accents because it looks like a bother, I've come up with a reduced set. If you eliminate the IPA-only combining accents, you're left with only 17 characters*.
0300 grave
0301 acute
0302 circumflex
0303 tilde
0304 macron
0306 breve
0307 dot accent
0308 dieresis
030A ring
030B double acute
030C caron
0313 comma (like gcommaccent)
0323 dot below
0326 comma below
0328 ogonek
0337 slash (like Oslash)
0338 slash (like oslash)
I'm not sure if these are required if you're already including an Vietnamese set.
0309 hook above
031B horn
* circumflex below is used for the Venda language but there are locations for those characters in the Latin Extended Additional Range.0
Categories
- All Categories
- 43 Introductions
- 3.7K Typeface Design
- 802 Font Technology
- 1K Technique and Theory
- 618 Type Business
- 444 Type Design Critiques
- 542 Type Design Software
- 30 Punchcutting
- 136 Lettering and Calligraphy
- 83 Technique and Theory
- 53 Lettering Critiques
- 483 Typography
- 301 History of Typography
- 114 Education
- 68 Resources
- 499 Announcements
- 80 Events
- 105 Job Postings
- 148 Type Releases
- 165 Miscellaneous News
- 269 About TypeDrawers
- 53 TypeDrawers Announcements
- 116 Suggestions and Bug Reports