With Unicode and the demand for greater multilingual support, in general, a lot of character sets have been rendered obsolete (MacRoman or ANSI) or, perhaps, insufficient like ISO 8859-1 and Windows 1252 - to name just two "Western" sets.
As with any software, defaults matter a lot. For example: what would you recommend newbie type designer load up in their font editor when making their first font?
(I'm asking because I've been writing HTML web font test pages. And some of them test for compliance with the major character sets like WGL4 and Adobe Latin 4, to name two. Also, these pages take a different approach to testing than anything else currently available. Among other benefits, they are easier for type designers to insert into their work flow because they sit in your local hard drive's file system. No web server necessary. They will be offered as open source and free of charge. So the smarter you make me, the smarter I can make tools for you.)

So, what set do you pick first, out of habit?
Comments
Why shy away from Vietnamese? You're the second type designer to tell me that.
Creating you own encoding is the best solution, but first you must investigate about the signs that you are designing and try them in context. Also it's always good to ask people who uses that language if what you are doing is OK. This is a good website for that:
http://diacritics.typo.cz/
For people who is starting, I would recommend reading this:
https://foundry.myfonts.com/guides/#character-sets
And perhaps, if you feel like it use the Underware Latin Plus encoding (that you can download here:
http://underware.nl/latin_plus/
Good luck!
Thanks for the links. I was unaware of the last two.
https://www.microsoft.com/typography/otspec/os2.htm#ur
- Adobe Latin 2
Or more often:
- Adobe Latin 3, possibly omitting Vietnamese
- Adobe Cyrillic 3
- monotonic Greek
Why leave out Vietnamese? Because in AL-3, including Vietnamese is 90 characters. That is without any small caps or alternate forms or anything. The horn accent is a tricky one to get right, and it is attached rather than floating. So the cost/benefit for Vietnamese is tough.
Not quite. The codepage bitfield is for registering what legacy 8-bit codepages the font supports, which is then used by some software — notably RichEdit clients on Windows —to make guesses about font fallback situations. The legacy nature of these bits means that they're not useful for fonts that support scripts and languages that never had 8-bit codepages, and the heavy handed nature of some software relying on the bits means that some fonts lie about what codepages they support: for instance, people making fonts to support the Arabic language may claim Wndows CP 1256 support even if their font does not support the ASCII subset or Farsi and Urdu characters, simply because this is the only way to get the font to work in some software and not fall back to an Arabic system font.
The section of the spec you link to is the Unicode range bitfield. I can't remember at what version of Unicode this has been stuck for many years. There's semi-regular conversation about rev'ing the spec to define bits for blocks that have been added to Unicode since then, but it doesn't seem to be a high priority, and there's equally regular conversation about ignoring these bits. Apart from the large number of blocks not included, the Unicode range bits have no standard requirement for complete or minimal coverage of a block. Hence, it is left to font developers to decide whether two characters from the Greek block used as symbols in a Latin-only font constitutes support of the Greek block for Unicode range bit purposes. Font editing software tends to err on the side of inclusion, so even if only one character from a block is present the bit may be automatically set.
In the case of the codepage bits, in theory one should only claim support if a font supports the entire 8-bit codepage, although some fonts lie for the reasons discussed above. In the case of the Unicode range bits, anything between a single character and full block coverage may be indicated by the bit setting, and only a total absence of characters from a claimed block constitutes a clear error.
For font developers, codepage support may be a reasonable starting place, with the caveats that a) even the simplest Latin fonts these days will tend to support multiple codepages, and b) not everything that a user might want for a given language exists in codepages. So, for example, there are a few Cyrillic characters needed for European languages that were not included in the old 8-bit Windows Cyrillic codepage; they are included in WGL4, however. At least, though, codepage support is a solid technical target.
For font developers, supporting entire Unicode blocks can involve a huge amount of work creating often obscure and graphically complicated characters that are unsuitable for inclusion in a particular design and unlikely ever to be used. Unicode includes a large number of characters of historical and specialist use only. I've had to make quite a lot of fonts containing such characters because of the nature of my clients — software companies who need to be able to display any Unicode character that may ever occur in text, and specialist publishers who actually produce texts involving such characters. Unless you have such clients, it's simply a waste of your life to spend time creating glyphs for such characters.
I used to fret about vertical metrics for Vietnamese but now I just stack 'em up. You can let those accents go past the emsquare rather than trying to squeeze them in. I can't imagine people use default leading when setting Vietnamese type anyway.
My Extended Cyrillic excludes historical characters.
0243 Ƀ is obscure but I include it because some people use it as a Bitcoin symbol.
You can see an example of something I recently worked on here.
I used to think that way too. The presumption is that the users of the target language are the ones buying fonts. But I think it's all about making fonts for effective localization. Popular Android and iOS apps tend to be localized. If you're an app developer, you don't want to have to license a separate font for Greek or Vietnamese. You're going to look for a font that covers as much as possible.
Everybody's who's responded so far is an old hand at font making and everybody's got their own formula. Me too. But what I'd like to understand is why did you make the decisions you made? Ok. I understand that Vietnamese can be seen as a burden with next to no payoff for the effort. But is that evidence-based? Ray's extended Cyrillic includes historical characters. Now, I haven't yet worked through the Adobe Cyrillic set but does that set include historical characters that make their way into Thomas Phinney's fonts? (Don't know, but I'm guessing not.)
Michael Jarboe, what character set is "Extended Latin". Who defines that one? Not any major industry players that I know of. Did you put it together for yourself?
Web fonts - fonts packed up to travel over the network - are my main interest, as some of you know. And traveling light is good. The smaller the file size, the better. So I'm trying to put some of these characters on trial for their lives, so to speak. And also divide the characters into those that are more suitable for a print (desktop) font than for the web.
I think Ray makes a great point about giving developers maximum coverage in a single font. But I'm wondering about the file sizes to achieve that.
Do apps install the fonts they use upon installation? Or do they pull from the network when you open the app? I don't know.
(If somebody could clue me in on that, I won't complain.)
Generally not, but, well, sometimes you gotta do what you gotta do.
For a long time, I told clients that if we're going to claim codepage support, then we have to actually support the codepage, and they were mostly okay with that because they understood there may be software dependencies. These days, where Unicode is the norm in most places and 8-bit codepage mapping not as critical as it used to be, some of the 'younger' clients — by which I mean companies like Google — seem perfectly happy to make non-Latin fonts without an ASCII subset. I've not checked the OS/2 tables in such fonts to see what codepage bits are set, but I wouldn't be surprised to find they're lying, and nor would I blame them.
Microsoft still ask us for at least one complete Windows legacy codepage in each font, even in fonts for scripts that are 'pure Unicode, i.e. that had no Windows 8-bit support. Typically, this means that every font we make for Microsoft supports at least CP 1252 (Win ANSI, Western Europe), which has led to a number of one-off Latin designs to accompany Ethiopic or Javanese or Arabic.
Vietnam’s per-capita income is $5,700 and it’s tied with Zimbabwe for first place in software piracy rates. I consider that ample evidence Vietnam is not an economy I should develop for unless somebody else pays up front.
There are around 15 million Nigerian smartphone users and their many languages only require a few extra glyphs to support, some of which are included in the Vietnamese range. iOS and Android apps can use embedded TTF font data without installing. App developers can use a mix of embedded fonts and OS fonts for localization. For example, a developer might embed a Latin/Greek/Cyrillic font and fall back on OS fonts for Chinese and Japanese.
I can see merit in that statement, and also in what @Ray Larabie us writing about smart phone app developers as a target audience. But the majority of professional type designers and font developers in “the west” – as well as users on this sit – do primarily service the graphic design industry. While this industry is potentially smaller than the web or smart phone apps, it is a real industry with a decades-long tradition of licensing fonts. It spends enough money to keep many of us gainfully employed. It is also the industry that a lot of us came out of (even though @John Hudson has often written IIRC, for example, that he is not a graphic designer, and prefers to design for typographers who are not graphic designers). In other words, our focus should not be a surprise.
One reason that several font foundries do not include Vietnamese in their off-the-shelf fonts in that the potential addition of Vietnamese characters later on down the line represents a potential future revenue stream. Several years ago, I worked at one of the big old foundries, and we occasionally got orders to create custom fonts that supported Vietnamese.
Even in the libre font market, many designers hope that corporations will hire them to extend the fonts they have already published. Certainly I would add Vietnamese support to libre fonts I have designer, or even libre fonts other designers have produced, if a customer was willing to pay me the appropriate fee.
I suspect, also, that every designer does their own cost/benefit analysis. We are willing to work on a specific family of fonts for a certain number of months or years, but eventually, one needs to release the products. Fonts are never finished, no matter who publishes them. Both commercial font makers and libre designers update their fonts over time, usually based on market demands that grow over time.
James Montalbano - think you left out the word "easy" from your post, yes?
The consensus among those who've responded is that each has his or her own set worked out that's done the job and you add and subtract characters from that personal default as needed. As far as choice of characters - well, certainly language coverage plays a big part. As does client request or some other motivating factor.
I remain open to any input.... what I've gotten from this thread has helped me understand, so I thank those who have responded so far.
More?
My own default char. set is SIAS-Lat-Eu-2 or SIAS-Lat-Eu-3. The latter also embraces Azeri and Vietnamese.
I wonder for a long time now, why there is no industry standard established for general char. sets, a matter so crucial. And I also wonder if any of the relevant academic institutions have ever been touched the foggiest possible way by the thought to do work on that matter.
Like Mark's example, it includes all of Latin Extended A and a few things from Latin Extended B. And then adjusted accordingly depending on the typeface, considering the inclusion or exclusion of lining figures, oldstyle figures, small caps, standard and discretionary ligatures, numerators, denominators, superiors, inferiors, case sensitive forms, symbols, etc. it can change quite greatly. That is why I always create a custom encoding because there are so many variables and I can order all the glyphs in an organized way that makes sense to me.
Beyond that, you get into symbols and emoji and the question there becomes: how often do people need those symbols? And in web browser's at least, remember there's a luxury that you don't have in the graphic arts: a stack of fallback fonts can be specified in the web page's style sheet that will, most probably, have the symbol you've omitted from your font. It might not match your font stylistically. The metrics from your font compared to the fallback font probably won't match, but the symbol will display.
I'm certainly not concerned about what characters folks like James, Mark or Ray or you, Andreas, put in a font. You know what you're aiming for. However, at least some of the folks who license those fonts are concerned about what languages are supported. On the web, especially, web fonts have really put "The World" into the "World Wide Web" and language support is an important thing to think about.
What I would like to see, beginning with Latin-based languages, are just a few character sets defined, based on Unicode, that progressively support a greater and greater number of languages. Based on the number of speakers and any other relevant factors. The Adobe Latin sets do that, to some degree, but Adobe does not consider them normative and thus "a standard" of any kind. (There's more to be said about the Adobe sets, but I'm not going into it here.)
Also - consider this - there actually are character sets defined in both the HTML4 recommendation from the W3C and also the HTML5 recommendation. The char list in the HTML5 is big and has a lot of symbols. But it IS part of a recognized industry "standard". And, in fact, all current browsers support that standard which maps the Uni points to "human friendly" names like → (for 'right pointing arrow') or (for non-breaking space).
Well, I'm on the job, Andreas.
Also, I think Microsoft's model for supporting languages is a good model to follow for anybody. They got there first. I'm thinking of writing up a little study of how Microsoft goes about handling language support for Windows and Office, for its different constituencies. Just sayin'.
[HT to @Adam Twardoch]