Tool for language support testing
Jure Kožuh
Posts: 5
Hello
Does anybody know a tool that would check a font file and define which languages the font supports?
Does anybody know a tool that would check a font file and define which languages the font supports?
1
Comments
-
All of the major font editing tools (FontLab, Glyphs. RoboFont, FontForge) will give you this ability, as will font management software such as FontExplorer and Suitcase Fusion/UTS.
Pablo Impallari's online font testing tool is incredibly useful in this respect as well:
http://www.impallari.com/testing/3 -
Glyphs app is good at this for fonts it can open (or for fonts developed with it)
, though I don't know if that's the kind of tool you're looking for0 -
Hey thank for the response, I was thinking about a tool that could give you an actual list of supported languages. Not just the definition of basic, western EU, ... or maybe I need to find lists of languages that fit into Basic, Western European, Central Europeana and South Eastern European (the definitions in Glyphs).
0 -
The user and all related content has been deleted.1
-
This is the default OSX font app you are talking about?
0 -
The user and all related content has been deleted.0
-
Will need to reinstall, thank you!0
-
IN FontBook you can see such a list when you select "Show Font Info" in the Preview menu
FontBook just builds that list based on whatever codespages the font claims to support. I once checked off Arabic and Fontbook claimed the font supported it. It also claims most Latin fonts support Greek because font apps check that one off by default.0 -
The user and all related content has been deleted.0
-
To approach this another way: does anyone know of any resource, online or off, that lists the glyph/diacritic requirements of Latin-based languages, broken out by language? I feel like MS must have this somewhere in their typography site, but I can't find it.0
-
Here are some interesting links that might work for you:
http://www.eki.ee/letter/chardata.cgi?lang=de+German&script=latin
http://www.evertype.com/alphabets/ (downloadable PDFs)
https://github.com/typekit/speakeasy (source for Speakeasy)
I also found mention of John Hudson's work with MS on Sylfaen c. 2000, later revised into that could be a good reference font.2 -
2
-
Thanks, George. That Eesti Keele link is tremendous.3
-
Some more resources:
omniglot.com/writing/langalph.htm#latin
en.wikipedia.org/wiki/Alphabets_derived_from_the_Latin
I’d suggest not relying solely on only one of the sources mentioned either by myself or Max Phillips, as I found a few discrepancies in a few of the less common orthographies between various sources.
Depending on the orthography, I’d recommend looking it up in at least two or three of the sources just to make sure you’ve covered it completely.2 -
-
Thanks, James and Stephen!0
-
There is a small tool on unicode.org that can actually examine fonts: www.unifont.org/fontaine2
-
Given that this question keeps coming up maybe we could split up the work of doing something about it. A group of people could each pick a chunk of the latin languages listed in Omniglot and plug them into a spreadsheet, which we could post online as a CSV file.0
-
There is a small tool on unicode.org that can actually examine fonts: www.unifont.org/fontaine
Dave Crossland and Vitaly Volkov are working on a python version of it
https://github.com/davelab6/pyfontaineI was thinking about a tool that could give you an actual list of supported languages
DTL OT Master does that
2 -
Besides code to analyze the font, the other problem is the quantity and quality of the language data feeding into it.
We've been working on this problem at Extensis as part of WebINK's new dynamic subsetting infrastructure. I've been building the language data files myself, after we first incorporated every definition from Fontaine and Speakeasy, then WinANSI, MacRoman, Adobe's standard character set definitions, and then adding still more. The Latin and Cyrillic coverage is increasingly good.
We will be releasing our data file as open source.
The data structure is a slightly modified version of Fontaine's. It would be trivial to modify it for use in Fontaine. It could be converted for use in Speakeasy with a bit more work.
I've mentioned this in passing, at least privately to one or two folks, but as it is getting pretty large and increasingly extensive, and has had many corrections, it seems to be nearly time to release it.7 -
@tphinney AWESOME Should be easy to have pyFontaine operate on whatever data you publish0
-
Yes, I hope so. That was the idea.
BTW, we did three things with our data structure. One was to put everything in a single file. Having a zillion separate files was just getting too unwieldy.
We also invoke the notion of a "parent" character set. So we are mostly concerned with what characters are required beyond that base set. So Latin-based* languages generally take English as a "parent" as it has a basic character set without any accented letters, and most Latin-based languages drop at most a very few of those letters for their alphabet.
* Yes, I know that many languages using the Latin writing system are not in fact based on Latin. I'm using "Latin-based" as a shorthand for "languages written with the Latin writing system." No offense intended to any language.
The third and most interesting change was an idea I had when wrestling with the problem of using a single data set for a couple of distinct purposes: the potential for two levels of codepoint coverage for any language. The base level is the codepoints it must have for us to claim it has language support. Then there are additional codepoints which we consider "nice to have," such that if you are doing something like subsetting a font down for that language coverage, you would want to include them. Quite possibly you'd want to include them when making a new font as well.
As an example, you might not require the new hryvni currency symbol to be present to say that a font supports Ukrainian, but you'd sure want to include it when subsetting a font down for just Ukrainian, or when building a new Ukrainian-supporting font today.
As you might guess from the above example, making reasonable data for each language includes thinking about things such as currency symbols, quote marks and other characters that were not always considered in the Fontaine and Speakeasy data sets.
BTW, here is what the Ukrainian entry currently looks like, for those who are curious about the data structure:<language name="Ukrainian" abbreviated-name="UKR" parent="Cyrillic"> <scanning-codepoints> 0x02BC,0x0404,0x0406,0x0407,0x0454, 0x0456,0x0457,0x0490,0x0491 </scanning-codepoints> <subsetting-codepoints> 0x20B4 </subsetting-codepoints> </language>
0 -
eek, full XML. Would be nicer as JSON, no?1
-
Reviewing the Fontaine and Speakeasy data formats again (it's been a long time since we started on this project!) I am realizing that we are pretty much our own animal now. Our starting point was actually closer to the Speakeasy format....
Except they mostly expressed all their Unicode values in decimal, and we went with hexadecimal for compatibility with... well just about every other font-related tool on the planet. Our format actually allows decimal as well, we just mostly avoid it.0 -
I wanted to say that we finally released the data file last week!
It's linked (and described) in this blog post: http://blog.webink.com/custom-font-subsetting-for-faster-websites/0 -
With regard to a tool for mapping glyph-to-language, it might be interesting for anyone considering this to review the history of the WRIT project (Microsoft & Tiro, 1997–98, presented at ATypI Lyon, originally published ATypI journal) .
As Thomas says, the big issue is the quantity and quality of data.1 -
Not to mention that there are some intriguing questions to be asked about where to draw the line, which are likely dependent on the use cases one has for the data.0
-
I'm very curious about proper templating for script support, which blends into proper templating for registered feature support per script, which leads to big fonts that need subsetting. And also, as someone who has lots of fonts floating about, i am interested in emerging merging techniques.
I don't think any of the lists of glyphs by script is worth much unless it is either coming from Apple or MS, or a specific document in use, though the other lists are always cool?
What Extensis is doing sounds pretty neat! On this one issue Thomas reports: "So Latin-based* languages generally take English as a "parent"..."
Are not most all subsets regardless of script "taking" what we used to call ASCII, as the base "language" beside that for which the subsetting is intended? And ASCII, as I know it, doesn't actually cover English, does it?
Also, I'm sort of getting the idea that subsetting depends on the size of use, the script, then matched to one of several levels of composition, if you follow. That way one can divide "what is needed" from "what would be nice" much more definitively?0 -
Thomas, how do you approach subsetting of unencoded glyphs, e.g. smallcaps, ligatures, stylistic variants? Are you parsing GSUB tables to track glyphs that map back to character level subset inclusions, or are you relying on glyph name parsing?1
-
John, the "WRIT project" link is not working for me.0
Categories
- All Categories
- 43 Introductions
- 3.7K Typeface Design
- 798 Font Technology
- 1K Technique and Theory
- 617 Type Business
- 444 Type Design Critiques
- 541 Type Design Software
- 30 Punchcutting
- 136 Lettering and Calligraphy
- 83 Technique and Theory
- 53 Lettering Critiques
- 483 Typography
- 301 History of Typography
- 114 Education
- 68 Resources
- 498 Announcements
- 79 Events
- 105 Job Postings
- 148 Type Releases
- 165 Miscellaneous News
- 269 About TypeDrawers
- 53 TypeDrawers Announcements
- 116 Suggestions and Bug Reports