Need to merge duplicate glyphs within a font into single unified glyphs
I am working on a scalable bitmap-like (a.k.a. pixelated) outline TTF font that sources its glyphs from a HEX plain-text file (like what Unifoundry’s Unifont project uses) that was originally sourced from a machine-generated BDF font. Because neither HEX nor BDF provides for mapping one single glyph (like “-”) to two or more character codepoints expecting to share one same glyph (like U+002D HYPHEN-MINUS, U+00AD SOFT HYPHEN, U+2010 HYPHEN, & U+2011 NON-BREAKING HYPHEN) due to their one-to-one glyph-to-char nature, my SFD project always ends up with an unnecessary amount of exact-duplicate glyphs.
I want to trim down the number of stored glyphs within my font project to an acceptable minimum of unique glyphs (allowing multiple encoding slots for certain glyphs like “-” from the example above) and thus reduce the final font size without decreasing Unicode coverage, but I don’t know if there is some automated, Perl-scriptable way for FontForge to detect all exact glyph duplicates within a font and merge/unify them all into single unique glyphs encoded to multiple characters. (I do not have the patience to manually check one-by-one all cases of glyph duplication in my font.)
Any help here would be greatly appreciated. Thankees!
Comments
-
For TrueType fonts, you can use composite glyphs. The easiest way is to try to determine if some glyphs have the same outlines, and then leave one of them and replace the rest with components from this glyph. The same goes for glyphs with accents.
It can be easily done with Python, but with Perl... Idk1 -
Viktor Rubenko said:For TrueType fonts, you can use composite glyphs. The easiest way is to try to determine if some glyphs have the same outlines, and then leave one of them and replace the rest with components from this glyph. The same goes for glyphs with accents.Viktor Rubenko said:It can be easily done with Python, but with Perl... Idk
Thankee! (hopefully)0 -
A fairly straightforward solution in Python; you can convert it to Perl, or just run this in Python and store the result elsewhere to process further.I downloaded this unifont.hex from GitHub but you can use the one you have, if it's in the same format ("Unicode value:hex string"). The result is a list of Unicode codepoints which have equal hex strings.
with open('unifont.hex') as f: data = f.readlines() # 1. make a dictionary of hexstring:unicode hexdict = {} for line in data: ucode,hstr = line.strip().split(':') if hstr in hexdict: hexdict[hstr].append(ucode) else: hexdict[hstr] = [ucode] # 2. filter out single unicodes hexdict = {key:hexdict[key] for key in hexdict if len(hexdict[key]) > 1} # 3. list the combined unicodes for key in hexdict: print (' '.join(hexdict[key]))
.. and the first lines of the result looks like this:0000 2400 0001 2401 0002 2402 0003 2403 ... (etc.)
As expected, the entry starting with the hyphen contains a few more characters:002D 00AD 2012 2013 2212
– you see /hyphen, /dischyphen, /endash, /emdash, and /minus here.1 -
Thankees. I will give it a try and possibly inform Paul Hardy of Unifoundry himself about this, since this would be greatly helpful to his project.0
-
In Perl you have also the options
1. use the CPAN module Font::TTF
It supports only the TTF file format. It can read, manipulate and write the tables of a font. The time needed to get into the guts is high.
2. use the command line utility ttx coming with FontForge resp. font-tools. Then manipulate the ttx file with your favorite XML module and convert it back to TTF.
I have a similar problem in repairing amateurish historical fonts. Duplicate glyphs, glyphs with a wrong code point, code points in the PUA.1
Categories
- All Categories
- 43 Introductions
- 3.7K Typeface Design
- 806 Font Technology
- 1.1K Technique and Theory
- 622 Type Business
- 446 Type Design Critiques
- 543 Type Design Software
- 30 Punchcutting
- 137 Lettering and Calligraphy
- 84 Technique and Theory
- 53 Lettering Critiques
- 489 Typography
- 304 History of Typography
- 115 Education
- 70 Resources
- 500 Announcements
- 80 Events
- 105 Job Postings
- 149 Type Releases
- 165 Miscellaneous News
- 271 About TypeDrawers
- 53 TypeDrawers Announcements
- 117 Suggestions and Bug Reports