Need to merge duplicate glyphs within a font into single unified glyphs

LVargas
LVargas Posts: 5
edited May 2020 in Technique and Theory

I am working on a scalable bitmap-like (a.k.a. pixelated) outline TTF font that sources its glyphs from a HEX plain-text file (like what Unifoundry’s Unifont project uses) that was originally sourced from a machine-generated BDF font. Because neither HEX nor BDF provides for mapping one single glyph (like “-”) to two or more character codepoints expecting to share one same glyph (like U+002D HYPHEN-MINUS, U+00AD SOFT HYPHEN, U+2010 HYPHEN, & U+2011 NON-BREAKING HYPHEN) due to their one-to-one glyph-to-char nature, my SFD project always ends up with an unnecessary amount of exact-duplicate glyphs.

I want to trim down the number of stored glyphs within my font project to an acceptable minimum of unique glyphs (allowing multiple encoding slots for certain glyphs like “-” from the example above) and thus reduce the final font size without decreasing Unicode coverage, but I don’t know if there is some automated, Perl-scriptable way for FontForge to detect all exact glyph duplicates within a font and merge/unify them all into single unique glyphs encoded to multiple characters. (I do not have the patience to manually check one-by-one all cases of glyph duplication in my font.)

Any help here would be greatly appreciated. Thankees!

Comments

  • Viktor Rubenko
    Viktor Rubenko Posts: 119
    edited May 2020
    For TrueType fonts, you can use composite glyphs. The easiest way is to try to determine if some glyphs have the same outlines, and then leave one of them and replace the rest with components from this glyph. The same goes for glyphs with accents.
    It can be easily done with Python, but with Perl... Idk
  • LVargas
    LVargas Posts: 5
    edited May 2020
    For TrueType fonts, you can use composite glyphs. The easiest way is to try to determine if some glyphs have the same outlines, and then leave one of them and replace the rest with components from this glyph. The same goes for glyphs with accents.
    Uh, I am not seeking to do composite glyphs or simplify accented glyphs. What I am seeking is detecting exact-duplicate glyphs like the four hyphen examples I gave above (which share the same glyph without adding new components to it): replace four hyphen glyphs with just one glyph (and map the same glyph to four distinct codepoints), and repeat the same for other groups of characters sharing glyphs (with no additional components).
    It can be easily done with Python, but with Perl... Idk
    Well, the only Python I know is what I learned when dealing with VapourSynth (a video frame-editing-&-serving framework often used with VirtualDub2), but maybe if you show me an example Python code of what you could do to deal with your glyph-compositing case, maybe I could see if there is something there that could perhaps apply to my case – and if I see a way to rewrite it in Perl, then better. (It’s not the first time I do translate a Python code to Perl – I once did that when trying to rewrite code for mapping a non-Unicode BDF font to Unicode before applying the Unifont scripts to convert it to HEX and then to outlined TTF.)

    Thankee! (hopefully)
  • Theunis de Jong
    Theunis de Jong Posts: 112
    edited May 2020
    A fairly straightforward solution in Python; you can convert it to Perl, or just run this in Python and store the result elsewhere to process further.
    I downloaded this unifont.hex from GitHub but you can use the one you have, if it's in the same format ("Unicode value:hex string"). The result is a list of Unicode codepoints which have equal hex strings.
    with open('unifont.hex') as f:
    	data = f.readlines()
    
    # 1. make a dictionary of hexstring:unicode
    hexdict = {}
    for line in data:
    	ucode,hstr = line.strip().split(':')
    	if hstr in hexdict:
    		hexdict[hstr].append(ucode)
    	else:
    		hexdict[hstr] = [ucode]
    # 2. filter out single unicodes
    hexdict = {key:hexdict[key] for key in hexdict if len(hexdict[key]) > 1}
    # 3. list the combined unicodes
    for key in hexdict:
    	print (' '.join(hexdict[key]))
    
    .. and the first lines of the result looks like this:
    0000 2400
    0001 2401
    0002 2402
    0003 2403
    ... (etc.)
    
    As expected, the entry starting with the hyphen contains a few more characters:
    002D 00AD 2012 2013 2212
    
    – you see /hyphen, /dischyphen, /endash, /emdash, and /minus here.

  • LVargas
    LVargas Posts: 5
    Thankees. I will give it a try and possibly inform Paul Hardy of Unifoundry himself about this, since this would be greatly helpful to his project.
  • In Perl you have also the options

    1. use the CPAN module Font::TTF

    It supports only the TTF file format. It can read, manipulate and write the tables of a font. The time needed to get into the guts is high.

    2. use the command line utility ttx coming with FontForge resp. font-tools. Then manipulate the ttx file with your favorite XML module and convert it back to TTF.

    I have a similar problem in repairing amateurish historical fonts. Duplicate glyphs, glyphs with a wrong code point, code points in the PUA.