Bulk Ligature Adding

Stuart Sandler · December 2020

Hello all! I've recently come into an odd project need whereby I need to replace large amount of unique words with a ligature to prevent unsavory words to appear when typed. My initial thought was to accomplish this using the liga feature but as a regular Glyphs user there isn't a fast 'bulk' way to add ligature feature code unless I'm just not familiar with where to look for this panel.

For fun I converted the font to TTX but I couldn't see where to add the ligature feature in an obvious way.

Rest assured I've already warned the client they need to be whole specific words rather than partial words else too many partial words would ruin longer words which contained them.

Any suggestions or insight is welcomed and appreciated! Kindly advise and thank you in advance for your time and assistance.

Nick Shinn · December 2020

The <rlig> feature would be more censorous, as it can’t be turned off, while <liga> can.

Are you familiar with the <ignore> coding?
You would have to use that, putting spaces/punctuation on either side of “shit”, for instance, to enable “mishit”. But it could get complex…

Reminds me of this.

Stuart Sandler · December 2020

Thanks @Nick Shinn actually I just sorta figured out if you uncheck the Generate feature automatically in the ligature feature panel, you can just type and add the ligatures in there

Georg Seifert · December 2020

First you need to add a glyph that should be shown instead of the offending word. Lets say you call it "censored".
Then you need to go into the Font Info window and select the Features tab. There you add a feature called "rlig" (as Nick suggested).

In the text view on the top right, you enter code like this:

sub d i r t y by censored;

Add one line for each word.

Stuart Sandler · December 2020

Thanks gang! Interesting . . . So what about cases or alternates?

sub [T t] [N n] [T t] by exclam;

or more specifically if somebody uses a S or an s or an $ or a 5 in place of an s?

Stuart Sandler · December 2020

Also as before, I continue to get the overflow errors because well, I'm adding 413 ligatures all that use [X x] type substitutes:

In feature 'rlig' ligature substitution rules cause an offset overflow (0x48bf12) to a lookup subtable

Craig Eliason · December 2020

This sure feels like a problem that should be solved at a level other than the font.

André G. Isaak · December 2020

Further to Nick's point, bear in mind that it looks really bad if (e.g.) "Constitution of the United States" ends up displaying as "Cons[CENSORED]ution of the United States".

Stuart Sandler · December 2020

This thread is the only thing I was able to find but I wasn't sure how to add a reference within the font file for liga or rlig to look it up - https://forum.glyphsapp.com/t/makeotf-gsub-offset-overflow/3144

FWIW @Craig Eliason I fully agree, it's simply a path we're exploring to see what is possible and FWIW, if this thread can help somebody in the future, it's worth discussing in public.

Aaron Bell · December 2020

Might this be the https://www.thepolitetype.com project?

I agree with @Craig Eliason. While this seems like a great idea, you'll be constantly chasing after all the possible variants. And what about the various letter-like symbols? Those will go to a fallback font, so any OpenType substitution won't work.

If I were to pursue this approach, though, I think what I would do is put together the full list of words that you want to process, and then a list of possible variants for each letter. And then write a python script that parses through each word and builds the substitution based on the letter variants.

So something like:

badwords = "a", "ab"

lookup={

"a": "[A a four]",

"b": "[B b eight]",

}

for word in badwords:

sub = "sub"

for letter in word:

sub = sub+" "+lookup[letter]

sub = sub+" by censored;"

print (sub)

This'll output:

sub [A a four] by censored;

sub [A a four] [B b eight] by censored;

Stuart Sandler · December 2020

@Aaron Bell great insight, thank you! This is not associated with that project but a private client we're doing some research for.

The problem isn't creating the substitutions but rather attempting to compile the font and not get an overflow error.

To be sure, by no means is the attached file comprehensive or even intended to be anything more than a starting point for the effort for testing but I've already got the feature code which can be added as an rlig or liga feature

Again, it just seems to be too much to allow the font to compile and I'm not sure how to specifically build this into the font as a look-up from liga or rliga and nest it in the font somewhere else.

Stuart Sandler · December 2020

@Aaron Bell great insight, thank you! This is not associated with that project but a private client we're doing some research for.

The problem isn't creating the substitutions but rather attempting to compile the font and not get an overflow error.

Again, even though it's only 413 ligatures, I'm not sure how to specifically build this into the font as a look-up from liga or rliga and nest it in the font somewhere else.

Aaron Bell · December 2020

@Stuart Sandler Maybe you just need to break it up into smaller lookups? Cascadia Code has 945 lines of code in the `calt` table and we don't get overflow issues.

Simon Cozens · December 2020

fontTools has much better support for juggling lookups around to avoid overflows than makeotf does. You may have more luck converting your Glyphs file to OTF with fontmake, rather than getting Glyphs to do the export.

Ray Larabie · December 2020

Make sure you have an exception for chardonnay as I've seen that one trip up swear filters.

Nick Shinn · December 2020

We need a LOL button for you, Ray!

Johannes Neumeier · December 2020

If you'd opt for using fontmake you could use fontmake to extract UFOs from your glyphs sources, write a (e.g. python/bash) script to add all substitutions (or better even write those substitutions from your blacklist of words) in bulk to the feature file—it is a plaintext file in UFO—and then use fontmake to compile the OTFs.

Also make sure that longer substitution sequences come before shorter ones that partially match, otherwise the short one matching first will end the run. I'm not sure if each word should be its own lookup, so that all parts of combinations of blacklisted words get replaced, otherwise the first match will end the run, no?

All in all, this is just so the wrong place to implement this and a rather naive request on part of the client.

Georg Seifert · December 2020

You don’t need to do any of the ufo stuff. If fontMake should be able to compile the fea, you could add the code inside the .glyphs file (manually or per script) and use glyphsLib+fontMake to compile the font.

Stuart Sandler · December 2020

I was already there gang! I actually got Font Make installed and attempted to compile the Glyphs file but it kept crapping out. In part it may have been the commands I was using for it to generate an OTF file since I'm not as familiar with Terminal as I'd like to be.

I'm using > fontmake file.glyphs --output otf

The error is signspace is required

Thomas Phinney · December 2020

Are you sure the error isn’t about “designspace” rather than “signspace”?

Simon Cozens · December 2020

fontmake -g file.glyphs

(Yes, this interface is terrible.)

Oliver Weiss (Walden Font Co.) · December 2020

While I respect the effort this requires, and know nothing about the client’s requirements, this seems like a sisyphean task. You’ll never be able to catch all the words you would want to, while also avoiding false positives. And unless this is to be used for English only, how would you address other languages? I believe the old adage goes “any given word in any language means something dirty in at least one other language”. Like others have said, the structure of a font is not designed to perform this type of task; it’s better handled a level up, where you can invoke dictionaries, etc.

Stuart Sandler · December 2020

And as expected, still unable to compile the font LOL!

Regardless, I'm happy to look the fool if it means we've taken this quest as far as we can and have determined it's indeed a fools errand which appears to be the case.

To be sure, at this point we have abandoned the effort due to an inability to compile as many variations as we deemed necessary into the ligature feature.

It's one thing to restrict lowercase words, another to restrict a combination of upper and lowercase words and entirely another to also consider the substitution of an $ for an S or a v for a u and with that many substitutions to consider the amount of bloat this creates makes the font simply unable to compile.

I imagine in a future version of OpenType, there may be some method that allows the embedding of a giant list of look-ups that can be packed into the font file itself without causing any compiling or speed of rendering issues for the end user. Today in 2020, not so much.

Thanks to all for their expertise and assistance!

Oliver Weiss (Walden Font Co.) · December 2020

Personally, I don't believe a font should have the level of awareness you're hoping for. Imagine a world where you have to choose typefaces not only by their design merits, but also by what text they will or won't let you set? Imagine a Comic Sans that won't let you set "big words"? Or an Apple or Microsoft font that adds trademark symbols regardless of context, and won't abide their competitor's name? Never mind the possibility to politicize type, or otherwise make it carry an agenda. To me, this is a very bad idea.

Simon Cozens · December 2020

Please note that what I’m about to suggest is purely out of perverse technical interest. As others have mentioned, even if the concept could work, which I doubt, it probably shouldn’t work.

So it sounds like you think the problem is tractable with lowercase letters but gets into overflows with uppercase and substitutions. Let’s see if there’s a clever way to get around that:

Create some empty glyphs called wascap, wasdollar, wasv etc.

Set their category to nonspacing mark.

Before your censorship lookup, use a one-to-many rule to substitute A with a wascap, B with b wascap, dollar with s wasdollar and so on.

Now comes the clever bit: in your censorship lookup, add “lookupFlag IgnoreMarks”. The was... glyphs will now be skipped over and you have reduced the character set of your search space to characters you care about, while still maintaining the information required to reconstruct the glyph string again afterwards.

Finally, add a lookup which reverses the first one, turning a wascap into A, s wasdollar into dollar and so on.

I am sure this is probably a very useful technique for some class of problem, even if this particular font doesn’t get off the ground.

Wolf Böse · January 2021

I am imagining the font replacing potty mouth with other more charming words.
sunny, glorious, sweet, and so on… however I also agree with @Craig Eliason

Simon Cozens · January 2021

A font which replaces swearing with innocuous words is still a misguided censorship approach.

On the other hand, a font which replaces innocuous words with swearing would be a design concept.

PabloImpallari · January 2021

413 are not many ligatures. Hindy fonts have much more and they work just fine.
So by entering the normal list as the standard ligatures should work. Fo example
Sub f l by f_l
Sub f i by f_i

My humble guess is that the overflow error comes from other place.
Maybe it is related to the letter order in each word of your list of ligatures.
I mean, maybe there is a long word that already includes one of the shorter words.
For example you may have words like "rat" and another word like "rattatuile". You get the idea? And maybe the ligature interpreterr gets confussed and explodes in the overflow error.
Make shure you detect those cases, and if you have those, they sould be in a particular order. I dont remember now if its the shorter or the longer word that must be first on the list.
You will need a script to check for that.
Hope it helps.
Im courious to see what other people think. Can this be the source of the overflow error, or I m just on too much chardonnay?

Sami Artur Mandelbaum · January 2021

Hi Stuart, there is a font similar You want: https://www.thepolitetype.com/#try-it
This font has 2000 lookups and can change 1800 bad words.
I made some tests and much probably this font was compiled with Microsoft VOLT.
Usually my complex fonts compilling in 2 or 3 seconds.
This fonts took 5 minutes to compile.
Take a look.
Sami

Stuart Sandler · January 2021

Thanks @PabloImpallari and @Sami Artur Mandelbaum but the primary issue wasn't the ability to replace a few bad words when they were typically spelled in all lowercase using the liga feature, but rather try to get those 1800 bad words to substitute when they are set in MiXeD case or when the letter 'u' is substituted for the letter 'v' or the letter 's' is substituted with a '5' or a '$' which leads to cases where the feature simply cannot compile because it's trying to take into account all the variant forms and simply can't.

Your solutions are welcomed even though the project ended up simply abandoning this approach.

Ray Larabie · January 2021

Could you make a class for each letter? Like @s contains S, s, 5, $ @a contains A, a, 4, Delta, Lambda, Alpha etc. Then sub @a @s @s for b u t t ...or whatever. Digits might be a problem becuase someone's phone number containing 455 might end up with butt in the middle.

Bulk Ligature Adding

Comments

Categories