Proof of concept: Geheimsprache

Want to put this up for discussion: http://yanone.de/typedesign/code/geheimsprache/
Any useful?

Comments

  • it's a fun idea! if you're interested in crypto and want to learn about how to break your own implementation, i suggest you look at the matasano crypto challenges.

    i suspect the value in geheimsprache is primarily in obfuscating email addresses and telephone numbers so they won't be crawled but can be viewed by regular folks. the challenge is that the 'attacker' has access to the rebuilt font (and the table), but even if they didn't and only had access to the ciphertext, you're still victim to the predictability of the language you're communicating in. sign up for the challenge, you'll be surprised how fun this is. seriously!

    i suspect you're right that most crawlers won't go to the effort, but your friends might groan when they copy and paste and try to send an email to ҸÏ×╥╡ĪңЬOõңOÏaO╇ĪĪΞÏңЬụ.

    i think if i were publishing a blog that contained sensitive topics or a diary i wanted my friends to be able to read, but that i wanted to be uncrawlable, i might go in for this, but i might also be irritated a few years down the road if the only copy is an Internet Archive post of gibberish when my server goes belly up. who knows. i could easily see publishers thinking this was a great idea for stymying instapaper-like products until the support emails came in.
  • yanone
    yanone Posts: 130
    Or simply a way to create better looking captchas.
  • yanone
    yanone Posts: 130
    What my brother just told me is making the idea look really stupid:

    If you print that page to a PDF, you can select and copy/paste that text and it will copy as human readable text, not the transmitted garbage.
    Tried in Acrobat Reader and Mac’s Preview.app.
    Do they have OCR built in?
  • That's awesome, probably a much simpler explanation than OCR. Just a guess but I'd assume that pdfs are being encoded with something like the cid table or the glyph names rather than unicode values.
  • PabloImpallari
    PabloImpallari Posts: 806
    edited March 2014
    Mmmmm... interesting...... let's see....
    - In the scrambled webfont, the Unicode is shuffled (but the Glphynames remains correct).
    - In the generated Pdf, the subsetted/embedded font is correctly encoded again.
    How can that happen?
    I'm guessing that the soft in charge of creating the Pdf it's using the Glphynames to re-encode the auto-generated subsetted/embedded font that gets inside the Pdf.
    I doubt it has anything to do with OCR.

    If you try shuffling both, the Unicode AND the Glphynames, the Pdf generating algorithm should not be able to re-encode the font... I guess....
  • Yes, scramble the glyph names as well.