Hashtag Steganography


Steganography (/ˌstɛɡəˈnɒɡrəfi/ is the practice of concealing a file, message, image, or video within another file, message, image, or video.

I recently saw someone tweeting the hashtag #ManchesُterDerby

Do you see an odd character in the middle? It's an Arabic Damma (U+064F) - a vowel character. Although it comes after the "s" in Manchester, it appears after the "t" because it is a Right-To-Left (RTL) character.

Yet, if you click on the hashtag with the extra character, you get through to the same page as if you had visited the regular #ManchesterDerby page.

Twitter website showing a hashtag.

Are there standards for hashtags?

In 2010, when Twitter was still in its infancy, I wrote a blog post on emerging Hashtag Standards.

Saying "these two strings of characters are equivalent" is a surprisingly hard problem in computer science. Is Café the same as cafe? In Twitter-land, they are identical. Case-sensitivity is ignored, and accents go through decomposition, and normalisation is applied to produce equivalence.

In over-simplified terms, all accents, diacritics, and modifiers are ignored. So #Ŕöméø is equivalent to #Romeo.

And, in this case "◌ُ  " , is ignored.

Steganography

The art of hiding secret messages in otherwise innocuous texts has a long history. Let's bring it into the modern day. Suppose we want to track whether someone has control of a Twitter account - we could ask them to tweet a seemingly innocent hashtag with invisible characters in it.

For example, using the diacritic "dot bẹlow" is likely to be unnoticed on a dirty screen. Did you spot it in the previous sentence?

Or, for absolute invisibility, use ͏ . Did you see that character? Nope! It is the combining grapheme joiner (U+034F). Completely invisible to the user, but it appears in the URL. For example #T͏esting links to twitter.com/hashtag/t%CD%8Festing

You can read Tom Ross's blog post about checking for information leakage using hidden characters to understand more of the theory.

Use in Twitter Marketing campaigns

If you visit the page of a hashtag with ignored character, something interesting happens. Hitting the "Tweet" button pre-fills your message with the hashtag. Not the normalised tag, but the one with hidden characters.

Try it now! Visit #Ŕöméø, you'll see all sorts of different #Romeo Tweets, but hit the Tweet button and see what happens.

A marketing campaign could give out identical looking hashtags to influencers - for example:

  • Alice #Campaig%CD%8Fn
  • Bob #Camp%CD%8Faign
  • Eve #C%CD%8Fa%CD%8Fm%CD%8Fp%CD%8Fa%CD%8Fi%CD%8Fgn

By seeing which of those subtly-different-but-semanticly-identical hashtags is used the most, it might be possible to see which influencer has the biggest reach.

Where next?

Hidden characters can be used for steganography in text and hashtags. You can use them to track who has copy-n-pasted specific versions of a text document, or who has clicked on a specific link to tweet out some information.

But is there anything more socially useful you can think to do with them?


Share this post on…

What are your reckons?