Eric Partington

Lua Parser for Punycode/IDN Homograph Attack

Blog Post created by Eric Partington Employee on Apr 24, 2017

Sean Lim has done awesome work to write a lua parser to detect potential IDN/Homograph attacks and has asked me to post this for him ...




In the past couple of days, you’ve probably read about the phishing attack that is “almost impossible to detect”.


Essentially, attackers are replacing ASCII characters in web domains with similar-looking Unicode characters for their phishing websites e.g. www.аррӏе.com which is in fact encoded to in Punycode.


I’ve attached a parser which decodes the Punycode-encoded domains, and flags out an alert when it spots a suspicious homograph (based on a predefined blacklist of Unicode characters). No guarantees at all on the efficiency or reliability, but it seems to work pretty well, and it is just a matter of increasing the blacklist size.


Writes into

risk.suspicious='possible idn homograph attack'


(you can change this in the parser by editing a few lines if you want to write into one of the new analysis.x keys for consistency)




Parser looks at the ratio of blacklisted to non-blacklisted Unicode characters, and fires when it exceeds the 0.75 threshold i.e. if it recognizes more than 75% of the Unicode characters as blacklisted ones, the alert will trigger.

·         Fixed a bug in the Punycode decoding which caused incorrect decoding of characters past the first Unicode codepoint

·         Added a few more homoglyphs


The parser is the higher fidelity method of attempting to detect these potential phishing attacks, a slight more brute force method would be using an application rule with existing packet meta.  Making the assumption that well known tld like com,org and net would be targeted for phishing and hostnames starting with 'xn--' we can create an application rule like the following:


name="possible idn homograph hostname" rule=" contains'xn--' && tld='com','org','net'" alert=analysis.session order=198 type=application

And to help close the loop a context menu item that allows you to right click on hostnames and TLD to see what the original domain might be to allow you to validate the potential impact.