RSA Admin

Identifying the country of origin for a malware PE executable

Blog Post created by RSA Admin Employee on Aug 22, 2012

REPOST - ORIGINALLY POSTED NOVEMBER 25, 2010

 

Have you ever wondered how people writing reports about malware can say where the malware was likely developed?

 

Sometimes you get totally lucky and log files created by the malware will help answer the question. Given the following line from a log:

11/16/2009 6:41:48 PM –>  Hook instalate lsass.exe

 

We can use Google Translate’s “language detect” feature to help up determine the language used:

44698

Of course, it’s not often we get THAT lucky!

 

A more interesting method is the examination of certain structures known as the Resource Directory within the executable file itself. For the purpose of this post, I will not be describing the Resource Directory structure. It’s a complicated beast, making it a topic I will save for later posts that actually warrant and/or require a low-level understanding of it. Suffice it to say, the Resource Directory is where embedded resources like bitmaps (used in GUI graphics), file icons, etc. are stored. The structure is frequently compared to the layout of files on a file system, although I think it’s insulting to file systems to say such a thing. For those more graphically inclined, I took the following image from http://www.devsource.com/images/stories/PEFigure2.jpg. 44697

For the sake of example, here’s some images showing you just a few of the resources embedded inside of notepad.exe: (using CFF Explorer from: http://www.ntcore.com/exsuite.php)

44696

44695

44694

Now it’s important to note that an executable may have only a few or even zero resources – especially in the case of malware. Consider the following example showing a recent piece of malware with only a single resource called “BINARY.” 44693

Moving on, let’s look at another piece of malware… Below, we see this piece of malware has five resource directories.

44692

We could pick any of the five for this analysis, but I’ll pick RCData – mostly because it’s typically an interesting directory to examine when reverse engineering malware. (This is because RCData defines a raw data resource for an application. Raw data resources permit the inclusion of any binary data directly in the executable file.) Under RCData, we see three separate entries:

44691

The first one to catch my eye is the one called IE_PLUGIN. I’ll show a screenshot of it below, but am saving the subject of executables embedded within executables for a MUCH more technical post in the near future (when it’s not 1:30 am and I actually feel like writing more!). 44690

Going back to the entry structure itself, the IE_PLUGIN entry will have at least one Directory Entry underneath it to describe the size(s) and offset(s) to the data contained within that resource. I have expanded it as shown next:

44689

And that’s where things get interesting – as it relates to answering the question at the start of this post anyways. Notice the ID: 1055. That’s our money shot for helping to determine what country this binary was compiled in. Or, more specifically, the default locale codepage of the computer used to compile this binary. Those ID’s have very legitimate uses, for example, you can have the same dialog in English, French and German localized forms. The system will choose the dialog to load based on the thread’s locale. However, when resources are added to the binary without explicitly setting them to different locale IDs, those resources will be assigned the default locale ID of the compiler’s computer.

 

So in the example above, what does 1055 mean?

 

It means this piece of malware likely was developed (or at least compiled in) Turkey.

 

How do we know that one resource wasn’t added with a custom ID? Because we see the same ID when looking at almost all the other resources in the file (anything with an ID of zero just means “use the default locale”):

44688

In this case, we are also lucky enough to have other strings in the binary (once unpacked) to help solidify the assertion this binary is from Turkey. One such string is “Aktif Pencere,” which Google’s Translation detection engine shows as: 44687

However, as you can see, this technique is very useful even when no strings are present – in logs or the binary itself.

 

So is this how the default binary locale identification works normally (eg: non-malware executable files)?

 

Not exactly. The above techniques are generally used with malware (if the malware even has exposed resources), but not generally with normal/legitimate binaries. Consider the following legitimate binary. What is the source locale for the following example?

44686

As you see in the green box, we have some cursor resources with the ID for the United States. (I’m including a lookup table at the bottom of this post.) In the orange box, there are additional cursor resources with the ID for Germany. In the red box is RCData, like we examined before, but all of these resources have the ID specifying the default language of the computer executing the application.

 

As it turns out, the normal value to examine is the ID for the Version Information Table resource (in the blue box). In the case above, it’s the Czech Republic. The Version Information Table contains the “metadata” you normally see depicted in locations like this:

44685

In the above screenshot, Windows is identifying the source/target local as English, and specifically, United States English (as opposed to UK English, Australian English, etc…). That information is not stored within the Version Information table, but rather is determined by the ID of the Version Information Table.

 

However, in malware, the Version Information table is almost always stripped or mangled, as is the case with our original example from earlier:

44684

Because of that, the earlier techniques are more applicable to malware.

 

Below, I’m including a table to help you translate Resource Entry IDs to locales (sorted by decimal ID number).

 

LocaleLanguageLCIDDecimalCodepage
Arabic – Saudi Arabiaarar-sa10251256
Bulgarianbgbg10261251
Catalancaca10271252
Chinese – Taiwanzhzh-tw1028
Czechcscs10291250
Danishdada10301252
German – Germanydede-de10311252
Greekelel10321253
English – United Statesenen-us10331252
Spanish – Spain (Traditional)eses-es10341252
Finnishfifi10351252
French – Francefrfr-fr10361252
Hebrewhehe10371255
Hungarianhuhu10381250
Icelandicisis10391252
Italian – Italyitit-it10401252
Japanesejaja1041
Koreankoko1042
Dutch – Netherlandsnlnl-nl10431252
Norwegian – Bokmlnbno-no10441252
Polishplpl10451250
Portuguese – Brazilptpt-br10461252
Raeto-Romancermrm1047
Romanian – Romaniaroro10481250
Russianruru10491251
Croatianhrhr10501250
Slovaksksk10511250
Albaniansqsq10521250
Swedish – Swedensvsv-se10531252
Thaithth1054
Turkishtrtr10551254
Urduurur10561256
Indonesianidid10571252
Ukrainianukuk10581251
Belarusianbebe10591251
Slovenianslsl10601250
Estonianetet10611257
Latvianlvlv10621257
Lithuanianltlt10631257
Tajiktgtg1064
Farsi – Persianfafa10651256
Vietnamesevivi10661258
Armenianhyhy1067
Azeri – Latinazaz-az10681254
Basqueeueu10691252
Sorbiansbsb1070
FYRO Macedoniamkmk10711251
Sesotho (Sutu)1072
Tsongatsts1073
Setsuanatntn1074
Venda1075
Xhosaxhxh1076
Zuluzuzu1077
Afrikaansafaf10781252
Georgianka1079
Faroesefofo10801252
Hindihihi1081
Maltesemtmt1082
Sami Lappish1083
Gaelic – Scotlandgdgd1084
Yiddishyiyi1085
Malay – Malaysiamsms-my10861252
Kazakhkkkk10871251
Kyrgyz – Cyrillic10881251
Swahiliswsw10891252
Turkmentktk1090
Uzbek – Latinuzuz-uz10911254
Tatartttt10921251
Bengali – Indiabnbn1093
Punjabipapa1094
Gujaratigugu1095
Oriyaoror1096
Tamiltata1097
Telugutete1098
Kannadaknkn1099
Malayalammlml1100
Assameseasas1101
Marathimrmr1102
Sanskritsasa1103
Mongolianmnmn11041251
Tibetanbobo1105
Welshcycy1106
Khmerkmkm1107
Laololo1108
Burmesemymy1109
Galiciangl11101252
Konkani1111
Manipuri1112
Sindhisdsd1113
Syriac1114
Sinhala; Sinhalesesisi1115
Amharicamam1118
Kashmiriksks1120
Nepalinene1121
Frisian – Netherlands1122
Filipino1124
Divehi; Dhivehi; Maldiviandvdv1125
Edo1126
Igbo – Nigeria1136
Guarani – Paraguaygngn1140
Latinlala1142
Somalisoso1143
Maorimimi1153
HID (Human Interface Device)1279
Arabic – Iraqarar-iq20491256
Chinese – Chinazhzh-cn2052
German – Switzerlanddede-ch20551252
English – Great Britainenen-gb20571252
Spanish – Mexicoeses-mx20581252
French – Belgiumfrfr-be20601252
Italian – Switzerlanditit-ch20641252
Dutch – Belgiumnlnl-be20671252
Norwegian – Nynorsknnno-no20681252
Portuguese – Portugalptpt-pt20701252
Romanian – Moldovaroro-mo2072
Russian – Moldovaruru-mo2073
Serbian – Latinsrsr-sp20741250
Swedish – Finlandsvsv-fi20771252
Azeri – Cyrillicazaz-az20921251
Gaelic – Irelandgdgd-ie2108
Malay – Bruneimsms-bn21101252
Uzbek – Cyrillicuzuz-uz21151251
Bengali – Bangladeshbnbn2117
Mongolianmnmn2128
Arabic – Egyptarar-eg30731256
Chinese – Hong Kong SARzhzh-hk3076
German – Austriadede-at30791252
English – Australiaenen-au30811252
French – Canadafrfr-ca30841252
Serbian – Cyrillicsrsr-sp30981251
Arabic – Libyaarar-ly40971256
Chinese – Singaporezhzh-sg4100
German – Luxembourgdede-lu41031252
English – Canadaenen-ca41051252
Spanish – Guatemalaeses-gt41061252
French – Switzerlandfrfr-ch41081252
Arabic – Algeriaarar-dz51211256
Chinese – Macau SARzhzh-mo5124
German – Liechtensteindede-li51271252
English – New Zealandenen-nz51291252
Spanish – Costa Ricaeses-cr51301252
French – Luxembourgfrfr-lu51321252
Bosnianbsbs5146
Arabic – Moroccoarar-ma61451256
English – Irelandenen-ie61531252
Spanish – Panamaeses-pa61541252
French – Monacofr61561252
Arabic – Tunisiaarar-tn71691256
English – Southern Africaenen-za71771252
Spanish – Dominican Republiceses-do71781252
French – West Indiesfr7180
Arabic – Omanarar-om81931256
English – Jamaicaenen-jm82011252
Spanish – Venezuelaeses-ve82021252
Arabic – Yemenarar-ye92171256
English – Caribbeanenen-cb92251252
Spanish – Colombiaeses-co92261252
French – Congofr9228
Arabic – Syriaarar-sy102411256
English – Belizeenen-bz102491252
Spanish – Perueses-pe102501252
French – Senegalfr10252
Arabic – Jordanarar-jo112651256
English – Trinidadenen-tt112731252
Spanish – Argentinaeses-ar112741252
French – Cameroonfr11276
Arabic – Lebanonarar-lb122891256
English – Zimbabween122971252
Spanish – Ecuadoreses-ec122981252
French – Cote d’Ivoirefr12300
Arabic – Kuwaitarar-kw133131256
English – Phillippinesenen-ph133211252
Spanish – Chileeses-cl133221252
French – Malifr13324
Arabic – United Arab Emiratesarar-ae143371256
Spanish – Uruguayeses-uy143461252
French – Moroccofr14348
Arabic – Bahrainarar-bh153611256
Spanish – Paraguayeses-py153701252
Arabic – Qatararar-qa163851256
English – Indiaenen-in16393
Spanish – Boliviaeses-bo163941252
Spanish – El Salvadoreses-sv174181252
Spanish – Honduraseses-hn184421252
Spanish – Nicaraguaeses-ni194661252
Spanish – Puerto Ricoeses-pr204901252

 

Gary Golomb

Outcomes