Posted by: realsecurity | September 4, 2008

Analyzing a malicious pdf – Troj/PDFJs-A

I picked up a copy of a malicious pdf a week or so ago that was trying to infected a workstation. Lets crack it open and see what’s inside.

Virus Total

MD5: bccb814a5bcba72be31cdaf4e8805a7b
Filename: pdf.pdf

Simply running the file command on the pdf returns the following: pdf.pdf: PDF document, version 1.4

Running strings on pdf.pdf returns a few interesting pieces of information:

/Creator (Scribus — The application used to publish the PDF
/Producer (Scribus PDF Library
/CreationDate (D:20080815213135) — Creation date of document
/ModDate (D:20080815213135)
/Filter /FlateDecode — a method for compressing the pdf
/JavaScript — self explanatory, the pdf seems to have javascript in it

Knowing that the pdf is compressed, we can uncompress it with pdftk using the following:

pdftk pdf.pdf output pdf.output uncompress

With the file uncompressed, running strings on it again will yield some additional data:

function kgvy(zrb){var mpgs="";for(zviz=0;zviz<zrb.length;zviz+=2){mpgs+=(String.fromCharCode(parseInt(zrb.substr(zviz,2),16)));}eval(mpgs);}[truncated]

This is some obfuscated javascript that should be easy to make into readable text using spidermonkey.

Change the eval method to print and execute the script with SM. This will execute the javascript and print out the script in it’s readable format.

function ooyS1YUR()
var jKts_E9h = 0x0c0c0c0c;
var i0a7eJNL = unescape(%u4343%u4343%u0feb%u335b%u66c9%u80b9%u8001%uef33 +

The code above contains the beginnings of some shell code which is recognizable by %u????. Further down in the javascript there is a function called Collab.collectEmailInfo. Some quick google searches will tell us that there is an exploit that takes advantage of this. See CVE-2007-5659.

Now that we have the shellcode we can very easily find out what it’s doing. First we must get rid of all the concatenating characters ” + “. Using a find and replace function in a text editor is an easy way to accomplishing this.

With the shellcode in a readable format, send it to shellcode 2 exe. We now have a .exe as output.

Simply running strings on the new exe shows the URL of the next stage malware:


Since I’m the curious type, I’d like to know how this second stage malware is downloaded by the shellcode. Strings only showed 1 dll being referenced by the shellcode, this is very strange. More dlls are required to download and execute this file from

Running the shellcode in olly makes this quite easy, simply stepping through the program we encounter a series of calls to locations in memory that are dynamically populated. This is done to evade AV detection as AV vendors look for suspicious API calls to flag a file as suspicious with heuristics. Since these APIs are called at runtime, the malware is much more stealthy.

For brevity I have only included the particularly interesting stack contents.

Now things are much clearer, the shellcode will download the file using URLDownloadToFile from urlmon.dll, execute it with WinExec from kernel32.dll and then probably delete itself using DeleteFile.

For more pdf decoding techniques, check out this article by Maarten Van Horenbeeck of the SANS Storm Center.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: