Attackers can also abuse the HTTP standard. The GET
method is intended for requesting information, and the POST
method is intended for sending information. Since it’s intended for requests, the GET
method provides a limited amount of space for data (typically around 2KB). Spyware regularly includes instructions on what it wants to collect in the URI path or query of an HTTP GET
, rather than in the body of the message. Similarly, in a piece of malware observed by the authors, all information from the infected host was embedded in the User-Agent fields of multiple HTTP GET
requests. The following two GET
requests show what the malware produced to send back a command prompt followed by a directory listing:
58
, which is its ASCII decimal representation. This is the raw static data that is invaluable to signature creation.Each of the initial 4 random bytes can ultimately be translated into a decimal number of 0 through 255. The regular expression ([1-9]|1[0-9]|2[0-5]) {0,1}[0-9]
covers the number range 0 through 259, and the {4}
indicates four copies of that pattern. Recall that the square brackets ([
and ]
) contain the symbols, and the curly brackets ({
and }
) contain a number that indicates the shows an IDA Pro graph of a sample parsing routine that looks for a Comment field in a web page. The design is typical of a custom parsing function, which is often used in malware instead of something like a regular expression library. Custom parsing routines are generally organized as a cascading pattern of tests for the initial characters. Each small test block will have one line cascading to the next block, and another line going to a failure block, which contains the option to loop back to the start.
The line forming the upper loop on the left of shows that the current line failed the test and the next line will be tried. This sample function has a double cascade and loop structure, and the second cascade looks for the characters that close the Comment field. The individual blocks in the cascade show the characters that the function is seeking. In this case, those characters are <!--
in the first loop and -->
in the second. In the block between the cascades, there is a function call that tests the contents that come after the <!--
. Thus, the command will be processed only if the contents in the middle match the internal function and both sides of the comment enclosure are intact.
Table 14-7. Sample Malware Commands
Command example | Base64 translation | Operation |
---|---|---|
|
| Sleep for 1 hour |
|
| Sleep for 24 hours |
|
| Sleep for 1 minute |
|
| Download and execute a binary on the local system |
|
| Use a custom protocol to establish a reverse shell |
One approach to creating signatures for this backdoor is to target the full set of commands known to be used by the malware (including the surrounding context). Content expressions for the five commands recognized by the malware would contain the following strings:
<!-- adsrv?bG9uZ3NsZWVw --> <!-- adsrv?c3VwZXJsb25nc2xlZXA= --> <!-- adsrv?c2hvcnRzbGVlcA== --> <!-- adsrv?cnVu <!-- adsrv?Y29ubmVj
The last two expressions target only the static part of the commands (run
and connect
), and since the length of the argument is not known, they do not target the trailing comment characters (-->
).
While signatures that use all of these elements will likely find this precise piece of malware, there is a risk of being too specific at the expense of robustness. If the attacker changes any part of the malware—the command set, the encoding, or the command prefix—a very precise signature will cease to be effective.
Previously, we saw that different parts of the command interpretation were in different parts of the code. Given that knowledge, we can create different signatures to target the various elements separately.
The three elements that appear to be in distinct functions are comment bracketing, the fixed adsrv?
with a Base64 expression following, and the actual command parsing. Based on these three elements, a set of signature elements could include the following (for brevity, only the primary elements of each signature are included, with each line representing a different signature).
pcre:"/<!-- adsrv\?([a-zA-Z0-9+\/=]{4})+ -->/" content:"<!-- "; content:"bG9uZ3NsZWVw -->"; within:100; content:"<!-- "; content:"c3VwZXJsb25nc2xlZXA= -->"; within:100; content:"<!-- "; content:"c2hvcnRzbGVlcA== -->"; within:100; content:"<!-- "; content:"cnVu";within:100;content: "-->"; within:100; content:"<!-- "; content:"Y29ubmVj"; within:100; content:"-->"; within:100;
These signatures target the three different elements that make up a command being sent to the malware. All include the comment bracketing. The first signature targets the command prefix adsrv?
followed by a generic Base64-encoded command. The rest of the signatures target a known Base64-encoded command without any dependency on a command prefix.
Since we know the parsing occurs in a separate section of the code, it makes sense to target it independently. If the attacker changes one part of the code or the other, our signatures will still detect the unchanged part.
Note that we are still making assumptions. The new signatures may be more prone to false positives. We are also assuming that the attacker will most likely continue to use comment bracketing, since comment bracketing is a part of regular web communications and is unlikely to be considered suspicious. Nevertheless, this strategy provides more robust coverage than our initial attempt and is more likely to detect future variants of the malware.
Let’s revisit the signature we created earlier for beacon traffic. Recall that we combined every possible element into the same signature:
alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (msg:"TROJAN Malicious Beacon "; content:"User-Agent: Mozilla/4.0 (compatible\; MSIE 7.0\; Windows NT 5.1)"; content:"Accept: * / *"; uricontent:"58"; content:!"|0d0a|referer:"; nocase; pcre:"/GET \/([12]{0,1}[0-9]{1,2}){4}58[0-9]{6,9}58(4[89]|5[0-7]|9[789]|10 [012]){8} HTTP/"; classtype:trojan-activity; sid:2000002; rev:1;)
This signature has a limited scope and would become useless if the attacker made any changes to the malware. A way to address different elements individually and avoid rapid obsolescence is with these two targets:
Target 1: User-Agent string, Accept
string, no referrer
Target 2: Specific URI, no referrer
This strategy would yield two signatures:
alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (msg:"TROJAN Malicious Beacon UA with Accept Anomaly"; content:"User-Agent: Mozilla/4.0 (compatible\; MSIE 7.0\; Windows NT 5.1)"; content:"Accept: * / *"; content:!"|0d0a|referer:"; nocase; classtype:trojan-activity; sid:2000004; rev:1;) alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (msg:"TROJAN Malicious Beacon URI"; uricontent:"58"; content:!"|0d0a|referer:"; nocase; pcre: "/GET \/([12]{0,1}[0-9]{1,2}){4}58[0-9]{6,9}58(4[89]|5[0-7]|9[789]|10[012]){8} HTTP/"; classtype:trojan-activity; sid:2000005; rev:1;)