Fingerprinting Phishers
Over the past couple of months, the number of phishing attacks targeting my client's customers has increased tremendously. They began to ask me: "why us?"
I haven't answered that question yet, there are still a number of theories, and very little evidence to sort. But I have made some progress in addressing the "who is attacking us?" question.
First, there is the bait-message. This is the email that is sent out with the hopes of finding appropriate targets. Each of these can be investigated as a spam campaign. They have their spam relays, they have their target list, they have their subset of subject messages, and they may or may not have a permutation of body.
I think it's possible that the people managing the spam campaign are separate from those managing the actual phishing attack. It's possible that separate phishing groups could employ a single spamming outfit. That's just a theory at the moment.
Secondly, there is the hook-site. This is where the link in the bait-message initially takes the victim. The hook-site may also be the collection-site, but it could forward the victim on to a separate collection server. This technique is especially common in cases where a phisher has a network of collection sites.
Use of network of sites, is an identifying quality of a phisher. I argue that given a set of phishing attacks, one can partition them to identify certain habits or modus operandi of the criminal actor. This actor may be an individual or a group.
There are two main ways that I use to build these partitions or clusters. You can compare how the hook-site or collection-site is built. By collecting copies of the phishing sites during your investigation and keeping them on hand, an investigator can go back and identify "repeat offenders." By comparing the fake website, to the target-firm's original site, you can examine any changes that the criminal applied. You could also approximately date when the site was copied?if you have a suitable change-control process on your web content.
Clusters and habits can also be detected in the URL used for the hook-site. How the criminal compromises, purchases, or otherwise acquires the hosting space can be evident in this URL. Are they creating suspiciously long domain names (implying they control the DNS,) or are they using doted directories in an attempt to hide the space from visual detection? Are the sites hosted off of cgi-bin space, or in directories of a BBS application? All of these qualities can be used to cluster a number of attacks into a smaller set of attackers.
Clustering along where a hook- or collection- site is hosted can sometimes illuminate a pattern; I did not find this to be the case in this population of URLs. I did find some interesting correspondences in the registrar used for some of the domains. This appeared to be indicative of an issue in the registrar's validation policies.
In an attempt to automate the detection and classification, I wrote some routines that calculate the "lexical distances" between the URLs used in the attacks. Then we built clusters based on arbitrary thresholds on these distances to see if the system was any better at classifying similar attacks than they humans. Needless to say, the trained human analyst will outperform my pathetic Perl script any day of the week, but they did find it helpful. Which is what it's all about.
Sadly, identifying clusters and forming a behavioral fingerprint of a criminal is a long way from identifying said criminal.
kliston -AT- isc sans org
I haven't answered that question yet, there are still a number of theories, and very little evidence to sort. But I have made some progress in addressing the "who is attacking us?" question.
First, there is the bait-message. This is the email that is sent out with the hopes of finding appropriate targets. Each of these can be investigated as a spam campaign. They have their spam relays, they have their target list, they have their subset of subject messages, and they may or may not have a permutation of body.
I think it's possible that the people managing the spam campaign are separate from those managing the actual phishing attack. It's possible that separate phishing groups could employ a single spamming outfit. That's just a theory at the moment.
Secondly, there is the hook-site. This is where the link in the bait-message initially takes the victim. The hook-site may also be the collection-site, but it could forward the victim on to a separate collection server. This technique is especially common in cases where a phisher has a network of collection sites.
Use of network of sites, is an identifying quality of a phisher. I argue that given a set of phishing attacks, one can partition them to identify certain habits or modus operandi of the criminal actor. This actor may be an individual or a group.
There are two main ways that I use to build these partitions or clusters. You can compare how the hook-site or collection-site is built. By collecting copies of the phishing sites during your investigation and keeping them on hand, an investigator can go back and identify "repeat offenders." By comparing the fake website, to the target-firm's original site, you can examine any changes that the criminal applied. You could also approximately date when the site was copied?if you have a suitable change-control process on your web content.
Clusters and habits can also be detected in the URL used for the hook-site. How the criminal compromises, purchases, or otherwise acquires the hosting space can be evident in this URL. Are they creating suspiciously long domain names (implying they control the DNS,) or are they using doted directories in an attempt to hide the space from visual detection? Are the sites hosted off of cgi-bin space, or in directories of a BBS application? All of these qualities can be used to cluster a number of attacks into a smaller set of attackers.
Clustering along where a hook- or collection- site is hosted can sometimes illuminate a pattern; I did not find this to be the case in this population of URLs. I did find some interesting correspondences in the registrar used for some of the domains. This appeared to be indicative of an issue in the registrar's validation policies.
In an attempt to automate the detection and classification, I wrote some routines that calculate the "lexical distances" between the URLs used in the attacks. Then we built clusters based on arbitrary thresholds on these distances to see if the system was any better at classifying similar attacks than they humans. Needless to say, the trained human analyst will outperform my pathetic Perl script any day of the week, but they did find it helpful. Which is what it's all about.
Sadly, identifying clusters and forming a behavioral fingerprint of a criminal is a long way from identifying said criminal.
kliston -AT- isc sans org
Keywords:
0 comment(s)
×
Diary Archives
Comments