The Week to Top All Weeks
I recently experienced the week to top all weeks. I work for a small ISP in the US. A couple of weeks ago we started to have unusual problems with our border router. Our network guys began investigating what appeared to be a hardware issue. The router was checked and nothing obvious was found. Continuing investigation showed that the problem was perhaps a memory overflow problem so all of the settings on device were checked and OS versions were checked as well. Nothing obvious was found. A call was made to the hardware vendor and diagnostics were done and settings were checked. All looked fine. When the problem continued and it began to affect our entire network, (all of the servers, all of the routers) a deeper investigation was started. We setup Wireshark captures to attempt to determine what was causing our network issues. To our surprise and dismay we discovered that the problem was originating from one of our internal (linux) web servers. We were seeing a huge number of packets coming from the server which was basically causing an "internal DOS". This server is a webhosting server for about 14 different companies. (Luckily the server is scheduled for decommissioning so only 14 companies our of 145 remained). We discovered that we had one customer who had a web page with a security hole in their PHP coding. We closed the hole and monitored the server closely all night and into the next day (no sleep for me that night). All looked good and we believed that we had stopped the culprits cold.
Unfortunately that was not the case. Mid afernoon the next day we started seeing a return of DOS type activity. We again started up the sniffer and some additional tools on the server. We sat and watched the traffic and captured the IP address that the activity was bound for. We immediately iptabled the IP address on the server and on the firewall. We blocked all traffic to and from the IP. Continuing to monitor the server we discovered that about 5 minutes later the activity started again. We again rebooted the server and blocked the new IP on both server and router. We continued to do this thinking that eventually they would give up. All the while we were scanning the files on the server, log files and config files to see if we could pin point the exact source. They obviously knew that we were on to them and they were attempting to win the battle of will and wits. Things got worse. The final straw was when the problem IP was now 127.0.0.1. Luckily at the same time as the IP address change took place we discovered that we had one particular directory that appeared to be running a script file of some type. We immediately changed the permission on that directory to 000 so that nothing from that directory could be executed, that stopped the attack.
Now to figure out what the culprit was. We continued digging through the files for this particular site and discovered that this customer had put up a shopping cart using Zen Cart. The version of Zen Cart that they had was vulnerable to Zen Cart Remote Code Execution Exploit and yes indeed they were compromised. We changed permission on the entire directory structure for the domain to 000 and notified the customer of the fact that their site was down and that their site had been exploited. I explained to the customer that the exploit was installed in the same directory as the PayPal transactions from his site. He explained that he uses Zen Cart because it is easy to use and he (with no knowledge of how it works) is able to update the cart by himself without having to pay a web designer. He said that he had received an email about the ZenCart vulnerability but he didn't understand what that meant so he just forgot about it. (I don't think that will happen again).
I spent the rest of the week researching the exploit, going through logs, the other web sites on the server and OS/config files checking to see if there was anything else impacted by this exploit. Luckily the exploit only affected the one site and all of the rest of the sites remain secure and intact. I am diligent about security of our network and servers. I review log files everyday for all of the servers under my umbrella. I believe it is because of this diligence that the damage was limited to just one web site.
I am now in the process of trying to determine what applications on the other hosted domains may cause us issues. (We have multiple servers and a few hundred domains) We simply host the domains, we do no design work. We have in the past made an incorrect assumption that our customers are using due diligence in designing their web sites (and you all know what happens when you ASSUME anything). The actual event took about 30 hours from discovery to resolution. The investigation and paper work took another 4 days. It was a crueling 5+ days but I have come through it with the confidence that we were successful in shutting the culprits down. The lesson learned for us is that we need to set some ground rules for the companies that host with us. We need to monitor network traffic realtime much closer than we have in the past. And we need to make sure that the things that our customers are doing doesn't have an adverse effect on us again.
The good news is things have returned to "normal". The investigation is complete. The Incident report completed and now I am playing catchup with the things that didn't get done in the 5+ days. Once I get caught up on the other things I will be working on mitigation steps to prevent this type of incident from happening again.
Deb Hale Long Lines, LLC
Comments
jjames
Dec 13th 2010
1 decade ago
djenkins
Dec 13th 2010
1 decade ago
jdimpson
Dec 13th 2010
1 decade ago
Feeding that data into something like rrdtool would give you throughput graphs for each user's outgoing traffic, and that would have immediately shown to you which user account this DOS traffic was coming from. Or a more restrictive iptables ruleset could have prevented it altogether.
I'd also use rrdtool to graph total traffic in/out of every server, maybe sourcing the data from a network switch's per-port counters via SNMP, or maybe obtaining it from the servers' network interface stats (eg. from ifconfig, or from SNMP again). Then any abnormalities should be apparent and more easily pinpointed.
iptables also has a '--log-uid-owner' for its '-j LOG' target which can be useful, especially when debugging/testing a ruleset like this, or when you need to identify the UNIX user ID generating certain traffic. But make sure to rate-limit any logging, such as with '-m limit --limit 1/sec' so you don't flood the log.
Steven Chamberlain
Dec 13th 2010
1 decade ago
Ken
Dec 13th 2010
1 decade ago
m2
Dec 14th 2010
1 decade ago
amilroy
Dec 14th 2010
1 decade ago
knight77
Dec 14th 2010
1 decade ago