Some of you might have seen tweets as I was working on the problem starting from this one. But this post is a clearer explanation of all that I did because 140 characters (even multiple tweets) is not sufficient to convey the message.
How I realized there was a problem
Yesterday, I wrote a blog post and it did not get saved onto CouchDB immediately. I thought the problem was with the load on my server because of traffic (I should have realized this wasn't the reason because my blog has around 20 visitors per day).
Figuring something wrong, not seeing the post even a couple of hours later, I tried 'ssh'ing into my machine and observed that it was very slow and the CPU was getting thrashed (pegged at 100%) and the memory consumption at 70%.
But due to a personal commitment, I did not try to troubleshoot the problem then and didn't even suspect anything was wrong! My initial feeling was that it was probably caused by memory leaks in apache. How wrong I was!
Today morning I had forgotten about the problem until I saw a mail from my hosting service, Gandi warning me to take action against a botnet hosted on my VPS. I was shocked! Had this been a mistake or was I looking at a phished mail. Then I recollected yesterday's problem and thought probably they were right.
How I fixed it
I had a tough time logging in today also, but after I logged in the first thing I did was to stop the network. With that the load on the machine reduced dramatically. But the memory usage was still around 70%. Then I shut down nginx, apache http server, postfix and nagios one by one while noting if any of these were the culprits. The memory reduction was visible but there was still something that was running (or sleeping).
Next, I performed a 'top' to check all processes running and noticed that nearly 100 binary instances of 'Xploit' was running. I killed them but they respawned within a second. I also noticed another binary called 'pscan2' running that was scanning a lot of ports on the network. So now I could confirm that my VPS was botted. But I wondered how the bots got in, especially when I was upgrading packages promptly and running a firewall. More on the actual break-in later in the post.
Then I ran 'netstat -plant' to check all open tcp ports and saw tens of connections to port 22 (the reason for slowness in ssh while trying to login) with the status SYN_WAIT (because I had shut down network access). Then I 'lsof'ed to see which application was opening so many connections to port 22 and saw something like this.
/home2/server/ /a/Xploit ....
(Note the 'space' in the pathname. It was intentional on the botnet author's part). Neat! So the botnet had found itself into the home directory of the apache process. I just copied the botnet directory over to my local directory. Then I deleted it on the VPS. I also cleaned /tmp/ and /var/tmp/ to be on the safe side and rebooted my VPS.
While I was rebooting I started digging through the botnet code (yes, quite a lot of C, Perl and Bash source code) on my local box and learnt that the main objectives of this botnet were:
- To bruteforce SSH logins on random IPs.
- To make the botted machine an open relay to send spam messages.
It looked like a fairly complicated botnet because it had atleast more than a couple of objectives. The password dictionary being used for bruteforcing was very simple and something like below:
Some of the funny ones were like:
After breaking in, the zombie computer would announce itself on the darknet IRC network. My machine had announced itself on the tampa.fl.us.undernet.org IRC server and was being commandeered from there.
How the machine was broken into?
Ok, now coming to the point on how the machine was compromised. Nearly a couple of months back, I was getting lots of spam comments on my site and I was fairly sure they were manual (because breaking a recaptcha) is not that easy. But I was curious as to how this was happening. I also noticed that there were lots of port scans around port 25 (SMTP) then. So I created a dummy user and ran a trivial site with apache running with that dummy user's privileges and observed that most comments posted were coming from a single country and within a few seconds of each other. But the user agent seemed to suggest that it was a valid browser (probably a headless browser, who knows?). I had collected a lot of data about this and suddenly one fine day, the network scans and the spam comments vanished (check symantec for an approximate timeline).
I got busy later as I was relocating and leaving my job and forgot to delete the dummy user with the stupidly simple password (yes, one among the list that I provided) and the rest as they say is history....
As a corrective action, I have changed all passwords and this time, they are very difficult to guess and not based on any dictionary word. I have also configured PAM to prompt me for a change in password every month and reject common dictionary words.
I had read zillions of articles on common passwords like this (and a lot that I can't remember) and have laughed my head off on the stupidity of people but I turned out to be as stupid as them! Very valuable lesson learnt...one that I can never forget in my life. I would recommend you too follow good password management techniques.
PS: If anyone is interested in having the botnet code and logs, please mail me and I shall mail them over to you after I am convinced that you have no malafide intentions.