I thought that for my first meaningful post, I should share some *NIX knowledge with you guys (however many there may be). Some of these tools are new for me too.
The problem that I think most people find with learning/using more *NIX tools is not that there information isn’t there (man pages are your friend), but the information is sometimes out of context. I’ll show you some quick little *NIX commands that I used to troubleshoot a few things for a server running multiple Rails applications.
I was recently having problems on a server with processes which were taking too long to finish and were essentially hogging up resources on my machine. One of which is an ugly mess of a process that does a bulk import of a 40K+ record CSV file (arghh…3rd party vendors!) to a relational database (Postgresql). And the other is a daily incremental system backup run by my hosting provider (Rackspace).
So what can one do to investigate the health of their disk(s)? Well standard *NIX tool iostat goes a long way:
iostat will then spit out some pretty nice metrics on IO and CPU usage for all of your disks:
Note: The output of these tools/commands might vary in style and ordering of columns on different *NIX distributions. The output noted in this post is a RedHat Enterprise Linux (RHEL) machine .
avg-cpu: %user %nice %system %iowait %steal %idle
5.07 0.59 13.49 7.42 0.00 73.44
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 55.04 49.13 241.79 499975863 2460564420
sda1 0.00 0.04 0.00 439390 10130
sda2 0.24 3.34 1.69 34011331 17192522
sda3 0.73 13.17 12.33 133976082 125519952
sda4 0.00 0.00 0.00 18 0
sda5 54.06 32.58 227.76 331548234 2317841816
iostat basically gives you a status/summary of read, writes, and usage of the disks on your server. You can also use iostat 1 which essentially tails iostat so you can observe how those stats are changing. Adding the 1 argument may seem cumbersome, but it comes in handy for seeing how the stats change (if at all) as certain processes are killed. That was a gem for me.
You might want to see how IO was on your server today. Well for that you can use this:
sar gives you averages on IO and CPU usage in 10 minute intervals.
12:00:01 AM CPU %user %nice %system %iowait %steal %idle
12:10:01 AM all 11.90 0.00 1.98 8.67 0.00 77.45
12:20:01 AM all 0.89 0.00 0.27 0.85 0.00 97.99
02:10:01 PM all 4.08 0.00 1.00 0.36 0.00 94.56
Average: all 4.21 0.01 9.68 22.81 0.00 63.29
This came in handy for me since the processes I was investigating were created by scheduled cron jobs that run daily at a low traffic time (usually early in the morning for me).
Lastly, I wanted to make sure that one process or the other was not paused by the system while it was trying to allocate resources properly. If you want to make sure that a process with a specific Process ID (PID) is not paused, you can always send it a continue signal like so:
kill -n SIGCONT
Up until recently, I’d only used the kill tool to well…kill processes! But to my delight, and hopefully yours to, kill can take this neat little SIGCONT signal and send a continue signal to a process that might have been paused intentionally or unintentionally.
So that’s it for now. I hope you guys will find these tools useful. And remember, if you’re ever need to know more about any of these tools just check the man pages (i.e. man kill). Till next time, Happy *NIXing