I run the exim smtp server at my vps, to accept external mail connections. When a large influx of mail, which are often small in size, come in, tens of exim processes would be started to handle them. Most of the time it is fine, but if the host IO system is also under heavy load, sometimes I would be mistaken as the resource abuser because of the number of exim process and the associated high load average, and got suspended.
I got my vps unsuspended every time, after I explained to the provider that I am not the abuser (Generally I use less than half of the memory I am entitled to, and per process cpu utilization is almost always 0.) Still, I want to prevent this from happening in the future, so I wrote this load control script in ksh.
As most performance problems in vps are the results of IO slowdown, my script uses dd to detect when that occurs. The sketch of my script is as follows.
T1=`date +%s`
dd if=/dev/urandom of=/home/lc/testio.file bs=4k count=10
T2=`date +%s`
(( DD_TIME = T2 - T1 ))
I write a file of 40KB in size, and record the time used in variable DD_TIME. If it takes more than 2 seconds to complete, the IO performance is obviously abysmal, and it would be better to stop the server process temporarily.
ps ax |
awk 'BEGIN { rc = 1; }
$1 == "'$PROC_PID'" { rc = 0; }
END { exit rc; }'
Variable PROC_PID holds the process id of the last server process, often retrieved from some PID file. If the server process is still running, the exit code of the code above is 0. Otherwise, the exit code will be 1.
if [ $? -eq 0 ]; then
if [ $DD_TIME -gt 1 ]; then
kill $PROC_PID
fi
else
if [ $DD_TIME -lt 2 ]; then
/usr/bin/ionice -c 2 -n 7 /usr/exim/bin/exim -bd
fi
fi
If the server is running, and IO speed is not good, I would kill the server process. If the server is not running, and IO speed has returned to normal (dd takes less than 2 seconds to write), I would start the server process, reducing its IO priority with ionice.
I execute the script every 3 minutes, so it would not pose too much IO load on the host node. Hope this script will be of some use to solve a common problem.