Load Average

From Noah.org
Revision as of 23:27, 5 January 2009 by Root (talk | contribs)
Jump to navigationJump to search


What does Load Average from `uptime` mean?

On a single user system that is not doing much you should usually see the load average below 0.10. Anything below 1.0 is OK. If your Load Average hovers around 2.0 all day then you need another CPU or one twice as fast. That's for a single core -- you have to divide the Load Average by the number of cores in your system. If you have two dual-core CPUs (4 cores) then a Load Average of 4.0 is the upper range of "OK". As a rule of thumb, anything below 1.0 per core is barely OK -- you should keep some breathing room. If my Load Average gets over 0.50 for most of the day then I start to plan for extra capacity.

Load Average, as given by `uptime`, is the number of processes that are are running or waiting to run in the last minute. The single CPU can only do one thing at a time, so you can think of Load Average as the number of processes that are bumping into each other to get a slice of the CPU. If the load goes above 1.0 then that means that some processes are forced to wait before they can get a turn. Another way to think about it is the number of CPUs you would need to handle the current load. If your Load Average goes to 2.0 then this means the kernel could have handled all requests without forcing anyone to wait if your CPU was twice as fast, or the kernel could have handled all requests without waiting if you had two CPUs. This is a rule of thumb. Load is actually much more complex and differs in different UNIX systems.

Load is not the same as CPU usage. If you have a single process using 99% of the CPU cycles your load will still hover around 1.0. To see a combination of CPU usage and Load Average run `top` or `procinfo`. You might have hundreds of processes open, but most of them should be idle (waiting for something to do or waking up from time to time to do something). Every shell window you have open and even the web browser you are using to read this are not doing much of anything. On my system a `ps ax | wc -l` shows that I have 150 processes open, but my Load Average is 0.13 and my CPU Idle time is 95%, so despite the number of processes my CPU is not working hard at all. For an example of this, run `top` and note all the processes open. Now press 'i'. This hides all the idle processes. You should see only one or two processes now and `top` will likely be one of them.

One easy way to get a feel for CPU usage is to look at the %idle of the CPU. That's the relative amount of time the CPU has spent doing nothing. That that value and reverse it to see how much time the CPU has spent doing something useful. Figuring out what "useful" is can be a lot more complex. Linux breaks that time into user, nice, system kernel, iowait, irq, softirq, and virtual processor steal.

mpstat

mpstat -P ALL 4

Update much lower than 1 second might not be as accurate. It's better to use as long an averaging period as practical to allow for noise. The activity of mpstat itself will raise the results in the intr/s column.

procstat

This tool is a little old. See mpstat.