Performance
See also
Contents
RCU
Read-copy update (RCU) is a synchronization mechanism that allows reads to occur concurrently with updates. This is in contrast to conventional locking primitives that ensure mutual exclusion to marked critical sections. RCU has zero overhead to readers.
profiling and tracing
- DTrace: DTrace is a performance analysis tool originally for Solris, but ported to Linux and BSD. Features a scrpting language called D.
- perf: Linux profiling with performance counters
- SystemTap: SystemTap is a system for instrumenting live Linux kernels and user-space processes.
- LLTNG Linux Trace Toolkit Next Generation]: includes a kernel tracer, userspace tracer, and LLTV (LLT Viewer).
- ktap: a script-based dynamic tracing tool similar to Systemtap and Dtrace. KTap does not depend upon GCC, debug symbols, modified kernel or kernel modules. It is suitabl for embedded development. It supports x86, Arm, PPC, and MIPS
- oprofile: a system-wide profiler for Linux using Linux Kernel Performance Events Subsystem based on CPU hardware performance counters.
- [ftrace]: Kernel function tracer built into Linux. Interfaces to the debugfs (mount -t debugfs nodev /sys/kernel/debug). Requires kernel be compiled with ftrace support (see your kernel config file -- usually cat /boot/config-$(uname -r) or if build with ikconfig then use scripts/extract-ikconfig to extract a config from a kernel image or use zcat /proc/config.gz to extract the config from a live kernel).
- pytimechart: a GUI viewer for kernel traces.
- GDB: not a profiler per se, but the venerable GNU debugger, which allows you to see inside a program as it executes or after it crashes (core dumps).
Dogtail automated GUI testing
dstat
`dstat` is one of the more valuable tools for monitoring system performance. The output columns can be easily customized.
The default options are -cdngy. The following are options I commonly use. Many other are described in the manpage.
-c --cpu system, user, idle, wait, hardware interrupt, software interrupt -d --disk disk read, write -f --full full listing when using certain options (--cpu, --int, --disk, --net, --swap) -g --page page in, out -i --int interrupts (see also --full option, --I option, and review /proc/interrupts) -l --load load average -m --mem memory used, buffers, cache, free -n --net network receive, send -r --io I/O read, write -s --swap swap used, free -y --sys system interrupts, context switches --vm vm hard pagefaults, soft pagefaults, allocated, free
`dstat` also has many Python plugins stored in /usr/share/dstat/.
Some statistics require the lm-sensors package. Run `sensors-detect` after installing.
tools
apttitude -q -y install iozone3 stress cpuburn sysstat iotop smem powertop hardinfo hddtemp \ dbench sysbench phoronix-test-suite iperf netperf netperfmeter \ google-perftools \ stressapptest \ ceph-test \ memtester \ posixtest \ fio lmbench
- MM Tests
- MM Tests. MMTests is a configurable test suite that runs a number of common workloads of interest to MM developers.
- LPT
- Linux Test Project This is a test suite composed of various third-party tests. This test suite is not available as a package for Ubuntu. It may be downloaded as a source.
- Autotest
- Fully Automated Testing Under Linux. This is primarily for testing the Linux kernel.
mpstat
The mpstat command displays statistics about the CPU queue and the activity of each CPU.
# mpstat -P ALL 1 3 Linux 3.8.0-35-generic (vmh-dev-9) 2014-03-26 _x86_64_ (4 CPU) 17:02:00 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 17:02:01 all 0.26 0.00 0.00 0.00 0.00 0.00 0.77 0.00 98.97 17:02:01 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 17:02:01 1 1.04 0.00 0.00 0.00 0.00 0.00 1.04 0.00 97.92 17:02:01 2 0.00 0.00 0.00 0.00 0.00 0.00 1.03 0.00 98.97 17:02:01 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
sysbench
File IO testing.
mkdir sysbench-testrun.0 cd sysbench-testrun.0 # Prepare 16 files, each 1GB in size. sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw --init-rnd --num-threads=16 prepare sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw --init-rnd --num-threads=16 run sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw --init-rnd --num-threads=16 cleanup
CPU performance testing.
sysbench --test=cpu --cpu-max-prime=20000 run
CPU thread testing.
sysbench --test=threads --num-threads=64 --thread-yields=100 --thread-locks=2 run
Mutex testing.
sysbench --test=mutex --mutex-locks=100000 --num-threads=1024 --memory-oper=read run sysbench --test=mutex --memory-oper=write --mutex-locks=100000 --num-threads=1024 run sysbench --test=mutex --mutex-locks=100000 --num-threads=1024 run sysbench --test=mutex --memory-oper=read sysbench --test=mutex --memory-oper=write
- OLTP (database)
sysbench --test=mutex
not used so much
The fio tool does not have a lot of documentation, but it looks interesting. The homepage is just a git repository: fio. Under Ubuntu install the fio package. For documentation see /usr/share/doc/fio/, especially /usr/share/doc/fio/examples/.
lmbench is ancient (its homepage is nearly 20 years old!), but it still works.
mibench It is a small suite of benchmarks used to test various tasks that might be of interest to embedded systems. This hasn't been touched in over a decade. At least a few of the launch scripts expect the current working directory to be in the PATH.
SPLASH-2 for testing shared address space memory systems. Sounds like multi-threaded or clustered computing test tools.
sysbench
Sysbench benchmarks are broken down into three steps: prepare, run, and cleanup. The prepare step will create sample data files for subsequent stages. The files will be named in the form test_file.NN where NN is an integer starting with 0.
sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw --num-threads=16 prepare sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw --num-threads=16 run sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw --num-threads=16 cleanup
drive IO testing and performance measurement
Basic read and write speed testing. The options used below tests for IOPS, not bytes/sec. These options also favor sequential streaming of large blocks of data.
iozone -a -s 1048576 -g 1G -i 0 -i 1 -O
Bonnie++ tests will work OK with the defaults. You do have to set the user. Note that root is not normally recommended. Note that bonnie is the same as bonnie++ (bonnie is a sym-link to bonnie++). The output of bonnie++ is stupidly difficult to read. There is no way to fix this. It also dumps out CSV data, which is even harder to read without a spreadsheet.
bonnie -u root:root
drive stress testing
The stress command generates pure loads. It does not attempt to measure how the system handles this. You can combine this with other tools to get performance measurements.
This generates stress on /dev/sda. While this is running you may want to run "iostat 1 300 /dev/sda" in a different window.
stress --hdd 10 /dev/sda
CPU stress and burn
Install the Ubuntu package cpuburn. For each CPU core your system has run one instance of `burnP6` (for Intel P6 processors). Monitor the CPU usage and system load using `htop` or the tool of your choice. Monitor the temperature using `sensors` or some ACPI tool.
burnP6 & burnP6 & burnP6 & burnP6 & watch -n1 sensors killall burnP6
System Platform Testing
Autotest is a fully automated test suite designed to test the entire Linux platform. It is based on a large collection third-party testing tools such as dbench, iozone, stress, sysbench, and lots more.