Difference between revisions of "Performance"
m (→dstat) |
m (→clear cache) |
||
(37 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
[[Category:Engineering]] | [[Category:Engineering]] | ||
+ | [[Category:Performance]] | ||
See also | See also | ||
* [[Disk_Performance_Tuning]] | * [[Disk_Performance_Tuning]] | ||
+ | |||
+ | == RCU== | ||
+ | |||
+ | Read-copy update (RCU) is a synchronization mechanism that allows reads to occur concurrently with updates. This is in contrast to conventional locking primitives that ensure mutual exclusion to marked critical sections. RCU has zero overhead to readers. | ||
+ | |||
+ | == profiling and tracing == | ||
+ | |||
+ | *[https://perf.wiki.kernel.org/index.php/Main_Page perf]: Linux profiling with performance counters. Sometimes called '''Perf Events'''. | ||
+ | *[ftrace]: Kernel function tracer built into Linux which uses '''Perf Events'''. Interfaces to the '''debugfs''' ('''mount -t debugfs nodev /sys/kernel/debug'''). Requires kernel be compiled with '''ftrace''' support (see your kernel config file -- usually '''cat /boot/config-$(uname -r)''' or if build with '''ikconfig''' then use '''scripts/extract-ikconfig''' to extract a config from a kernel image or use '''zcat /proc/config.gz''' to extract the config from a live kernel). | ||
+ | *[http://www.linux-kvm.org/page/Perf_events KVM Perf Events]: Perf tracing in the KVM module. | ||
+ | *[https://sourceware.org/systemtap/wiki SystemTap]: SystemTap is a system for instrumenting live Linux kernels and user-space processes. | ||
+ | *[http://lttng.org/ LLTNG] Linux Trace Toolkit Next Generation]: includes a kernel tracer, userspace tracer, and LLTV (LLT Viewer). | ||
+ | *[http://dtrace.org DTrace]: DTrace is a performance analysis tool originally for Solris, but ported to Linux and BSD. Features a scripting language called '''D'''. | ||
+ | *[http://www.ktap.org/ ktap]: a script-based dynamic tracing tool similar to Systemtap and Dtrace. '''KTap''' does not depend upon GCC, debug symbols, modified kernel or kernel modules. It is suitabl for embedded development. It supports x86, Arm, PPC, and MIPS | ||
+ | *[http://oprofile.sourceforge.net/news/ oprofile]: a system-wide profiler for Linux using '''Linux Kernel Performance Events Subsystem''' based on CPU hardware performance counters. | ||
+ | *[https://github.com/tardyp/pytimechart pytimechart]: a GUI viewer for kernel traces. | ||
+ | *[http://www.bootchart.org/ BootChart]: a tool for performance analysis and visualization of the GNU/Linux boot process. | ||
+ | *[https://git.kernel.org/cgit/linux/kernel/git/axboe/blktrace.git/tree/README blktrace]: Block IO kernel subsystem tracer. http://smackerelofopinion.blogspot.com/2009/10/block-io-layer-tracing-using-blktrace.html | ||
+ | *strace, ltrace, latrace: The classic system call, library call, application library call tracers. http://ltrace.org/ http://people.redhat.com/jolsa/latrace/index.shtml | ||
+ | *[https://sourceware.org/gdb/wiki/HomePage GDB]: not a profiler per se, but the venerable GNU debugger, which allows you to see inside a program as it executes or after it crashes ('''core dumps'''). | ||
+ | |||
+ | === other tools that I'm not sure where they belong === | ||
+ | *[http://www.latencytop.org/ latencytop]: a tool for identifying latency. | ||
+ | *[https://github.com/poelzi/ulatencyd ulatencyd]: a daemon to minimize latency on a linux system using cgroups | ||
+ | *mtrace: Malloc Trace memory debugger from the GNU C library. | ||
+ | *gprof: GNU Profiler from GNU | ||
+ | *gcov: GNU code coverage from the GNU C Compiler project. | ||
+ | |||
+ | == file this somewhere (KVM Perf Events) == | ||
+ | <pre> | ||
+ | Often you want event counts after running a benchmark: | ||
+ | |||
+ | $ sudo mount -t debugfs none /sys/kernel/debug | ||
+ | $ sudo ./perf stat -e 'kvm:*' -a sleep 1h | ||
+ | ^C | ||
+ | Performance counter stats for 'sleep 1h': | ||
+ | |||
+ | 8330 kvm:kvm_entry # 0.000 M/sec | ||
+ | 0 kvm:kvm_hypercall # 0.000 M/sec | ||
+ | 4060 kvm:kvm_pio # 0.000 M/sec | ||
+ | 0 kvm:kvm_cpuid # 0.000 M/sec | ||
+ | 2681 kvm:kvm_apic # 0.000 M/sec | ||
+ | 8343 kvm:kvm_exit # 0.000 M/sec | ||
+ | 737 kvm:kvm_inj_virq # 0.000 M/sec | ||
+ | 0 kvm:kvm_page_fault # 0.000 M/sec | ||
+ | 0 kvm:kvm_msr # 0.000 M/sec | ||
+ | 664 kvm:kvm_cr # 0.000 M/sec | ||
+ | 872 kvm:kvm_pic_set_irq # 0.000 M/sec | ||
+ | 0 kvm:kvm_apic_ipi # 0.000 M/sec | ||
+ | 738 kvm:kvm_apic_accept_irq # 0.000 M/sec | ||
+ | 874 kvm:kvm_set_irq # 0.000 M/sec | ||
+ | 874 kvm:kvm_ioapic_set_irq # 0.000 M/sec | ||
+ | 0 kvm:kvm_msi_set_irq # 0.000 M/sec | ||
+ | 433 kvm:kvm_ack_irq # 0.000 M/sec | ||
+ | 2685 kvm:kvm_mmio # 0.000 M/sec | ||
+ | |||
+ | 3.493562100 seconds time elapsed | ||
+ | |||
+ | The perf tool is part of the Linux kernel tree in tools/perf. | ||
+ | </pre> | ||
+ | |||
+ | == Dogtail automated GUI testing == | ||
== dstat == | == dstat == | ||
`dstat` is one of the more valuable tools for monitoring system performance. The output columns can be easily customized. | `dstat` is one of the more valuable tools for monitoring system performance. The output columns can be easily customized. | ||
+ | <pre> | ||
+ | dstat -cdngypilmr --vm | ||
+ | </pre> | ||
+ | Show each CPU separately: | ||
+ | <pre> | ||
+ | dstat -fcdngypilmr --vm | ||
+ | </pre> | ||
The default options are '''-cdngy'''. The following are options I commonly use. Many other are described in the manpage. | The default options are '''-cdngy'''. The following are options I commonly use. Many other are described in the manpage. | ||
Line 31: | Line 101: | ||
<pre> | <pre> | ||
− | apttitude -q -y install iozone3 stress cpuburn sysstat iotop hddtemp | + | apttitude -q -y install iozone3 stress cpuburn sysstat iotop smem powertop hardinfo hddtemp \ |
+ | dbench sysbench phoronix-test-suite iperf netperf netperfmeter \ | ||
+ | google-perftools \ | ||
+ | stressapptest \ | ||
+ | ceph-test \ | ||
+ | memtester \ | ||
+ | posixtest \ | ||
+ | fio lmbench | ||
+ | </pre> | ||
+ | |||
+ | ;MM Tests: [http://www.csn.ul.ie/~mel/projects/mmtests/ MM Tests]. MMTests is a configurable test suite that runs a number of common workloads of interest to MM developers. | ||
+ | ;LPT: [http://ltp.sourceforge.net Linux Test Project] This is a test suite composed of various third-party tests. This test suite is not available as a package for Ubuntu. It may be downloaded as a source. | ||
+ | ;Autotest: [http://autotest.github.io Fully Automated Testing Under Linux]. This is primarily for testing the Linux kernel. | ||
+ | |||
+ | === mpstat === | ||
+ | |||
+ | The '''mpstat''' command displays statistics about the CPU queue and the activity of each CPU. | ||
+ | <pre> | ||
+ | # mpstat -P ALL 1 3 | ||
+ | Linux 3.8.0-35-generic (vmh-dev-9) 2014-03-26 _x86_64_ (4 CPU) | ||
+ | |||
+ | 17:02:00 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle | ||
+ | 17:02:01 all 0.26 0.00 0.00 0.00 0.00 0.00 0.77 0.00 98.97 | ||
+ | 17:02:01 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 | ||
+ | 17:02:01 1 1.04 0.00 0.00 0.00 0.00 0.00 1.04 0.00 97.92 | ||
+ | 17:02:01 2 0.00 0.00 0.00 0.00 0.00 0.00 1.03 0.00 98.97 | ||
+ | 17:02:01 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 | ||
+ | </pre> | ||
+ | |||
+ | === sysbench === | ||
+ | |||
+ | File IO testing. | ||
+ | <pre> | ||
+ | mkdir sysbench-testrun.0 | ||
+ | cd sysbench-testrun.0 | ||
+ | # Prepare 16 files, each 1GB in size. | ||
+ | sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw --init-rnd --num-threads=16 prepare | ||
+ | sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw --init-rnd --num-threads=16 run | ||
+ | sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw --init-rnd --num-threads=16 cleanup | ||
+ | </pre> | ||
+ | |||
+ | CPU performance testing. | ||
+ | <pre> | ||
+ | sysbench --test=cpu --cpu-max-prime=20000 run | ||
+ | </pre> | ||
+ | |||
+ | CPU thread testing. | ||
+ | <pre> | ||
+ | sysbench --test=threads --num-threads=64 --thread-yields=100 --thread-locks=2 run | ||
+ | </pre> | ||
+ | |||
+ | Mutex testing. | ||
+ | <pre> | ||
+ | sysbench --test=mutex --mutex-locks=100000 --num-threads=1024 --memory-oper=read run | ||
+ | sysbench --test=mutex --memory-oper=write --mutex-locks=100000 --num-threads=1024 run | ||
+ | sysbench --test=mutex --mutex-locks=100000 --num-threads=1024 run | ||
+ | sysbench --test=mutex --memory-oper=read | ||
+ | sysbench --test=mutex --memory-oper=write | ||
+ | </pre> | ||
+ | |||
+ | # OLTP (database) | ||
+ | <pre> | ||
+ | sysbench --test=mutex | ||
+ | </pre> | ||
+ | |||
+ | === not used so much === | ||
+ | |||
+ | The '''fio''' tool does not have a lot of documentation, but it looks interesting. The homepage is just a git repository: [http://git.kernel.dk/?p=fio.git;a=tree fio]. Under Ubuntu install the '''fio''' package. For documentation see '''/usr/share/doc/fio/''', especially '''/usr/share/doc/fio/examples/'''. | ||
+ | |||
+ | '''lmbench''' is ancient (its homepage is nearly 20 years old!), but it still works. | ||
+ | |||
+ | [http://www.eecs.umich.edu/mibench/ mibench] It is a small suite of benchmarks used to test various tasks that might be of interest to embedded systems. This hasn't been touched in over a decade. At least a few of the launch scripts expect the current working directory to be in the PATH. | ||
+ | |||
+ | '''SPLASH-2''' for testing shared address space memory systems. Sounds like multi-threaded or clustered computing test tools. | ||
+ | |||
+ | == sysbench == | ||
+ | |||
+ | Sysbench benchmarks are broken down into three steps: prepare, run, and cleanup. The '''prepare''' step will create sample data files for subsequent stages. The files will be named in the form '''test_file.NN''' where '''NN''' is an integer starting with '''0'''. | ||
+ | <pre> | ||
+ | sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw --num-threads=16 prepare | ||
+ | sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw --num-threads=16 run | ||
+ | sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw --num-threads=16 cleanup | ||
</pre> | </pre> | ||
Line 66: | Line 217: | ||
killall burnP6 | killall burnP6 | ||
</pre> | </pre> | ||
+ | |||
+ | == System Platform Testing == | ||
+ | |||
+ | [https://github.com/autotest/autotest Autotest] is a fully automated test suite designed to test the entire Linux platform. It is based on a large collection third-party testing tools such as dbench, iozone, stress, sysbench, and lots more. | ||
+ | |||
+ | == clear cache == | ||
+ | |||
+ | (flush cache, dump cache, drop cache, drop caches) | ||
+ | |||
+ | Linux kernel caches may make certain performance tests difficult to interpret. When doing performance testing you want to start with the same clear state each time to take into account the performance penalty of populating a cache the first time. Luckily Linux provides a way to free caches. The following commands will clear the given kernel caches. | ||
+ | ;pagecache: echo 1 > /proc/sys/vm/drop_caches | ||
+ | ;slab and objects (includes dentries and inodes): echo 2 > /proc/sys/vm/drop_caches | ||
+ | ;all caches -- slab, objects, and pagecache: echo 3 > /proc/sys/vm/drop_caches | ||
+ | |||
+ | See also: https://www.kernel.org/doc/Documentation/sysctl/vm.txt |
Latest revision as of 14:47, 29 May 2014
See also
Contents
RCU
Read-copy update (RCU) is a synchronization mechanism that allows reads to occur concurrently with updates. This is in contrast to conventional locking primitives that ensure mutual exclusion to marked critical sections. RCU has zero overhead to readers.
profiling and tracing
- perf: Linux profiling with performance counters. Sometimes called Perf Events.
- [ftrace]: Kernel function tracer built into Linux which uses Perf Events. Interfaces to the debugfs (mount -t debugfs nodev /sys/kernel/debug). Requires kernel be compiled with ftrace support (see your kernel config file -- usually cat /boot/config-$(uname -r) or if build with ikconfig then use scripts/extract-ikconfig to extract a config from a kernel image or use zcat /proc/config.gz to extract the config from a live kernel).
- KVM Perf Events: Perf tracing in the KVM module.
- SystemTap: SystemTap is a system for instrumenting live Linux kernels and user-space processes.
- LLTNG Linux Trace Toolkit Next Generation]: includes a kernel tracer, userspace tracer, and LLTV (LLT Viewer).
- DTrace: DTrace is a performance analysis tool originally for Solris, but ported to Linux and BSD. Features a scripting language called D.
- ktap: a script-based dynamic tracing tool similar to Systemtap and Dtrace. KTap does not depend upon GCC, debug symbols, modified kernel or kernel modules. It is suitabl for embedded development. It supports x86, Arm, PPC, and MIPS
- oprofile: a system-wide profiler for Linux using Linux Kernel Performance Events Subsystem based on CPU hardware performance counters.
- pytimechart: a GUI viewer for kernel traces.
- BootChart: a tool for performance analysis and visualization of the GNU/Linux boot process.
- blktrace: Block IO kernel subsystem tracer. http://smackerelofopinion.blogspot.com/2009/10/block-io-layer-tracing-using-blktrace.html
- strace, ltrace, latrace: The classic system call, library call, application library call tracers. http://ltrace.org/ http://people.redhat.com/jolsa/latrace/index.shtml
- GDB: not a profiler per se, but the venerable GNU debugger, which allows you to see inside a program as it executes or after it crashes (core dumps).
other tools that I'm not sure where they belong
- latencytop: a tool for identifying latency.
- ulatencyd: a daemon to minimize latency on a linux system using cgroups
- mtrace: Malloc Trace memory debugger from the GNU C library.
- gprof: GNU Profiler from GNU
- gcov: GNU code coverage from the GNU C Compiler project.
file this somewhere (KVM Perf Events)
Often you want event counts after running a benchmark: $ sudo mount -t debugfs none /sys/kernel/debug $ sudo ./perf stat -e 'kvm:*' -a sleep 1h ^C Performance counter stats for 'sleep 1h': 8330 kvm:kvm_entry # 0.000 M/sec 0 kvm:kvm_hypercall # 0.000 M/sec 4060 kvm:kvm_pio # 0.000 M/sec 0 kvm:kvm_cpuid # 0.000 M/sec 2681 kvm:kvm_apic # 0.000 M/sec 8343 kvm:kvm_exit # 0.000 M/sec 737 kvm:kvm_inj_virq # 0.000 M/sec 0 kvm:kvm_page_fault # 0.000 M/sec 0 kvm:kvm_msr # 0.000 M/sec 664 kvm:kvm_cr # 0.000 M/sec 872 kvm:kvm_pic_set_irq # 0.000 M/sec 0 kvm:kvm_apic_ipi # 0.000 M/sec 738 kvm:kvm_apic_accept_irq # 0.000 M/sec 874 kvm:kvm_set_irq # 0.000 M/sec 874 kvm:kvm_ioapic_set_irq # 0.000 M/sec 0 kvm:kvm_msi_set_irq # 0.000 M/sec 433 kvm:kvm_ack_irq # 0.000 M/sec 2685 kvm:kvm_mmio # 0.000 M/sec 3.493562100 seconds time elapsed The perf tool is part of the Linux kernel tree in tools/perf.
Dogtail automated GUI testing
dstat
`dstat` is one of the more valuable tools for monitoring system performance. The output columns can be easily customized.
dstat -cdngypilmr --vm
Show each CPU separately:
dstat -fcdngypilmr --vm
The default options are -cdngy. The following are options I commonly use. Many other are described in the manpage.
-c --cpu system, user, idle, wait, hardware interrupt, software interrupt -d --disk disk read, write -f --full full listing when using certain options (--cpu, --int, --disk, --net, --swap) -g --page page in, out -i --int interrupts (see also --full option, --I option, and review /proc/interrupts) -l --load load average -m --mem memory used, buffers, cache, free -n --net network receive, send -r --io I/O read, write -s --swap swap used, free -y --sys system interrupts, context switches --vm vm hard pagefaults, soft pagefaults, allocated, free
`dstat` also has many Python plugins stored in /usr/share/dstat/.
Some statistics require the lm-sensors package. Run `sensors-detect` after installing.
tools
apttitude -q -y install iozone3 stress cpuburn sysstat iotop smem powertop hardinfo hddtemp \ dbench sysbench phoronix-test-suite iperf netperf netperfmeter \ google-perftools \ stressapptest \ ceph-test \ memtester \ posixtest \ fio lmbench
- MM Tests
- MM Tests. MMTests is a configurable test suite that runs a number of common workloads of interest to MM developers.
- LPT
- Linux Test Project This is a test suite composed of various third-party tests. This test suite is not available as a package for Ubuntu. It may be downloaded as a source.
- Autotest
- Fully Automated Testing Under Linux. This is primarily for testing the Linux kernel.
mpstat
The mpstat command displays statistics about the CPU queue and the activity of each CPU.
# mpstat -P ALL 1 3 Linux 3.8.0-35-generic (vmh-dev-9) 2014-03-26 _x86_64_ (4 CPU) 17:02:00 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 17:02:01 all 0.26 0.00 0.00 0.00 0.00 0.00 0.77 0.00 98.97 17:02:01 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 17:02:01 1 1.04 0.00 0.00 0.00 0.00 0.00 1.04 0.00 97.92 17:02:01 2 0.00 0.00 0.00 0.00 0.00 0.00 1.03 0.00 98.97 17:02:01 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
sysbench
File IO testing.
mkdir sysbench-testrun.0 cd sysbench-testrun.0 # Prepare 16 files, each 1GB in size. sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw --init-rnd --num-threads=16 prepare sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw --init-rnd --num-threads=16 run sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw --init-rnd --num-threads=16 cleanup
CPU performance testing.
sysbench --test=cpu --cpu-max-prime=20000 run
CPU thread testing.
sysbench --test=threads --num-threads=64 --thread-yields=100 --thread-locks=2 run
Mutex testing.
sysbench --test=mutex --mutex-locks=100000 --num-threads=1024 --memory-oper=read run sysbench --test=mutex --memory-oper=write --mutex-locks=100000 --num-threads=1024 run sysbench --test=mutex --mutex-locks=100000 --num-threads=1024 run sysbench --test=mutex --memory-oper=read sysbench --test=mutex --memory-oper=write
- OLTP (database)
sysbench --test=mutex
not used so much
The fio tool does not have a lot of documentation, but it looks interesting. The homepage is just a git repository: fio. Under Ubuntu install the fio package. For documentation see /usr/share/doc/fio/, especially /usr/share/doc/fio/examples/.
lmbench is ancient (its homepage is nearly 20 years old!), but it still works.
mibench It is a small suite of benchmarks used to test various tasks that might be of interest to embedded systems. This hasn't been touched in over a decade. At least a few of the launch scripts expect the current working directory to be in the PATH.
SPLASH-2 for testing shared address space memory systems. Sounds like multi-threaded or clustered computing test tools.
sysbench
Sysbench benchmarks are broken down into three steps: prepare, run, and cleanup. The prepare step will create sample data files for subsequent stages. The files will be named in the form test_file.NN where NN is an integer starting with 0.
sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw --num-threads=16 prepare sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw --num-threads=16 run sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw --num-threads=16 cleanup
drive IO testing and performance measurement
Basic read and write speed testing. The options used below tests for IOPS, not bytes/sec. These options also favor sequential streaming of large blocks of data.
iozone -a -s 1048576 -g 1G -i 0 -i 1 -O
Bonnie++ tests will work OK with the defaults. You do have to set the user. Note that root is not normally recommended. Note that bonnie is the same as bonnie++ (bonnie is a sym-link to bonnie++). The output of bonnie++ is stupidly difficult to read. There is no way to fix this. It also dumps out CSV data, which is even harder to read without a spreadsheet.
bonnie -u root:root
drive stress testing
The stress command generates pure loads. It does not attempt to measure how the system handles this. You can combine this with other tools to get performance measurements.
This generates stress on /dev/sda. While this is running you may want to run "iostat 1 300 /dev/sda" in a different window.
stress --hdd 10 /dev/sda
CPU stress and burn
Install the Ubuntu package cpuburn. For each CPU core your system has run one instance of `burnP6` (for Intel P6 processors). Monitor the CPU usage and system load using `htop` or the tool of your choice. Monitor the temperature using `sensors` or some ACPI tool.
burnP6 & burnP6 & burnP6 & burnP6 & watch -n1 sensors killall burnP6
System Platform Testing
Autotest is a fully automated test suite designed to test the entire Linux platform. It is based on a large collection third-party testing tools such as dbench, iozone, stress, sysbench, and lots more.
clear cache
(flush cache, dump cache, drop cache, drop caches)
Linux kernel caches may make certain performance tests difficult to interpret. When doing performance testing you want to start with the same clear state each time to take into account the performance penalty of populating a cache the first time. Luckily Linux provides a way to free caches. The following commands will clear the given kernel caches.
- pagecache
- echo 1 > /proc/sys/vm/drop_caches
- slab and objects (includes dentries and inodes)
- echo 2 > /proc/sys/vm/drop_caches
- all caches -- slab, objects, and pagecache
- echo 3 > /proc/sys/vm/drop_caches
See also: https://www.kernel.org/doc/Documentation/sysctl/vm.txt