Difference between revisions of "Performance"

From Noah.org
Jump to navigationJump to search
 
(One intermediate revision by the same user not shown)
Line 223: Line 223:
  
 
== clear cache ==
 
== clear cache ==
 +
 +
(flush cache, dump cache, drop cache, drop caches)
  
 
Linux kernel caches may make certain performance tests difficult to interpret. When doing performance testing you want to start with the same clear state each time to take into account the performance penalty of populating a cache the first time. Luckily Linux provides a way to free caches. The following commands will clear the given kernel caches.
 
Linux kernel caches may make certain performance tests difficult to interpret. When doing performance testing you want to start with the same clear state each time to take into account the performance penalty of populating a cache the first time. Luckily Linux provides a way to free caches. The following commands will clear the given kernel caches.
 
;pagecache: echo 1 > /proc/sys/vm/drop_caches
 
;pagecache: echo 1 > /proc/sys/vm/drop_caches
;slab objects (includes dentries and inodes): echo 2 > /proc/sys/vm/drop_caches
+
;slab and objects (includes dentries and inodes): echo 2 > /proc/sys/vm/drop_caches
;slab objects and pagecache: echo 3 > /proc/sys/vm/drop_caches
+
;all caches -- slab, objects, and pagecache: echo 3 > /proc/sys/vm/drop_caches
  
See https://www.kernel.org/doc/Documentation/sysctl/vm.txt
+
See also: https://www.kernel.org/doc/Documentation/sysctl/vm.txt

Latest revision as of 14:47, 29 May 2014


See also

RCU

Read-copy update (RCU) is a synchronization mechanism that allows reads to occur concurrently with updates. This is in contrast to conventional locking primitives that ensure mutual exclusion to marked critical sections. RCU has zero overhead to readers.

profiling and tracing

  • perf: Linux profiling with performance counters. Sometimes called Perf Events.
  • [ftrace]: Kernel function tracer built into Linux which uses Perf Events. Interfaces to the debugfs (mount -t debugfs nodev /sys/kernel/debug). Requires kernel be compiled with ftrace support (see your kernel config file -- usually cat /boot/config-$(uname -r) or if build with ikconfig then use scripts/extract-ikconfig to extract a config from a kernel image or use zcat /proc/config.gz to extract the config from a live kernel).
  • KVM Perf Events: Perf tracing in the KVM module.
  • SystemTap: SystemTap is a system for instrumenting live Linux kernels and user-space processes.
  • LLTNG Linux Trace Toolkit Next Generation]: includes a kernel tracer, userspace tracer, and LLTV (LLT Viewer).
  • DTrace: DTrace is a performance analysis tool originally for Solris, but ported to Linux and BSD. Features a scripting language called D.
  • ktap: a script-based dynamic tracing tool similar to Systemtap and Dtrace. KTap does not depend upon GCC, debug symbols, modified kernel or kernel modules. It is suitabl for embedded development. It supports x86, Arm, PPC, and MIPS
  • oprofile: a system-wide profiler for Linux using Linux Kernel Performance Events Subsystem based on CPU hardware performance counters.
  • pytimechart: a GUI viewer for kernel traces.
  • BootChart: a tool for performance analysis and visualization of the GNU/Linux boot process.
  • blktrace: Block IO kernel subsystem tracer. http://smackerelofopinion.blogspot.com/2009/10/block-io-layer-tracing-using-blktrace.html
  • strace, ltrace, latrace: The classic system call, library call, application library call tracers. http://ltrace.org/ http://people.redhat.com/jolsa/latrace/index.shtml
  • GDB: not a profiler per se, but the venerable GNU debugger, which allows you to see inside a program as it executes or after it crashes (core dumps).

other tools that I'm not sure where they belong

  • latencytop: a tool for identifying latency.
  • ulatencyd: a daemon to minimize latency on a linux system using cgroups
  • mtrace: Malloc Trace memory debugger from the GNU C library.
  • gprof: GNU Profiler from GNU
  • gcov: GNU code coverage from the GNU C Compiler project.

file this somewhere (KVM Perf Events)

Often you want event counts after running a benchmark:

$ sudo mount -t debugfs none /sys/kernel/debug
$ sudo ./perf stat -e 'kvm:*' -a sleep 1h
^C
 Performance counter stats for 'sleep 1h':

           8330  kvm:kvm_entry            #      0.000 M/sec
              0  kvm:kvm_hypercall        #      0.000 M/sec
           4060  kvm:kvm_pio              #      0.000 M/sec
              0  kvm:kvm_cpuid            #      0.000 M/sec
           2681  kvm:kvm_apic             #      0.000 M/sec
           8343  kvm:kvm_exit             #      0.000 M/sec
            737  kvm:kvm_inj_virq         #      0.000 M/sec
              0  kvm:kvm_page_fault       #      0.000 M/sec
              0  kvm:kvm_msr              #      0.000 M/sec
            664  kvm:kvm_cr               #      0.000 M/sec
            872  kvm:kvm_pic_set_irq      #      0.000 M/sec
              0  kvm:kvm_apic_ipi         #      0.000 M/sec
            738  kvm:kvm_apic_accept_irq  #      0.000 M/sec
            874  kvm:kvm_set_irq          #      0.000 M/sec
            874  kvm:kvm_ioapic_set_irq   #      0.000 M/sec
              0  kvm:kvm_msi_set_irq      #      0.000 M/sec
            433  kvm:kvm_ack_irq          #      0.000 M/sec
           2685  kvm:kvm_mmio             #      0.000 M/sec

    3.493562100  seconds time elapsed

The perf tool is part of the Linux kernel tree in tools/perf. 

Dogtail automated GUI testing

dstat

`dstat` is one of the more valuable tools for monitoring system performance. The output columns can be easily customized.

dstat -cdngypilmr --vm

Show each CPU separately:

dstat -fcdngypilmr --vm

The default options are -cdngy. The following are options I commonly use. Many other are described in the manpage.

-c --cpu    system, user, idle, wait, hardware interrupt, software interrupt
-d --disk   disk read, write
-f --full   full listing when using certain options (--cpu, --int, --disk, --net, --swap)
-g --page   page in, out
-i --int    interrupts (see also --full option, --I option, and review /proc/interrupts)
-l --load   load average
-m --mem    memory used, buffers, cache, free
-n --net    network receive, send
-r --io     I/O read, write
-s --swap   swap used, free
-y --sys    system interrupts, context switches
   --vm     vm hard pagefaults, soft pagefaults, allocated, free

`dstat` also has many Python plugins stored in /usr/share/dstat/.

Some statistics require the lm-sensors package. Run `sensors-detect` after installing.

tools

apttitude -q -y install iozone3 stress cpuburn sysstat iotop smem powertop hardinfo hddtemp \
    dbench sysbench phoronix-test-suite iperf netperf netperfmeter \
    google-perftools \
    stressapptest \
    ceph-test \
    memtester \
    posixtest \
    fio lmbench
MM Tests
MM Tests. MMTests is a configurable test suite that runs a number of common workloads of interest to MM developers.
LPT
Linux Test Project This is a test suite composed of various third-party tests. This test suite is not available as a package for Ubuntu. It may be downloaded as a source.
Autotest
Fully Automated Testing Under Linux. This is primarily for testing the Linux kernel.

mpstat

The mpstat command displays statistics about the CPU queue and the activity of each CPU.

# mpstat -P ALL 1 3
Linux 3.8.0-35-generic (vmh-dev-9) 	2014-03-26 	_x86_64_	(4 CPU)

17:02:00     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
17:02:01     all    0.26    0.00    0.00    0.00    0.00    0.00    0.77    0.00   98.97
17:02:01       0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
17:02:01       1    1.04    0.00    0.00    0.00    0.00    0.00    1.04    0.00   97.92
17:02:01       2    0.00    0.00    0.00    0.00    0.00    0.00    1.03    0.00   98.97
17:02:01       3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

sysbench

File IO testing.

mkdir sysbench-testrun.0
cd sysbench-testrun.0
# Prepare 16 files, each 1GB in size.
sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw --init-rnd --num-threads=16 prepare
sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw --init-rnd --num-threads=16 run
sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw --init-rnd --num-threads=16 cleanup

CPU performance testing.

sysbench --test=cpu --cpu-max-prime=20000 run

CPU thread testing.

sysbench --test=threads --num-threads=64 --thread-yields=100 --thread-locks=2 run

Mutex testing.

sysbench --test=mutex --mutex-locks=100000 --num-threads=1024 --memory-oper=read  run
sysbench --test=mutex --memory-oper=write --mutex-locks=100000 --num-threads=1024 run
sysbench --test=mutex --mutex-locks=100000 --num-threads=1024 run
sysbench --test=mutex --memory-oper=read
sysbench --test=mutex --memory-oper=write
  1. OLTP (database)
sysbench --test=mutex

not used so much

The fio tool does not have a lot of documentation, but it looks interesting. The homepage is just a git repository: fio. Under Ubuntu install the fio package. For documentation see /usr/share/doc/fio/, especially /usr/share/doc/fio/examples/.

lmbench is ancient (its homepage is nearly 20 years old!), but it still works.

mibench It is a small suite of benchmarks used to test various tasks that might be of interest to embedded systems. This hasn't been touched in over a decade. At least a few of the launch scripts expect the current working directory to be in the PATH.

SPLASH-2 for testing shared address space memory systems. Sounds like multi-threaded or clustered computing test tools.

sysbench

Sysbench benchmarks are broken down into three steps: prepare, run, and cleanup. The prepare step will create sample data files for subsequent stages. The files will be named in the form test_file.NN where NN is an integer starting with 0.

sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw  --num-threads=16 prepare
sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw  --num-threads=16 run
sysbench --test=fileio --file-total-size=16G --file-num=16 --file-test-mode=rndrw  --num-threads=16 cleanup

drive IO testing and performance measurement

Basic read and write speed testing. The options used below tests for IOPS, not bytes/sec. These options also favor sequential streaming of large blocks of data.

iozone -a -s 1048576 -g 1G -i 0 -i 1 -O

Bonnie++ tests will work OK with the defaults. You do have to set the user. Note that root is not normally recommended. Note that bonnie is the same as bonnie++ (bonnie is a sym-link to bonnie++). The output of bonnie++ is stupidly difficult to read. There is no way to fix this. It also dumps out CSV data, which is even harder to read without a spreadsheet.

bonnie -u root:root

drive stress testing

The stress command generates pure loads. It does not attempt to measure how the system handles this. You can combine this with other tools to get performance measurements.

This generates stress on /dev/sda. While this is running you may want to run "iostat 1 300 /dev/sda" in a different window.

stress --hdd 10 /dev/sda

CPU stress and burn

Install the Ubuntu package cpuburn. For each CPU core your system has run one instance of `burnP6` (for Intel P6 processors). Monitor the CPU usage and system load using `htop` or the tool of your choice. Monitor the temperature using `sensors` or some ACPI tool.

burnP6 &
burnP6 &
burnP6 &
burnP6 &
watch -n1 sensors
killall burnP6

System Platform Testing

Autotest is a fully automated test suite designed to test the entire Linux platform. It is based on a large collection third-party testing tools such as dbench, iozone, stress, sysbench, and lots more.

clear cache

(flush cache, dump cache, drop cache, drop caches)

Linux kernel caches may make certain performance tests difficult to interpret. When doing performance testing you want to start with the same clear state each time to take into account the performance penalty of populating a cache the first time. Luckily Linux provides a way to free caches. The following commands will clear the given kernel caches.

pagecache
echo 1 > /proc/sys/vm/drop_caches
slab and objects (includes dentries and inodes)
echo 2 > /proc/sys/vm/drop_caches
all caches -- slab, objects, and pagecache
echo 3 > /proc/sys/vm/drop_caches

See also: https://www.kernel.org/doc/Documentation/sysctl/vm.txt