Table of Contents
主要来自redis 作者的文章: http://redis.io/topics/latency
这篇文章介绍了很多工具, iostat, vmstat
测量:
redis-cli --latency -h `host` -p `port`
基准(不可能好于这个数):
$ ./redis-cli --intrinsic-latency 100 Max latency so far: 573 microseconds. Max latency so far: 695 microseconds. Max latency so far: 919 microseconds. Max latency so far: 1606 microseconds. Max latency so far: 3191 microseconds. Max latency so far: 9243 microseconds. Max latency so far: 9671 microseconds. Here we have an intrinsic latency of 9.7 milliseconds: this means that we can't ask better than that to Redis.
可能的原因:
The typical latency of a 1 GBits/s network is about 200 us, while the latency with a Unix domain socket can be as low as 30 us.
Please note vanilla Redis is not really suitable to be bound on a single CPU core. Redis can fork background tasks that can be extremely CPU consuming like bgsave or AOF rewrite. These tasks must never run on the same core as the main event loop.
from Redis 2.4 we use threads in Redis in order to perform some slow I/O operations in the background, mainly related to disk I/O,
建议:
查看swap:
ning@ning-laptop ~/test$ cat /proc/18941/smaps | grep Swap Swap: 0 kB Swap: 0 kB Swap: 0 kB Swap: 0 kB Swap: 0 kB Swap: 156 kB
write() + fdatasync()
Both the write(2) and fdatasync(2) calls can be source of latency. For instance write(2) can block both when there is a system wide sync in progress, or when the output buffers are full and the kernel requires to flush on disk in order to accept new writes.
close 也会导致flush.
fdatasync 可能从几个ms到几s. 所以redis2.4 尽可能在另一个线程里面做fdatasync.
When appendfsync is set to the value of no Redis performs no fsync. In this configuration the only source of latency can be write(2).
测量:
sudo strace -p $(pidof redis-server) -T -e trace=fdatasync -f
因为fdatasync 是在另一个线程, 所以需要加-f
However since write(2) is also used in order to write data to the client sockets this will likely show too many things unrelated to disk I/O. Apparently there is no way to tell strace to just show slow system calls so I use the following command:
strace -f -p $(pidof redis-server) -T -e trace=fdatasync,write 2>&1 | grep -v '0.0' | grep -v unfinished
建议:
用这个, 我们在做copy 的时候, 可以观察到:
[pid 24734] write(42, "*4\r\n$5\r\nhmset\r\n$37\r\np-lc-d687791"..., 272475) = 272475 <0.036430> [pid 24738] <... fdatasync resumed> ) = 0 <2.030435> [pid 24738] <... fdatasync resumed> ) = 0 <0.012418> [pid 24734] write(42, "*4\r\n$5\r\nHMSET\r\n$37\r\np-lc-6787211"..., 73) = 73 <0.125906> [pid 24738] <... fdatasync resumed> ) = 0 <4.476948> [pid 24734] <... write resumed> ) = 294594 <2.477184> (2.47s)
此时输出:
$ ./_binaries/redis-cli --latency-history -h 10.38.114.60 -p 2000 min: 0, max: 223, avg: 1.24 (1329 samples) -- 15.01 seconds range min: 0, max: 2500, avg: 3.46 (1110 samples) -- 15.00 seconds range (这里观察到2.5s) min: 0, max: 5, avg: 1.01 (1355 samples) -- 15.01 seconds range
watchdog 输出:
[24734] 07 Jul 10:54:41.006 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis. [24734 | signal handler] (1404701682) --- WATCHDOG TIMER EXPIRED --- bin/redis-server *:2000(logStackTrace+0x4b)[0x443bdb] /lib64/tls/libpthread.so.0(__write+0x4f)[0x302b80b03f] /lib64/tls/libpthread.so.0[0x302b80c420] /lib64/tls/libpthread.so.0(__write+0x4f)[0x302b80b03f] bin/redis-server *:2000(flushAppendOnlyFile+0x76)[0x43f616] bin/redis-server *:2000(serverCron+0x325)[0x41b5b5] bin/redis-server *:2000(aeProcessEvents+0x2b2)[0x416a22] bin/redis-server *:2000(aeMain+0x3f)[0x416bbf] bin/redis-server *:2000(main+0x1c8)[0x41dcd8] /lib64/tls/libc.so.6(__libc_start_main+0xdb)[0x302af1c4bb] bin/redis-server *:2000[0x415b1a] [24734 | signal handler] (1404701682) --------
if the database contains has many many keys expiring in the same second, and this keys are at least 25% of the current population of keys with an expire set,
2, 3, 5 我们都遇到了.