Permalink: 2014-09-09 21:29:40 by ning in redis tags: all

最近发现hhvm和redis的交互偶尔hang住(这里连接没有设置超时), 分析有三种可能性:

  1. server端返回错误的response. 比如返回一个超大的 length:

    $10240000\r\nabcd\4\n
    
客户端读到这个大数, 但是发现没有这么多字节要读, 于是就一直等着读
  1. client解析的时候出错, 如果读到正确的 $12\r\n , 解析出错溢出, 被解析成一个超大的数, 也会一直等.
  2. 由于网络问题造成hang住(比如FIN包丢失)

当然, 要绕过这个问题最简单的办法是:

  1. 请求加读写超时
  2. 给每个线程加个超时重启机制 (php就可以这么干).

鉴于其它client很少发现类似问题(不过其它client通常都是连接超时, hhvm没有设置), 而且hhvm redis库代码质量较低(之前已经发现了bug), 首先怀疑是client解析出错.

这篇分析做下面几个事情:

  1. 尝试复现hang住的问题
  2. 通过一个fake-redis稳定复现解析错误.
  3. 检查hhvm Redis/fgets中的bug
  4. 构造hang住的case.
  5. 如何fix.

1   尝试问题复现

由于之前发现 hgetall比较容易hang住, 写了这样一个php来测试:

<?php
set_time_limit(0);

$r = new Redis();
$r->setOption('OPT_READ_TIMEOUT', 0.0);
$conn = $r->connect('10.81.19.14', 4000, 0.1 );

$key = 'kkkkkkkkk';
for ($i=0; $i<3000; $i++) {
    $value = str_pad( $i, $i%100, "\r\n" );

    $r->hMset($key, array("field-$i" => $value));
}

for($i=0; $i<10000000; $i++) {
    $v = $r->hgetall($key);
    $len = count($v);

    if ($i%100 == 0) {
        echo "$i: $len\n";
    }
}

hhvm_backend/bin/hhvm_backend -c hhvm_backend/conf/hhvm.conf a.php

发现不会hang住, 但是经常报错:

[Tue Sep  9 12:58:00 2014] [hphp] [54093:7f598d376340:0:000955] []
Fatal error: Uncaught exception 'RedisException' with Message 'protocol error, got 'o' as reply type byte' in :
Stack trace:
#0 (): Redis->sockReadData()
#1 (): Redis->processMapResponse()
#2 /home/ning/hhvm/a.php(26): Redis->__call()
#3 {main}

这个应该是解析过程中发生的某种错误, 我们先研究这个问题.

2   fake-redis 复现

粗略看了hhvm Redis库的代码, 发现读response的地方处理不好, 所以我们构造一个fake-redis用于测试, fake-redis在response的时候, 每次写一个字节, sleep 100ms .

https://github.com/idning/fake-redis/blob/master/fake-redis.py

(这个测试估计可以让多数网络程序出bug, 包括lighttpd, openssl等)

应用这个 fake-redis, 做了下面测试

2.1   测试1

c.php:

$r = new Redis();

$conn = $r->connect('127.1', 9999, 0.1 );

echo "set\n";
echo $r->set('k', 'abc');
echo "get\n";
echo $r->Get('k');
echo "done\n";

会出问题:

$ hhvm_backend/bin/hhvm_backend -c hhvm_backend/conf/hhvm.conf c.php
set
get
[Tue Sep  9 22:10:57 2014] [hphp] [54178:7f539ac6f340:0:000001] []
Fatal error: Uncaught exception 'RedisException' with Message 'protocol error, got 'K' as reply type byte' in :
Stack trace:
#0 (): Redis->sockReadData()
#1 (): Redis->processSerializedResponse()
#2 hhvm/c.php(11): Redis->__call()
#3 {main}

加了strace打算看下原因, 发现ok了:

$ strace -oa.txt hhvm_backend/bin/hhvm_backend -c hhvm_backend/conf/hhvm.conf c.php
set
1get
abcdone

后来仔细分析了strace, 发现客户端会经常用100ms 去poll(更详细的分析参考后面):

recvfrom(7, "O", 1, MSG_PEEK|MSG_DONTWAIT, NULL, NULL) = 1
fcntl(7, F_GETFL)                       = 0x2 (flags O_RDWR)
poll([{fd=7, events=POLLIN|POLLERR|POLLHUP, revents=POLLIN}], 1, 100) = 1
getsockopt(7, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
recvfrom(7, "O", 8192, MSG_DONTWAIT, NULL, NULL) = 1

因为server端也是sleep 100ms, 所以用了strace后, cliet慢了一点, server端的response已经来了, 正好就ok了.

修改 sleep 为150ms, 这时候就能稳定复现, 每次结果都一样:

$ strace -oa.txt hhvm_backend/bin/hhvm_backend -c hhvm_backend/conf/hhvm.conf c.php
set
NRedis: type=+, resp=
get
[Wed Sep 10 10:43:44 2014] [hphp] [14966:7f42e70f4340:0:000001] []
Fatal error: Uncaught exception 'RedisException' with Message 'protocol error, got 'O' as reply type byte' in :
Stack trace:
#0 (): Redis->sockReadData()
#1 (): Redis->processSerializedResponse()
#2 /home/ning/hhvm/c.php(56): Redis->__call()
#3 {main}

strace如下:

recvfrom(7, "O", 1, MSG_PEEK|MSG_DONTWAIT, NULL, NULL) = 1
fcntl(7, F_GETFL)                       = 0x2 (flags O_RDWR)
poll([{fd=7, events=POLLIN|POLLERR|POLLHUP, revents=POLLIN}], 1, 100) = 1
getsockopt(7, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
recvfrom(7, "O", 8192, MSG_DONTWAIT, NULL, NULL) = 1
recvfrom(7, 0x7fff6c432e40, 1, 66, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
fcntl(7, F_GETFL)                       = 0x2 (flags O_RDWR)
poll([{fd=7, events=POLLIN|POLLERR|POLLHUP}], 1, 100) = 0
recvfrom(7, 0x7fa647bf5000, 8192, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)

2.2   小结

到了这里, 我们很容易模拟fgets不符合预期的情况了, 通过读代码, 发现主要是fgets的问题.

3   Redis库的问题

3.1   Mock 测试

我们通过继承Redis,来看看到底发生了什么(这个php被编译在二进制里面了, 每次修改都要重新编译, 不如继承, 方便调试):

class NRedis extends Redis{

  protected function processBooleanResponse() {
    if ($this->mode === self::ATOMIC) {
      $resp = $this->sockReadData($type);
      echo "NRedis: type=$type, resp=$resp\r\n";

      return ($type === self::TYPE_LINE) AND ($resp === 'OK');
    }
    $this->multiHandler[] = [ 'cb' => [$this,'processBooleanResponse'] ];
    if (($this->mode === self::MULTI) && !$this->processQueuedResponse()) {
      return false;
    }
    return $this;
  }

  public function hMSet($key, array $pairs) {
    $args = [$this->prefix($key)];
    foreach ($pairs as $k => $v) {
      $args[] = $k;
      $args[] = $this->serialize($v);
    }
    $this->processArrayCommand('HMSET', $args);
    return $this->processBooleanResponse();
  }
}


$ strace -oa.txt hhvm_backend/bin/hhvm_backend -c hhvm_backend/conf/hhvm.conf c.php
set
NRedis: type=+, resp=
get
[Wed Sep 10 10:43:44 2014] [hphp] [14966:7f42e70f4340:0:000001] []
Fatal error: Uncaught exception 'RedisException' with Message 'protocol error, got 'O' as reply type byte' in :
Stack trace:
#0 (): Redis->sockReadData()
#1 (): Redis->processSerializedResponse()
#2 /home/ning/hhvm/c.php(56): Redis->__call()
#3 {main}

发现第一个set请求, server 端的返回是 +OK\r\n, 但是客户端读到 + 就返回了, 此时返回的type是 + , resp为空, 所以接下来读get请求的response的时候, 就读到了 'O', 这就是上面这个异常的原因.

这里 fgets 返回了 + , 还没遇到 \n 就返回, 这是 fgets 的问题, 我们后面再分析

3.2   修复

既然fgets有问题, 我们可以换一个比较保守的readLine实现:

class NRedis extends Redis{
  ...
  protected function sockReadLine() {
   if (!$this->checkConnection()) {
     return false;
   }

   $line = '';
   while(1) {
     $c = fgetc($this->connection);
     $line .= $c;
     if (substr($line, -2) == "\r\n") {
       $line = substr($line, 0, -2);
       return $line;
     }
   }

   return false;
  }
}

使用修复版的NRedis, 测试OK:

$ strace -oa.txt hhvm_backend/bin/hhvm_backend -c hhvm_backend/conf/hhvm.conf c.php
set
NRedis: type=+, resp=OK
1get
abcdone

4   fgets问题

4.1   Mock 测试

利用前面的fake-redis server, 写了这样一个测试代码:

$ cat d.php
<?php

$conn = fsockopen('127.1', 9999, $errno, $errstr, 0.1);

fwrite($conn, "*3\r\n$3\r\nSET\r\n$1\r\nk\r\n$1\r\nv\r\n");
$r = fgets($conn);
var_dump($r);

fwrite($conn, "*2\r\n$3\r\nGET\r\n$1\r\nk\r\n");

$r = fgets($conn);
var_dump($r);

$r = fgets($conn);
var_dump($r);

4.1.1   情况1

server端不sleep时, 返回一切正常:

$ hhvm_backend/bin/hhvm_backend -c hhvm_backend/conf/hhvm.conf d.php
string(5) "+OK
"
string(4) "$1
"
string(3) "v
"

4.1.2   情况2

server端每发送1个byte sleep 150ms:

$ strace -oa.txt hhvm_backend/bin/hhvm_backend -c hhvm_backend/conf/hhvm.conf d.php
string(1) "+"
string(1) "O"
string(1) "K"

每次 fgets 只能读到一个字节.

4.2   fgets bug

http://php.net/manual/en/function.fgets.php

php.net上fgets 的文档:

Reading ends when length - 1 bytes have been read, or a newline (which is included in the return value), or an EOF (whichever comes first). If no length is specified, it will keep reading from the stream until it reaches the end of the line.

有三种情况这个函数返回:

  1. 读到1行
  2. 读到eof
  3. 读到(length-1) 字节

4.2.1   看实现

4.2.1.1   fgets

implement of fgets hphp/runtime/ext/ext_file.cpp

Variant f_fgets(const Resource& handle, int64_t length /* = 0 */) {
  if (length < 0) {
    throw_invalid_argument("length (negative): %" PRId64, length);
    return false;
  }
  CHECK_HANDLE(handle, f);
  String line = f->readLine(length);
  if (!line.isNull()) {
    return line;
  }
  return false;
}

hphp/runtime/base/file.cpp:readLine:

1   String File::readLine(int64_t maxlen /* = 0 */) {
2     size_t current_buf_size = 0;
3     size_t total_copied = 0;
4     char *ret = nullptr;
5     for (;;) {
6       int64_t avail = bufferedLen();
7       if (avail > 0) {
8         int64_t cpysz = 0;
9         bool done = false;
10
11        char *readptr = m_buffer + m_readpos;
12        const char *eol;
13        const char *cr;
14        const char *lf;
15        cr = (const char *)memchr(readptr, '\r', avail);
16        lf = (const char *)memchr(readptr, '\n', avail);
17        if (cr && lf != cr + 1 && !(lf && lf < cr)) {
18          /* mac */
19          eol = cr;
20        } else if ((cr && lf && cr == lf - 1) || (lf)) {
21          /* dos or unix endings */
22          eol = lf;
23        } else {
24          eol = cr;
25        }
26
27        if (eol) {
28          cpysz = eol - readptr + 1;
29          done = true;
30        } else {
31          cpysz = avail;
32        }
33        if (maxlen > 0 && maxlen <= cpysz) {
34          cpysz = maxlen;
35          done = true;
36        }
37
38        current_buf_size += cpysz + 1;
39        if (ret) {
40          ret = (char *)realloc(ret, current_buf_size);
41        } else {
42          ret = (char *)malloc(current_buf_size);
43        }
44        memcpy(ret + total_copied, readptr, cpysz);
45
46        m_position += cpysz;
47        m_readpos += cpysz;
48        maxlen -= cpysz;
49        total_copied += cpysz;
50
51        if (done) {
52          break;                              ////////////////////////// 1: eol 或者读到了maxlen
53        }
54      } else if (eof()) {
55        break;                                /////////////////////////  2: eof
56      } else {
57        if (m_buffer == nullptr) {
58          m_buffer = (char *)malloc(CHUNK_SIZE);
59          m_bufferSize = CHUNK_SIZE;
60        }
61        m_writepos = filteredReadToBuffer();
62        m_readpos = 0;
63        if (bufferedLen() == 0) {
64          break;                              /////////////////////////  3: 一次读返回空.
65        }
66      }
67    }
68
69    if (total_copied == 0) {
70      assert(ret == nullptr);
71      return String();
72    }
73
74    ret[total_copied] = '\0';
75    return String(ret, total_copied, AttachString);
76  }

看这个长函数, 有几个情况会返回, 我在代码中标了1, 2, 3. 根据fgets的定义, 1,2两处返回是合理的, 3不应该返回, 而应该继续读, 这就是 fgets的bug

这个函数的另一个问题是, 如果一个socket上发来 abc\r\n, 但是一次读只读到 abc\r 的时候, 它就会返回以 \r 结尾的一行, 这个问题, Redis.php已经针对它做了专门的适配, 这里不讨论:

protected function sockReadLine() {
  if (!$this->checkConnection()) {
    return false;
  }
  $line = fgets($this->connection);
  if (substr($line, -2) == "\r\n") {
    $line = substr($line, 0, -2);
  } else if (substr($line, -1) == "\r") {             //就是这里.
    $line = substr($line, 0, -1);
    $lf = fgetc($this->connection);
    if ($lf === false) {
      // A response must terminate with both CR and LF. Refuse to guess.
      return false;
    } else if ($lf !== "\n") {
      throw new RedisException("Protocol error: CR not followed by LF");
    }
  }

  return $line;
}
4.2.1.2   readImpl

上面的readLine会调用filteredReadToBuffer 来read more bytes, 实际上就是调用readImpl(), 下面我们分析readImpl可能返回空的情况:

int64_t File::filteredReadToBuffer() {
  int64_t bytes_read = readImpl(m_buffer, CHUNK_SIZE);
  if (LIKELY(m_readFilters.empty())) {
    return bytes_read;
  }

readImpl for Socket: hphp/runtime/base/socket.cpp::readImpl:

1   int64_t Socket::readImpl(char *buffer, int64_t length) {
2     assert(m_fd);
3     assert(length > 0);
4
5     IOStatusHelper io("socket::recv", m_address.c_str(), m_port);
6
7     int recvFlags = 0;
8     if (m_timeout > 0) {
9       int flags = fcntl(m_fd, F_GETFL, 0);  // ning: we will get into this.
10      if ((flags & O_NONBLOCK) == 0) {
11        if (!waitForData()) {
12          m_eof = true;
13          return 0;
14        }
15        recvFlags = MSG_DONTWAIT; // polled, so no need to wait any more
16      }
17    }
18
19    int64_t ret = recv(m_fd, buffer, length, recvFlags);
20    if (ret == 0 || (ret == -1 && errno != EWOULDBLOCK)) {
21      m_eof = true;
22    }
23    return (ret < 0) ? 0 : ret;
24  }
25
26  bool Socket::waitForData() {
27    m_timedOut = false;
28    while (true) {
29      struct pollfd fds[1];
30      fds[0].fd = m_fd;
31      fds[0].events = POLLIN|POLLERR|POLLHUP;
32      if (poll(fds, 1, m_timeout / 1000)) {
33        socklen_t lon = sizeof(int);
34        int valopt;
35        getsockopt(m_fd, SOL_SOCKET, SO_ERROR, (void*)(&valopt), &lon);
36        if (valopt == EINTR) continue;
37        return valopt == 0;
38      } else {
39        m_timedOut = true;
40        return true;
41      }
42    }
43    return false;
44  }

Socket::readImpl 会进入11行的waitForData函数(从strace结果看, 这里m_timeout是设为100*1000的):

recvfrom(7, 0x7fff5be32f00, 1, 66, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
fcntl(7, F_GETFL)                       = 0x2 (flags O_RDWR)
poll([{fd=7, events=POLLIN|POLLERR|POLLHUP, revents=POLLIN}], 1, 100) = 1

我们可以先不管m_timeout怎么设置的, 这个值设为多大, 或者设为0, 都不影响分析, 看waitForData函数, 如果poll超时后没有收到事件, 就会设置m_timedOut=true, 并且return true.

所以此时 readImpl 进入 19行recv, 明显是返回 EAGAIN 的:

poll([{fd=7, events=POLLIN|POLLERR|POLLHUP}], 1, 100) = 0
recvfrom(7, 0x7f3a3d9e2000, 8192, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)

EAGAIN就是EWOULDBLOCK, 所以这里没有进入21行, 而是进入23行, 返回 ret = 0 .

4.2.2   小结

所以, 这应该认为是 File::readLine 的一个bug, 建议的修改是这样:

$ git diff
diff --git a/hphp/runtime/base/file.cpp b/hphp/runtime/base/file.cpp
index b180301..3d216bd 100644
--- a/hphp/runtime/base/file.cpp
+++ b/hphp/runtime/base/file.cpp
@@ -611,9 +611,9 @@ String File::readLine(int64_t maxlen /* = 0 */) {
       }
       m_writepos = filteredReadToBuffer();
       m_readpos = 0;
-      if (bufferedLen() == 0) {
-        break;
-      }
+      //if (bufferedLen() == 0) {
+        //break;
+      //}
     }
   }

这需要重新编译hhvm, 所以我没有测试.

5   模拟hang住的情况

最后, 知道bug的原因, 怎么模拟hang住呢, 我们需要构造合法的返回, 被这个bug的解析代码解析认为符合协议, 同时又解析出错。

因为fgets在server端sleep 200ms 的时候就会返回当前已经读到的 部分, 构造如下response:

$10\r\n$123456789\r\n
  ^
  |
  返回 '$1' 后sleep, 此时client读到 '$1' 认为是读到了一行, 并且认为这个response的阵阵大小是1, 接下来读到 '0\r\n', 所以client 会认为服务器返回了 '0' 这个长度为1的字符串.
  接下来, 下一个get请求来的时候, 会读到 ``$123456789\r\n`` 这一行, 它会认为这个response有123456789这么多字节, 就会一致尝试去读, 当然就一直hang住了.

模拟这个响应的代码:

def handle_get(self, argv):
    key = argv[1]
    if key == 'special-key-for-hhvm':

        # $10\r\n$123456789\r\n
        self.request.sendall('$1')
        time.sleep(.15)
        self.request.sendall('0\r\n$123456789\r\n')

    elif key in store:
        self.reply_bulk(store[key])
    else:
        self.reply_bulk(None)

这是一个完全合法的get请求的response, 表达的值为 '$123456789', 客户端测试代码:

$r = new Redis();

$conn = $r->connect('127.1', 9999, 0.1 );

$t = $r->get('special-key-for-hhvm');
var_dump($t);
$t = $r->get('special-key-for-hhvm');
var_dump($t);

$ strace -oa.txt hhvm_backend/bin/hhvm_backend -c hhvm_backend/conf/hhvm.conf c.php
string(1) "0"
--- 这里hang住
[Wed Sep 10 11:37:39 2014] [hphp] [52895:7ffc15e5c340:0:000001] []
Fatal error: Maximum execution time of 30 seconds exceeded

使用修改了 sockReadLine 后的NRedis代码, 就ok:

$ strace -oa.txt hhvm_backend/bin/hhvm_backend -c hhvm_backend/conf/hhvm.conf c.php
string(10) "$123456789"
string(10) "$123456789"

6   小结

  1. fgets有bug, 修复方法参考正文
  2. Redis类最好不要用有bug的fgets代码, 自己实现一个sockReadLine即可.
  3. 不排除server端(redis/twemproxy) 也有类似bug的可能性.

7   更新

fgets其实没有bug, hhvm在这里的行为和php是一致的, 就是:

超时后返回已读到的部分, 实际上, 这时候应该通过stream_get_meta_data() 获取是否超时信息:

$conn = fsockopen('127.1', 9999, $errno, $errstr, 0.1);
//$conn = fsockopen('127.1', 9999);

fwrite($conn, "*3\r\n$3\r\nSET\r\n$1\r\nk\r\n$1\r\nv\r\n");
$r = fgets($conn);
var_dump($r);

fwrite($conn, "*2\r\n$3\r\nGET\r\n$1\r\nk\r\n");

$r = fgets($conn);
var_dump($r);
$info = stream_get_meta_data($conn);
var_dump($info);


$r = fgets($conn);
var_dump($r);
$info = stream_get_meta_data($conn);
var_dump($info);

输出如下:

string(1) "+"
string(1) "O"
array(10) {
  ...
  ["timed_out"]=>
  bool(true)
  ["eof"]=>
  bool(false)
  ["wrapper_data"]=>
  NULL
}
string(1) "K"
array(10) {
  ["timed_out"]=>
  bool(true)
  ["blocked"]=>
  bool(false)
  ["eof"]=>
  bool(false)
  ["wrapper_data"]=>
  NULL
}

接口会输出timeout.

所以fgets应该认为没bug, 修改Redis库即可.

Comments