Table of Contents
today we got a core of twemproxy:
$ gdb -c core.14420 ./bin/nutcracker (gdb) bt #0 0x000000302af2e2ed in raise () from /lib64/tls/libc.so.6 #1 0x000000302af2fa3e in abort () from /lib64/tls/libc.so.6 #2 0x0000000000419c82 in nc_assert (cond=0x444dc0 "!TAILQ_EMPTY(&send_msgq) && nsend != 0", file=0x444aa8 "nc_message.c", line=745, panic=1) at nc_util.c:308 #3 0x000000000040d0d6 in msg_send_chain (ctx=0x553090, conn=0x55b380, msg=0x0) at nc_message.c:745 #4 0x000000000040d568 in msg_send (ctx=0x553090, conn=0x55b380) at nc_message.c:820 #5 0x00000000004059af in core_send (ctx=0x553090, conn=0x55b380) at nc_core.c:173 #6 0x0000000000405ffe in core_core (arg=0x55b380, events=65280) at nc_core.c:301 #7 0x0000000000429297 in event_wait (evb=0x5652e0, timeout=389) at nc_epoll.c:269 #8 0x000000000040606f in core_loop (ctx=0x553090) at nc_core.c:316 #9 0x000000000041b109 in nc_run (nci=0x7fbfffea80) at nc.c:530 #10 0x000000000041b20d in main (argc=14, argv=0x7fbfffecc8) at nc.c:579 (gdb) f 3 #3 0x000000000040d0d6 in msg_send_chain (ctx=0x553090, conn=0x55b380, msg=0x0) at nc_message.c:745 745 ASSERT(!TAILQ_EMPTY(&send_msgq) && nsend != 0); (gdb) l 740 if (msg == NULL) { 741 break; 742 } 743 } 744 745 ASSERT(!TAILQ_EMPTY(&send_msgq) && nsend != 0); 746 747 conn->smsg = NULL; 748 749 n = conn_sendv(conn, &sendv, nsend);
it is caused by this ASSERT at nc_message.c:745,
this core is not always raise when we do the same request again,
if we use -v 9 to debug, it never raise.
use gdb, we found that the msg we want to send is:
(gdb) p * (send_msgq.tqh_first ) $6 = {c_tqe = {tqe_next = 0x0, tqe_prev = 0x0, trace = {lastfile = 0x444e28 "nc_request.c", lastline = 291, prevfile = 0x444e28 "nc_request.c", prevline = 270}}, s_tqe = {tqe_next = 0x0, tqe_prev = 0x0, trace = {lastfile = 0x444e28 "nc_request.c", lastline = 302, prevfile = 0x444e28 "nc_request.c", prevline = 279}}, m_tqe = {tqe_next = 0x0, tqe_prev = 0x7fbfffe8c0, trace = { lastfile = 0x444aa8 "nc_message.c", lastline = 712, prevfile = 0x444aa8 "nc_message.c", prevline = 196}}, id = 6515, peer = 0x810cc0, owner = 0x55ae00, tmo_rbe = {left = 0x0, right = 0x0, parent = 0x0, key = 0, data = 0x0, color = 0 '\0'}, mhdr = {stqh_first = 0x82ecc0, stqh_last = 0x82ecc8}, mlen = 0, state = 0, pos = 0x82eaf4 ":0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n:0\r\n", token = 0x0, parser = 0x42713b <redis_parse_rsp>, result = MSG_PARSE_OK, pre_splitcopy = 0x427e1d <redis_pre_splitcopy>, post_splitcopy = 0x427f8a <redis_post_splitcopy>, pre_coalesce = 0x42816a <redis_pre_coalesce>, post_coalesce = 0x428557 <redis_post_coalesce>, type = MSG_RSP_REDIS_INTEGER, key_start = 0x0, key_end = 0x0, vlen = 0, end = 0x0, narg_start = 0x0, narg_end = 0x0, narg = 0, rnarg = 0, rlen = 0, integer = 0, frag_owner = 0x0, nfrag = 0, frag_id = 0, err = 0, error = 0, ferror = 0, request = 0, quit = 0, noreply = 0, done = 0, fdone = 0, first_fragment = 0, last_fragment = 0, swallow = 0, redis = 1}
this is the only msg( and the last msg) in send_msgq, mlen=0 ,
and it's peer request is a fragment of a delete cmd:
(gdb) p * (send_msgq.tqh_first -> peer ) $5 = {c_tqe = {tqe_next = 0x75e140, tqe_prev = 0x55b400, trace = {lastfile = 0x444e28 "nc_request.c", lastline = 270, prevfile = 0x0, prevline = 0}}, s_tqe = {tqe_next = 0x0, tqe_prev = 0x0, trace = { lastfile = 0x444e28 "nc_request.c", lastline = 302, prevfile = 0x444e28 "nc_request.c", prevline = 279}}, m_tqe = {tqe_next = 0x0, tqe_prev = 0x0, trace = {lastfile = 0x444aa8 "nc_message.c", lastline = 758, prevfile = 0x444aa8 "nc_message.c", prevline = 712}}, id = 4515, peer = 0x5fe2c0, owner = 0x55b380, tmo_rbe = {left = 0x0, right = 0x0, parent = 0x0, key = 0, data = 0x0, color = 0 '\0'}, mhdr = {stqh_first = 0x6c1090, stqh_last = 0x6c1098}, mlen = 47, state = 0, pos = 0x6c0eef "*3\r\n$3\r\nDEL\r\n$27\r\nrrr_kkk_3933078092091789232\r\n$27\r\nrrr_kkk_5663627759491506624\r\n*3\r\n$3\r\nDEL\r\n$27\r\nrrr_kkk_3933078092091789232\r\n$27\r\nrrr_kkk_5663627759491506624\r\n*3\r\n$3\r\nDEL\r\n$27\r\nrrr_kkk_393307809209"..., token = 0x0, parser = 0x421279 <redis_parse_req>, result = MSG_PARSE_OK, pre_splitcopy = 0x427e1d <redis_pre_splitcopy>, post_splitcopy = 0x427f8a <redis_post_splitcopy>, pre_coalesce = 0x42816a <redis_pre_coalesce>, post_coalesce = 0x428557 <redis_post_coalesce>, type = MSG_REQ_REDIS_DEL, key_start = 0x6c0ed2 "rrr_kkk_5663627759491506624\r\n*3\r\n$3\r\nDEL\r\n$27\r\nrrr_kkk_3933078092091789232\r\n$27\r\nrrr_kkk_5663627759491506624\r\n*3\r\n$3\r\nDEL\r\n$27\r\nrrr_kkk_3933078092091789232\r\n$27\r\nrrr_kkk_5663627759491506624\r\n*3\r\n$3\r\nD"..., key_end = 0x6c0eed "\r\n*3\r\n$3\r\nDEL\r\n$27\r\nrrr_kkk_3933078092091789232\r\n$27\r\nrrr_kkk_5663627759491506624\r\n*3\r\n$3\r\nDEL\r\n$27\r\nrrr_kkk_3933078092091789232\r\n$27\r\nrrr_kkk_5663627759491506624\r\n*3\r\n$3\r\nDEL\r\n$27\r\nrrr_kkk_3933078092"..., vlen = 0, end = 0x0, narg_start = 0x6c0ec0 "*2\r\n$3\r\ndel\r\n$27\r\nrrr_kkk_5663627759491506624\r\n*3\r\n$3\r\nDEL\r\n$27\r\nrrr_kkk_3933078092091789232\r\n$27\r\nrrr_kkk_5663627759491506624\r\n*3\r\n$3\r\nDEL\r\n$27\r\nrrr_kkk_3933078092091789232\r\n$27\r\nrrr_kkk_566362775949"..., narg_end = 0x6c0ec2 "\r\n$3\r\ndel\r\n$27\r\nrrr_kkk_5663627759491506624\r\n*3\r\n$3\r\nDEL\r\n$27\r\nrrr_kkk_3933078092091789232\r\n$27\r\nrrr_kkk_5663627759491506624\r\n*3\r\n$3\r\nDEL\r\n$27\r\nrrr_kkk_3933078092091789232\r\n$27\r\nrrr_kkk_56636277594915"..., narg = 2, rnarg = 0, rlen = 0, integer = 0, frag_owner = 0x75e4f0, nfrag = 0, frag_id = 1256, err = 0, error = 0, ferror = 0, request = 1, quit = 0, noreply = 0, done = 1, fdone = 1, first_fragment = 0, last_fragment = 1, swallow = 0, redis = 1}
the orig msg is:
(gdb) p * (send_msgq.tqh_first -> peer ->frag_owner ) $23 = {c_tqe = {tqe_next = 0x0, tqe_prev = 0x0, trace = {lastfile = 0x444e28 "nc_request.c", lastline = 291, prevfile = 0x444e28 "nc_request.c", prevline = 270}}, s_tqe = {tqe_next = 0x0, tqe_prev = 0x0, trace = {lastfile = 0x444e28 "nc_request.c", lastline = 302, prevfile = 0x444e28 "nc_request.c", prevline = 279}}, m_tqe = {tqe_next = 0x5f9740, tqe_prev = 0x551f80, trace = { lastfile = 0x444aa8 "nc_message.c", lastline = 360, prevfile = 0x444aa8 "nc_message.c", prevline = 758}}, id = 4514, peer = 0x0, owner = 0x55b380, tmo_rbe = {left = 0x0, right = 0x0, parent = 0x0, key = 0, data = 0x0, color = 1 '\001'}, mhdr = {stqh_first = 0x0, stqh_last = 0x75e5c8}, mlen = 34, state = 27, pos = 0x74f15f "$27\r\nrrr_kkk_5663627759491506624\r\n*3\r\n$3\r\nDEL\r\n$27\r\nrrr_kkk_3933078092091789232\r\n$27\r\nrrr_kkk_5663627759491506624\r\n*3\r\n$3\r\nDEL\r\n$27\r\nrrr_kkk_3933078092091789232\r\n$27\r\nrrr_kkk_5663627759491506624\r\n*3\r\n"..., token = 0x0, parser = 0x421279 <redis_parse_req>, result = MSG_PARSE_FRAGMENT, pre_splitcopy = 0x427e1d <redis_pre_splitcopy>, post_splitcopy = 0x427f8a <redis_post_splitcopy>, pre_coalesce = 0x42816a <redis_pre_coalesce>, post_coalesce = 0x428557 <redis_post_coalesce>, type = MSG_REQ_REDIS_DEL, key_start = 0x74f142 "rrr_kkk_3933078092091789232\r\n$27\r\nrrr_kkk_5663627759491506624\r\n*3\r\n$3\r\nDEL\r\n$27\r\nrrr_kkk_3933078092091789232\r\n$27\r\nrrr_kkk_5663627759491506624\r\n*3\r\n$3\r\nDEL\r\n$27\r\nrrr_kkk_3933078092091789232\r\n$27\r\ncoun"..., key_end = 0x74f15d "\r\n$27\r\nrrr_kkk_5663627759491506624\r\n*3\r\n$3\r\nDEL\r\n$27\r\nrrr_kkk_3933078092091789232\r\n$27\r\nrrr_kkk_5663627759491506624\r\n*3\r\n$3\r\nDEL\r\n$27\r\nrrr_kkk_3933078092091789232\r\n$27\r\nrrr_kkk_5663627759491506624\r\n*3"..., vlen = 0, end = 0x0, narg_start = 0x74ed80 "*2\r\n$3\r\nDEL\r\n$27\r\nrrr_kkk_3933078092091789232\r\n$27\r\nrrr_kkk_5663627759491506624\r\n*3\r\n$3\r\nDEL\r", narg_end = 0x74ed82 "\r\n$3\r\nDEL\r\n$27\r\nrrr_kkk_3933078092091789232\r\n$27\r\nrrr_kkk_5663627759491506624\r\n*3\r\n$3\r\nDEL\r", narg = 3, rnarg = 1, rlen = 0, integer = 0, frag_owner = 0x75e4f0, nfrag = 2, frag_id = 1256, err = 0, error = 0, ferror = 0, request = 1, quit = 0, noreply = 0, done = 1, fdone = 1, first_fragment = 1, last_fragment = 0, swallow = 0, redis = 1}
we found that it's a DEL cmd with narg=3. the orig cmd is:
DEL rrr_kkk_3933078092091789232 rrr_kkk_5663627759491506624
so we try to repetition this bug with lot's of MULTI-DEL.
a MULTI-DEL cmd will split into many DEL in nutcracker,
response will coalesce. (all fragment except the first one will be rewrite to empty msg), following pic show the split and coalesce of msg:
ok, let's check this code:
static rstatus_t msg_send_chain(struct context *ctx, struct conn *conn, struct msg *msg) { ... TAILQ_INIT(&send_msgq); array_set(&sendv, iov, sizeof(iov[0]), NC_IOV_MAX); nsend = 0; for (;;) { ASSERT(conn->smsg == msg); TAILQ_INSERT_TAIL(&send_msgq, msg, m_tqe); for (mbuf = STAILQ_FIRST(&msg->mhdr); mbuf != NULL && array_n(&sendv) < NC_IOV_MAX && nsend < limit; mbuf = nbuf) { nbuf = STAILQ_NEXT(mbuf, next); if (mbuf_empty(mbuf)) { continue; } mlen = mbuf_length(mbuf); if ((nsend + mlen) > limit) { mlen = limit - nsend; } ciov = array_push(&sendv); ciov->iov_base = mbuf->pos; ciov->iov_len = mlen; nsend += mlen; } if (array_n(&sendv) >= NC_IOV_MAX || nsend >= limit) { break; } msg = conn->send_next(ctx, conn); if (msg == NULL) { break; } } //log_debug(LOG_VVERB, "conn->client: %d, len_sendv: %d nsend: %d", conn->client, array_n(&sendv), nsend); ASSERT(!TAILQ_EMPTY(&send_msgq) && nsend != 0); conn->smsg = NULL; n = conn_sendv(conn, &sendv, nsend);
conn_send send no more than NC_IOV_MAX(128) pieces in this function,
if the first fragment of MULTI-DEL response is send on last batch. and this is the last msg in send queue, the next call of msg_send_chain will got nsend == 0:
following img show such a case:
the test case with test-twemproxy:
def test_multi_delete_20140525(): conn = redis.Redis('127.0.0.5', 4100) cnt = 126 keys = ['key-%s'%i for i in range(cnt)] pipe = conn.pipeline(transaction=False) pipe.mget(keys) pipe.delete(*keys) print pipe.execute()
see: https://github.com/idning/test-twemproxy/blob/master/test_redis/test_del.py#L56-L63
we do not need this comment here:
ASSERT(!TAILQ_EMPTY(&send_msgq) && nsend != 0);
and we should eat all the fragment of MULTI-DEL in msg_send_chain
current master (8a4f5c0)can not pass following cases:
test_redis.test_mget_mset.test_mget_special_key ... ERROR test_redis.test_mget_mset.test_mget_special_key_2 ... ERROR