Table of Contents
初始cwnd 为2:
ning@ning-laptop ~/idning-source/kernel/linux-2.6.6/net/ipv4$ ack cwnd tcp_ipv4.c 2071: tp->snd_cwnd = 2;
初始rto 为3s, 所以我们经常看见3s延迟 /home/ning/idning-source/kernel/linux-3.6.2/Documentation/networking/ip-sysctl.txt:
rto_initial - INTEGER The initial round trip timeout value in milliseconds that will be used in calculating round trip times. This is the initial time interval for retransmissions. Default: 3000 #define TCP_TIMEOUT_INIT ((unsigned)(3*HZ)) /* RFC 1122 initial RTO value */ rto_max - INTEGER The maximum value (in milliseconds) of the round trip timeout. This is the largest time interval that can elapse between retransmissions. Default: 60000 rto_min - INTEGER The minimum value (in milliseconds) of the round trip timeout. This is the smallest time interval the can elapse between retransmissions. Default: 1000
看代码:
#重传时x2 icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX); /* Calculate rto without backoff. This is the second half of Van Jacobson's * routine referred to above. */ static inline void tcp_set_rto(struct sock *sk) { const struct tcp_sock *tp = tcp_sk(sk); /* Old crap is replaced with new one. 8) * * More seriously: * 1. If rtt variance happened to be less 50msec, it is hallucination. * It cannot be less due to utterly erratic ACK generation made * at least by solaris and freebsd. "Erratic ACKs" has _nothing_ * to do with delayed acks, because at cwnd>2 true delack timeout * is invisible. Actually, Linux-2.4 also generates erratic * ACKs in some circumstances. */ inet_csk(sk)->icsk_rto = __tcp_set_rto(tp); /* 2. Fixups made earlier cannot be right. * If we do not estimate RTO correctly without them, * all the algo is pure shit and should be replaced * with correct one. It is exactly, which we pretend to do. */ /* NOTE: clamping at TCP_RTO_MIN is not required, current algo * guarantees that rto is higher. */ tcp_bound_rto(sk); }
计算当前rto:
static inline u32 __tcp_set_rto(const struct tcp_sock *tp) { return (tp->srtt >> 3) + tp->rttvar; }
下面这里调用了tcp_set_rto:
static void tcp_valid_rtt_meas(struct sock *sk, u32 seq_rtt) { tcp_rtt_estimator(sk, seq_rtt); tcp_set_rto(sk); inet_csk(sk)->icsk_backoff = 0; } /* Read draft-ietf-tcplw-high-performance before mucking * with this code. (Supersedes RFC1323) */ static void tcp_ack_saw_tstamp(struct sock *sk, int flag) { /* RTTM Rule: A TSecr value received in a segment is used to * update the averaged RTT measurement only if the segment * acknowledges some new data, i.e., only if it advances the * left edge of the send window. * * See draft-ietf-tcplw-high-performance-00, section 3.3. * 1998/04/10 Andrey V. Savochkin <saw@msu.ru> * * Changed: reset backoff as soon as we see the first valid sample. * If we do not, we get strongly overestimated rto. With timestamps * samples are accepted even from very old segments: f.e., when rtt=1 * increases to 8, we retransmit 5 times and after 8 seconds delayed * answer arrives rto becomes 120 seconds! If at least one of segments * in window is lost... Voila. --ANK (010210) */ struct tcp_sock *tp = tcp_sk(sk); tcp_valid_rtt_meas(sk, tcp_time_stamp - tp->rx_opt.rcv_tsecr); }
注意:
/* RTTM Rule: A TSecr value received in a segment is used to * update the averaged RTT measurement only if the segment * acknowledges some new data, i.e., only if it advances the * left edge of the send window.
这就是说:
A =SYN=> B B =SYN=> A A =PSH=> B B =ack=> A A =PSH=> B B =ack=> A 这时候B第一次向A发数据, 如果这个包丢了, 此时的rtt也是3s:: B =PSH=> A
Do the 5-second pauses correspond with Zero Window announcements from the client?
Bandwidth Delay Product(BDP)即带宽 x RTT
通常,同机房内ping 命令输出:
$ ping 10.65.19.52 PING 10.65.19.52 (10.65.19.52) 56(84) bytes of data. 64 bytes from 10.65.19.52: icmp_seq=0 ttl=59 time=0.766 ms 64 bytes from 10.65.19.52: icmp_seq=1 ttl=59 time=0.746 ms 64 bytes from 10.65.19.52: icmp_seq=2 ttl=59 time=0.884 ms 64 bytes from 10.65.19.52: icmp_seq=3 ttl=59 time=0.747 ms
跨机房:
$ ping 10.220.109.45 PING 10.220.109.45 (10.220.109.45) 56(84) bytes of data. 64 bytes from 10.220.109.45: icmp_seq=0 ttl=58 time=28.8 ms 64 bytes from 10.220.109.45: icmp_seq=1 ttl=58 time=28.7 ms 64 bytes from 10.220.109.45: icmp_seq=2 ttl=58 time=28.6 ms 64 bytes from 10.220.109.45: icmp_seq=3 ttl=58 time=28.6 ms
跨机房场景,就是典型的高延迟.
公式1:
BDP = link_bandwidth * RTT
tcp window_size 大小应该设置为BDP 或2 x BDP. 假设window_size = BDP, 则得到公式2:
throughput = window_size / RTT
对于上面case来说, 千兆网络, 带宽为125MB/s
BDP = 1000Mbps * 0.030 sec / 8 = 3.7MB (所以应该设置为3.7MB左右) Linux 2.6 上默认的 TCP 窗口大小是 110KB, 根据公式1, 得到 throughput = window_size/RTT = 110KB/0.03 = 3.6MB. 如果设置为3.7MB 根据公式1, 得到 throughput = window_size/RTT = 3.7MB/0.03 = 125MB.
短连接是指,传输数据量较小 很多时候slow start还没有结束, 连接就关闭了.
在10.220.109.45.8080开启一个webserver, 从10.36.115.52下载一个256K的文件.
发送方 10.220.109.45.8080 (被动打开) 接收放 10.36.115.52 (主动打开)
发送方抓包(被动打开, 第一个包是接收到的syn):
11:12:08.045299 IP 10.36.115.52.55281 > 10.220.109.45.8080: S 2126508121:2126508121(0) win 5840 <mss 1460,sackOK,timestamp 1890669490 0,nop,wscale 7> 11:12:08.045376 IP 10.220.109.45.8080 > 10.36.115.52.55281: S 2227406332:2227406332(0) ack 2126508122 win 5792 <mss 1460,sackOK,timestamp 1396920409 1890669490,nop,wscale 7> 11:12:08.074157 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 1 win 46 <nop,nop,timestamp 1890669518 1396920409> 11:12:08.074233 IP 10.36.115.52.55281 > 10.220.109.45.8080: P 1:213(212) ack 1 win 46 <nop,nop,timestamp 1890669518 1396920409> : 客户端发来请求GET xxx 11:12:08.074239 IP 10.220.109.45.8080 > 10.36.115.52.55281: . ack 213 win 54 <nop,nop,timestamp 1396920438 1890669518> : server表示收到请求 11:12:08.074647 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 1:1449(1448) ack 213 win 54 <nop,nop,timestamp 1396920439 1890669518> 11:12:08.074652 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 1449:2897(1448) ack 213 win 54 <nop,nop,timestamp 1396920439 1890669518> 11:12:08.074655 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 2897:4345(1448) ack 213 win 54 <nop,nop,timestamp 1396920439 1890669518> : 连发3个包, 因为init_cwnd=3, 此时等待 11:12:08.103542 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 1449 win 69 <nop,nop,timestamp 1890669548 1396920439> : 等到了1个ack , cwnd+=1 => 4 11:12:08.103548 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 4345:5793(1448) ack 213 win 54 <nop,nop,timestamp 1396920468 1890669548> 11:12:08.103550 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 5793:7241(1448) ack 213 win 54 <nop,nop,timestamp 1396920468 1890669548> 11:12:08.103591 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 2897 win 91 <nop,nop,timestamp 1890669548 1396920439> : 等到了1个ack , cwnd+=1 => 5 11:12:08.103595 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 7241:8689(1448) ack 213 win 54 <nop,nop,timestamp 1396920468 1890669548> 11:12:08.103598 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 8689:10137(1448) ack 213 win 54 <nop,nop,timestamp 1396920468 1890669548> 11:12:08.103639 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 4345 win 114 <nop,nop,timestamp 1890669548 1396920439> : 等到了1个ack , cwnd+=1 => 6 11:12:08.103643 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 10137:11585(1448) ack 213 win 54 <nop,nop,timestamp 1396920468 1890669548> 11:12:08.103645 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 11585:13033(1448) ack 213 win 54 <nop,nop,timestamp 1396920468 1890669548> : 连发6个包 11:12:08.132462 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 5793 win 137 <nop,nop,timestamp 1890669577 1396920468> : 等到了1个ack , cwnd+=1 => 7 11:12:08.132467 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 13033:14481(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> 11:12:08.132469 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 14481:15929(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> 11:12:08.132558 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 7241 win 159 <nop,nop,timestamp 1890669577 1396920468> : 等到了1个ack , cwnd+=1 => 8 11:12:08.132563 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 15929:17377(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> 11:12:08.132565 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 17377:18825(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> 11:12:08.132654 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 8689 win 182 <nop,nop,timestamp 1890669577 1396920468> : 等到了1个ack , cwnd+=1 => 9 11:12:08.132658 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 18825:20273(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> 11:12:08.132661 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 20273:21721(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> 11:12:08.132663 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 10137 win 204 <nop,nop,timestamp 1890669577 1396920468> : 等到了1个ack , cwnd+=1 => 10 11:12:08.132666 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 21721:23169(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> 11:12:08.132668 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 23169:24617(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> 11:12:08.132670 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 11585 win 227 <nop,nop,timestamp 1890669577 1396920468> : 等到了1个ack , cwnd+=1 => 11 11:12:08.132673 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 24617:26065(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> 11:12:08.132675 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 26065:27513(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> 11:12:08.132678 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 13033 win 250 <nop,nop,timestamp 1890669577 1396920468> : 等到了1个ack , cwnd+=1 => 12 11:12:08.132680 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 27513:28961(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> 11:12:08.132683 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 28961:30409(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> : 2ms内 连发12个包 11:12:08.161358 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 14481 win 272 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.161362 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 30409:31857(1448) ack 213 win 54 <nop,nop,timestamp 1396920525 1890669606> 11:12:08.161365 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 31857:33305(1448) ack 213 win 54 <nop,nop,timestamp 1396920525 1890669606> 11:12:08.161406 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 15929 win 295 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.161409 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 33305:34753(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.161412 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 34753:36201(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.161502 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 17377 win 318 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.161506 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 36201:37649(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.161509 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 37649:39097(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.161511 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 18825 win 340 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.161514 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 39097:40545(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.161516 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 40545:41993(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.161597 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 20273 win 363 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.161602 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 41993:43441(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.161604 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 43441:44889(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.161606 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 21721 win 385 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.161609 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 44889:46337(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.161611 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 46337:47785(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.161693 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 23169 win 408 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.161697 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 47785:49233(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.161700 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 49233:50681(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.161702 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 24617 win 431 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.161705 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 50681:52129(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.161708 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 52129:53577(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.161710 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 26065 win 453 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.161713 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 53577:55025(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.161715 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 55025:56473(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.161717 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 27513 win 476 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.161720 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 56473:57921(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.161722 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 57921:59369(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.161725 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 28961 win 499 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.161727 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 59369:60817(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.161730 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 60817:62265(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.161732 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 30409 win 521 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.161735 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 62265:63713(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.161737 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 63713:65161(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> : 4ms内连发24个包. 11:12:08.190703 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 31857 win 544 <nop,nop,timestamp 1890669635 1396920525> 11:12:08.190708 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 65161:66609(1448) ack 213 win 54 <nop,nop,timestamp 1396920555 1890669635> ...
接收端抓包(主动打开, 先发syn):
11:12:08.060202 IP 10.36.115.52.55281 > 10.220.109.45.8080: S 2126508121:2126508121(0) win 5840 <mss 1460,sackOK,timestamp 1890669490 0,nop,wscale 7> 11:12:08.089056 IP 10.220.109.45.8080 > 10.36.115.52.55281: S 2227406332:2227406332(0) ack 2126508122 win 5792 <mss 1460,sackOK,timestamp 1396920409 1890669490,nop,wscale 7> 11:12:08.089069 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 1 win 46 <nop,nop,timestamp 1890669518 1396920409> 11:12:08.089133 IP 10.36.115.52.55281 > 10.220.109.45.8080: P 1:213(212) ack 1 win 46 <nop,nop,timestamp 1890669518 1396920409> : 请求发过去了 11:12:08.118014 IP 10.220.109.45.8080 > 10.36.115.52.55281: . ack 213 win 54 <nop,nop,timestamp 1396920438 1890669518> : 一个rtt后,收到ack 11:12:08.118448 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 1:1449(1448) ack 213 win 54 <nop,nop,timestamp 1396920439 1890669518> : 同时收到响应. 11:12:08.118452 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 1449 win 69 <nop,nop,timestamp 1890669548 1396920439> : 收到一个马上就回复ack, 但是对方需要15ms以后才能收到 11:12:08.118493 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 1449:2897(1448) ack 213 win 54 <nop,nop,timestamp 1396920439 1890669518> 11:12:08.118497 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 2897 win 91 <nop,nop,timestamp 1890669548 1396920439> 11:12:08.118529 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 2897:4345(1448) ack 213 win 54 <nop,nop,timestamp 1396920439 1890669518> :收3个包 11:12:08.118532 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 4345 win 114 <nop,nop,timestamp 1890669548 1396920439> 11:12:08.147370 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 4345:5793(1448) ack 213 win 54 <nop,nop,timestamp 1396920468 1890669548> 11:12:08.147378 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 5793 win 137 <nop,nop,timestamp 1890669577 1396920468> 11:12:08.147444 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 5793:7241(1448) ack 213 win 54 <nop,nop,timestamp 1396920468 1890669548> 11:12:08.147446 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 7241 win 159 <nop,nop,timestamp 1890669577 1396920468> 11:12:08.147499 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 7241:8689(1448) ack 213 win 54 <nop,nop,timestamp 1396920468 1890669548> 11:12:08.147516 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 8689:10137(1448) ack 213 win 54 <nop,nop,timestamp 1396920468 1890669548> 11:12:08.147520 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 8689 win 182 <nop,nop,timestamp 1890669577 1396920468> 11:12:08.147522 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 10137 win 204 <nop,nop,timestamp 1890669577 1396920468> 11:12:08.147563 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 10137:11585(1448) ack 213 win 54 <nop,nop,timestamp 1396920468 1890669548> 11:12:08.147566 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 11585 win 227 <nop,nop,timestamp 1890669577 1396920468> 11:12:08.147569 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 11585:13033(1448) ack 213 win 54 <nop,nop,timestamp 1396920468 1890669548> :收6个包 11:12:08.147572 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 13033 win 250 <nop,nop,timestamp 1890669577 1396920468> 11:12:08.176267 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 13033:14481(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> 11:12:08.176274 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 14481 win 272 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.176308 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 14481:15929(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> 11:12:08.176311 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 15929 win 295 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.176384 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 15929:17377(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> 11:12:08.176388 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 17377 win 318 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.176391 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 17377:18825(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> 11:12:08.176393 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 18825 win 340 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.176460 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 18825:20273(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> 11:12:08.176463 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 20273 win 363 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.176493 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 20273:21721(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> 11:12:08.176496 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 21721 win 385 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.176581 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 21721:23169(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> 11:12:08.176584 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 23169 win 408 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.176587 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 23169:24617(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> 11:12:08.176589 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 24617 win 431 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.176592 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 24617:26065(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> 11:12:08.176595 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 26065 win 453 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.176597 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 26065:27513(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> 11:12:08.176599 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 27513 win 476 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.176602 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 27513:28961(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> 11:12:08.176604 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 28961 win 499 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.176607 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 28961:30409(1448) ack 213 win 54 <nop,nop,timestamp 1396920497 1890669577> : 收12个包 11:12:08.176609 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 30409 win 521 <nop,nop,timestamp 1890669606 1396920497> 11:12:08.205168 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 30409:31857(1448) ack 213 win 54 <nop,nop,timestamp 1396920525 1890669606> 11:12:08.205174 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 31857 win 544 <nop,nop,timestamp 1890669635 1396920525> 11:12:08.205226 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 31857:33305(1448) ack 213 win 54 <nop,nop,timestamp 1396920525 1890669606> 11:12:08.205230 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 33305 win 566 <nop,nop,timestamp 1890669635 1396920525> 11:12:08.205248 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 33305:34753(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.205250 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 34753 win 589 <nop,nop,timestamp 1890669635 1396920526> 11:12:08.205293 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 34753:36201(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.205296 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 36201 win 612 <nop,nop,timestamp 1890669635 1396920526> 11:12:08.205312 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 36201:37649(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.205315 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 37649 win 634 <nop,nop,timestamp 1890669635 1396920526> 11:12:08.205367 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 37649:39097(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.205369 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 39097 win 657 <nop,nop,timestamp 1890669635 1396920526> 11:12:08.205372 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 39097:40545(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.205374 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 40545 win 680 <nop,nop,timestamp 1890669635 1396920526> 11:12:08.205376 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 40545:41993(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.205378 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 41993 win 702 <nop,nop,timestamp 1890669635 1396920526> 11:12:08.205423 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 41993:43441(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.205425 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 43441 win 725 <nop,nop,timestamp 1890669635 1396920526> 11:12:08.205433 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 43441:44889(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.205436 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 44889 win 747 <nop,nop,timestamp 1890669635 1396920526> 11:12:08.205481 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 44889:46337(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.205484 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 46337 win 770 <nop,nop,timestamp 1890669635 1396920526> 11:12:08.205542 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 46337:47785(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.205545 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 47785 win 793 <nop,nop,timestamp 1890669635 1396920526> 11:12:08.205547 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 47785:49233(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.205549 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 49233 win 815 <nop,nop,timestamp 1890669635 1396920526> 11:12:08.205569 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 49233:50681(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.205572 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 50681 win 838 <nop,nop,timestamp 1890669635 1396920526> 11:12:08.205574 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 50681:52129(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.205576 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 52129 win 861 <nop,nop,timestamp 1890669635 1396920526> 11:12:08.205579 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 52129:53577(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.205581 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 53577 win 883 <nop,nop,timestamp 1890669635 1396920526> 11:12:08.205616 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 53577:55025(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.205618 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 55025 win 906 <nop,nop,timestamp 1890669635 1396920526> 11:12:08.205661 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 55025:56473(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.205664 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 56473 win 928 <nop,nop,timestamp 1890669635 1396920526> 11:12:08.205667 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 56473:57921(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.205669 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 57921 win 951 <nop,nop,timestamp 1890669635 1396920526> 11:12:08.205671 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 57921:59369(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.205674 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 59369 win 974 <nop,nop,timestamp 1890669635 1396920526> 11:12:08.205676 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 59369:60817(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.205678 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 60817 win 996 <nop,nop,timestamp 1890669635 1396920526> 11:12:08.205680 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 60817:62265(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.205683 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 62265 win 1019 <nop,nop,timestamp 1890669635 1396920526> 11:12:08.205685 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 62265:63713(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> 11:12:08.205687 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 63713 win 1042 <nop,nop,timestamp 1890669635 1396920526> 11:12:08.205723 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 63713:65161(1448) ack 213 win 54 <nop,nop,timestamp 1396920526 1890669606> : 收24个包 11:12:08.205725 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 65161 win 1064 <nop,nop,timestamp 1890669635 1396920526> 11:12:08.234511 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 65161:66609(1448) ack 213 win 54 <nop,nop,timestamp 1396920555 1890669635> 11:12:08.234517 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 66609 win 1087 <nop,nop,timestamp 1890669664 1396920555> 11:12:08.234552 IP 10.220.109.45.8080 > 10.36.115.52.55281: . 66609:68057(1448) ack 213 win 54 <nop,nop,timestamp 1396920555 1890669635> 11:12:08.234554 IP 10.36.115.52.55281 > 10.220.109.45.8080: . ack 68057 win 1109 <nop,nop,timestamp 1890669664 1396920555> 11:12:08.234619 IP 10.220.109.45.8080 > 10.36.115.52.55281: P 68057:69505(1448) ack 213 win 54 <nop,nop,timestamp 1396920555 1890669635>
平均1ms内发包6个, 假设拥塞窗口不受限的话, 30ms 可以发送:red:6*30=180个包, 相当于`1448*180=260640(约260K)`
上面tcpdump输出中的win 都是54左右, 实际上这是因为wscale, 在第一个SYN包可以看见: wscale 7, 表示scale 为2^7 = 128 即54*128 = 6912
我们发现图中的win值刚开始是5888,后面逐渐增大.
问题1: 这个win初始值和 tcp_rmem大小有啥关系?, 我们设的rmem默认值为2.5M.(程序可以通过setsockopt(SO_RCVBUF修改), 但是目前看curl 没有该设置)
$ cat /proc/sys/net/ipv4/tcp_rmem 4096 2500000 5000000
问题1.1 /proc/sys/net/ipv4/tcp_rmem /proc/sys/net/core/rmem_default 有啥关系?
看到说该值覆盖 net.core.rmem_default:
tcp_rmem - vector of 3 INTEGERs: min, default, max min: Minimal size of receive buffer used by TCP sockets. It is guaranteed to each TCP socket, even under moderate memory pressure. Default: 1 page default: initial size of receive buffer used by TCP sockets. This value overrides net.core.rmem_default used by other protocols. Default: 87380 bytes. This value results in window of 65535 with default setting of tcp_adv_win_scale and tcp_app_win:0 and a bit less for default tcp_app_win. See below about these variables.
ning@ning-laptop ~/idning-source/kernel/linux-2.6.9/net/ipv4$ ack tcp_rmem tcp_ipv4.c 2086: sk->sk_rcvbuf = sysctl_tcp_rmem[1]; 2621: .sysctl_rmem = sysctl_tcp_rmem, tcp.c 281:int sysctl_tcp_rmem[3] = { 4 * 1024, 87380, 87380 * 2 }; 284:EXPORT_SYMBOL(sysctl_tcp_rmem); 2296: sysctl_tcp_rmem[0] = PAGE_SIZE; 2297: sysctl_tcp_rmem[1] = 43689; 2298: sysctl_tcp_rmem[2] = 2 * 43689; sysctl_net_ipv4.c 526: .procname = "tcp_rmem", 527: .data = &sysctl_tcp_rmem, 528: .maxlen = sizeof(sysctl_tcp_rmem),
问题2: 为什么它有逐渐增加的趋势.
我们发现256K 的时候, 平均速度远小于传输大文件的 平均速度:
$ time wget 'http://10.220.109.45:8080/30M.data' real 0m1.228s user 0m0.008s sys 0m0.093s speed = 30M/1.228 = 24MB/s $ time wget 'http://10.220.109.45:8080/256K.data' real 0m0.208s user 0m0.001s sys 0m0.004s speed= 256K/0.2 = 1.28MB/s
短连接:
时间 发送方 接收方 00ms 发送syn 15ms 收到syn, 发回syn 30ms 收到syn 发请求: 45ms 收到请求,发回数据 (初始3个包) 60ms 收到3个包 90ms 收到6个包 120ms 收到12个包 150ms 收到24个包 180ms 收到48个包 210ms 收到89个包 182个包收完.
长连接:
时间 发送方 接收方 00ms 收到syn 发请求: 15ms 收到请求,发回数据 (初始假设为96个包) 30ms 收到96个包 60ms 收到96个包 接收完成.
当cwnd 慢启动满了以后,速度又会稳定在某个值.
(要注意如果发生超时重传,就进入慢启动了 ) (只有发生超时重传才会进入慢启动,连接上长时间没有传送数据也会重新开始慢启动)
/proc/sys/net/ipv4/tcp_vegas_alpha /proc/sys/net/ipv4/tcp_vegas_beta /proc/sys/net/ipv4/tcp_vegas_cong_avoid /proc/sys/net/ipv4/tcp_vegas_gamma