Permalink: 2014-02-11 10:29:10 by ning in redis tags: all

两种扩容思路:

  1. 一个是 redis-mgr 中redis实例的迁移, 迁到一个内存大的机器
  2. 另外一个是新搭建集群, 把数据迁移过去.
  3. 和2类似, 把旧集群中某个业务(一定前缀)的数据迁移到新集群.

这里说的是思路1

1   两种方法:

  • 冷迁移
    • 步骤
      1. 拷贝rdb/aof文件,
      2. 搭建新的master-slave
      3. 更新twemproxy配置 重启
    • 问题
      • 丢失部分到老master/slave的写
  • 热迁移
    • 场景:
      1. 集群维护(如部分机器下线)
      2. 集群扩容, 如2*32 实例原来部署在16 机器上, 可以扩容到 32/64 机器上.
    • 步骤, 假设老的master/slave 为m,s, 新的为n1, n2
      1. 搭建n1, n1 SLAVEOF m, 等待同步完成.
      2. kill s
      3. kill m, n1成为master.
      4. 搭建n2, n2 SLAVEOF n1

如果服务down了, 起不来(主从都down了起不来的可能性较小), 只能使用冷迁移

下面是热迁移的方法.

2   redis-mgr中如何操作

  1. config中写操作步骤:

    cluster0 = {
        'migrate' : [
            'host1:port' => 'host2:port'
        ],
    }
    
  2. 直接修改 cluster 中redis 这一节的配置.

  3. 通过命令:

    ./bin/deploy.py cluster0 migrate xxxx  xxx
    

最后选择了3.

3   实现细节:主从同步的步骤

通过观察 现在 集群中杀掉从库重启后, 通过redis 的INFO命令观察主/从的表现:

  1. slave 启动, load 本地的aof文件, load完成后:

    used_memory_peak_human CHANGE FROM 2.32G to 2.33G
    
  2. 收到sentinel 发来的SLAVEOF命令:

    master_host CHANGE FROM  to 127.0.0.5
    master_port CHANGE FROM  to 22000
    master_link_status CHANGE FROM  to down
    slave_repl_offset CHANGE FROM  to -1
    

    此时 master 这边的状态变化:

    slave0 CHANGE FROM  to ip=127.0.0.5,port=23000,state=wait_bgsave,offset=0,lag=0
    

    master 开始做一次bgsave. slave 等待bgsave(wait_bgsave)

  3. master bgsave 完成, 开始传送数据:

    master:

    slave0 CHANGE FROM ip=127.0.0.5,port=23000,state=wait_bgsave,offset=0,lag=0 to ip=127.0.0.5,port=23000,state=send_bulk,offset=0,lag=0
    
  4. rdb 传送完成后, slave load 获得的数据:

    master_sync_left_bytes CHANGE FROM 47165401 to 0
    mem_fragmentation_ratio CHANGE FROM 1.01 to 1.21
    used_memory_human CHANGE FROM 2.33G to 18.79M
    
    
    开始load 新的db:
    used_memory_human CHANGE FROM 18.79M to 116.79M
    
  5. load完成:

    aof_rewrite_in_progress CHANGE FROM 0 to 1
    master_link_status CHANGE FROM down to up
    slave_repl_offset CHANGE FROM -1 to 995178543
    used_memory_human CHANGE FROM 2.29G to 2.33G
    loading CHANGE FROM 1 to 0
    

    此时:

    slave0 CHANGE FROM ip=127.0.0.5,port=23000,state=send_bulk,offset=0,lag=0 to ip=127.0.0.5,port=23000,state=online,offset=0,lag=1
    

所以:

  1. 基准数据同步完成的标志
    • master 上看对应slave的 status=online
    • slave 上看 master_link_status = up
  2. 实时同步跟上的标志
    • 从master 上看 lag=0
    • master 的 master_repl_offset : 和slave 的 slave_repl_offset 一致
  3. 进度信息获取:
    • slave 的内存/master 的内存
    • db0:keys 在master/slave 上对比.

4   小结

最终实现了下面命令:

migrate src dst : migrate a redis instance to another machine

步骤:

pre_check,
force_src_be_slave,
deploy_dst,
add_dst_as_slave,
cleanup,
sentinel_reset,
update_config,

使用方法:

$ ./bin/deploy.py cluster0 migrate cluster0-22000:127.0.0.5:23000:/tmp/r/redis-23000 cluster0-22000:127.0.0.5:50015:/tmp/r/redis-50015
...
2014-02-27 19:21:58,667 [MainThread] [INFO] deploy [redis:127.0.0.5:50015]
2014-02-27 19:21:59,774 [MainThread] [INFO] [redis:127.0.0.5:50015] start ok in 0.19 seconds
2014-02-27 19:21:59,775 [MainThread] [NOTICE] add_dst_as_slave
2014-02-27 19:21:59,790 [MainThread] [INFO] [redis:127.0.0.5:50015] /home/ning/idning-github/redis/src/redis-cli -h 127.0.0.5 -p 50015 SLAVEOF 127.0.0.5 22000
OK
2014-02-27 19:21:59,801 [MainThread] [INFO] [redis:127.0.0.5:50015]: {'used_memory': '342432', 'master_link_status': 'down', 'slave_repl_offset': '-1'}
2014-02-27 19:22:00,811 [MainThread] [INFO] [redis:127.0.0.5:50015]: {'used_memory': '342464', 'master_link_status': 'down', 'slave_repl_offset': '-1'}
2014-02-27 19:22:01,820 [MainThread] [INFO] [redis:127.0.0.5:50015]: {'used_memory': '363456', 'master_link_status': 'up', 'slave_repl_offset': '5998625'}
2014-02-27 19:22:01,821 [MainThread] [NOTICE] cleanup
2014-02-27 19:22:02,156 [MainThread] [INFO] [redis:127.0.0.5:23000] stop ok in 0.11 seconds
2014-02-27 19:22:02,156 [MainThread] [NOTICE] sentinel_reset
2014-02-27 19:22:02,165 [MainThread] [NOTICE] update_config
2014-02-27 19:22:02,166 [MainThread] [INFO] AppendConfig:cluster0['migration'] = []
2014-02-27 19:22:02,166 [MainThread] [INFO] AppendConfig:cluster0['migration'].append('cluster0-22000:127.0.0.5:23000:/tmp/r/redis-23000=>cluster0-22000:127.0.0.5:50015:/tmp/r/redis-50015')

它会修改conf.py, 在末尾增加替换信息:

cluster0['migration'] = []
cluster0['migration'].append('cluster0-22000:127.0.0.5:23000:/tmp/r/redis-23000=>cluster0-22000:127.0.0.5:50015:/tmp/r/redis-50015')

当下一次用redis-mgr操作这个集群时, 老的instace信息就会被新instance替代:

$ ./bin/deploy.py cluster0 status
2014-02-27 19:24:24,815 [MainThread] [NOTICE] start running: ./bin/deploy.py -v cluster0 status
2014-02-27 19:24:24,820 [MainThread] [NOTICE] status redis
2014-02-27 19:24:24,825 [MainThread] [INFO] [redis:127.0.0.5:22000] uptime 29815 seconds
2014-02-27 19:24:24,831 [MainThread] [INFO] [redis:127.0.0.5:50015] uptime 145 seconds
...
2014-02-27 19:24:24,893 [MainThread] [NOTICE] status master-slave
cluster0-22000 [redis:127.0.0.5:22000] <- 127.0.0.5:50015
cluster0-22001 [redis:127.0.0.5:22001] <- 127.0.0.5:23001
cluster0-22002 [redis:127.0.0.5:22002] <- 127.0.0.5:23002
cluster0-22003 [redis:127.0.0.5:22003] <- 127.0.0.5:23003

Comments