baguai

uid：74

注册时间：2022-8-25 11:30 上次发表时间：2024-6-13 14:29

好友数 2 | 博客数 32 | 回帖数 7 | 主题数 0

MySQL:双主server_id的过滤方法

baguai 已有 3344 次阅读2023-5-6 10:41 |个人分类:MySQL学习|系统分类:用户实践| 数据库, MySQL

作者简介：高鹏，笔名八怪。《深入理解MySQL主从原理》图书作者，同时运营个人公众号“MySQL学习”，持续分享遇到的有趣case以及代码解析！

最近遇到一个和这个相关的问题，简单记录了一下，有误请谅解。5.7.29版本代码。

一、问题抛出

不知道大家是否用过双主，双主有一个值得思考的地方就是在开启从库记录主库binlog的情况下(log_slave_updates=ON)这种情况下我们一个event是否会从主库传输到从库然后从库又传回来主库应用呢？这就循环了。

显然这是不会的，我们知道在event的header中存储了server_id，也就是根据这个server_id可以进行过滤，那么这里大概是怎么做的呢?下面来看看。

二、过程分析

首先呢，在IO线程中(queue_event)，通过读取到event会通过header拿到server_id

(s_id == ::server_id && !mi-&gt;rli-&gt;replicate_same_server_id)

这里两个条件，需要同时成立。

s_id == ::server_id: ::server_id是本地的server id而s_id来自我们读取到的event header的server_id，这是关键的过滤条件。
mi->rli->replicate_same_server_id: 一般我们不会设置这个，且和log_slave_updates有冲突报错如下：

using --replicate-same-server-id in conjunction with--log-slave-updates is impossible, 
it would lead to infinite loops in thisserver.

如果这个条件成立则会进行跳过，这个过程relay log将不会记录这部分event，源码注释如下：

    /*
          Do not write it to the relay log.      
          a) We still want to increment mi-&gt;get_master_log_pos(), so that we won't      
          re-read this event from the master if the slave IO thread is now      
          stopped/restarted (more efficient if the events we are ignoring are big      
          LOAD DATA INFILE).      
          b) We want to record that we are skipping events, for the information of      
          the slave SQL thread, otherwise that thread may let      
          rli-&gt;group_relay_log_pos stay too small if the last binlog's event is      
          ignored.      
          But events which were generated by this slave and which do not exist in      
          the master's binlog (i.e. Format_desc, Rotate & Stop) should not increment      
          mi-&gt;get_master_log_pos().      
          If the event is originated remotely and is being filtered out by      
          IGNORE_SERVER_IDS it increments mi-&gt;get_master_log_pos()      
          as well as rli-&gt;group_relay_log_pos.    */

虽然IO线程跳过了这些server_id相同的event了，但是需要考虑的问题还有2个

需要增加IO线程拉取的位点
需要增加SQL线程读取的位点

也就是红色部分是需要更新的。

那具体是怎么做的呢？如下：

IO线程(queue_event)

    {
          mi-&gt;set_master_log_pos(mi-&gt;get_master_log_pos() + inc_pos);      
          memcpy(rli-&gt;ign_master_log_name_end, mi-&gt;get_master_log_name(), FN_REFLEN);      
          rli-&gt;ign_master_log_pos_end= mi-&gt;get_master_log_pos();    
    }    
    rli-&gt;relay_log.signal_update(); // the slave SQL thread needs to re-check

IO线程通过如上的方式，修改了IO线程的拉取位点，同时给SQL线程一个标记，这个标记就是ign_master_log_pos_end和ign_master_log_name_end，完成后唤醒SQL线程干活。

SQL线程

SQL线程会根据前面标记进行执行位点的更新，主要方式是构建一个Rotate_log_event来做(next_event)，如下：

        if (rli-&gt;ign_master_log_name_end[0])        
        {
                  /* We generate and return a Rotate, to make our positions advance */          
                  DBUG_PRINT("info",("seeing an ignored end segment"));          
                  ev= new Rotate_log_event(rli-&gt;ign_master_log_name_end,
                                                     0, rli-&gt;ign_master_log_pos_end,                                   
                                                     Rotate_log_event::DUP_NAME);  //构建一个rotate event 来做跳过操作          
                  rli-&gt;ign_master_log_name_end[0]= 0;          
                  mysql_mutex_unlock(log_lock);          
                  if (unlikely(!ev))          
                  {
                              errmsg= "Slave SQL thread failed to create a Rotate event "              
                              "(out of memory?), SHOW SLAVE STATUS may be inaccurate";            
                              goto err;          
                  }          
                  ev-&gt;server_id= 0; // don't be ignored by slave SQL thread  这里设置为server_id为0          
                  DBUG_RETURN(ev);        
        }

随后SQL线程会更新执行的位点。也就是说虽然跳过了server_id相同的event，但是需要更新执行位点。

#0  Relay_log_info::set_group_master_log_pos (this=0x70bf130, log_pos=3435) at /home/mysql/soft/percona-server-5.7.29-32/sql/rpl_rli.h:1028
#1  0x0000000001850825 in Relay_log_info::inc_group_relay_log_pos (this=0x70bf130, log_pos=3435, need_data_lock=false) at /home/mysql/soft/percona-server-5.7.29-32/sql/rpl_rli.cc:1202
#2  0x00000000017c4ec0 in Rotate_log_event::do_update_pos (this=0x7ffdec015730, rli=0x70bf130) at /home/mysql/soft/percona-server-5.7.29-32/sql/log_event.cc:6974
#3  0x000000000184659e in Log_event::update_pos (this=0x7ffdec015790, rli=0x70bf130) at /home/mysql/soft/percona-server-5.7.29-32/sql/log_event.h:1220
#4  0x000000000183454a in apply_event_and_update_pos (ptr_ev=0x7ffdf416c780, thd=0x7ffdec000970, rli=0x70bf130) at /home/mysql/soft/percona-server-5.7.29-32/sql/rpl_slave.cc:4982
#5  0x00000000018353da in exec_relay_log_event (thd=0x7ffdec000970, rli=0x70bf130) at /home/mysql/soft/percona-server-5.7.29-32/sql/rpl_slave.cc:5358
#6  0x000000000183c14b in handle_slave_sql (arg=0x700ade0) at /home/mysql/soft/percona-server-5.7.29-32/sql/rpl_slave.cc:7626

这是由于前面构建的Rotate_log_event中记录了忽略的位点。这里执行Rotate_log_event，设置sql线程执行的位点。

因此虽然跳过了server_id为本节点的event，但是IO线程的读取位点和SQL线程执行位点都得到了更新，但是由于过滤是IO线程做的，因此server_id和本库相同的event不会记录到relay log。

收藏 0 邀请举报

baguai

MySQL:双主server_id的过滤方法

一、问题抛出

二、过程分析

全部作者的其他最新博客

评论 (0 个评论)