You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug#23604483 GCOLS: MINIMAL BINLOG_ROW_IMAGE LEAD TO CORRUPTION/CRASH ON SLAVE
========
Problem:
========
Slave crashes when it is trying to apply an UPDATE row event
(with virtual generated columns) that came from a Master with
BINLOG_ROW_IMAGE=MINIMAL setting.
=========
Analysis:
=========
=============================================================
read_set/write_set bits requirement for GCol computation:
=============================================================
Stored generated columns are computed during an operation and
the value is stored inside storage engine. The computed value will be used
incase of next operation on that column. But Virtual generated column
values, unlike storged generated columns, computed always on the fly and
will not be stored anywhere. But if there is an index on top of virtual
generated columns, it will be maintained by Storage engine. The code
that is required to compute the Virtual generated columns on the fly
is added to all the handler functions (ha_rnd_pos, ha_rnd_next,
ha_index_read_map and etc.,). But due to performance reasons,
it will be computed only when the caller explictly asked to do so by
setting it's flag in read_set bitmap.
mark_generated_columns() function is used to set read_set/write_set
flags that are required to do generated columns modifications
(insert/update/delete).
=========================================================================
'Innodb' Storage engine behaviour/expecation in case of DMLs (update operations)
=========================================================================
In case DMLs are the ongoing operations and exclusive lock is acquired on
the table (which is true for all DMLs), fetch_all_key is set to true i.e.,
storage engine ignores table->read_set and fetches all the column values.
If a primary key (or the keys that can retrieve the tuple uniquely) is provided
to storage engine, storage engine will retrieve all column values from the clustered index
and fills the tuple struture (struct record) with all the column values
irrespective of what is there in table->read_set bitmap.
Also, Storage engine expects all the column values are ready in before_image
before calling ha_update_row, so that the value of a column from the image
can be used as key to find second index entry in case the index entry needs
to be updated.
=============================================================
RBR behaviour when BINLOG_ROW_IMAGE=MINIMAL and setting of
read_set/write_set bits in that setting:
=============================================================
In MySQL row-based replication, each row change event contains two images, a
“before” image whose columns are matched against when searching for the row
to be updated, and an “after” image containing the changes.
In case of MINIMAL setting, for the before image, it is necessary only that
the minimum set of columns required to uniquely identify rows is logged.
If the table containing the row has a primary key, then only the primary key
column or columns are written to the binary log. In the replication flow,
read_set and write_set bitmaps are also used to decide what all the columns
needs to be packed to write into the binary log.
=============================
Analysis on the server crash
=============================
Only DDLs and DMLs are replicated (SELECTs are never replicated).
With the above explanation, storage engines always return all the column
values in case of DMLs (i.e., fetch_all_key is true always for DMLs).
So in pure replication flow (i.e., RBR on slave does not go through regular
optimzer layer) never sets read_set bits to decide what needs to be retrieved
from the storage engine as it was getting all the column values from the
storage engine irrespective of what is there in 'read_set' bitmap.
But since virtual generated columns are not maintained by storage engine
and the retriveal/computation of virtual generated columns are depends on
its bit value in read_set bitmap, it is not computed now. Hence it is not
filled in before_image struture when RBR logic called handler function(in this
example, ha_rnd_pos).
Later when RBR called ha_update_row and storage engine wanted to update secondary
index created on virtua column(which is maintained by storage engine), it tries
to find secondary index entry for the given virtual column value (which is
NULL) and crashes there.
====
Fix:
====
Call mark_generated_columns() before calling any handler functions to
retrieve the before_image. This function will make sure to set all
the read_set/write_set bits that are required to compute/update
virutal columns.
binlog_prepare_row_image() function, which will be called from
binlogging functions (binlog_update_row() and binlog_delete_row())
will take care of removing these spurious fields required during execution
but not needed for binlogging. In case of inserts, there are no spurious
fields (all the columns are required to be written into the binlog)
Along with the fix, the patch is removing some code that was added
for testing purpose (in all three paths, INDEX_SCAN, TABLE_SCAN and
HASH_SCAN).
- // Temporary fix to find out why it fails [/Matz]
- memcpy(m_table->read_set->bitmap, m_cols.bitmap, (m_table->read_set->n_bits + 7) / 8);
0 commit comments