Skip to content

Commit c39e7df

Browse files
author
Venkatesh Duggirala
committed
Bug#23604483 GCOLS: MINIMAL BINLOG_ROW_IMAGE LEAD TO CORRUPTION/CRASH ON SLAVE
======== Problem: ======== Slave crashes when it is trying to apply an UPDATE row event (with virtual generated columns) that came from a Master with BINLOG_ROW_IMAGE=MINIMAL setting. ========= Analysis: ========= ============================================================= read_set/write_set bits requirement for GCol computation: ============================================================= Stored generated columns are computed during an operation and the value is stored inside storage engine. The computed value will be used incase of next operation on that column. But Virtual generated column values, unlike storged generated columns, computed always on the fly and will not be stored anywhere. But if there is an index on top of virtual generated columns, it will be maintained by Storage engine. The code that is required to compute the Virtual generated columns on the fly is added to all the handler functions (ha_rnd_pos, ha_rnd_next, ha_index_read_map and etc.,). But due to performance reasons, it will be computed only when the caller explictly asked to do so by setting it's flag in read_set bitmap. mark_generated_columns() function is used to set read_set/write_set flags that are required to do generated columns modifications (insert/update/delete). ========================================================================= 'Innodb' Storage engine behaviour/expecation in case of DMLs (update operations) ========================================================================= In case DMLs are the ongoing operations and exclusive lock is acquired on the table (which is true for all DMLs), fetch_all_key is set to true i.e., storage engine ignores table->read_set and fetches all the column values. If a primary key (or the keys that can retrieve the tuple uniquely) is provided to storage engine, storage engine will retrieve all column values from the clustered index and fills the tuple struture (struct record) with all the column values irrespective of what is there in table->read_set bitmap. Also, Storage engine expects all the column values are ready in before_image before calling ha_update_row, so that the value of a column from the image can be used as key to find second index entry in case the index entry needs to be updated. ============================================================= RBR behaviour when BINLOG_ROW_IMAGE=MINIMAL and setting of read_set/write_set bits in that setting: ============================================================= In MySQL row-based replication, each row change event contains two images, a “before” image whose columns are matched against when searching for the row to be updated, and an “after” image containing the changes. In case of MINIMAL setting, for the before image, it is necessary only that the minimum set of columns required to uniquely identify rows is logged. If the table containing the row has a primary key, then only the primary key column or columns are written to the binary log. In the replication flow, read_set and write_set bitmaps are also used to decide what all the columns needs to be packed to write into the binary log. ============================= Analysis on the server crash ============================= Only DDLs and DMLs are replicated (SELECTs are never replicated). With the above explanation, storage engines always return all the column values in case of DMLs (i.e., fetch_all_key is true always for DMLs). So in pure replication flow (i.e., RBR on slave does not go through regular optimzer layer) never sets read_set bits to decide what needs to be retrieved from the storage engine as it was getting all the column values from the storage engine irrespective of what is there in 'read_set' bitmap. But since virtual generated columns are not maintained by storage engine and the retriveal/computation of virtual generated columns are depends on its bit value in read_set bitmap, it is not computed now. Hence it is not filled in before_image struture when RBR logic called handler function(in this example, ha_rnd_pos). Later when RBR called ha_update_row and storage engine wanted to update secondary index created on virtua column(which is maintained by storage engine), it tries to find secondary index entry for the given virtual column value (which is NULL) and crashes there. ==== Fix: ==== Call mark_generated_columns() before calling any handler functions to retrieve the before_image. This function will make sure to set all the read_set/write_set bits that are required to compute/update virutal columns. binlog_prepare_row_image() function, which will be called from binlogging functions (binlog_update_row() and binlog_delete_row()) will take care of removing these spurious fields required during execution but not needed for binlogging. In case of inserts, there are no spurious fields (all the columns are required to be written into the binlog) Along with the fix, the patch is removing some code that was added for testing purpose (in all three paths, INDEX_SCAN, TABLE_SCAN and HASH_SCAN). - // Temporary fix to find out why it fails [/Matz] - memcpy(m_table->read_set->bitmap, m_cols.bitmap, (m_table->read_set->n_bits + 7) / 8);
1 parent dedc8b3 commit c39e7df

File tree

1 file changed

+20
-10
lines changed

1 file changed

+20
-10
lines changed

sql/log_event.cc

Lines changed: 20 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -9995,9 +9995,6 @@ int Rows_log_event::do_index_scan_and_update(Relay_log_info const *rli)
99959995
if ((error= unpack_current_row(rli, &m_cols)))
99969996
goto end;
99979997

9998-
// Temporary fix to find out why it fails [/Matz]
9999-
memcpy(m_table->read_set->bitmap, m_cols.bitmap, (m_table->read_set->n_bits + 7) / 8);
10000-
100019998
/*
100029999
Trying to do an index scan without a usable key
1000310000
This is a valid state because we allow the user
@@ -10459,9 +10456,6 @@ int Rows_log_event::do_table_scan_and_update(Relay_log_info const *rli)
1045910456
prepare_record(table, &m_cols, FALSE);
1046010457
if (!(error= unpack_current_row(rli, &m_cols)))
1046110458
{
10462-
// Temporary fix to find out why it fails [/Matz]
10463-
memcpy(m_table->read_set->bitmap, m_cols.bitmap, (m_table->read_set->n_bits + 7) / 8);
10464-
1046510459
/** save a copy so that we can compare against it later */
1046610460
store_record(m_table, record[1]);
1046710461

@@ -10834,6 +10828,26 @@ int Rows_log_event::do_apply_event(Relay_log_info const *rli)
1083410828
get_general_type_code() == binary_log::UPDATE_ROWS_EVENT)
1083510829
bitmap_intersect(table->read_set,&m_cols);
1083610830

10831+
/*
10832+
Call mark_generated_columns() to set read_set/write_set bits of the
10833+
virtual generated columns as required in order to get these computed.
10834+
This is needed since all columns need to have a value in the before
10835+
image for the record when doing the update (some storage engines will
10836+
use this for maintaining of secondary indexes). This call is required
10837+
even for DELETE events to set write_set bit in order to satisfy
10838+
ASSERTs in Field_*::store functions.
10839+
10840+
binlog_prepare_row_image() function, which will be called from
10841+
binlogging functions (binlog_update_row() and binlog_delete_row())
10842+
will take care of removing these spurious fields required during
10843+
execution but not needed for binlogging. In case of inserts, there
10844+
are no spurious fields (all generated columns are required to be written
10845+
into the binlog).
10846+
*/
10847+
10848+
if (m_table->vfield)
10849+
m_table->mark_generated_columns(get_general_type_code() == binary_log::UPDATE_ROWS_EVENT);
10850+
1083710851
bitmap_set_all(table->write_set);
1083810852

1083910853
/* WRITE ROWS EVENTS store the bitmap in m_cols instead of m_cols_ai */
@@ -12636,10 +12650,6 @@ Update_rows_log_event::do_exec_row(const Relay_log_info *const rli)
1263612650
DBUG_DUMP("old record", m_table->record[1], m_table->s->reclength);
1263712651
DBUG_DUMP("new values", m_table->record[0], m_table->s->reclength);
1263812652

12639-
// Temporary fix to find out why it fails [/Matz]
12640-
memcpy(m_table->read_set->bitmap, m_cols.bitmap, (m_table->read_set->n_bits + 7) / 8);
12641-
memcpy(m_table->write_set->bitmap, m_cols_ai.bitmap, (m_table->write_set->n_bits + 7) / 8);
12642-
1264312653
m_table->mark_columns_per_binlog_row_image();
1264412654
error= m_table->file->ha_update_row(m_table->record[1], m_table->record[0]);
1264512655
if (error == HA_ERR_RECORD_IS_THE_SAME)

0 commit comments

Comments
 (0)