Skip to content

Commit 7711c65

Browse files
author
Joao Gramacho
committed
BUG#24763579: BINLOG SENDER IS GENERATING UNWANTED FSEEKS ON BINARY
LOG FILE Problem ======= The Binlog_sender is, some times, generating unwanted fseeks into the binary log file being replicated to a salve. Analysis ======== The Binlog_sender does the following "per event" to be sent to a slave: 1) Peek the event size; 2) Adjust the packet buffer to fit the whole event, based on its size; 3) Read the (full) event into the buffer; 4) Send the event to the slave. Detailing (1) a little bit, it: 1.1) Saves current IO_CACHE position; 1.2) Read the event reader (currently 19 bytes) into a temporary buffer; 1.3) Get the event size from the temporary buffer; 1.4) Change back the IO_CACHE position to the beginning of the event; Detailing (3): 3.1) Read the event reader (currently 19 bytes) from the IO_CACHE into a temporary buffer; 3.2) Get the event size from the temporary buffer; 3.3) Append the temporary buffer (with the event header) to the packet buffer; 3.4) Calculate the remaining event data to be read (event size - header size); 3.5) Append the remaining event data from the IO_CACHE directly to the packet buffer. The binary log file IO_CACHE uses an internal 8K buffer for caching the real file access. It reads 8K blocks from the real file when the read request is less than 8K in size. When the event header is fully stored within an 8K block of the file, the IO_CACHE guarantees that the read(of 19 bytes) and the fseek(to current position minus 19) will not require additional reads from the real file. When the event header is crossing the boundary of an 8K block, the IO_CACHE copies the part that is in the current 8K block, reads the next 8K block from the file and copies the rest of the event header. The "problem" is that, in this case, the repositioning to the beginning of the event will require to read the previous 8K block again from the real file. This will lead to a hit at "COUNT_MISC" counter at "performance_schema.file_summary_by_event_name" for the event_name 'wait/io/file/sql/binlog'. Fix === Reused the header loaded when "peeking" the event size (suppressing the fseek) when reading the full event into the packet buffer.
1 parent bfbfae3 commit 7711c65

File tree

5 files changed

+185
-24
lines changed

5 files changed

+185
-24
lines changed
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
include/master-slave.inc
2+
Warnings:
3+
Note #### Sending passwords in plain text without SSL/TLS is extremely insecure.
4+
Note #### Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
5+
[connection master]
6+
include/stop_dump_threads.inc
7+
CREATE TABLE t1 (c1 TEXT(8192));
8+
Inserting a reference transaction to calculate the content size
9+
Filling binary log up to the border of the 8192 block
10+
include/assert.inc [The binary log is at the expected border of a 8192 block]
11+
Filling the binary log 100 transactions
12+
TRUNCATE performance_schema.file_summary_by_event_name;
13+
[connection slave]
14+
include/start_slave_io.inc
15+
[connection master]
16+
include/sync_slave_io_with_master.inc
17+
[connection master]
18+
include/assert.inc [COUNT_MISC for "wait/io/file/sql/binlog" should be = 5]
19+
[connection slave]
20+
include/start_slave_sql.inc
21+
[connection master]
22+
DROP TABLE t1;
23+
include/rpl_end.inc
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
# ==== Purpose ====
2+
#
3+
# This test case test verifies that no unwanted fseeks are done by dump threads
4+
# reading the binary log file sequentially. Dump threads should not change
5+
# COUNT_MISC after started dumping a binary log file, until moving to dumping
6+
# another binary log file.
7+
#
8+
# This test will create a set of transactions which event headers would,
9+
# sometimes, cross the IO_CACHE block size. After running the workload, the test
10+
# case will verify the P_S counter COUNT_MISC for "wait/io/file/sql/binlog"
11+
# EVENT_NAME at performance_schema.file_summary_by_event_name. In a single
12+
# binary log file case, the counter should not be more than 5 (as there are some
13+
# real file operations opening the binary log file and pointing it to the first
14+
# event position).
15+
#
16+
# ==== Related Bugs and Worklogs ====
17+
#
18+
# BUG#24763579 BINLOG SENDER IS GENERATING UNWANTED FSEEKS ON BINARY LOG FILE
19+
#
20+
# It was noticed that the binlog sender was reading the event reader (19 bytes)
21+
# to get the event size and was doing a "fseek" on the IO_CACHE to the beginning
22+
# of the event, as the event would be fully read later, after adjusting the
23+
# network packet size to fit the whole event.
24+
#
25+
# As the IO_CACHE read the files using an 8K block (for reads smaller than 8K),
26+
# when the event header crossed the block boundaries, the "fseek" needed to
27+
# be performed on file level (and not only at IO_CACHE buffer level), generating
28+
# a call to the OS that resulted in an unwanted wait.
29+
#
30+
31+
# This test case was designed to run with SBR
32+
--source include/have_binlog_format_statement.inc
33+
--let $rpl_skip_start_slave= 1
34+
--source include/master-slave.inc
35+
--source include/stop_dump_threads.inc
36+
37+
##################################################
38+
# Some variable to help calculating the workload #
39+
##################################################
40+
# IO_CACHE block size
41+
--let $block_size= 8192
42+
# How many bytes of the header should be at the end of a block
43+
--let $border= 2
44+
# We must fit many events into a block size
45+
--let $event_factor= 8
46+
47+
################################################
48+
# Preparing the master for the larger workload #
49+
################################################
50+
--eval CREATE TABLE t1 (c1 TEXT($block_size))
51+
52+
# Get current binlog position
53+
--let $init_master_pos= query_get_value(SHOW MASTER STATUS, Position, 1)
54+
55+
--let $ref_size= `SELECT $block_size DIV $event_factor`
56+
--let $content= `SELECT REPEAT('x', $ref_size)`
57+
--disable_query_log
58+
--echo Inserting a reference transaction to calculate the content size
59+
--eval INSERT INTO t1 VALUES ("$content")
60+
--enable_query_log
61+
62+
# Calculate the "transaction payload" (all but the insert content)
63+
--let $master_pos= query_get_value(SHOW MASTER STATUS, Position, 1)
64+
--let $trx_base_size= `SELECT ($master_pos - $init_master_pos) - $ref_size`
65+
66+
# Prepare a transaction to left the binlog at the border of a block
67+
--let $content_size= `SELECT (($block_size - $border) - $trx_base_size) - $master_pos`
68+
--let $content= `SELECT REPEAT('x', $content_size)`
69+
--disable_query_log
70+
--echo Filling binary log up to the border of the $block_size block
71+
--eval INSERT INTO t1 VALUES ("$content")
72+
--enable_query_log
73+
74+
--let $master_pos= query_get_value(SHOW MASTER STATUS, Position, 1)
75+
--let $assert_text= The binary log is at the expected border of a $block_size block
76+
--let $assert_cond= $border = ($block_size - ( $master_pos % $block_size ))
77+
--source include/assert.inc
78+
79+
# Prepare a transaction content for the large workload
80+
--let $content_size= `SELECT (($block_size DIV $event_factor) - $trx_base_size)`
81+
--let $content= `SELECT REPEAT('x', $content_size)`
82+
83+
######################
84+
# The large workload #
85+
######################
86+
--let $counter=100
87+
--echo Filling the binary log $counter transactions
88+
--disable_query_log
89+
while ($counter)
90+
{
91+
--eval INSERT INTO t1 VALUES ("$content")
92+
--dec $counter
93+
}
94+
--enable_query_log
95+
96+
################################################
97+
# Sync slave I/O thread and check the counters #
98+
################################################
99+
TRUNCATE performance_schema.file_summary_by_event_name;
100+
101+
# Sync slave I/O thread
102+
--source include/rpl_connection_slave.inc
103+
--source include/start_slave_io.inc
104+
--source include/rpl_connection_master.inc
105+
--source include/sync_slave_io_with_master.inc
106+
107+
# Check the COUNT_MISC counter
108+
--source include/rpl_connection_master.inc
109+
# In a first run, the counter tops 5, but running mtr --repeat
110+
# is making the other than first rounds to top 12
111+
--let $assert_text= COUNT_MISC for "wait/io/file/sql/binlog" should be = 5
112+
--let $assert_cond= [ SELECT COUNT_MISC = 5 FROM performance_schema.file_summary_by_event_name WHERE EVENT_NAME = "wait/io/file/sql/binlog" ]
113+
--let $extra_debug_eval= COUNT_MISC FROM performance_schema.file_summary_by_event_name WHERE EVENT_NAME = "wait/io/file/sql/binlog"
114+
--source include/assert.inc
115+
116+
# Cleanup
117+
--source include/rpl_connection_slave.inc
118+
--source include/start_slave_sql.inc
119+
--source include/rpl_connection_master.inc
120+
DROP TABLE t1;
121+
--source include/rpl_end.inc

sql/log_event.cc

Lines changed: 24 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1128,12 +1128,14 @@ int Log_event::read_log_event(IO_CACHE* file, String* packet,
11281128
mysql_mutex_t* log_lock,
11291129
enum_binlog_checksum_alg checksum_alg_arg,
11301130
const char *log_file_name_arg,
1131-
bool* is_binlog_active)
1131+
bool* is_binlog_active,
1132+
char *event_header)
11321133
{
11331134

11341135
ulong data_len;
11351136
int result=0;
1136-
char buf[LOG_EVENT_MINIMAL_HEADER_LEN];
1137+
char local_buf[LOG_EVENT_MINIMAL_HEADER_LEN];
1138+
char *buf= event_header != NULL ? event_header : local_buf;
11371139
uchar ev_offset= packet->length();
11381140
DBUG_ENTER("Log_event::read_log_event(IO_CACHE *, String *, mysql_mutex_t, uint8)");
11391141

@@ -1143,20 +1145,27 @@ int Log_event::read_log_event(IO_CACHE* file, String* packet,
11431145
if (log_file_name_arg)
11441146
*is_binlog_active= mysql_bin_log.is_active(log_file_name_arg);
11451147

1146-
if (my_b_read(file, (uchar*) buf, sizeof(buf)))
1148+
/* If the event header wasn't passed, we need to read it. */
1149+
if (buf == local_buf)
11471150
{
1148-
/*
1149-
If the read hits eof, we must report it as eof so the caller
1150-
will know it can go into cond_wait to be woken up on the next
1151-
update to the log.
1152-
*/
1153-
DBUG_PRINT("error",("my_b_read failed. file->error: %d", file->error));
1154-
if (!file->error)
1155-
result= LOG_READ_EOF;
1156-
else
1157-
result= (file->error > 0 ? LOG_READ_TRUNC : LOG_READ_IO);
1158-
goto end;
1151+
if (my_b_read(file, (uchar*) buf, sizeof(buf)))
1152+
{
1153+
/*
1154+
If the read hits eof, we must report it as eof so the caller
1155+
will know it can go into cond_wait to be woken up on the next
1156+
update to the log.
1157+
*/
1158+
DBUG_PRINT("error",("my_b_read failed. file->error: %d", file->error));
1159+
if (!file->error)
1160+
result= LOG_READ_EOF;
1161+
else
1162+
result= (file->error > 0 ? LOG_READ_TRUNC : LOG_READ_IO);
1163+
goto end;
1164+
}
11591165
}
1166+
else
1167+
DBUG_PRINT("info",("Skipped reading the event header. Using the provided one."));
1168+
11601169
data_len= uint4korr(buf + EVENT_LEN_OFFSET);
11611170
if (data_len < LOG_EVENT_MINIMAL_HEADER_LEN ||
11621171
data_len > max(current_thd->variables.max_allowed_packet,
@@ -1169,7 +1178,7 @@ int Log_event::read_log_event(IO_CACHE* file, String* packet,
11691178
}
11701179

11711180
/* Append the log event header to packet */
1172-
if (packet->append(buf, sizeof(buf)))
1181+
if (packet->append(buf, LOG_EVENT_MINIMAL_HEADER_LEN))
11731182
{
11741183
DBUG_PRINT("info", ("first packet->append failed (out of memory)"));
11751184
/* Failed to allocate packet */

sql/log_event.h

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -676,8 +676,7 @@ class Log_event
676676
my_bool crc_check);
677677

678678
/*
679-
This function will read the common header into the buffer and
680-
rewind the IO_CACHE back to the beginning of the event.
679+
This function will read the common header into the buffer.
681680
682681
@param[in] log_cache The IO_CACHE to read from.
683682
@param[in/out] header The buffer where to read the common header. This
@@ -688,10 +687,8 @@ class Log_event
688687
inline static bool peek_event_header(char *header, IO_CACHE *log_cache)
689688
{
690689
DBUG_ENTER("Log_event::peek_event_header");
691-
my_off_t old_pos= my_b_safe_tell(log_cache);
692690
if (my_b_read(log_cache, (uchar*) header, LOG_EVENT_MINIMAL_HEADER_LEN))
693691
DBUG_RETURN(true);
694-
my_b_seek(log_cache, old_pos); // rewind
695692
DBUG_RETURN(false);
696693
}
697694

@@ -703,14 +700,18 @@ class Log_event
703700
@param[in] log_cache The IO_CACHE to read from.
704701
@param[out] length A pointer to the memory position where to store
705702
the length value.
703+
@param[out] header_buffer An optional pointer to a buffer to store
704+
the event header.
706705
707706
@returns false on success, true otherwise.
708707
*/
709708

710-
inline static bool peek_event_length(uint32* length, IO_CACHE *log_cache)
709+
inline static bool peek_event_length(uint32* length, IO_CACHE *log_cache,
710+
char *header_buffer)
711711
{
712712
DBUG_ENTER("Log_event::peek_event_length");
713-
char header[LOG_EVENT_MINIMAL_HEADER_LEN];
713+
char local_header_buffer[LOG_EVENT_MINIMAL_HEADER_LEN];
714+
char *header= header_buffer != NULL ? header_buffer : local_header_buffer;
714715
if (peek_event_header(header, log_cache))
715716
DBUG_RETURN(true);
716717
*length= uint4korr(header + EVENT_LEN_OFFSET);
@@ -732,6 +733,9 @@ class Log_event
732733
@param[in] checksum_alg_arg the checksum algorithm
733734
@param[in] log_file_name_arg the log's file name
734735
@param[out] is_binlog_active is the current log still active
736+
@param[in] event_header the actual event header. Passing this
737+
parameter will make the function to skip
738+
reading the event header.
735739
736740
@retval 0 success
737741
@retval LOG_READ_EOF end of file, nothing was read
@@ -745,7 +749,9 @@ class Log_event
745749
mysql_mutex_t* log_lock,
746750
binary_log::enum_binlog_checksum_alg checksum_alg_arg,
747751
const char *log_file_name_arg= NULL,
748-
bool* is_binlog_active= NULL);
752+
bool* is_binlog_active= NULL,
753+
char *event_header= NULL);
754+
749755
/*
750756
init_show_field_list() prepares the column names and types for the
751757
output of SHOW BINLOG EVENTS; it is used only by SHOW BINLOG

sql/rpl_binlog_sender.cc

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1048,9 +1048,10 @@ inline int Binlog_sender::read_event(IO_CACHE *log_cache, enum_binlog_checksum_a
10481048
DBUG_ENTER("Binlog_sender::read_event");
10491049

10501050
size_t event_offset;
1051+
char header[LOG_EVENT_MINIMAL_HEADER_LEN];
10511052
int error= 0;
10521053

1053-
if ((error= Log_event::peek_event_length(event_len, log_cache)))
1054+
if ((error= Log_event::peek_event_length(event_len, log_cache, header)))
10541055
goto read_error;
10551056

10561057
if (reset_transmit_packet(0, *event_len))
@@ -1069,7 +1070,8 @@ inline int Binlog_sender::read_event(IO_CACHE *log_cache, enum_binlog_checksum_a
10691070
packet is big enough to read the event, since we have reallocated based
10701071
on the length stated in the event header.
10711072
*/
1072-
if ((error= Log_event::read_log_event(log_cache, &m_packet, NULL, checksum_alg)))
1073+
if ((error= Log_event::read_log_event(log_cache, &m_packet, NULL, checksum_alg,
1074+
NULL, NULL, header)))
10731075
goto read_error;
10741076

10751077
set_last_pos(my_b_tell(log_cache));

0 commit comments

Comments
 (0)