Skip to content

Commit dfc4420

Browse files
committed
DOCSP-8771: Initial sync survives transient network errors
1 parent 7f93fad commit dfc4420

File tree

3 files changed

+56
-8
lines changed

3 files changed

+56
-8
lines changed

source/core/replica-set-sync.txt

Lines changed: 22 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -63,14 +63,28 @@ To perform an initial sync, see
6363
Fault Tolerance
6464
~~~~~~~~~~~~~~~
6565

66-
To recover from transient network or operation failures, initial sync
67-
has built-in retry logic.
68-
69-
.. versionchanged:: 3.4
70-
71-
MongoDB 3.4 improves the initial sync retry logic to be more resilient to
72-
intermittent failures on the network.
73-
66+
If a secondary performing initial sync encounters a *non-transient*
67+
network error during the sync process, the secondary restarts the
68+
initial sync process from the beginning.
69+
70+
Starting in MongoDB 4.4, a secondary performing initial sync can attempt
71+
to resume the sync process if interrupted by a *transient* network
72+
error. The sync source must also run MongoDB 4.4 to support resumable
73+
initial sync. If the sync source runs MongoDB 4.2 or earlier, the
74+
secondary must restart the initial sync process as if it encountered a
75+
non-transient network error.
76+
77+
By default, the secondary tries to resume initial sync for 24 hours.
78+
MongoDB 4.4 adds the
79+
:parameter:`initialSyncTransientErrorRetryPeriodSeconds` server
80+
parameter for controlling the amount of time the secondary attempts to
81+
resume initial sync. If the secondary cannot successfully resume the
82+
initial sync process during the configured time period, it selects a new
83+
healthy source from the replica set and restarts the initial
84+
synchronization process from the beginning.
85+
86+
The secondary attempts to restart the initial sync up to ``10`` times
87+
before returning a fatal error.
7488

7589
.. _replica-set-replication:
7690

source/reference/parameters.txt

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1961,6 +1961,18 @@ Replication Parameters
19611961
The specified value must be greater than or equal to 0, with 0 to
19621962
disable warnings.
19631963

1964+
.. parameter:: initialSyncTransientErrorRetryPeriodSeconds
1965+
1966+
.. versionadded:: 4.4
1967+
1968+
*Type*: integer
1969+
1970+
*Default*: 86400
1971+
1972+
The amount of time in seconds a secondary performing initial sync
1973+
attempts to resume the process if interrupted by a transient
1974+
network error. The default value is equivalent to 24 hours.
1975+
19641976
.. parameter:: oplogInitialFindMaxSeconds
19651977

19661978
.. versionadded:: 3.6

source/release-notes/4.4.txt

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -338,6 +338,28 @@ documentation, see :parameter:`tlsX509ExpirationWarningThresholdDays`.
338338
Replica Sets
339339
------------
340340

341+
Resumable Initial Sync
342+
~~~~~~~~~~~~~~~~~~~~~~
343+
344+
Starting in MongoDB 4.4, a secondary performing initial sync can attempt
345+
to resume the sync process if interrupted by a *transient* network
346+
error. The sync source must also run MongoDB 4.4 to support resumable
347+
initial sync. If the sync source runs MongoDB 4.2 or earlier, the
348+
secondary must restart the initial sync process as if it encountered a
349+
non-transient network error.
350+
351+
By default, the secondary tries to resume initial sync for 24 hours.
352+
MongoDB 4.4 adds the
353+
:parameter:`initialSyncTransientErrorRetryPeriodSeconds` server
354+
parameter for controlling the amount of time the secondary attempts to
355+
resume initial sync. If the secondary cannot successfully resume the
356+
initial sync process during the configured time period, it selects a new
357+
healthy source from the replica set and restarts the initial
358+
synchronization process from the beginning.
359+
360+
Prior to MongoDB 4.4, the secondary would restart the entire initial
361+
sync process if it encountered an error during the process.
362+
341363
Rollback Directory
342364
~~~~~~~~~~~~~~~~~~
343365

0 commit comments

Comments
 (0)