Operational Defect Database

BugZero found this defect 2537 days ago.

MongoDB | 365726

[SERVER-28376] ReplSetTest.initiate() should call awaitReplication before requesting node 0 to step up as primary

Last update date:

8/15/2017

Affected products:

MongoDB Server

Affected releases:

No affected releases provided.

Fixed releases:

3.4.4

3.5.6

Description:

Info

During ReplSetTest initialization, when elections happen while adding nodes to the replica set, it is possible for node 0 (the intended primary node after initialization) to lag the other nodes before it is asked to step up. https://github.com/mongodb/mongo/blob/231e760c744013fe68fe863a7d0315148c69047a/src/mongo/shell/replsettest.js#L771 replsettest.js 771 this.stepUp(this.nodes[0]); 772 } 773   774 // Setup authentication if running test with authentication 775 if ((jsTestOptions().keyFile) && cmdKey == 'replSetInitiate') { 776 master = this.getPrimary(); 777 jsTest.authenticateNodes(this.nodes); 778 } 779 }; 780   781 this.stepUp = function(node) { 782 this.awaitSecondaryNodes(); 783 this.awaitNodesAgreeOnPrimary(); 784 if (this.getPrimary() === node) { 785 return; 786 } This may result in the step up phase of ReplSetTest,initiate() failing to elect node 0 with a "candidate's data is staler than mine" reason. [js_test:replsetprio1] 2017-03-15T13:03:02.788+0000 d20010| 2017-03-15T13:03:02.787+0000 I REPL [conn2] stepping down from primary, because a new term has begun: 2 [js_test:replsetprio1] 2017-03-15T13:03:02.788+0000 d20010| 2017-03-15T13:03:02.787+0000 I REPL [replExecDBWorker-2] transition to SECONDARY ... [js_test:replsetprio1] 2017-03-15T13:03:02.802+0000 d20010| 2017-03-15T13:03:02.799+0000 D REPL_HB [conn2] Received heartbeat request from mci-osx1010-10.build.10gen.cc:20011, { replSetHeartbeat: "testSet", configVersion: 2, from: "mci-osx1010-10.build.10gen.cc:20011", fromId: 1, term: 2 } [js_test:replsetprio1] 2017-03-15T13:03:02.802+0000 d20010| 2017-03-15T13:03:02.799+0000 D REPL_HB [conn2] Processed heartbeat from mci-osx1010-10.build.10gen.cc:20011 and generated response, { ok: 1.0, state: 2, v: 2, hbmsg: "", set: "testSet", term: 1, durableOpTime: { ts: Timestamp 1489582961000|2, t: 1 }, opTime: { ts: Timestamp 1489582961000|2, t: 1 } }   ... [js_test:replsetprio1] 2017-03-15T13:03:03.315+0000 d20010| 2017-03-15T13:03:03.315+0000 D REPL_HB [ReplicationExecutor] Sending heartbeat (requestId: 52) to mci-osx1010-10.build.10gen.cc:20012, { replSetHeartbeat: "testSet", configVersion: 2, from: "mci-osx1010-10.build.10gen.cc:20010", fromId: 0, term: 1 } [js_test:replsetprio1] 2017-03-15T13:03:03.316+0000 d20010| 2017-03-15T13:03:03.315+0000 D REPL_HB [ReplicationExecutor] Sending heartbeat (requestId: 53) to mci-osx1010-10.build.10gen.cc:20011, { replSetHeartbeat: "testSet", configVersion: 2, from: "mci-osx1010-10.build.10gen.cc:20010", fromId: 0, term: 1 } ... [js_test:replsetprio1] 2017-03-15T13:03:03.318+0000 d20010| 2017-03-15T13:03:03.316+0000 D REPL_HB [ReplicationExecutor] Received response to heartbeat (requestId: 52) from mci-osx1010-10.build.10gen.cc:20012, { ok: 1.0, state: 2, v: 2, hbmsg: "", set: "testSet", term: 2, primaryId: 0, durableOpTime: { ts: Timestamp 1489582961000|2, t: 1 }, opTime: { ts: Timestamp 1489582961000|2, t: 1 }, operationTime: Timestamp 1489582982000|1 } [js_test:replsetprio1] 2017-03-15T13:03:03.318+0000 d20010| 2017-03-15T13:03:03.316+0000 I REPL [ReplicationExecutor] Member mci-osx1010-10.build.10gen.cc:20012 is now in state SECONDARY [js_test:replsetprio1] 2017-03-15T13:03:03.318+0000 d20010| 2017-03-15T13:03:03.316+0000 D REPL_HB [ReplicationExecutor] Scheduling heartbeat to mci-osx1010-10.build.10gen.cc:20012 at 2017-03-15T13:03:08.316Z [js_test:replsetprio1] 2017-03-15T13:03:03.319+0000 d20010| 2017-03-15T13:03:03.316+0000 D REPL_HB [ReplicationExecutor] Received response to heartbeat (requestId: 53) from mci-osx1010-10.build.10gen.cc:20011, { ok: 1.0, electionTime: new Date(6397710192368156673), state: 1, v: 2, hbmsg: "", set: "testSet", term: 2, primaryId: 1, durableOpTime: { ts: Timestamp 1489582961000|2, t: 1 }, opTime: { ts: Timestamp 1489582961000|2, t: 1 }, operationTime: Timestamp 1489582982000|1 } [js_test:replsetprio1] 2017-03-15T13:03:03.319+0000 d20010| 2017-03-15T13:03:03.316+0000 I REPL [ReplicationExecutor] Member mci-osx1010-10.build.10gen.cc:20011 is now in state PRIMARY [js_test:replsetprio1] 2017-03-15T13:03:03.319+0000 d20010| 2017-03-15T13:03:03.316+0000 D REPL_HB [ReplicationExecutor] Scheduling heartbeat to mci-osx1010-10.build.10gen.cc:20011 at 2017-03-15T13:03:08.316Z ... [js_test:replsetprio1] 2017-03-15T13:03:07.781+0000 d20010| 2017-03-15T13:03:07.781+0000 I COMMAND [conn11] Received replSetStepUp request [js_test:replsetprio1] 2017-03-15T13:03:07.781+0000 d20010| 2017-03-15T13:03:07.781+0000 I REPL [conn11] Starting an election due to step up request [js_test:replsetprio1] 2017-03-15T13:03:07.782+0000 d20010| 2017-03-15T13:03:07.781+0000 I REPL [conn11] conducting a dry run election to see if we could be elected [js_test:replsetprio1] 2017-03-15T13:03:07.782+0000 d20010| 2017-03-15T13:03:07.781+0000 I REPL [ReplicationExecutor] VoteRequester(term 2 dry run) received a no vote from mci-osx1010-10.build.10gen.cc:20011 with reason "candidate's data is staler than mine"; response message: { term: 2, voteGranted: false, reason: "candidate's data is staler than mine", ok: 1.0, operationTime: Timestamp 1489582984000|1 } [js_test:replsetprio1] 2017-03-15T13:03:07.783+0000 d20010| 2017-03-15T13:03:07.782+0000 I REPL [ReplicationExecutor] VoteRequester(term 2 dry run) received a no vote from mci-osx1010-10.build.10gen.cc:20012 with reason "candidate's data is staler than mine"; response message: { term: 2, voteGranted: false, reason: "candidate's data is staler than mine", ok: 1.0, operationTime: Timestamp 1489582984000|1 } [js_test:replsetprio1] 2017-03-15T13:03:07.783+0000 d20010| 2017-03-15T13:03:07.782+0000 I REPL [ReplicationExecutor] not running for primary, we received insufficient votes [js_test:replsetprio1] 2017-03-15T13:03:07.783+0000 d20010| 2017-03-15T13:03:07.782+0000 I COMMAND [conn11] replSetStepUp request failed :: caused by :: CommandFailed: Election failed.

Top User Comments

xgen-internal-githook commented on Fri, 31 Mar 2017 17:21:21 +0000: Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'} Message: SERVER-28376 split ReplSetTest.initiate() into 3 functions. ReplSetTest.initiateWithAnyNodeAsPrimary() ReplSetTest.initiateWithNodeZeroAsPrimary() ReplSetTest.initiate() - alias for initiateWithNodeZeroAsPrimary() (cherry picked from commit cc9c461904392d5bf71d815853d13065f6b91629) Branch: v3.4 https://github.com/mongodb/mongo/commit/451630c29ca4478097373c0d27c68e65a2089efb xgen-internal-githook commented on Fri, 31 Mar 2017 16:42:18 +0000: Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'} Message: SERVER-28376 fixed replsets/auth2.js to not clobber global 'rs' variable. (cherry picked from commit 0501d9eedc19ee1d4bf35525fb25d0e5a50caa49) Branch: v3.4 https://github.com/mongodb/mongo/commit/8ebb459615c85fda6b4bb077e0a8dc88436f7f83 xgen-internal-githook commented on Fri, 31 Mar 2017 16:37:59 +0000: Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'} Message: SERVER-28376 remove references to unsupported ReplSetTest.initiate() initiateTimeout option. This option was removed in SERVER-27810. (cherry picked from commit 2e189a57db00b291b171d5a2323700d6f57cd471) Branch: v3.4 https://github.com/mongodb/mongo/commit/ede51fda3a16f7aa3de35579f3cafe886f138a4c xgen-internal-githook commented on Wed, 29 Mar 2017 12:37:27 +0000: Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'} Message: SERVER-28376 ReplSetTest.stepUp() calls awaitReplication() before sending replSetStepUp command Branch: master https://github.com/mongodb/mongo/commit/f1fb0600a328f0792fa84bea78e368831e4af669 xgen-internal-githook commented on Wed, 29 Mar 2017 11:52:28 +0000: Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'} Message: SERVER-28376 fixed jstests/ssl and jstests/sslSpecial tests to not require node 0 as primary Branch: master https://github.com/mongodb/mongo/commit/710cefca98db7c8f7736a79cebca855b99509235 xgen-internal-githook commented on Wed, 29 Mar 2017 04:11:21 +0000: Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'} Message: SERVER-28376 remove requirement for node 0 of every shard to be primary in ShardingTest initialization Branch: master https://github.com/mongodb/mongo/commit/ff6116630ad77b2afd348bbf81c8af2eccfa8163 xgen-internal-githook commented on Wed, 29 Mar 2017 03:05:00 +0000: Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'} Message: SERVER-28376 Make authorization tests not require node 0 to be primary after ReplSetTest initialization Branch: master https://github.com/10gen/mongo-enterprise-modules/commit/95c7a8365964ef62be3f8af7d4e5436046e0b3b3 xgen-internal-githook commented on Wed, 29 Mar 2017 03:03:18 +0000: Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'} Message: SERVER-28376 split ReplSetTest.initiate() into 3 functions. ReplSetTest.initiateWithAnyNodeAsPrimary() ReplSetTest.initiateWithNodeZeroAsPrimary() ReplSetTest.initiate() - alias for initiateWithNodeZeroAsPrimary() Branch: master https://github.com/mongodb/mongo/commit/cc9c461904392d5bf71d815853d13065f6b91629 xgen-internal-githook commented on Wed, 29 Mar 2017 01:57:34 +0000: Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'} Message: SERVER-28376 fixed replsets/auth2.js to not clobber global 'rs' variable. Branch: master https://github.com/mongodb/mongo/commit/0501d9eedc19ee1d4bf35525fb25d0e5a50caa49 xgen-internal-githook commented on Wed, 29 Mar 2017 01:23:53 +0000: Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'} Message: SERVER-28376 remove references to unsupported ReplSetTest.initiate() initiateTimeout option. This option was removed in SERVER-27810. Branch: master https://github.com/mongodb/mongo/commit/2e189a57db00b291b171d5a2323700d6f57cd471

Additional Resources / Links

Share:

BugZero Risk Score

Coming soon

Status

Closed

Have you been affected by this bug?

cost-cta-background

Do you know how much operational outages are costing you?

Understand the cost to your business and how BugZero can help you reduce those costs.

Discussion

Login to read and write comments.

Have you ever...

had your data corrupted from a

VMware

bug?

Search:

...