BugZero found this defect 166 days ago.
Data sources
All data on this page is proprietary to BugZero® or gathered from public sources
3/14/2024
MongoDB Server
7.3.0-rc0
8.0.0-rc0
7.3.0-rc2
During the investigating of BF-30194, a profile_lookup.js failure which happens when running as part of fcv_upgrade_downgrade_replica_sets_jscore_passthrough, it was uncovered that commands other than setFeatureCompatibilityVersion can be emitted implicitly as part of FCV upgrade/downgrade procedures. Tests marked as {}require_profiling are already excluded from fcv_upgrade_downgrade_sharding_jscore_passthrough suites due to mongos lacking the system.profile collection, so this could only happen on fcv_upgrade_downgrade_replica_sets_jscore_passthrough. For BF-30194 in particular, an unaccounted collMod command happens during the time profiling is enabled which increments the command counter on the collection targeted by the test. This later Looking at the latest available evergreen failure at the moment of investigating, we can reconstruct the following timeline from the logs: [CODE] Profiler gets enabled with the filter "filter":{"command.setFeatureCompatibilityVersion":{"$not":{"$exists":true}},"db":"test"}}}} [j2:prim] {"t": {"$date":"2023-11-29T09:30:37.095+00:00"} ,"s":"I", "c":"COMMAND", "id":48742, "ctx":"conn320","msg":"Profiler settings changed","attr":{"from": {"level":0,"slowms":100,"sampleRate":1} ,"to":{"level":1,"slowms":100,"sampleRate":1,"filter":{"command.setFeatureCompatibilityVersion":{"$not":{"$exists":true}}}},"db":"test"}} [Problem] A collMod on collection foreign happens right after [j2:prim] {"t": {"$date":"2023-11-29T09:30:37.103+00:00"} ,"s":"I", "c":"COMMAND", "id":5324200, "ctx":"conn319","msg":"CMD: collMod","attr":{"cmdObj": {"collMod":"foreign"} }} [CODE] The test query containing $lookup executes: [j2:prim] {"t": {"$date":"2023-11-29T09:30:37.103+00:00"} ,"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn320","msg":"Slow query","attr":{"type":"command","isFromUserConnection":true,"ns":"test.local","collectionType":"normal","appName":"MongoDB Shell","command":{"aggregate":"local","pipeline":\\\\\{"$lookup":{"from":"foreign","as":"res","localField":"a","foreignField":"a"}},"cursor":{},"lsid":{"id":{"$uuid":"d493d41d-7190-4a35-9a12-165c9f1cd823"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1701250236,"i":10}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test"},"planSummary":"COLLSCAN","planningTimeMicros":233,"keysExamined":0,"docsExamined":4,"cursorExhausted":true,"numYields":0,"nreturned":3,"queryHash":"9EB1333D","planCacheKey":"9B882338","queryFramework":"sbe","reslen":398,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":1}},"Global":{"acquireCount": {"r":1} }},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{},"remote":"127.0.0.1:58381","protocol":"op_msg","durationMillis":4}} [CODE] Some time later the profiler gets sopped [j2:prim] {"t": {"$date":"2023-11-29T09:30:37.116+00:00"} ,"s":"I", "c":"COMMAND", "id":48742, "ctx":"conn320","msg":"Profiler settings changed","attr":{"from":{"level":1,"slowms":100,"sampleRate":1,"filter":{"command.setFeatureCompatibilityVersion":{"$not":{"$exists":true}}}},"to":{"level":0,"slowms":100,"sampleRate":1,"filter":{"command.setFeatureCompatibilityVersion":{"$not":{"$exists":true}}}},"db":"test"}} [CODE] Then the test fails on the foreign command count assertion: [js_test:profile_lookup] 2023-11-29T09:30:37.137Z assert: [1] != [2] are not equal [js_test:profile_lookup] doassert@src/mongo/shell/assert.js:20:14 [js_test:profile_lookup] assert.eq@src/mongo/shell/assert.js:176:17 [js_test:profile_lookup] @jstests\aggregation\sources\lookup\profile_lookup.js:57:8 [js_test:profile_lookup] uncaught exception: Error: [1] != [2] are not equal : [js_test:profile_lookup] doassert@src/mongo/shell/assert.js:20:14 [js_test:profile_lookup] assert.eq@src/mongo/shell/assert.js:176:17 [js_test:profile_lookup] @jstests\aggregation\sources\lookup\profile_lookup.js:57:8 [js_test:profile_lookup] Error: [1] != [2] are not equal [js_test:profile_lookup] failed to load: jstests\aggregation\sources\lookup\profile_lookup.js [js_test:profile_lookup] exiting with code -3 The test assumed that the pipeline will be the sole command involving collection foreign between the profiler start / stop calls, and therefore expected the command count on foreign to be equal to 1 (classic engine). This however wasn't the case due to the additional `collMod` command.
xgen-internal-githook commented on Thu, 14 Mar 2024 22:29:52 +0000: Author: {'name': 'Adi Zaimi', 'email': 'adizaimi@yahoo.com', 'username': 'adizaimi'} Message: SERVER-83879: Omit running profile_lookup.js in fcv up/down suite (#18267) SERVER-83879: Omit running profile_lookup.js in fcv up/down suite GitOrigin-RevId: a9e2a6516cfb9da91262c1b82c0c254539d27749 Branch: v7.4test https://github.com/mongodb/mongo/commit/f018a8eb3e381355989c36cfdcbadf0e9d1ce538 xgen-internal-githook commented on Fri, 26 Jan 2024 23:57:29 +0000: Author: {'name': 'Adi Zaimi', 'email': 'adizaimi@yahoo.com', 'username': 'adizaimi'} Message: SERVER-83879: Omit running profile_lookup.js in fcv up/down suite (#18267) SERVER-83879: Omit running profile_lookup.js in fcv up/down suite GitOrigin-RevId: a9e2a6516cfb9da91262c1b82c0c254539d27749 Branch: v7.3 https://github.com/mongodb/mongo/commit/3debbda0ff675fde1f80099d7e994f70b0757d78 xgen-internal-githook commented on Thu, 25 Jan 2024 21:28:36 +0000: Author: {'name': 'Adi Zaimi', 'email': 'adizaimi@yahoo.com', 'username': 'adizaimi'} Message: SERVER-83879: Omit running profile_lookup.js in fcv up/down suite (#18267) SERVER-83879: Omit running profile_lookup.js in fcv up/down suite GitOrigin-RevId: 0b80a1d04f647c056d160fd7c563c3b33ecb594c Branch: master https://github.com/mongodb/mongo/commit/85f557da72250d0c384c20439e5872569b6d9ea9 huayu.ouyang commented on Tue, 5 Dec 2023 12:16:08 +0000: catalin.sumanaru@mongodb.com I'm happy to work on it, but I believe this passthrough suite is actually owned by the Cluster Scalability team (I only worked on the previous ticket since it was a quick fix)