Operational Defect Database

BugZero found this defect 2411 days ago.

MongoDB | 406990

[SERVER-30266] ShardServerCatalogCacheLoader may drop collections causing failures in test validation

Last update date:

10/30/2023

Affected products:

MongoDB Server

Affected releases:

3.5.10

Fixed releases:

3.5.12

Description:

Info

validateCollections (of validate_collections.js) calls listCollections and then iteratively calls validate collection on them. However, ShardServerCatalogCacheLoader can drop the collection in between the listCollection and validate calls

Top User Comments

xgen-internal-githook commented on Sat, 12 Aug 2017 15:28:03 +0000: Author: {'name': 'Eddie Louie', 'username': 'elouie99', 'email': 'eddie.louie@mongodb.com'} Message: SERVER-30266 Enable TestData.skipValidationOnNamespaceNotFound by default to bypass collection validation error due to namespace not found for sharding and replicaset tests Branch: master https://github.com/mongodb/mongo/commit/2b4bac239373acec5c0388aac8fe584dbf611eca max.hirschhorn@10gen.com commented on Fri, 11 Aug 2017 13:58:03 +0000: eddie.louie, given that running the validateCollections hook against replica sets and sharded clusters represents the more common case, I think we should flip the default value of TestData.skipValidationOnNamespaceNotFound to true (when the option isn't being specified at all). One way to do this would be to change jsTest.options().skipValidationOnNamespaceNotFound return TestData.hasOwnProperty("skipValidationOnNamespaceNotFound") ? TestData.skipValidationOnNamespaceNotFound : true. So that we don't lose out on all coverage of checking that the "listCollections" command response is up-to-date with the contents of the catalog, I'd recommend setting TestData.skipValidationOnNamespaceNotFound as part of the ValidateCollections's shell_options.global_vars section explicitly to false in the following YAML test suites. $ rg -l -g '*core*' MongoDFixture buildscripts/resmokeconfig/ buildscripts/resmokeconfig/suites/core_auth.yml buildscripts/resmokeconfig/suites/core_minimum_batch_size.yml buildscripts/resmokeconfig/suites/core.yml buildscripts/resmokeconfig/suites/core_ese.yml buildscripts/resmokeconfig/suites/core_op_query.yml buildscripts/resmokeconfig/suites/dur_jscore_passthrough.yml buildscripts/resmokeconfig/suites/session_jscore_passthrough.yml I'd also recommend doing the same for concurrency.yml and jstestfuzz.yml. renctan commented on Tue, 25 Jul 2017 17:04:27 +0000: max.hirschhorn Oh, you're right. I meant to say skipValidationOnNamespaceNotFound. Copy and pasted the wrong variable. As for why I think this should also be applied to replication suites is because secondaries can still be replicating when the test is about to shutdown. That means that there is a chance that it will hit the same issue of trying to call validate on collection that was just dropped. max.hirschhorn@10gen.com commented on Fri, 21 Jul 2017 20:58:11 +0000: It looks like we need to set skipValidationOnInvalidViewDefinitions to true for all suites that has sharding/replication after we hooked override_methods/validate_collections_on_shutdown.js to tests. renctan, given that the ShardServerCatalogCacheLoader is dropping the collection, I think we'd want to set skipValidationOnNamespaceNotFound to true instead. Also, does "sharding/replication" mean that it the test is using replica-set shards, or what is the reason that replication is involved? renctan commented on Fri, 21 Jul 2017 20:09:11 +0000: It looks like we need to set skipValidationOnInvalidViewDefinitions to true for all suites that has sharding/replication after we hooked override_methods/validate_collections_on_shutdown.js to tests.

Additional Resources / Links

Share:

BugZero Risk Score

Coming soon

Status

Closed

Have you been affected by this bug?

cost-cta-background

Do you know how much operational outages are costing you?

Understand the cost to your business and how BugZero can help you reduce those costs.

Discussion

Login to read and write comments.

Have you ever...

had your data corrupted from a

VMware

bug?

Search:

...