Operational Defect Database

BugZero found this defect 236 days ago.

MongoDB | 2452021

Changing the timeseries granularity/bucketing values can cause tenant migration and logical initial sync to fail.

Last update date:

3/12/2024

Affected products:

MongoDB Server

Affected releases:

No affected releases provided.

Fixed releases:

No fixed releases provided.

Description:

Info

Running a collMod command to change the timeseries granularity during tenant migration and logical initial sync, can cause those data migration protocols to fail with following error "error":{"code":72,"codeName":"InvalidOptions","errmsg":"Invalid transition for timeseries.granularity. Can only transition from 'seconds' to 'minutes' or 'minutes' to 'hours'."}}} The error is expected as we apply oplog entries on a inconsistent data for both tenant migration and logical initial sync. We need to ignore the error if the oplog application mode is kInitialSync and kUnstableRecovering, just like SERVER-80301. The fix would be to update the coll Mod ignore list with InvalidOptions Regarding the fix, I'm considering whether it's the correct approach to catch these errors individually and ignore them for the kInitialSync oplog application mode. In the future, we may encounter similar cases, and waiting for build failures or issues in production to address them doesn't seem ideal. I'm thinking of a solution where we simply ignore any errors when applying oplog entries during kInitialSync mode. However, I'm unsure about the safety of this approach and believe it might require further investigation. (Attached a repro for initial sync case) EDIT (11/10/2023) Modifying the bucket values during concurrent tenant migration/logical initial sync will cause the migration/initial sync to fail. [j1:rs1:prim] | 2023-11-09T02:27:20.103+00:00 D1 TENANT_M 4886005 [TenantMigrationRecipientService-4] "TenantOplogApplier::_finishShutdown","attr":{"protocol":0,"migrationId":{"uuid":{"$uuid":"adc565c0-0844-4ea2-a36d-b76c4699bdfc"}},"error":"InvalidOptions: Timeseries 'bucketMaxSpanSeconds' needs to be equal or greater to transition"}

Top User Comments

suganthi.mani commented on Fri, 10 Nov 2023 22:54:32 +0000: Considering whether to delay addressing this issue, I think it's better to fix it sooner. This problem affects logical initial sync. I agree, we retry upon initial sync failure for some number of times, and the retry should be successful for the issue mentioned this ticket. But, it's important to note that logical initial syncs are expensive and we drop all the cloned collections before retry, leading to wasted effort. Additionally, this problem causes test failures in evergreen, and we need to address it to reduce noise. It also raises questions about why this issue isn't caught in the initial sync test suite. Aren't we testing timeseries workload in those initial sync suite? gregory.noma@mongodb.com steven.vannelli commented on Tue, 3 Oct 2023 15:49:41 +0000: Backlogging this ticket for now since this is not a high priority for the team. This won't be a problem in future versions for tenant migrations.

Steps to Reproduce


Additional Resources / Links

Share:

BugZero® Risk Score

What's this?

Coming soon

Status

Backlog

Learn More

Search:

...