Operational Defect Database

BugZero found this defect 348 days ago.

MongoDB | 2359557

listIndexes should report expireAfterSeconds as a float

Last update date:

3/8/2024

Affected products:

MongoDB Server

Affected releases:

No affected releases provided.

Fixed releases:

7.3.0-rc0

Description:

Info

Per a conversation with Max just now, it seems chunk migration uses `listIndexes` to fetch index stats, which (again, per Max) reports expireAfterSeconds as an int. This value, though, is internally a floating-point, which can trigger rounding problems. This seems to have caused REP-2567, in which the source cluster’s shards built the `x_before_epoch` index with different expireAfterSeconds values. That happened because ttl_index_options.js sends a UNIX timestamp. Assumedly that value, these days, is just high enough that rounding errors will happen, where previously they may not have.

Top User Comments

xgen-internal-githook commented on Thu, 11 Jan 2024 02:40:05 +0000: Author: {'name': 'mongodt', 'email': '146988481+mongodt@users.noreply.github.com', 'username': 'mongodt'} Message: SERVER-77828 Ensure expireAfterSeconds is stored as integer type on-disk (#17983) GitOrigin-RevId: ee7b5dbe682a4381d1445b7fdb2dc5f3766dd199 Branch: master https://github.com/mongodb/mongo/commit/62dcd52509c0bfecd50fbbccf8fbf86d04281c45 xgen-internal-githook commented on Tue, 12 Sep 2023 14:37:13 +0000: Author: {'name': 'Dan Larkin-York', 'email': 'dan.larkin-york@mongodb.com', 'username': 'dhly-etc'} Message: Revert "SERVER-77828 Ensure expireAfterSeconds is stored as integer type on-disk" This reverts commit 07d91ec750e7ad0942f7bd535606f486a3647118. Branch: master https://github.com/mongodb/mongo/commit/555e288f0ccc22a2ce80e28bad4cd30cfb0a4697 xgen-internal-githook commented on Tue, 12 Sep 2023 14:13:41 +0000: Author: {'name': 'Dan Larkin-York', 'email': 'dan.larkin-york@mongodb.com', 'username': 'dhly-etc'} Message: Revert "SERVER-77828 Ensure expireAfterSeconds is stored as integer type on-disk" This reverts commit 07d91ec750e7ad0942f7bd535606f486a3647118. Branch: v7.1 https://github.com/mongodb/mongo/commit/39b086c4b28a9d07a603cfacbd68fb902a0e4d2a xgen-internal-githook commented on Fri, 25 Aug 2023 21:27:03 +0000: Author: {'name': 'Dan Larkin-York', 'email': 'dan.larkin-york@mongodb.com', 'username': 'dhly-etc'} Message: SERVER-77828 Ensure expireAfterSeconds is stored as integer type on-disk Branch: master https://github.com/mongodb/mongo/commit/07d91ec750e7ad0942f7bd535606f486a3647118 JIRAUSER1258161 commented on Wed, 2 Aug 2023 13:40:22 +0000: max.hirschhorn@mongodb.com I'll be digging into this in the next few weeks. The plan you suggested (coerce on create + upgrade rewrite) sounds like it might be an option, but I'll have to chase down the details to be sure. As to why it cares about the current time value, there's an explanation in the comment with the code you linked. The restriction only applies to clustered collections, to handle the fact that time series collections are clustered by OID, which has a 32-bit unsigned timestamp. It's definitely janky, and could almost certainly be handled another way (e.g. by doing the restriction at TTL-delete time rather than index creation time. I suspect though that changing this doesn't have a great deal of value for most customers, as it would certainly be uncommon to have an expireAfterSeconds value that large. max.hirschhorn@10gen.com commented on Wed, 2 Aug 2023 00:23:53 +0000: dan.larkin-york@mongodb.com, connie.chen@mongodb.com, has a path forward on SERVER-77828 been defined? I will reiterate that I do not consider the problem here to reside in how chunk migration fetches the index specifications from the donor shard. My expectation for SERVER-77828 is for the Storage Execution team to do the following: Change the create, createIndexes, and collMod commands to implicitly truncate their expireAfterSeconds to an integral value. The TTL monitor thread calls BSONElement::safeNumberLong() and so this truncation was already happening for how the expiry is applied. Perform a one-time FCV upgrade transition to rewrite any stored expireAfterSeconds values in the local _mdb_catalog.wt as an integral value. Separately it would be nice to receive an explanation for why the expireAfterSeconds validation cares about the current Unix epoch time value. It doesn't make immediate sense to me why a create command at time T can fail with InvalidOptions yet at time T+1 can succeed without issue. What am I missing? JIRAUSER1272372 commented on Fri, 9 Jun 2023 15:08:06 +0000: Actually, int32 should round-trip safely through an IEEE 754 double: #include #include int main() { int32_t myval = INT32_MAX; double valf = myval; int32_t myval2 = valf; printf("numbers: %d -> %f -> %d\n", myval, valf, myval2); return 0; } So, as long as `Math.floor(Date.now() / 1000) + 1000` (from ttl_index_options.js) fits within an int32, it shouldn’t explain the disparity seen in REP-2567. max.hirschhorn@10gen.com commented on Tue, 6 Jun 2023 14:17:48 +0000: Chunk migration should fetch expireAfterSeconds as a float I prefer we change the listIndexes command to return the true value stored in the local mdb catalog and have asked this ticket be assigned to the Storage Execution team for this reason. If we aren't going to have the createIndexes command use the NewIndexSpec struct to coerce the index specification to have an integral expireAfterSeconds then I don't see a reason for the listIndexes command to do the coercion. The implemented behavior is misleading. For example, running a createIndexes command with the output of the listIndexes command can return an ok:0 error response rather than an ok:1 "all indexes already exist" response.

Steps to Reproduce


Additional Resources / Links

Share:

BugZero® Risk Score

What's this?

Coming soon

Status

Closed

Learn More

Search:

...