Operational Defect Database

BugZero found this defect 185 days ago.

MongoDB | 2501728

Compound Wildcard Index bounds reverts to [MinKey, MaxKey] when filter contains fields outside CWI

Last update date:

3/14/2024

Affected products:

MongoDB Server

Affected releases:

7.0.1

8.0.0-rc0

Fixed releases:

No fixed releases provided.

Description:

Info

I'm observing a weird behaviour with CWI's, where queries containing filters beyond those covered by the CWI, results in the `indexBounds` to revert to '[MinKey, MaxKey]`. Say I have CWI: db.cwi.createIndex({ TenantId: 1, 'CustomFields.$**': 1, Date: 1 }) If I query only using fields contained in the CWI: db.cwi.explain().find({ TenantId: 1, 'CustomFields.Prop': 2, Date: { $gte: ISODate('2023-01-01'), $lte: ISODate('2024-01-01') }, }) I get this winning query plan: { "stage": "FETCH", "planNodeId": new NumberInt("2"), "inputStage": { "stage": "IXSCAN", "planNodeId": new NumberInt("1"), "keyPattern": { "TenantId": new NumberInt("1"), "$_path": new NumberInt("1"), "CustomFields.Prop": new NumberInt("1"), "Date": new NumberInt("1") }, "indexName": "TenantId_1_CustomFields.$**_1_Date_1", "isMultiKey": false, "multiKeyPaths": { "TenantId": [ ], "$_path": [ ], "CustomFields.Prop": [ ], "Date": [ ] }, "isUnique": false, "isSparse": true, "isPartial": false, "indexVersion": new NumberInt("2"), "direction": "forward", "indexBounds": { "TenantId": [ "[1, 1]" ], "$_path": [ "[\"CustomFields.Prop\", \"CustomFields.Prop\"]" ], "CustomFields.Prop": [ "[2, 2]" ], "Date": [ "[new Date(1672531200000), new Date(1704067200000)]" ] } } } But if I add an extra criteria, say ` { Foo: 'Bar' } `: db.cwi.explain().find({ TenantId: 1, 'CustomFields.Prop': 2, Date: { $gte: ISODate('2023-01-01'), $lte: ISODate('2024-01-01') }, Foo: 'Bar' }) I suddenly get a query plan where the index bounds on `CustomFields.Prop` and `Date` are both `[MinKey, MaxKey]`: { "stage": "FETCH", "planNodeId": new NumberInt("2"), "filter": { "$and": [ { "CustomFields.Prop": { "$eq": new NumberInt("2") } }, { "Date": { "$lte": new ISODate("2024-01-01T00:00:00.000Z") } }, { "Date": { "$gte": new ISODate("2023-01-01T00:00:00.000Z") } }, { "Foo": { "$eq": "Bar" } } ] }, "inputStage": { "stage": "IXSCAN", "planNodeId": new NumberInt("1"), "keyPattern": { "TenantId": new NumberInt("1"), "$_path": new NumberInt("1"), "Date": new NumberInt("1") }, "indexName": "TenantId_1_CustomFields.$**_1_Date_1", "isMultiKey": false, "multiKeyPaths": { "TenantId": [ ], "$_path": [ ], "Date": [ ] }, "isUnique": false, "isSparse": true, "isPartial": false, "indexVersion": new NumberInt("2"), "direction": "forward", "indexBounds": { "TenantId": [ "[1, 1]" ], "$_path": [ "[MinKey, MaxKey]" ], "Date": [ "[MinKey, MaxKey]" ] } } } Note the index bounds and the `filter` on the outer fetch stage which is suddenly doing the work of filtering on `CustomFields.Prop` and `Date`. I see no reason why this is the case, an optimal query plan would be as before with the CWI filtering by all fields it covers, and then only a filter on the fetch stag on ` { Foo: 'Bar' } `.

Top User Comments

JIRAUSER1275839 commented on Tue, 27 Feb 2024 13:32:14 +0000: ivan.fefer@mongodb.com Awesome, thanks for creating a much simpler reproduction, I'm sure that will make it much easier to debug and diagnose. 🙈 Coming from outside-in, it's so much a black-box it can be hard at times figuring out which heuristics are affecting what, and working backwards from an outcome and to why the query planner choose a certain way. Honestly it would be so great if there were more ressources on the heuristics employed and even more ways of seeing what factors are being included in choosing one plan over another. All those times where we can observe a badly chosen plan, we often can see the (in our human minds) correct plan among the rejected plans, and we often are able to force it with hint and observe drastically improved performance - thus often leading us to make assumptions about "why did it get confused for this query". But I'm super excited for the potential of progress on the behaviour we are seeing with CWI's, as I'm hoping to be able to replace those 20 indexes with a single CWI freeing up a lot of index and working memory in turn (lots of overlap between those 20 indexes compared to a single CWI). JIRAUSER1270969 commented on Tue, 27 Feb 2024 13:25:09 +0000: Thanks for the response. Currently I am clarifying some CWI behavior and get back to you on that. That seems to be different issue, so pulling 7.0.6-rc0 probably won't fix it. > it seems like it's mainly focusing on which index provides any results the quickest, and not factoring in all the filtering applied in later stage That is false, we consider the whole query when picking an index, of course. Sorry to hear that our optimizer giving you this impression. I have created a simple script that reproduces the problem: repro.js JIRAUSER1275839 commented on Tue, 27 Feb 2024 13:16:42 +0000: ivan.fefer@mongodb.com it could potentially be the bounds which end up making the CWI unpalatable for the query planner? Would also explain why forcing it ends up with the same bounds which perform poorly. Any way of pulling a 7.0.6-rc0 docker image, then I could test whether the same behaviour can be observed for that version also? Also, let me know if you want the test data I'm using as mentioned it's a rather large gzip dump of just shy of 1GB (we use it for synthetic tests when evaluating the performance of ideas we have without using real customer data). JIRAUSER1275839 commented on Tue, 27 Feb 2024 12:53:18 +0000: Not knowing the internals of how query plans are chosen, but entirely anecdotaly based on my experience through using MongoDB in production, it seems like it's mainly focusing on which index provides any results the quickest, and not factoring in all the filtering applied in later stages or tracking the number of false-positives any given plan is producing and then favoring not only plans with fast output from an index, but also those with a low false-positive percentage. I often see slow queries where MongoDB is choosing indexes which match the query shape very poorly but typically indexes which I expect it to quickly be able to start getting data out of, but where the vast majority is false-positives filtered after a FETCH stage later. Thus we see it often using the same index for many different queries even though we have many optimized indexes for each field we are capable of filtering on. It's even gotten to a place where we have considered doing some analysis of the cardinality of our fields and then baking into our own code a selection-heuristic added as a `hint` to our MongoDB query based on which filters we know we are applying and which of these have the highest cardinality and thus should shave of the most false-positives, rather than relying on MongoDB being able to pick a winning plan. But sorry, that's off topic - but somewhat related. 🙈 JIRAUSER1270969 commented on Tue, 27 Feb 2024 12:50:22 +0000: Thanks for the response. I see that in your provided query we actually consider CWI as an valid alternative, but we fail to generate correct index boundaries for it: { "stage": "FETCH", "filter": { "$and": [ { "Metadata.CustomProp1": { "$eq": "sunt" } }, { "State": { "$eq": "Success" } } ] }, "inputStage": { "stage": "IXSCAN", "keyPattern": { "OrganisationId": 1, "$_path": 1, "DateTime": -1 }, "indexName": "OrganisationId_1_Metadata.$**_1_DateTime_-1", "isMultiKey": false, "multiKeyPaths": { "OrganisationId": [], "$_path": [], "DateTime": [] }, "isUnique": false, "isSparse": false, "isPartial": false, "indexVersion": 2, "direction": "forward", "indexBounds": { "OrganisationId": ["[BinData(3, D99052341C37B742B3FB25C927042153), BinData(3, D99052341C37B742B3FB25C927042153)]"], "$_path": ["[MinKey, MaxKey]"], "DateTime": ["[MaxKey, MinKey]"] } } } JIRAUSER1275839 commented on Tue, 27 Feb 2024 12:48:09 +0000: Even if I force it to use the index I want, it ends up doing the same [MinKey,MaxKey] on all the fields other than `OrganisationId` as with the other index, so it ends up being just as slow. db.Conversation.explain('allPlansExecution').find({ OrganisationId: UUID('345290d9-371c-42b7-b3fb-25c927042153'), 'Metadata.CustomProp1': 'sunt', State: 'Success' }).hint({OrganisationId: 1, 'Metadata.$**': 1, DateTime: -1}) So it seems like it's not allowing itself to optimally use the index, perhaps some safe-guard against edge cases which might occur, or similar things but which might be applying themselves in cases where they are not present. In this case, I would argue that of course it should be able to use the CWI, with bounds based on OrganisationId and `Metadata.CustomProp1`, and then just [MinKey, MaxKey] on DateTime, with a filter on the proceeding FETCH stage on `State` which filters out all the false-positives provided from the CWI. I can't see any false-negatives not being included here, which would preclude using the CWI. JIRAUSER1275839 commented on Tue, 27 Feb 2024 12:35:22 +0000: Hi Ivan, I have re-created my tests using a sample database I have with 250k documents which have similar form to our real data, with similar cardinality of fields etc., I can provide it for your reference if there is somewhere I can provide you the shy of 1GB gzipped dump, I'd prefer not to make it publicly available, even though it's filled with auto-generated data. The documents contained within have a field `Metadata` which has 20 fields which today are covered by 20 individual indexes. To test with real data, I have replaced those 20 indexes (Metadata.CustomProp[1-10] and Metadata.CustomNumberProp[1-10]) with a single CWI: db.Conversation.createIndex({ OrganisationId: 1, 'Metadata.$**': 1, DateTime: -1 }) Then if I query only using fields covered by the CWI, I get: db.Conversation.explain('allPlansExecution').find({ OrganisationId: UUID('345290d9-371c-42b7-b3fb-25c927042153'), 'Metadata.CustomProp1': 'sunt' }) with result: query_base.json From above, I note that it picked the expected index `OrganisationId_1_Metadata.$**_1_DateTime_-1`. Now I add a field not covered by the CWI, but also not covered by any other index (this is a low-cardinality field where 99,9% of all documents have the wanted value, so false-positives are extremely low, so it's better to not index or include in compound indexes and just filter after a `FETCH` stage the few if any false positievs). db.Conversation.explain('allPlansExecution').find({ OrganisationId: UUID('345290d9-371c-42b7-b3fb-25c927042153'), 'Metadata.CustomProp1': 'sunt', State: 'Success' }) And we observe: query_with_extra_field.json In this case, it ended up picking `OrganisationId_1_Subject_1_DateTime_-1`, a basically useless index as it's just doing [MinKey,MaxKey] on all fields other than `OrganisationId` anyway and then doing basically all the filtering as a filter in the FETCH stage afterwards. The data set only contains one OrganisationId so it's effectively a collscan in this case. This matches the behaviour we are seeing in production. It seems very consistent about not wanting to use CWI's as soon as any field not contained in the CWI is included in the filter. Whereas if I go back to a regular compound index, it's more happy to use it. But our collections are multi-tenancy so we need compound indexes to effectively fetch data for a given tenant only. I'll see if I can find a docker image of 7.0.6 and whether I can reproduce it with the same dataset in that database. JIRAUSER1270969 commented on Tue, 27 Feb 2024 11:55:16 +0000: Hello Michael, It would be great if you can attach .explain("allPlansExecution") output for the queries in question. Thank you. One possible explanation is that because there is no field Foo with the value Bar in the collection, the query planner can't pick the correct index because both correct and wrong indexes match 0 documents. Could you please try the same query, but using more realistic predicate? Also, in 7.0.6 we implemented some heuristics to improve tie breaking between indexes in such cases. Unfortunately, 7.0.6 is only available in a form of release candidate, so it is not recommended for production use, but you can test if this problem is resolved there (it should be). JIRAUSER1275839 commented on Mon, 22 Jan 2024 08:10:32 +0000: Is there any further info I can provide to help identify the reason behind this? We really would love to deploy CWI's in our production code, as we have a set of 20 indexes which all could be replaced by a single CWI - but when we tried doing this the performance tanked, and looking into the profiler output I observed that MongoDB was not using the CWI under any circumstance. This led me to try and investigate why this was, and at least I came up with this behaviour which I found weird. It could explain why it wasn't being used as we typically have a lot of criteria in our filters many of which would not be covered by the CWI, but the CWI should be able to shave off a huge amount of false-positives. JIRAUSER1275839 commented on Thu, 16 Nov 2023 10:40:21 +0000: If I "dumb" down the index to a regular wildcard index only, i.e.: db.cwi.createIndex({ 'CustomFields.$**': 1 }) And run the same query, I get this winning query plan: { "stage": "FETCH", "planNodeId": new NumberInt("2"), "filter": { "$and": [ { "Date": { "$lte": new ISODate("2024-01-01T00:00:00.000Z") } }, { "Date": { "$gte": new ISODate("2023-01-01T00:00:00.000Z") } }, { "Foo": { "$eq": "Bar" } }, { "TenantId": { "$eq": new NumberInt("1") } } ] }, "inputStage": { "stage": "IXSCAN", "planNodeId": new NumberInt("1"), "keyPattern": { "$_path": new NumberInt("1"), "CustomFields.Prop": new NumberInt("1") }, "indexName": "CustomFields.$**_1", "isMultiKey": false, "multiKeyPaths": { "$_path": [ ], "CustomFields.Prop": [ ] }, "isUnique": false, "isSparse": true, "isPartial": false, "indexVersion": new NumberInt("2"), "direction": "forward", "indexBounds": { "$_path": [ "[\"CustomFields.Prop\", \"CustomFields.Prop\"]" ], "CustomFields.Prop": [ "[2, 2]" ] } } } Which is what I would expect, it uses the wildcard index to filter all that it can, and then do the rest as a filter on the fetch stage. So it would seem to perhaps be related to CWI's.

Steps to Reproduce

Additional Resources / Links

Original Vendor Announcement

BugZero® Risk Score

What's this?

Coming soon

Status

Investigating

Search:

...