Operational Defect Database

BugZero found this defect 2444 days ago.

MongoDB | 394128

[SERVER-29647] Avoid moving $match to be before $sort + $limit

Last update date:

10/30/2023

Affected products:

MongoDB Server

Affected releases:

No affected releases provided.

Fixed releases:

3.4.6

3.5.10

Description:

Info

Background — $sort + $match When $match follows $sort it gets moved to be ahead of the $sort. This makes sense because it reduces the N in the O(NlogN) of the sort, ie. the aggregation: > db.foo.explain().aggregate( [ { $group: { _id: '$a' } }, { $sort: { b: 1 } }, { $match: { c: 1 } } ] ) can be more efficiently implementated (with identical results) by: > db.foo.explain().aggregate( [ { $group: { _id: '$a' } }, { $match: { c: 1 } }, { $sort: { b: 1 } } ] ) And indeed both of these aggregations have the same explain plan: > db.foo.explain().aggregate( [ { $group: { _id: '$a' } }, { $sort: { b: 1 } }, { $match: { c: 1 } } ] ) { "stages" : [ { "$cursor" : { "query" : {   }, "fields" : { "a" : 1, "_id" : 0 }, "queryPlanner" : { "plannerVersion" : 1, "namespace" : "test.foo", "indexFilterSet" : false, "parsedQuery" : {   }, "winningPlan" : { "stage" : "COLLSCAN", "direction" : "forward" }, "rejectedPlans" : [ ] } } }, { "$group" : { "_id" : "$a" } }, { "$match" : { "c" : 1 } }, { "$sort" : { "sortKey" : { "b" : 1 } } } ], "ok" : 1 } > db.foo.explain().aggregate( [ { $group: { _id: '$a' } }, { $match: { c: 1 } }, { $sort: { b: 1 } } ] ) { "stages" : [ { "$cursor" : { "query" : {   }, "fields" : { "a" : 1, "_id" : 0 }, "queryPlanner" : { "plannerVersion" : 1, "namespace" : "test.foo", "indexFilterSet" : false, "parsedQuery" : {   }, "winningPlan" : { "stage" : "COLLSCAN", "direction" : "forward" }, "rejectedPlans" : [ ] } } }, { "$group" : { "_id" : "$a" } }, { "$match" : { "c" : 1 } }, { "$sort" : { "sortKey" : { "b" : 1 } } } ], "ok" : 1 } Background — $sort + $limit When $sort is followed by $limit (a common use case), the $limit gets pushed into the $sort stage. This allows a more efficient top-k algorithm to be used, which is usually O(Nlogk) (which reduces to O(N) if k is small) — compared to O(NlogN) to naively sort the documents and then apply the limit. eg. in the following aggregation, the maximum document output by $group can be found after a single pass: > db.foo.explain().aggregate( [ { $group: { _id: '$a' } }, { $sort: { b: 1 } }, { $limit: 1 } ] ) { "stages" : [ { "$cursor" : { "query" : {   }, "fields" : { "a" : 1, "_id" : 0 }, "queryPlanner" : { "plannerVersion" : 1, "namespace" : "test.foo", "indexFilterSet" : false, "parsedQuery" : {   }, "winningPlan" : { "stage" : "COLLSCAN", "direction" : "forward" }, "rejectedPlans" : [ ] } } }, { "$group" : { "_id" : "$a" } }, { "$sort" : { "sortKey" : { "b" : 1 }, "limit" : NumberLong(1) } } ], "ok" : 1 } Problem — $sort + $limit + $match Now consider when a $match occurs after $sort + $limit. This table shows the number of documents processed after the $group stage, with a $limit of 1, for the best case ($match matches no documents) and worst case ($match matches all documents): aggregation best case worst case [ { $group: { _id: '$a' } }, { $match: { c: 1 } }, { $sort: { b: 1 } }, { $limit: 1 } ] N 2N + 1 [ { $group: { _id: '$a' } }, { $sort: { b: 1 } }, { $limit: 1 }, { $match: { c: 1 } } ] N N + 2 This shows that for $sort + $limit: 1 it is better (and never worse) for the $match to appear afterwards. For larger k, this may not be the case, and for k = N, the problem degenerates to sorting without a limit (and hence the $match is better off before the $sort). Unfortunately, the behaviour of the server is to always move $match ahead of the $sort + $limit (even though sometimes this is worse): > db.foo.explain().aggregate( [ { $group: { _id: '$a' } }, ... { $sort: { b: 1 } }, { $limit: 1 }, ... { $match: { c: 1 } } ] ) { "stages" : [ { "$cursor" : { "query" : {   }, "fields" : { "a" : 1, "_id" : 0 }, "queryPlanner" : { "plannerVersion" : 1, "namespace" : "test.foo", "indexFilterSet" : false, "parsedQuery" : {   }, "winningPlan" : { "stage" : "COLLSCAN", "direction" : "forward" }, "rejectedPlans" : [ ] } } }, { "$group" : { "_id" : "$a" } }, { "$match" : { "c" : 1 } }, { "$sort" : { "sortKey" : { "b" : 1 }, "limit" : NumberLong(1) } } ], "ok" : 1 } Conclusion Ideally there are two changes here: Avoid moving $match ahead of a $sort with limit. Move $match to appear after a $sort with limit of 1. The second could be optional, since the first would allow users to manually adjust their aggregation (based on their N and k). The server should avoid moving $match ahead of a $sort with limit. This change may be harder if the $match move happens before the $limit is coalesced into the $sort.

Top User Comments

xgen-internal-githook commented on Mon, 26 Jun 2017 20:42:58 +0000: Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'} Message: SERVER-29647 Fix $match swapping to avoid moving before $sort-$limit. (cherry picked from commit 8b8845703f0c92bafca58ce5d9f36fd2327301d1) Conflicts: jstests/core/views/views_aggregation.js Branch: v3.4 https://github.com/mongodb/mongo/commit/8bc18f2367d616ef1e68b3be4acfaa1d2e8daa6e xgen-internal-githook commented on Fri, 23 Jun 2017 20:01:03 +0000: Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'} Message: SERVER-29647 Fix $match swapping to avoid moving before $sort-$limit. Branch: master https://github.com/mongodb/mongo/commit/8b8845703f0c92bafca58ce5d9f36fd2327301d1 asya commented on Mon, 19 Jun 2017 16:50:52 +0000: Right, since $match can't be moved ahead of $limit ever, it also can't be moved in front of sort-with-limit. kevin.pulo@10gen.com commented on Mon, 19 Jun 2017 07:33:43 +0000: Nice counter-example! I've updated the "Conclusion" section of the description accordingly. charlie.swanson commented on Thu, 15 Jun 2017 19:46:55 +0000: Missed the second point in your conclusion: Move $match to appear after a $sort with limit of 1. I think this is also incorrect, as it could possibly change the sort order. charlie.swanson commented on Thu, 15 Jun 2017 19:44:39 +0000: I think this is actually even simpler. I think it is a straight-up bug to move a $match before a $sort if the $sort has absorbed a $limit: (mongod-3.4.4) test> db.bar.aggregate([{$sort: {x: -1}}, {$limit: 1}, {$match: {x: {$mod: [2, 1]}}}]) { "_id": ObjectId("5942e317b8328df32ca4fdb2"), "x": 3 } (mongod-3.4.4) test> db.bar.aggregate([ {$sort: {x: -1}}, {$limit: 1}, {$project: {y: "$x"}} /* renames x to trick $match to stay after $sort */, {$match: {y: {$mod: [2, 1]}}} ]) // No results

Additional Resources / Links

Share:

BugZero Risk Score

Coming soon

Status

Closed

Have you been affected by this bug?

cost-cta-background

Do you know how much operational outages are costing you?

Understand the cost to your business and how BugZero can help you reduce those costs.

Discussion

Login to read and write comments.

Have you ever...

had your data corrupted from a

VMware

bug?

Search:

...