Operational Defect Database

BugZero found this defect 2780 days ago.

MongoDB | 322216

Text search uses excessive memory

Last update date:

3/11/2024

Affected products:

MongoDB Server

Affected releases:

3.2.1

3.2.10

3.4.0-rc0

Fixed releases:

No fixed releases provided.

Description:

Info

As in SERVER-18926 and SERVER-18961 create a collection containing 50 M documents totaling about 4 GB in size with a full-text index, then do a simple search for a single word on the full-text index that returns all the documents. Total memory allocated excluding WT cache is roughly the size of the collection. The top four allocating stacks, accounting for most of the excess: heapProfile stack44: { 0: "tc_malloc", 1: "mongo::mongoMalloc", 2: "mongo::BSONObj::copy", 3: "mongo::BSONObj::getOwned", 4: "mongo::WorkingSetMember::makeObjOwnedIfNeeded", 5: "mongo::TextOrStage::addTerm", 6: "mongo::TextOrStage::readFromChildren", 7: "mongo::TextOrStage::work", 8: "mongo::TextMatchStage::work", 9: "mongo::TextStage::work", 10: "mongo::PlanExecutor::getNextImpl", 11: "mongo::PlanExecutor::getNext", 12: "mongo::FindCmd::run", 13: "mongo::Command::run", 14: "mongo::Command::execCommand", 15: "mongo::runCommands", 16: "mongo::assembleResponse", 17: "mongo::MyMessageHandler::process", 18: "mongo::PortMessageServer::handleIncomingMsg", 19: "0x7f89ec7466aa", 20: "clone" } heapProfile stack41: { 0: "tc_new", 1: "mongo::WorkingSet::allocate", 2: "mongo::IndexScan::work", 3: "mongo::TextOrStage::readFromChildren", 4: "mongo::TextOrStage::work", 5: "mongo::TextMatchStage::work", 6: "mongo::TextStage::work", 7: "mongo::PlanExecutor::getNextImpl", 8: "mongo::PlanExecutor::getNext", 9: "mongo::FindCmd::run", 10: "mongo::Command::run", 11: "mongo::Command::execCommand", 12: "mongo::runCommands", 13: "mongo::assembleResponse", 14: "mongo::MyMessageHandler::process", 15: "mongo::PortMessageServer::handleIncomingMsg", 16: "0x7f89ec7466aa", 17: "clone" } heapProfile stack46: { 0: "tc_new", 1: "void std::vector >::_M_emplace_back_aux", 2: "mongo::IndexScan::work", 3: "mongo::TextOrStage::readFromChildren", 4: "mongo::TextOrStage::work", 5: "mongo::TextMatchStage::work", 6: "mongo::TextStage::work", 7: "mongo::PlanExecutor::getNextImpl", 8: "mongo::PlanExecutor::getNext", 9: "mongo::FindCmd::run", 10: "mongo::Command::run", 11: "mongo::Command::execCommand", 12: "mongo::runCommands", 13: "mongo::assembleResponse", 14: "mongo::MyMessageHandler::process", 15: "mongo::PortMessageServer::handleIncomingMsg", 16: "0x7f89ec7466aa", 17: "clone" } heapProfile stack45: { 0: "tc_new", 1: "mongo::TextOrStage::addTerm", 2: "mongo::TextOrStage::readFromChildren", 3: "mongo::TextOrStage::work", 4: "mongo::TextMatchStage::work", 5: "mongo::TextStage::work", 6: "mongo::PlanExecutor::getNextImpl", 7: "mongo::PlanExecutor::getNext", 8: "mongo::FindCmd::run", 9: "mongo::Command::run", 10: "mongo::Command::execCommand", 11: "mongo::runCommands", 12: "mongo::assembleResponse", 13: "mongo::MyMessageHandler::process", 14: "mongo::PortMessageServer::handleIncomingMsg", 15: "0x7f89ec7466aa", 16: "clone" } By experiment it appears that the amount of memory used is proportional (possibly roughly equal in size) to the number of documents returned.

Top User Comments

JIRAUSER1253472 commented on Mon, 24 Jul 2023 15:30:49 +0000: Is this possibly related? https://jira.mongodb.org/browse/SERVER-79244 david.storch commented on Fri, 11 Nov 2016 22:06:39 +0000: All four stacks which Bruce pasted above are allocations made in order to setup the ScoreMap data structure maintained by the TextOrStage: https://github.com/mongodb/mongo/blob/r3.4.0-rc3/src/mongo/db/exec/text_or.h#L151-L152 This data structure maps from each matching document's RecordId to a pair containing a copy of the corresponding document and its text score. We have to keep a copy of the document since during query yields the storage engine is allowed to free the memory housing the storage subsystem's copy. So it is indeed the case that text queries currently require memory proportional to the size of the result set. This behavior is baked into the current implementation of text search execution. It would require a significant overhaul to fix this in all cases. The good news is that we only need to maintain the ScoreMap structure in order to support computation of text search relevance scores. We hold onto information about documents seen so far so that we can adjust the relevance score when we find a new index key for a document we've already seen. This means that if the query does not request the text score, there is no need to maintain the ScoreMap. This is part of the feature request tracked in related ticket SERVER-26833.

Steps to Reproduce

Additional Resources / Links

Original Vendor Announcement

BugZero® Risk Score

What's this?

Coming soon

Status

Backlog

Search:

...