Operational Defect Database

BugZero found this defect 172 days ago.

MongoDB | 2512419

Full Text Search incorrectly matches partial ID-like phrases

Last update date:

3/11/2024

Affected products:

MongoDB Server

Affected releases:

6.0.5

Fixed releases:

No fixed releases provided.

Description:

Info

It’s my understanding that $text searches do not match partial words or phrases. However, when I am searching in my MongoDB with a text index, it is returning matches that partially match an ID-like search query, for example: This search: {{{ $text: {$search: '"23-X-1"' }}}} Matches this document: {{ { "value": "123-X-1 Hello" } }} Especially since it is quoted, I’d expect it to not match, since the phrase "23-X-1" doesn’t appear isolated anywhere in the value, but only appears as a partial match of the greater phrase "123-X-1".

Top User Comments

JIRAUSER1274755 commented on Fri, 29 Dec 2023 14:43:22 +0000: Hi Team, There was some initial confusion on my part as to what the issue was here, but it looks like the user is reporting that tokenization is not applying to numerical values and is uncertain as to whether or not that is intentional. The end result of the current behavior(confirmed, and can be replicated using a simple replication js) is that strings that contain numbers and dashes are partially matched when perhaps they should not be (the users example being 123-X-1 when $text is given "23-X-1" to search for). Is this intended? JIRAUSER1274755 commented on Fri, 29 Dec 2023 14:39:53 +0000: Hello eric@ericmakesapps.com, Thank you for explaining in more detail, that does clarify your report. As for the behavior that you've reported, I have confirmed that using your instructions I was able to replicate said behavior. I'm going to move this ticket over to the appropriate team to confirm whether or not this behavior is expected. JIRAUSER1275311 commented on Tue, 26 Dec 2023 21:49:14 +0000: In this particular case, the search term is an "Exact Phrase", so tokenization of search terms should not come into play here. This issue is more about matches being found seemingly using partial word matching, where-as the fact that #15090 is still open strongly implies that it should not match based on partial word matching, as contrasted with Atlas deployments that can support partial (or fuzzy) searching for text searches. The crux is, I’d not expect the search term to be found since it’s not present in the text being searched, even when the text being search is tokenized by whitespace and/or punctuation. It seems to split apart numbers (it finds a match for 23 in the number 123, but that should not be the case without partial matching, right?). I don’t know if I’m explaining it very well, but let me know if that is clear(er). JIRAUSER1274755 commented on Tue, 26 Dec 2023 21:33:04 +0000: Hi eric@ericmakesapps.com, Apologies for the incomplete link list, the documentation that I was referring to is as follows: https://www.mongodb.com/docs/manual/core/text-search-operators/#std-label-text-search-operators-on-premises https://www.mongodb.com/docs/manual/core/link-text-indexes/#std-label-text-search-on-premises https://www.mongodb.com/docs/manual/core/indexes/index-types/index-text/#std-label-index-type-text https://www.mongodb.com/docs/manual/tutorial/text-search-in-aggregation/#std-label-text-agg As noted in our "Perform a Text Search" documentation, tokenization is performed on the provided terms (which itself links to further documentation). JIRAUSER1275311 commented on Tue, 26 Dec 2023 20:50:30 +0000: Is anyone alive around here? This is was incorrectly closed, as far as I can tell. rhea.thorne@mongodb.com says the documentation page mentions tokenization, as if that should cause the text search to match partial phrases, but that doesn’t really seem the case. JIRAUSER1275311 commented on Sat, 16 Dec 2023 14:31:10 +0000: Can you please reopen this, or link to the exact part of the documentation that talks about this tokenization on text search that causes it to match partial words, contrary to the known current behavior? JIRAUSER1275311 commented on Fri, 15 Dec 2023 14:53:27 +0000: I mean, I read through the whole documentation before creating this ticket. It obviously is how it currently behaves, but I wouldn't say it’s expected. Everything in the documentation is geared towards "words" and "phrases", nowhere implying that it could or would match a partial word. In fact, there's an open work item about adding the ability to match partial words expressly for that reason (#15090). This definitely leads me to believe that $text should not match partial words in the content. JIRAUSER1274755 commented on Fri, 15 Dec 2023 14:43:27 +0000: Hello eric@ericmakesapps.com, Thank you for your report. The behavior that you've described is the intended behavior. $text searches tokenize your search items, and will search the given fields for appearances of said search items. You can read more about this on our documentation page. At this time, I'll be closing this ticket as the behavior described is intended.

Steps to Reproduce

Add a document to a collection, something like {{ { "value": "123-X-1 Hello" } }} Add a text index on the value property Perform a text search with something like {{ { $text: \{$search: '"23-X-1"' } }}}

Additional Resources / Links

Share:

BugZero® Risk Score

What's this?

Coming soon

Status

Investigating

Learn More

Search:

...