Operational Defect Database

BugZero found this defect 2408 days ago.

MongoDB | 406834

[SERVER-30261] too many connections to a mongod instance will botch performance

Last update date:

11/7/2017

Affected products:

MongoDB Server

Affected releases:

No affected releases provided.

Fixed releases:

No fixed releases provided.

Description:

Info

In some workloads, there are a lot of connections to one or many mongod instances which run as a shard. We then start getting these errors in dmesg: TCP: request_sock_TCP: Possible SYN flooding on port 10105. Sending cookies. Check SNMP counters. TCP: request_sock_TCP: Possible SYN flooding on port 10104. Sending cookies. Check SNMP counters. After the workload is gone, the mongod instance responds very slowly to some very simple queries. If we restart the mongod instances, the problem goes away. We have increased somax, tcp syn backlog, and tcp memory (rmem, wmem), but the issue is not fixed. Is this old ticket still related? https://jira.mongodb.org/browse/SERVER-2554 Thank you

Top User Comments

thomas.schubert commented on Fri, 29 Sep 2017 18:46:50 +0000: Hi thestick613, We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket. Regards, Kelsey ramon.fernandez commented on Thu, 14 Sep 2017 20:30:41 +0000: thestick613, we have not been able to reproduce this issue with the binaries we distribute. Have you by any chance compiled your own binaries? If not, I'm afraid without the information Mark requested above we'll have to close this ticket. Thanks, Ramón. thestick613 commented on Tue, 22 Aug 2017 20:16:57 +0000: Moving to 3.4 from 3.2 improved the performance significantly. We still get the syncookie error, but the server is more stable. I'm suspecting because of the new replication engine in 3.4. thestick613 commented on Tue, 22 Aug 2017 20:14:57 +0000: Hello, You can reproduce this with a brand new mongo setup, there is no need for any log files or diagnostic.data. See this comment. I am using Ubuntu 16.04.2 LTS, and mongod db version v3.4.6 git version: c55eb86ef46ee7aede3b1e2a5d184a7df4bfb5b5 and a generic ubuntu kernel: Linux mongo-rs1 4.4.0-83-generic #106-Ubuntu SMP Mon Jun 26 17:54:43 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux. mark.agarunov commented on Tue, 22 Aug 2017 17:34:40 +0000: Hello thestick613, We still need additional information to diagnose the problem. If this is still an issue for you, would you please provide the following: The complete log files from mongod and mongos Please archive and upload the $dbpath/diagnostic.data directory Please provide the version of MongoDB being used Please provide the operating system, version and kernel version being used. Thanks, Mark mark.agarunov commented on Tue, 8 Aug 2017 17:36:19 +0000: Hello thestick613, Thank you for the report. Unfortunately I have not yet been able to reproduce this issue so I would like to request some additional information to better diagnose the behavior. For all affected nodes, please provide the following: The complete log files from mongod and mongos Please archive and upload the $dbpath/diagnostic.data directory Please provide the version of MongoDB being used Please provide the operating system, version and kernel version being used. Thanks, Mark thestick613 commented on Sat, 22 Jul 2017 09:10:26 +0000: This script manages to generate the kernel message: TCP: request_sock_TCP: Possible SYN flooding on port 10104. Sending cookies. Check SNMP counters. from gevent import monkey monkey.patch_all()   import pymongo import gevent import random   from multiprocessing import cpu_count from multiprocessing.pool import Process   def one_connection(): # gevent.sleep(random.random() * 4) pm = pymongo.MongoClient('10.10.10.100:10104') # gevent.sleep(random.random() * 4) for j in range(100): pm.admin.command({'ping': 1}) gevent.sleep(0.1)   def on_core(per_cpu_threads): tasks = [] # gevent.sleep(random.random() * 4) for i in range(per_cpu_threads): tasks.append(gevent.spawn(one_connection)) gevent.joinall(tasks)   if __name__ == "__main__": tcpus = 72 per_cpu_threads = 10 print tcpus   procs = [] for coreid in range(0, 3 * tcpus): procs.append(Process(target=on_core, args=(per_cpu_threads, )))   for proc in procs: proc.start()   for proc in procs: proc.join() If you uncomment the sleeps, the peak number of connections is still 4320, but there is no more SYNCOOKIE warning. thestick613 commented on Fri, 21 Jul 2017 21:36:16 +0000: strace shows me the limit of 128 is still there, which is too low. [pid 18057] listen(7, 128) = 0 thestick613 commented on Fri, 21 Jul 2017 20:51:24 +0000: We have been having this problem on 3.2. I have upgraded to 3.4 today. We have been having this problem on the shard servers, not on the mongos, but maybe if we tune down the mongos instances, they will be easier on the mongod instances. Our application makes a lot of connections via mongos instances, but it also connects individually to the each shards for some read-only queries. When restarting a mongod instance, the former secondary which gets promoted to a primary also generates this error. The sudden burst of connections seems to be the problem, not the number. ramon.fernandez commented on Fri, 21 Jul 2017 19:28:38 +0000: What version of MongoDB are you using? Depending on which version you are, you may be able to use the knobs in SERVER-25027 to tweak connection pooling in mongos to better suite your needs / node capacity.

Additional Resources / Links

Share:

BugZero Risk Score

Coming soon

Status

Closed

Have you been affected by this bug?

cost-cta-background

Do you know how much operational outages are costing you?

Understand the cost to your business and how BugZero can help you reduce those costs.

Discussion

Login to read and write comments.

Have you ever...

had your data corrupted from a

VMware

bug?

Search:

...