Operational Defect Database

BugZero found this defect 2383 days ago.

MongoDB | 417804

[SERVER-30728] Low Azure socket timeout may cause initial sync failure

Last update date:

10/27/2017

Affected products:

MongoDB Server

Affected releases:

No affected releases provided.

Fixed releases:

No fixed releases provided.

Description:

Info

Hi Team, We are running a MongoDB instance on Azure VM with default settings. We notice that Azure VM tends to close socket connection if it's not active in several minutes. When we are trying to initial sync from MongoDB on Azure VM to another replica set member, syncing always fails because the connection will be dropped when there's no network traffic for several minutes (e.g., when the startup instance is building an index), and initial sync will start all over. A sample log snippet: [building index here...] 2017-08-17T19:06:30.632+0800 I NETWORK [rsSync] Socket recv() errno:10053 An established connection was aborted by the software in your host machine. [***.***.***.***:*****] 2017-08-17T19:06:30.632+0800 I NETWORK [rsSync] SocketException: remote: (NONE):0 error: 9001 socket exception [RECV_ERROR] server [***.***.***.***:*****] 2017-08-17T19:06:30.632+0800 I NETWORK [rsSync] DBClientCursor::init call() failed 2017-08-17T19:06:30.640+0800 E REPL [rsSync] 13386 socket error for mapping query 2017-08-17T19:06:30.640+0800 E REPL [rsSync] initial sync attempt failed, 9 attempts remaining 2017-08-17T19:06:35.641+0800 I REPL [rsSync] initial sync pending 2017-08-17T19:06:35.643+0800 I REPL [ReplicationExecutor] syncing from: <HOSTNAME>:***** 2017-08-17T19:06:36.454+0800 I REPL [rsSync] initial sync drop all databases 2017-08-17T19:06:36.454+0800 I STORAGE [rsSync] dropAllDatabasesExceptLocal 14 2017-08-17T19:06:43.928+0800 I REPL [rsSync] initial sync clone all databases For MongoDB client, this can be resolved by set MaxConnectionIdleTime, but it seems there's no way to configure the same for replica sets, and hence Azure users (if not tweaking OS settings) will find it hard to sync data to another replica set out of Azure VM. Can we have an option to either specify max connection time for replica set, or make the initial sync not fail completely on a single connection failure?

Top User Comments

ramon.fernandez commented on Thu, 14 Sep 2017 21:03:39 +0000: Thanks for the update wekurtz, and glad to hear you've found a solution. I've adjusted the issue summary to make it easier for others to find and I'm going to close it. Regards, Ramón. wekurtz commented on Fri, 25 Aug 2017 00:50:46 +0000: Team - per solution above I'm fine to close this issue. wekurtz commented on Mon, 21 Aug 2017 01:25:34 +0000: This should relates to Azure TCP timeout setting which is only 4 minutes by default. A workaround is to increase Azure timeout to 30min in Azure Powershell: Add-AzureRmAccount $p = Get-AzureRmPublicIpAddress $p.IdleTimeoutInMinutes = 30 Set-AzureRmPublicIpAddress -PublicIpAddress $p By doing so I've eliminated disconnections for my database. wekurtz commented on Fri, 18 Aug 2017 07:16:03 +0000: Hi Ramón, Thank you for the prompt reply. I'm on 3.2 currently. Let me upgrade to 3.4.7 to see if I could replicate this error. ramon.fernandez commented on Fri, 18 Aug 2017 06:57:13 +0000: wekurtz, what version of MongoDB are your running? It seems Azure may have some settings that account for the behavior you're seeing, and would be useful for us to know if the most recent production release (3.4.7) exhibits the behavior you describe. Thanks, Ramón.

Additional Resources / Links

Share:

BugZero Risk Score

Coming soon

Status

Closed

Have you been affected by this bug?

cost-cta-background

Do you know how much operational outages are costing you?

Understand the cost to your business and how BugZero can help you reduce those costs.

Discussion

Login to read and write comments.

Have you ever...

had your data corrupted from a

VMware

bug?

Search:

...