Sunday, December 20, 2015

Windows TCP buffering

After changing the send buffer size for FTP upload, I thought I would check if SFTP upload mentioned in part 1 also suffers from the same limitation. To my surprise, SFTP uploads were able to push data continuously, even though the send buffer size was not set anywhere in my code.
An experiment showed that the SendBufferSize property after SFTP upload usually contained a different value from the original 8KB; for FTP upload, the value always remained at 8KB. The only apparent difference in code was the application buffer size - SFTP application was sending ~32KB batches, while FTP application was sending 8KB batches all of the time.
I increased the batch size on FTP application side. Immediately it became apparent that the entire batch was transmitted over the wire before waiting for an ACK, which improved the send speed. Furthermore, with batch size of 32KB the SendBufferSize was also automagically raised behind the scenes, further improving the send speed.
I couldn't find any documentation for the observed SendBufferSize changes. The effect of application send buffer on transmission speed was described in this SO question and this MS Knowledge Base article. To summarize, if the application send buffer is the same or larger than the socket send buffer, only 2 pending send operations are allowed at a time. However, the socket can buffer whatever the application passes, even when the application buffer is larger than SendBufferSize, so using a larger application buffer can speed up the transfers the same way using a larger socket buffer can.

Saturday, December 19, 2015

Efficient file transmission, part 2

Right after I have fixed the SFTP performance problem mentioned in part 1, I found that another feed, this time using FTP, is gathering a backlog. Again the bandwidth was not an issue, latency was on the order of 40 ms, but somehow pushing even the smallest files took no less than 5 seconds. Equipped with Wireshark and a lot of enthusiasm, I started looking into the problem.
FTP is a much less sophisticated protocol than SFTP. It creates a dedicated TCP connection for every file transfer. The entire file is transferred over that connection, and closing the connection marks the end of file. All commands are exchanged over so-called control connection. Normal sequence of events for FTP upload is (after establishing the control connection & logging in):
  • Client sends PASV command
  • Server responds with an address, to which the client should upload
  • Client connects to that address and issues a STOR command
  • Client starts pushing the file through the data connection
The first thing I noticed when examining a wireshark trace was a two second gap between receiving PASV response and sending SYN packet; during that gap, two NBNS packets were sent to the server, with no response. NBNS stands for NetBIOS name service, and was completely out of place at that point, because PASV returns an IP address which does not require any lookups.
I launched the application in Visual Studio debugger, and during the 2 second pause I hit "Break all" to see what was happening. I found that the application was using DNS.Resolve to convert string to convert string to IPAddress object. The method is obsolete in .NET 2.0, and documentation suggests to use DNS.GetHostEntry instead. GetHostEntry (and Resolve) method has an interesting behavior that was not documented until .NET 3.5. When you pass an IP string to the method, it first performs a reverse name lookup to get host name, then it performs a name lookup to get IP addresses associated with the name. Since we only wanted an IPAddress object, I changed the code to use DNS.GetHostAddresses, which does not use name servers when passed an IP literal. This saved 2 seconds on every transfer.

Next interesting thing I noticed was that the client was sending data in 8KB batches; it sent a batch, then waited a long time for the server to acknowledge previous batch. Only then it sent another 8KB batch, which meant that most of the time the client was waiting for an acknowledgement from the server.
Receive window size captured in the traces showed plenty of free space; for a brief moment I suspected the congestion window, until I read that congestion window size doubles every round trip time as long as there are no packet losses, which were absent in the traces. Then I found that the default send buffer size in Windows is 8KB.
I modified the send buffer size to 1MB. After that the client was no longer waiting for acknowledgements, but instead was sending data until the receive window was full; then the server advertised a larger receive window, enabling the client to send even more data, until the network started dropping packets.
Final result: files with average size of 500KB that previously took 5+ seconds to upload, now were sent in less than a second over the same link.

Friday, December 18, 2015

Efficient file transmission, part 1

Recently I faced a problem where our tool used to push files over SFTP started gathering a backlog, even though there was plenty of bandwidth available. The link had a pretty high latency, so the first thing I checked was TCP window scaling.
TCP can limit the bandwidth available on long fat networks (LFNs). Receive window limits the amount of data that can be sent over the link, but not yet acknowledged. Receive window in the original specification was limited to 64 KB. Bandwidth on any TCP link that does not employ TCP window scaling is limited to 64KB divided by the network latency - so if the latency is 0.1 seconds, the maximum transfer with 64KB window size is 640KB/s. With window scaling, the receive window can be resized up to 1 GB.
I found that TCP window scaling was implemented in Windows 2000. Linux kernels from year 2004 implement dynamic receive buffers, which are based on TCP window scaling. Since we use more recent OSes, I was pretty certain that window scaling was not a problem here. Using Wireshark I was able to confirm that indeed, the SYN packets advertised a non-zero scale factor.
Quick look at the code revealed that the protocol implementation we had required an immediate acknowledgement of every write message; the SFTP specification says that it is allowable to send multiple messages, and receive the acknowledgements later. Instead of waiting for an acknowledgement of every message, I started sending the entire file, then receiving all acknowledgements. This resulted in a massive speed up.
Driven by the desire to measure the improvement, I created a really large file, uploaded it using the old method, started uploading it using the new method, and... the transfer deadlocked. TCP socket can only buffer a limited amount of data, and what happened was, since I never retrieved the acknowledgements, at some point the server blocked on sending one. When the server blocked, it stopped processing my messages, which in turn blocked the sender.
Unfortunately checking if there's an acknowledgement available for reading in the buffer using the library we had was not an easy task, so I ended up limiting the number of unacknowledged messages to a known safe value, which still yielded a decent performance improvement.

Thursday, December 3, 2015

Playtime: labyrinth

A simple labyrinth generator.
The above labyrinth is generated using regular depth-first search. The result does not present any real challenge, as it contains one long path with very short branches. It is perfectly suitable to teach your 4-year-old to use the keyboard arrows.