Wednesday, October 10, 2018

TCP war story 3: excessive packet reordering

So there is this user who says that suddenly our server started responding very slow. Earlier a page would load in a second, now it takes 4-5 minutes to load, he says. We run some checks, and the server is snappy as always. So we ask him for a Fiddler capture. Indeed, the loading times are high. We replay the same requests, and they are super fast here. We tell the user to go to network support. Network support says ping is good, packet loss is zero, no other users are complaining, must be a problem with the application.

How can there be a problem with the network if ping is good and there's no packet loss?

Well apparently there can.

We asked the user for a packet capture. This wasn't his first encounter with the network support, so he knew how to use Wireshark. Good for us. So we open the pcap file and immediately notice a lot of black color. Every second or so a packet arrives 5 milliseconds late, and other packets arrive before it. TCP stack reacts correctly by sending duplicate ACKs, and once the late packet arrives, there's one cumulative ack for everything. But that's too late, 3+ duplicate acks were sent. Once the server receives these, it stops slow start mode and implements congestion control, which is CTCP, so send rate is halved.
To make things worse, MSS is only 587 bytes, and RTT is on the order of 220 milliseconds. Resulting transfer rate is about 20 KB/s, far below 1Gbit/s that is normally available.

How do you convince the network support team that it's a problem that they need to fix? Well, I'm still trying to figure it out.

Thursday, September 20, 2018

TCP war story 2: overzealous SYN defense

Further experimenting with the load balanced system we found that while a few simultaneous connections get multi-megabyte throughput, running 10 or more connections at the same time resulted in some of the connections being very slow.

Running tcpdump revealed that the slow connections received SYN cookies; the SYN/ACK packet did not contain window scaling options, and receive window was limited at 64KB, again limiting the throughput at 640KB/s.

The TTL on the SYN/ACK packet was different from the TTL on all other packets on the connection; this allowed us to determine that SYN/ACK did not come from the server, but was sent by a firewall along the way.

The firewall was a Checkpoint device configured with very eager SYN defense settings. After adjusting these settings, the problem was eliminated and we were finally able to enjoy fast transfers on all connections.

TCP war story 1: F5 BIGIP load balancer

Recently I run into trouble using F5 load balancer; it was configured with standard TCP profile, and provided great performance within a data center. However, transfers crossing WAN boundaries had their throughput severely limited.

I don't normally deal with network devices, so it was a surprise for me when I found that the load balancer with TCP profile is in fact a proxy. The TCP profile limits send buffer size to 64 KB, and therefore limits throughput to 64KB / RTT, in our case 640KB/s.

After raising the send buffer size to 1MB we were able to get 10MB/s transfers, which were acceptable for our uses. Fortunately the F5 device had sufficient memory to support that buffer size.

Only later I found that F5 also supports fastL4 mode, in which case it does not act as a proxy, but rather as a regular router. In that mode the send buffer is controlled by the server directly. This reduces the memory requirements, allowing F5 to serve more connections, and shifts the responsibility for throughput to the application.

Thursday, September 13, 2018

[MSVC] Linking to DLLs when no .lib is available

Reference:
https://stackoverflow.com/a/16127548/7707617

Step 1) Generate exports file
>dumpbin /exports libcurl.dll > libcurl.exports

Step 2) Edit exports file to leave just the word EXPORTS in the first line and function names in the following lines. The result is a .def file.

Step 3) Create lib from def file:
>lib /def:libcurl.def /out:libcurl.lib

Step 4) Pass the resulting lib file to linker as usual.

Thursday, July 19, 2018

Libcurl and slow uploads on Windows

Recently I started using libCurl 7.60 to upload files from Windows machines to Amazon cloud using HTTP POST. The uploads don't perform well. They are limited by the send buffer used by Windows.
On Windows 2008R2 (and probably on more recent versions as well) the system only puts on the wire the bytes it has buffered, and this is limited by SO_SNDBUF (default 8KB) or by the send buffer used by application code (CURL_MAX_WRITE_SIZE, 16KB), whichever is higher. In case of CURL uploads, the system buffers 16KB, then waits for acknowledgement from the other end before buffering and sending the next batch. This is readily visible in Wireshark traces.

Starting with Windows 7 / 2008R2, Windows implements send buffer autotuning. This is well described here.

Theoretically on these systems the send buffer should be automatically adjusted to optimize throughput. I run a couple experiments to confirm that, and found that the send buffer is only adjusted if the socket is blocking and application buffer size is reasonably large (in my experiment 16KB was too small, but 20KB was sufficient), and buffer stays at 8192 if the socket is nonblocking regardless of the application buffer size.

Curl is using nonblocking sockets, and switching to blocking would break existing functionality. In order to use the optimal buffer size, it would need to periodically update SO_SNDBUF to a value provided by SIO_IDEAL_SEND_BACKLOG_QUERY or SIO_IDEAL_SEND_BACKLOG_CHANGE.

I sent a proof-of-concept patch to curl-library mailing list. With some luck the patch will find its way to the next curl release.

Wednesday, June 13, 2018

RDP session hijacking (without password / as administrator)

  1. Get session ID of the session you want to connect to
  2. Get PsExec
  3. Run CMD under System account from admin CMD: psexec -i -s -d cmd
  4. Run tscon.exe <sessionID>
 Sources:
  1.  PsExec
  2. Getting a CMD prompt as SYSTEM in Windows Vista and Windows Server 2008 
  3. RDP hijacking — how to hijack RDS... 

Wednesday, March 21, 2018

Bulk loading data to SQL server

Source: tab-separated file
Destination: table with the same structure as the file
Command:
bcp <table> in <file> -c -S <server> -U <user>

Other options:
-t : column separator (tab is the default)
-r : row separator (default: newline)
-f : format file - can be used to import files that do not match the list of columns in DB table