History of HTTP
Since HTTP/1.0, the HTTP protocol was a means for request - response communication. The client sends a request to the server, and the server sends a response. The request contains at least a request method (GET, POST etc.) and requested resource path, and can contain other headers and request body. The response contains at least the status code, and can contain other headers and response body.
HTTP/1.0 used a single TCP connection per request. The client opened a connection, sent the request, read the response and closed the connection. This scheme was inefficient for a few reasons:
- TCP connections start out in a so-called "slow start" state. When they start, the data transfer rates are artificially limited, and the transfer rates increase as more and more data is transferred. Since HTTP/1.0 uses a different TCP connection for each request, all requests observe the slow transfers.
- Using encryption (TLS) makes things even worse. TLS requires large amounts of CPU during connection establishment, and more connections mean more CPU usage.
HTTP/1.1 addressed these points by reusing connections. In HTTP/1.1, the client could send multiple requests over the same connection, and the server would respond with multiple responses over the same connection. This required some changes to the HTTP protocol, specifically, both the client and the server are now required to declare where the message body ends, either by sending a content-length header, a transfer-encoding header, or by using methods that are known to send no content. Other than this, the protocol looks exactly like before.
HTTP/1.1 improved the transfer speeds a lot, but still left some room for improvement:
- the responses had to be sent in the order in which requests were received. So, if the client sent multiple requests and generating a response for the first request took time, the connection could not be used for transferring other data and was idle.
- while multiple requests could be sent at the same time in theory, in practice many servers were unable to handle such pipelined requests due to implementation bugs. In practice the clients often wait for the server response before sending a follow-up request, leaving the connection idle again.
HTTP/2 addressed these limitations by introducing multiplexing. Multiplexing means that it is now possible to send multiple streams over a single TCP connection. Each stream enables data transfer in both directions, and the connection can alternate between different streams at any time. Each request / response exchange is done on its own stream.
HTTP/2 can fully utilize a TCP link. Both the client and the server can send data over any stream at any time, so the connection is only idle when there is nothing to send on any of the active streams.
Full speed of TCP was not sufficient for the authors of HTTP/3. TCP also has some room for improvement:
- before any data can be exchanged, the connection goes through a 3-way handshake, which takes one round-trip time to complete,
- if any TCP packet is lost, data in subsequent packets is not deliverable until the lost packet is retransmitted and received,
- there are many other points where a new protocol could improve upon TCP, I will discuss them later.
Compared to HTTP/2, HTTP/3 offers only cosmetic changes. The big change comes from replacing TCP with QUIC as the underlying protocol.
QUIC protocol
QUIC replaces TCP as the underlying transport for HTTP/3. Similar to TCP, it offers reliable in-order delivery. Unlike TCP, QUIC is always encrypted. Also QUIC supports multiple data streams. HTTP/2 had to implement its own multiplexing, HTTP/3 delegates that to the QUIC layer. One advantage of multiplexing on the QUIC layer is that data loss on one stream does not block data delivery on other streams.
Importantly though, everything in QUIC is encrypted end to end, including packet numbers, acknowledgements, and reset packets. This limits the options for the network devices to interfere with QUIC traffic, or to ossify on a specific QUIC version.
Compared to TCP, I find the following differences interesting:
Path MTU detection (PMTUD)
With TCP, MTU detection can be performed using one of the following methods:
- The SYN packet includes a Maximum Segment Size extension. This extension can be modified by routers along the way, and the recipient calculates the maximum transfer unit based on the received MSS.
- If a router on path receives a packet larger than it can handle without fragmentation, it drops the packet, and sends an ICMP message back to the sender with information about maximum supported size.
None of these methods is authenticated. When an endpoint receives a MSS or an ICMP packet, it has no way to determine if it is authentic or forged.
There were cases where the ICMP packets were used to trick TCP endpoints to use very small packet sizes, and as a result many implementations ignore or block the ICMP packets. This can sometimes lead to a situation where the TCP stack selects a MTU larger than supported by the network, and then the connection breaks once the connected parties try to send data.
With QUIC, MTU detection can be performed using one of the following methods:
- The handshake is performed using 1200-byte datagrams, and will fail if the network does not support this datagram size
- support for larger datagram sizes is probed. If a given packet size is acknowledged by the peer, that packet size is supported. ICMP packets may be used to drive the selection of the packet sizes to probe, but they cannot be used to reduce the packet size below 1200 bytes.
- an endpoint can advertise the maximum datagram size it is willing to accept. If this number is modified in transit, the connection will fail.
Connection resilience
When running TCP+TLS, a corruption of a single bit is usually enough to terminate the connection. QUIC on the other hand is able to detect and discard a corrupted packet, and continue processing non-corrupted packets. Once the handshake is completed, it is practically impossible for a third party to create a QUIC packet that would cause connection termination.
Connection closure
TCP options to close a connection are limited to:
- closing the sending side of the connection
- resetting the connection
This works well enough in many cases, but doesn't work when one peer needs to send a message and abruptly close a connection where the other peer is actively sending. In that case, the connection will usually be reset, and the final message will be lost.
QUIC separates closing the stream from closing the connection. For closing the stream, it offers the following options:
- closing the sending side of the stream
- resetting the sending side of the stream
- notifying the peer of closing the receiving side of the stream
And for closing the connection:
- closing the connection with an error message
- resetting the connection, used only when the peer is sending over a connection that no longer exists
- timing out after a negotiated period of inactivity
Congestion control
TCP only offers limited information to the congestion controller:
- last acknowledged sequence number is always available
- optionally, the endpoints can negotiate support for selective acknowledgements (SACKs) to acknowledge data received out of order (supported by most implementations). SACKs can be reneged, i.e. an endpoint can request retransmission of data it previously acknowledged.
- optionally, the endpoints can support timestamps (only supported by some implementations) to indicate the order in which packets were transmitted.
- optionally, the endpoints can negotiate support of ECN. This has to be supported by the devices on path, and there used to be bugs that prevented its adoption.
- QUIC acknowledges packets, not sequence numbers. This way when a packet is retransmitted and later acknowledged, it is clear if the acknowledgement applies to the original packet, to the retransmitted one, or both.
- Packets are always acknowledged, even in the presence of packet loss
- Packets cannot be reneged - once acknowledged, the data may not be discarded
- Acknowledgements contain timing information - it is always clear if an acknowledgement was delayed by the sender and by how much
- ECN support is detected, and ECN information is only used when no bugs are detected.
Handshake improvements
DoS prevention: when a TCP server deals with a flood of TCP SYN packets, it starts sending SYN cookies. They permit the server to defer allocating state for a connection until the client address is confirmed. However, the SYN cookies lose information about TCP extensions present in the SYN packet, like MSS or TCP window scale.
When a QUIC server deals with a flood of initial packets, it starts sending retry packets. They also permit the server to defer allocating state for a connection until the client address is confirmed. They do not lose any information, but they cost one round trip time.
Timing improvements: TCP + TLS handshake costs at least 1 RTT (1 RTT for TCP, 0 RTT for TLS 1.3); QUIC can send data in the first packet making it true 0-RTT.
MTU validation: QUIC sends 1200-byte datagrams during handshake, validating that the path supports this datagram size.
Path migration
A TCP connection is initiated between 2 given addresses. Changing any of the addresses requires establishing a new connection and, in case of TLS, performing a new handshake.
A QUIC connection is established between 2 given addresses. Changing the client address can be performed any time without affecting connection state, but requires path validation to remove the anti-amplification limit. Changing the server address has the same requirements as with TLS.