HTTP Compression - boost your server’s speed

Improve latency and cost by enabling web-server compression

Published in

ITNEXT

7 min readMar 20, 2021

We are all working really hard for making an impact and once in a while there’s an opportunity for making a HUGE impact by a really small effort. After reading this post you’ll be able to make a fast impact on your web-servers and improve your site’s performance!

REST API (a subset of HTTP) is one of the most popular APIs. If you are working in a SaaS company you probably use it as an external API for communicating with your customers and you might also use it internally across your services. At Dynamic Yield we are serving thousands of HTTP requests per second and we strive for easy wins such as latency improvements and cost optimizations. Compression is one of the ways that helped us by simply reducing the bytes on the wire.

Most of the modern web browsers today support compressions such as gzip and Brotli out of the box and asking for it by default.

It turns out that most of the web-server frameworks don’t respect “Accept-Encoding: gzip” header by default and we need to actively enable this compression feature. We will cover few frameworks below.

Enabling compression can help for better latency and cost reduction as fewer bytes are being transferred over the network. Here is what we get:

Lower latency internally, measured in AWS Load-Balancer by Target Response Time metric.
Lower round-trip time as fewer bytes need to be transferred to the end-user over the internet.
Cost reduction: Cloud providers charge for data transfer IN/OUT/across availability zones etc. fewer bytes means that you save more money.

If you google for the numbers that every programmer should know you’ll find that compressing data is much faster than sending packets over the network. That implies that compressing the data will save latency.

Processed Bytes improvement while applying compression (Image by author)

Response Time improvement while applying compression (Image by author)

You can see in the graphs above that we are processing ~80% fewer bytes per minute after enabling the compression and our latency was improved by ~15%!

Real-World Servers Examples

Let see several examples in different frameworks/languages:

Tornado (Python)

import tornado.ioloop
import tornado.web

class MainHandler(tornado.web.RequestHandler):
    def get(self):
        self.write("x" * 1024)

def make_app():
    return tornado.web.Application([
        (r"/", MainHandler),
    ], compress_response=True)

if __name__ == "__main__":
    app = make_app()
    app.listen(8888)
    tornado.ioloop.IOLoop.current().start()

Tornado (tested on version 6.1) is a Python web framework and asynchronous networking library. The snippet above was copied from their “Hello, world” example with slight changes:

Tornado applies compression for a content length of 1024 and above. Thus, the “Hello, world” string replaced by “x” * 1024 for this demonstration.
compress_response=True was added to the Application’s constructor.

Now let’s test it! first, let’s use curl to send a request without compression:

~ $ curl -vs http://127.0.0.1:8888
> GET / HTTP/1.1
> Host: 127.0.0.1:8888
>
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=UTF-8
< Content-Length: 1024
< Vary: Accept-Encoding
<
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Now let’s add the compressed flag (to simulate a web browser):

~ $ curl -vs --compressed http://127.0.0.1:8888
> GET / HTTP/1.1
> Host: 127.0.0.1:8888
> Accept-Encoding: deflate, gzip
>
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=UTF-8
< Content-Length: 29
< Vary: Accept-Encoding
< Content-Encoding: gzip
<
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

As you can see above, the “Accept-Encoding: deflate, gzip” header was added to the request and our server respond with “Content-Encoding: gzip” header. As a result, our content was compressed to “Content-Length: 29” (compared to “Content-Length: 1024” in the first request).

Aiohttp (Python)

AIOHTTP (tested on version 3.7.4) is an Asynchronous HTTP Server for asyncio. The snippet below was copied from their Server example with a minor change; we added a compression middleware to compress the response before sending it back to the user:

from aiohttp import web

@web.middleware
async def compression_middleware(request, handler):
    response = await handler(request)
    response.enable_compression()
    return response

async def handle(request):
    return web.Response(text="x" * 128)

app = web.Application(middlewares=[compression_middleware])
app.add_routes([web.get('/', handle)])

if __name__ == '__main__':
    web.run_app(app)

Note that there’s no minimum content length here and each request will be compressed (you can, however, apply a minimum length rule by adding a condition in the compression_middleware function above).

Express (node.js)

Express (tested on version 4.17.1) is a fast, unopinionated, minimalist web framework for node.js. The snippet below was copied from the hello-world example and adjusted to use the compression middleware. Enabling compression is recommended on express’s production best practices page.

const compression = require('compression')
const express = require('express')
var app = express()
app.use(compression())
const port = 3000

app.get('/', (req, res) => {
    res.send("x".repeat(1024))
})

app.listen(port, () => {
    console.log(`Example app listening at http://localhost:${port}`)
})

As we saw in the first example (Tornado), express compression has also a minimum threshold of 1024 which you can tune in the instantiation. As opposed to Tornado’s curl example above, the Content-Length header is missing in the response but we do see the Content-Encoding: gzip header:

~ $ curl -vs --compressed http://127.0.0.1:3000
> Accept-Encoding: deflate, gzip
>
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=utf-8
< Content-Encoding: gzip
<
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Ruby on Rails

Ruby on Rails (tested on version 5.2.4.5) is a server-side web application framework written in Ruby. Adding the Deflater middleware will do the magic here:

require_relative 'boot'
require 'rails/all'

Bundler.require(*Rails.groups)

module Myapp
  class Application < Rails::Application
    config.load_defaults 5.2
    config.middleware.insert_after ActionDispatch::Static, Rack::Deflater
  end
end

Adding compression middleware (Image by author)

Real-World Clients Examples

Enabling server-side compression is sufficient if your clients are web browsers. If you’re using HTTP to communicate between your other internal components, you need to explicitly ask for a compressed response from the server by adding the “Accept-Encoding: deflate, gzip” request header.

Elasticsearch Client (Python)

Elasticsearch is a distributed, RESTful search and analytics engine. If you are using Python Elasticsearch Client you can enable the compression in the client instantiation:

from elasticsearch import Elasticsearch
es = Elasticsearch(hosts, http_compress=True)

Other clients

The simplest way to ask for a compressed response is to add the “Accept-Encoding: deflate, gzip” header field to the request. Some clients are doing that by default and most of them also decompress the response automatically for you.

Back to our services after a week

A picture is worth a thousand words, so here are some screenshots of our Load Balancers:

Observations

Compression level — gzip support compression levels from 1 (minimum compression) to 9 (maximum compression). Tuning that level is a tradeoff between the time it takes to compress and the outcome size. Compression is CPU-intensive work so make sure you’re monitoring CPU utilization as well. In Elasticsearch for instance, you can configure it via http.compression_level in the HTTP settings. All other web-servers allow configuring this value as well.
Brotli is the (relatively) new kid in town, initially released in 2013. gzip on the other hand is here since the early 90’s. While a lot of benchmarks states that Brotli can improve even further than gzip, configuring Brotli on your web-server framework might be trickier and requires a bit more work.
We need to distinguish between static-content that can be compressed with a higher compression level in advance and dynamic-content like our RESTful API that generates a different response for each request and needs to be served as fast as we can.
Compression will be most effective for areas with limited bandwidth or slow internet connection. In such cases, the end-user experience will be optimal!
If you’re using CDN make sure it configured properly and supports the compression algorithm as well. If you’re using a reverse proxy (nginx for example) you might want to delegate the gzip encoding to it.
Follow REST principles and best practices and use GET whenever you want to fetch data. In GET method you’ll enjoy HTTP 304 Not Modified response code that saves data transfer which eventually saves time&money (more details here).

Summary

In this post, we saw how easy is to enable compression support for HTTP web-servers and clients. There’s a reason why all modern web browsers ask for a compressed response by default. Although it comes with a cost of higher CPU utilization, there’s nothing like a fast user experience on your website.