Gzip Performance for Go Webservers

In our previous performance test, we looked at high throughput compression, suited for log files, map-reduce data, etc. This post will deal with webserver performance, where we will investigate which gzip is best suited for a Go webserver.

Update (18 Sept 2015): Level 1 compression has been updated to a faster version, so the difference to deflate level 1 has gotten smaller. The benchmark sheet has been updated, so some of these numbers may be inaccurate.

First of all, a good deal of you might not want to use gzip at your backend level at all. A common setup is to use a reverse proxy like nginx in front of your web service, that handles compression and TLS. This has the advantage that you are handling compression in one place, and usually the reverse proxy has better performance than Go. The disadvantage is that it can place a heavy load on your reverse proxy, and usually it is a lot easier to scale the number of backend application servers than it is to scale the number of reverse proxies.

For this test I have compiled a webserver test-set. It is rather small and contains a number of html pages, css, minified javascript, svg and json files. These filetypes are selected on purpose since they are good compression targets. I have not included precompressed files like jpeg, png, mp3 or similar formats, since compressing these types are just a waste of CPU cycles. I would expect you to not attempt to compress these types of content, so that is why these are not included in the test-set.

I have added the results to the same sheet as the previous results. To see them switch to the “Web content” tab.

Click to open the results.
Click to open the results.

 

In this benchmark the test set is run 50 times, which gives a total of 27,400 files, with a total of 240,800,700 bytes for each test with an average size of a little more than 8KB. Source code of the test program. It is important to note that the benchmark is running sequentially, so only one core is every used. That means you can multiply the files per second by the number of physical cores in your server.

The standard library gzip has a peak single cores performance of 2664 files/second. Performance doesn’t degrade significantly until after level 4, but compression improves a little for each step. Based on this test set I would not recommend using it at any level bigger than 5, since there are only very small gains.

The pgzip package is confirmed to be bad at these file sizes. As the documentation also states, do not use pgzip for a webserver, only for big files.

 

image (1)
Chart showing speed upwards and total compression ratio horizontally

 

My modified gzip package, is consistently around 1.6 times faster than the standard library. At levels below 5 it has better compression. At higher levels it has about 0.5% lower compression at similar level to the standard library. If this is a concern, use a higher compression level, which gives both better compression and speed. Below level 3 there is little speedup, and above compression level 5 there is no big benefit, so level 3, 4 or 5 seems to be a sweet spot.

Update: The new “Snappy” level 1 compression, offers the best speed available (2.2x the standard library), but at a slight compression loss. This seems like a very good option for bigger load servers.

The cgo cgzip package performs really well here. Except level 1, it has around 30-50% additional throughput compared to the modified library, with the best compression at levels 4+. At level 1 it is clearly the fastest, with only a 0.5% deficit in compression size. It seems like levels 2 to 5 are the best options. So cgzip remains the best option for webservers, if cgo/unsafe is an option for you, but the difference to a clean Go solution has been minimized.

Re-using Gzip Writers

Recycle Reduce Reuse by Kevin Dooley

An important performance thing to consider is that there is a considerable overhead when creating a new Writer for your compressed content. When dealing with small data sizes like this, it can significantly eat into your performance.

Let’s look at the relevant code for a very simple http wrapper, taken from here.


// Wrap a http.Handler to support transparent gzip encoding.
func NewHandler(h http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		[...]
		gz := gzip.NewWriter(w)
		defer gz.Close()
		h.ServeHTTP(&gzipResponseWriter{Writer: gz, ResponseWriter: w}, r)
	})
}

The important aspect here is that a Writer is created on EVERY request. In some simple tests this reduces throughput of approximately 35%. A very simple way to avoid this is to re-use Writers. This can however be tricky, since each web request is on a separate goroutine, and we want multiple to run at the same time.

This is however a very good use of a sync.Pool, so you can re-use Writers between goroutines without any pain. Here is how the modified handler looks:


// Create a Pool that contains previously used Writers and
// can create new ones if we run out.
var zippers = sync.Pool{New: func() interface{} {
	return gzip.NewWriter(nil)
}}

// Wrap a http.Handler to support transparent gzip encoding.
func NewHandler(h http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		[...]

		// Get a Writer from the Pool
		gz := zippers.Get().(*gzip.Writer)

		// When done, put the Writer back in to the Pool
		defer zippers.Put(gz)

		// We use Reset to set the writer we want to use.
		gz.Reset(w)
		defer gz.Close()

		h.ServeHTTP(&gzipResponseWriter{Writer: gz, ResponseWriter: w}, r)
	})
}

This is a simple change and in web server workloads it will give you much better performance.

Edit: For a more complete package, take a look at the httpgzip package by Michael Cross.

Which types should I compress?

As we have covered earlier, simply gzipping all content is a waste of CPU cycles both on the server and the client, so spotting the correct content to gzip is important. The easiest way to spot compressible content is to look at the mime-types of your content.

Here is a small whitelist that is safe to start with. You can add more, if you have specific types you use on your server.

application/javascript
application/json
application/x-javascript
application/x-tar
image/svg+xml
text/css
text/html
text/plain
text/xml

 

Precompress your input

Prepare. to Fail.... by thinboyfatter

While I risk stating the obvious, the best way to avoid compression cost on your server is to do it in advance on static content.

Static content can be pre-compressed, and usually when creating a static file server, I add functionality where you can place a .gz file alongside the uncompressed input, which can then be served directly from disk.

  1. Client requests “/static/file.js”
  2. Server checks if client accepts gzip encoding
  3. If so, see if there is a “/static/file.js.gz”
  4. If so, set Content-Encoding and serve the gzipped file directly from disk

All stages should of course have appropriate fallbacks. For pre-compressing your static content I can recommend the 7-zip compressor, which produces slightly smaller files with maximum compression settings.

In the final part I will show a near constant time compressor, that heavily trades compression for more predictable speed.

Flattr this!

  • In your final example with sync.Pool, you defer go.Close and then put gz back into the pool. Won’t this have the effect of putting a closed gz back in the pool, resulting in an error the next time another goroutine attempts to use it?

    • Probably not since Reset is called on the retrieved gz.

      • Klaus Post

        It does create a race, since the Writer could be handed to another go-routine before Close is finished, so it really needs to be fixed.

        • Doesn’t gz.Close() block before zippers.Put(gz) is run?

    • Klaus Post

      Thanks for noticing! I have fixed the sample.

  • Dmitriy Domashevskiy

    For pre-compressing your static content i would recommend https://github.com/google/zopfli

    • Klaus Post

      Yes, that is very good. And is almost consistently the smallest.

      There are also some older, and usually less efficient: “AdvanceCOMP Deflate Compression Utility” (advdef), based on 7-zip library. “DeflOpt”: rather good, no source available AFAICT. “zRecompress”,also based on 7zip AFAICT.

    • Klaus Post

      Very nice, although it is fixed at “BestCompression”. It would be nice if that value was exposed as a variable, so it could be changed at initialization.

  • Michael Cross

    Klaus, I’m a bit late to the party but thank you for this blog post. I’d not properly seen the utility of sync.Pool before this.

    I went looking for an “on the fly” gzip http.Handler for my own web sites on Godoc but I found the existing solutions, while mostly working, all had one or more of the following deficiencies:

    – Did not use sync.Pool as per your recommendation
    – Did not properly parse the Accept-Encoding header as per RFC 2616
    – Did not handle the Range header correctly (so corrupted resuming partial downloads)
    – Compressed 0 byte bodies to > 0 bytes
    – Compressed already compressed data
    – Did not correctly set Content-Type if the wrapped handler omitted doing so
    – Did not allow setting compression level
    – Did not correctly handle the wrapped handler setting Content-Encoding itself

    Hence I’ve just had a stab at making one myself that attempts to fix all the above. It’s at https://xi2.org/x/httpgzip. Seeing at it was inspired by your blog post if you want an AUTHOR credit let me know!

    • Klaus Post

      Really nice!

      Great to have a good solid implementation of a http compression handler.

      Is there any reason you are not using my gzip compressor? It should increase throughput of ~1.7 to 2 times?

      You are of course welcome to put in a note. If it is alright with you, I will also put your library in the main post.

      • Michael Cross

        Thanks Klaus. I have added your gzip implementation as a build tag (go install -tags klauspost). That makes it nice and easy to use if you want the speed but keeps the stdlib as the default for now.

        It was also a good idea to add your gzip package since it showed up a problem with my tests. They were testing response sizes to determine gzipped status, which can vary of course from implementation to implementation. Now fixed and done properly!

        I’ve credited you and linked back to this blog on the Godoc page for httpgzip. Feel free to link to my package in the main post if you feel it’s of interest.

        • Klaus Post

          I don’t personally like build tags, since it requires a specific build method for users of your library. Of course it is your library, so it is your choice.

          Another question, if you don’t mind. Is there a reason for choosing GPL? It will keep many from being able to use it, since they would be forced to re-license to GPL.