Optimising Website Performance on Apache with Cache Control Settings

When tuning a web site running on Apache for optimal performance there are two places to look. The first is picking the right combination of settings for the muli-processor module (MPM) you have selected and the second is tweaking the various settings that affect the caching and compression of web application content.

Optimising Apache Multi-processor Module Settings

The first strategy involves deciding whether to use the pre-fork or worker MPM and tweaking the setting relating to min, max and number of spare threads/processes. The optimal settings for these directives will depend on the server/application work load profile, such as large number of short-lived requests as is common with Web 2.0 sites or a lower number of long-running requests,  and have a direct relationship with other settings such as KeepAliveTimeOuts.  

This brief post will focus on the 2nd strategy for tweaking your web application performance on Apache, namely settings affecting the caching of your web content.

Tweaking Cache Control Settings on Apache

HTTP1.1 has a rich mechanism for controlling caching that can greatly affect the performance of your application and lessen the load on your server. The essential problem is letting web browser caches and proxy caches know what can, and what can't, be cached. The more that can be cached, and therefore served to the client without contacting the web server for the content, the less your server needs to do and the quicker the user can get a response to their requests, so everyone wins!

The trick, of course, is determining what to cache, since changes made to content that has been cached will not been seen by those users getting cached content returned. The new content will only be visiable once their cache expired. Sorting out what can and can't, be cached is not as difficult as it sounds. Most web applications and frameworks concentrate on the cache setting that affect the dynamic creation of pages i.e the html that get sent to browsers.  It is up to the web developers to take responsibility for setting the appropriate headers in code when it comes to the caching of text/html pages of their applications. 

The other assets used on the page, such as images, css and javascript files usually have no programmatic way of having their HTTP1.1 headers, that affect caching set and usually are not optimised by web designers and application developers. Luckily Apache allows for the web server administrator to set reasonable caching headers for these objects. These settings can be set on a server, virtual hosts, application or even directory level so you can tweak to your hearts content :).

HTTP1.1 Headers Affecting Caching

HTTP1.1 aims to reduce unnecessary round trips by letting the cache know if content can be cached and how subsequent requests for the cached content should be handled.

There are two approaches to returning cached content. One can let the cache make the decision alone by examining expiration dates of the asset and returning content if it is "fresh" or by making the cache revalidated the object with the origin server before sending it to the requesting client. Revalidating is much faster than requesting the object as the cache performs a simple "ping" to determine if the content has changed or not. When content has expired or is considered stale, validation can be used as a first measure to determine if the content needs to be fetched again or not.

There headers affecting caching of web content are:

  • ETag - A E(ntity)Tag is a unique identifier for a resource served up by the web server. Every response with an ETag header allows the cache to store the content  and to "ping" the web server on any subsequent request for the resources to determine if it's cache content is still valid. If not, the server responses with a 304 Not-Modifed-Status which confirms the asset can be served from the cache. (ETag are like version numbers for URLs.) This is much more efficient than sending the content down the wire again.
  • Expires - The Expires headers informs the cache when the content should be considered stale and refreshed from the origin server. When a request for a resource with an expires header is received the cache can respond without contacting the web server at all, if the resource has not expired. This eliminates the check "ping" that is needed for the ETag header above, but it also means that if your asset has changed before the expiration date users will see the old content until the date expires. Frequently this header is given a date far in the past to force the asset to not be cached rather than for indicating how long it can be cached or to force the cache to revalidate before returning cached content.
  • Cache-Control - This header was introduced in HTTP1.1 and has several directives that affect caching. They are:
    • Max-Age - which, like the expires header, informs the cache how long the object is valid for. This is specified as a period of time while the expires header is expressed as a date. Max-Age overrides Expires if there is a conflict
    • public - means the response can be cached by a proxy server and served to any browsers requesting the resources i.e public pages,
    • private - allows for the caching of pages for a specific user. Think of your Facebook profile page. Only you should see it. This directive would allow for it to be cached but only be accessable to the specific user. i.e your browser cache.
    • no-cache - oddly means the resource can be cached but must be checked for freshness before being served up. Its like ETag
    • no-store - Do not cache this resource at all. 
    • must-revalidate - The content can be cached but expires/max-age must be respected. In some cases caches are allowed to server stale content and this tells the cache never to do that.
    • others .. the complete list can be found at the W3.org web site
  • Last-Modified- This header indicates the last date the resource was modified, or at least when the server thinks it was last modified. The cache can use this to request the resource from the origin server if the resource has changed since this date. If not then the origin server response with a 304 status, as with ETag, and the resource can be served from the cache. ETag are newer and often both are specified in headers. ETags are usually generated by a combination of date time stamp, file inode number etc which means that the same resource will have a different if it is served from a server farm in a load-balanced scenario, meaning it is pretty useless for caching content server and Last-Modified should be used instead in this case. ETags is safe for shared hosting setups/single server.

Apache Directives Affecting HTTP Headers

It is clear from the above that there are multiple ways of trying to obtain the caching you desire. Either you want the cache to serve up content until it expires or you want it to check with you first before serving it up just in case it has changed. Some of the duplication over the header setting is due to cache-control being introduced in HTTP 1.1 which is supposed to replace some of the older headers, however both co-exist today.

As with most Apache directives the following can be "filtered" with regular expressions or by placing them in restricted containers to affect only selected resources. The example we give here can be applied at a server, host or directory level.

The modules that affect cach setting in Apache are:

  • Expires- This modules set the Expires header on content selected for the setting. It is kind enough to also set the cache-control, max-age directive as well so there should never be any conflict between the two. There are several options with the expires module whereby it can be set only on selected mime-types or on all content with exceptions. Basically an optimistic or conservative approach to the expires header. Being too optimistic can be bad if you end up caching content that shouldn't be. The two most used directives from this module are: (see here for more info)
    • ExpiresDefault "access plus 1 month"
    • ExpiresByType image/png "access plus 1 month 15 days 2 hours", or ExpiresByType image/jpg M604800
  • Headers - This module can be used to set the remaining cache-contorl directives  using environment variables to control its application to content eg:(see here for more info)
    •  Header merge Cache-Control no-cache 
  • Core (FileEtag) - This will enable ETag directive. The options allow one to select which elements go into generating the ETag identifier. Usually it involved the inode and modification date but you can come up with an algorithm which would identify the item uniquely, even across server farms. eg: (see here for more info)
    • FileETag All

Apache Cache Settings Example

FileETag All

ExpiresActive On
ExpiresDefault A120
ExpiresByType image/gif "access plus 1 day"
ExpiresByType image/jpeg "access plus 1 day"
ExpiresByType image/jpg "access plus 1 day"
ExpiresByType image/png  "access plus 1 day"
ExpiresByType application/x-shockwave-flash "access plus 10 day"
 
SetEnvIf Request_URI \.gif image-request
SetEnvIf Request_URI \.jpg image-request
SetEnvIf Request_URI \.jpeg image-request
SetEnvIf Request_URI \.png image-request
Header merge Cache-Control public env=image-request
 
These directives can appear in the server,host, directory or .htaccess context of your Apache configuration file, allowing for fine grain control when combined with additional scope restriction via environment variables or matching mime types

Faster Web Apps, Happier Users and Less Resources Used!

Well that the basics of setting your HTTP headers for cache-control. You can test the result of your effort by using firebug in Firefox or just entering ctrl-shift-j in Chrome. Have a look at the header sent with images before and after the addition of caching directives. Check the response time before and after too!

We offer Apache & Tomcat training.