When tuning a web site running on Apache for optimal performance there are two places to look. The first is picking the right combination of settings for the muli-processor module (MPM) you have selected and the second is tweaking the various settings that affect the caching and compression of web application content.
Optimising Apache Multi-processor Module Settings
The first strategy involves deciding whether to use the pre-fork or worker MPM and tweaking the setting relating to min, max and number of spare threads/processes. The optimal settings for these directives will depend on the server/application work load profile, such as large number of short-lived requests as is common with Web 2.0 sites or a lower number of long-running requests, and have a direct relationship with other settings such as KeepAliveTimeOuts.
This brief post will focus on the 2nd strategy for tweaking your web application performance on Apache, namely settings affecting the caching of your web content.
Tweaking Cache Control Settings on Apache
HTTP1.1 has a rich mechanism for controlling caching that can greatly affect the performance of your application and lessen the load on your server. The essential problem is letting web browser caches and proxy caches know what can, and what can't, be cached. The more that can be cached, and therefore served to the client without contacting the web server for the content, the less your server needs to do and the quicker the user can get a response to their requests, so everyone wins!
The trick, of course, is determining what to cache, since changes made to content that has been cached will not been seen by those users getting cached content returned. The new content will only be visiable once their cache expired. Sorting out what can and can't, be cached is not as difficult as it sounds. Most web applications and frameworks concentrate on the cache setting that affect the dynamic creation of pages i.e the html that get sent to browsers. It is up to the web developers to take responsibility for setting the appropriate headers in code when it comes to the caching of text/html pages of their applications.
HTTP1.1 Headers Affecting Caching
HTTP1.1 aims to reduce unnecessary round trips by letting the cache know if content can be cached and how subsequent requests for the cached content should be handled.
There are two approaches to returning cached content. One can let the cache make the decision alone by examining expiration dates of the asset and returning content if it is "fresh" or by making the cache revalidated the object with the origin server before sending it to the requesting client. Revalidating is much faster than requesting the object as the cache performs a simple "ping" to determine if the content has changed or not. When content has expired or is considered stale, validation can be used as a first measure to determine if the content needs to be fetched again or not.
There headers affecting caching of web content are:
- ETag - A E(ntity)Tag is a unique identifier for a resource served up by the web server. Every response with an ETag header allows the cache to store the content and to "ping" the web server on any subsequent request for the resources to determine if it's cache content is still valid. If not, the server responses with a 304 Not-Modifed-Status which confirms the asset can be served from the cache. (ETag are like version numbers for URLs.) This is much more efficient than sending the content down the wire again.
- Expires - The Expires headers informs the cache when the content should be considered stale and refreshed from the origin server. When a request for a resource with an expires header is received the cache can respond without contacting the web server at all, if the resource has not expired. This eliminates the check "ping" that is needed for the ETag header above, but it also means that if your asset has changed before the expiration date users will see the old content until the date expires. Frequently this header is given a date far in the past to force the asset to not be cached rather than for indicating how long it can be cached or to force the cache to revalidate before returning cached content.
- Cache-Control - This header was introduced in HTTP1.1 and has several directives that affect caching. They are:
- Max-Age - which, like the expires header, informs the cache how long the object is valid for. This is specified as a period of time while the expires header is expressed as a date. Max-Age overrides Expires if there is a conflict
- public - means the response can be cached by a proxy server and served to any browsers requesting the resources i.e public pages,
- private - allows for the caching of pages for a specific user. Think of your Facebook profile page. Only you should see it. This directive would allow for it to be cached but only be accessible to the specific user. i.e your browser cache.
- no-cache - oddly means the resource can be cached but must be checked for freshness before being served up. Its like ETag
- no-store - Do not cache this resource at all.
- must-revalidate - The content can be cached but expires/max-age must be respected. In some cases caches are allowed to server stale content and this tells the cache never to do that.
- others .. the complete list can be found at the W3.org web site
- Last-Modified- This header indicates the last date the resource was modified, or at least when the server thinks it was last modified. The cache can use this to request the resource from the origin server if the resource has changed since this date. If not then the origin server response with a 304 status, as with ETag, and the resource can be served from the cache. ETag are newer and often both are specified in headers. ETags are usually generated by a combination of date time stamp, file inode number etc which means that the same resource will have a different if it is served from a server farm in a load-balanced scenario, meaning it is pretty useless for caching content server and Last-Modified should be used instead in this case. ETags is safe for shared hosting setups/single server.
Apache Directives Affecting HTTP Headers
It is clear from the above that there are multiple ways of trying to obtain the caching you desire. Either you want the cache to serve up content until it expires or you want it to check with you first before serving it up just in case it has changed. Some of the duplication over the header setting is due to cache-control being introduced in HTTP 1.1 which is supposed to replace some of the older headers, however both co-exist today.
As with most Apache directives the following can be "filtered" with regular expressions or by placing them in restricted containers to affect only selected resources. The example we give here can be applied at a server, host or directory level.
The modules that affect cache setting in Apache are:
- Expires- This modules set the Expires header on content selected for the setting. It is kind enough to also set the cache-control, max-age directive as well so there should never be any conflict between the two. There are several options with the expires module whereby it can be set only on selected mime-types or on all content with exceptions. Basically an optimistic or conservative approach to the expires header. Being too optimistic can be bad if you end up caching content that shouldn't be. The two most used directives from this module are: (see here for more info)
- ExpiresDefault "access plus 1 month"
- ExpiresByType image/png "access plus 1 month 15 days 2 hours", or ExpiresByType image/jpg M604800
- Headers - This module can be used to set the remaining cache-contorl directives using environment variables to control its application to content eg:(see here for more info)
- Header merge Cache-Control no-cache
- Core (FileEtag) - This will enable ETag directive. The options allow one to select which elements go into generating the ETag identifier. Usually it involved the inode and modification date but you can come up with an algorithm which would identify the item uniquely, even across server farms. eg: (see here for more info)
- FileETag All
Apache Cache Settings Example
Faster Web Apps, Happier Users and Less Resources Used!
Well that the basics of setting your HTTP headers for cache-control. You can test the result of your effort by using firebug in Firefox or just entering ctrl-shift-j in Chrome. Have a look at the header sent with images before and after the addition of caching directives. Check the response time before and after too!
We offer Apache & Tomcat training.