How to optimize the Dispatcher cache?
This article offers detailed instructions on the different ways to optimize the Dispatcher cache. It further describes the steps to听enable TTL (鈥淭ime to Live鈥 or expiration) style invalidations, disabling Dispatcher flush agents, re-fetching Dispatcher flush, among others.
Description description
Environment
51黑料不打烊 Experience Manager
Issues/Symptoms
This article focuses on the latest optimizations in the AEM Dispatcher and how to best leverage those. The AEM Dispatcher is a caching听听server designed for use with 51黑料不打烊 Experience Manager. It can be installed and run as a module within an existing web server software. At the time of writing this article, the听Dispatcher module is supported听on Apache HTTP Server, Microsoft IIS, and iPlanet.
Resolution resolution
How does Dispatcher caching work?
At the most basic level, the AEM dispatcher is a reverse proxy that works by performing caching, cache flushing and cache invalidation.
See the related links for more details on the Dispatcher:
- How the Dispatcher works and how to install it.
- Configuration options available in the Dispatcher.
- 听- note that some information in the presentation is based on old versions of the dispatcher.
- Gems webinar session on Dispatcher features, CDN usage and security.
- Gems session on newer features in Dispatcher (after v4.1.9).
Optimizing the Dispatcher cache
Here are some ways to optimize the Dispatcher cache:
-
Cache almost everything 听- This means cache any content that would be requested more than once by users.
-
Cache personalized content for different periods of time 听- If your site has personalized content then consider using听Apache Sling Dynamic Includes听in your AEM application to leverage Ajax (Asynchronous JavaScript and XML calls at the browser level), SSI (Server Side Includes at the Web Server level), and ESI (Edge-side Includes at the CDN level) to cache different parts of the page for different periods of time.
-
Never delete the Dispatcher cache on a live Dispatcher 听- If a Dispatcher is serving live content and you delete the cache, it听causes a massive flood of requests to go back to AEM.听 Due to this, the Dispatcher cache should never be deleted on a live Dispatcher.
-
Prime the cache 听- Before听deleting the Dispatcher cache, pull the Dispatcher off your load balancer, delete the cache, then听run a web crawler tool to cache files on the Dispatcher before putting it on the load balancer.
-
Cache error pages 听- Leverage the 听(Apache Web Server specific)听directive to serve error pages such as 404s from the Dispatcher cache.
-
GZip compress all file types except for those that are pre-compressed 听- In Apache Web Server,听听could be used, but make sure that听 Vary: User-Agent 听header听isn鈥檛 set.听 In Microsoft IIS, use听.
Apache configuration example (specifying only certain content types to avoid precompressed file types):
AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css text/javascript application/javascript
-
贰苍补产濒别听 听 in the /cache configuration - Serve the old cache file when AEM instances are serving errors.
-
Add听 听 to the /cache configuration - Define the number of seconds a stale, auto-invalidated resource may still be served from the cache after the last content publish event (鈥渁ctivation鈥).听 This reduces the number of requests that go back to the publish instances during a large content publishing activity such as a 鈥淭ree Activation鈥.
-
Add rules to听 听- Ignore querystring parameters that are not required or used by the application.听 This allows caching of URLs even when a querystring is present.
-
Cache the Cache-Control and Last-Modified response headers 听- Use the听 听configuration to cache the HTTP response headers听 Cache-Control 听and听 Last-Modified 听(and/or听 ETag 听header if you听are sending it from AEM).听 This helps in simplifying and optimizing caching at the CDN and browser levels.听 Caching these headers makes it so only AEM sets the headers, not the web server itself.听 Note that when you do this, then you听need to start sending the headers from your AEM application.
-
Cache content for as long as possible 听and听 reduce requests that go back to AEM 听- Optimize flush requests by enabling听refetching flush on all flush agents. See the below section titled Re-fetching Dispatcher Flush.听Or use听 /enableTTL 听and set听 Cache-Control: max-age=鈥 听header to cache files as long as possible.听 See听below听for details on this topic.
Using TTLs
As of Dispatcher version 4.1.11,听/enableTTL 1听can be set听in any file听configuration.听 This setting makes the Dispatcher respect cache expirations set in the HTTP Cache-Control response header.听 In other words, the Dispatcher will function similar to a CDN where primary form of cache invalidation occurs when files expire.听 Once you implement this and start sending听 Cache-Control: max-age=鈥 听for all responses from AEM, then you can safely disable your Dispatcher flush agents in the publish instances.
After disabling flush agents on the publish instances then you may still want to be able to flush the Dispatcher cache.听 In that case, you can use听.听 This tool is installed on the author instance.听 It gives users a UI where they can perform manual cache flush requests.
I. Steps to enable TTL (鈥淭ime to Live鈥 or expiration) style invalidations:
- Modify source code in the AEM application to send听 Cache-Control 听header and听 Last-Modified 听for all requests where it鈥檚 not already set.
- Install Dispatcher 4.1.11 or later.
- Set听 听in any farm configuration of the site.
- Set the听 听configuration to cache the听 Cache-Control 听and听 Last-Modified 听headers.
- Restart the web server.
II. Disable Dispatcher flush agents on the publish instances:
The Dispatcher will now use the Cache-Control header to control invalidation of the cache files.听 Since that is the case, then Dispatcher flushing from the publish instances is no longer required.
- Go to /etc/replication/agents.publish.html on each publish instance.
- Go to each flush agent鈥檚 configuration and disable the agent.
III. Allow manual Dispatcher flush requests from the author instance:
Now that flush agents are disabled, you would rely entirely on the听 Cache-Control 听header to control when content is refreshed on the dispatcher.听 You can听still allow users to issue manual flushes of the Dispatcher cache:
- Install听听on the author instance.
- Configure flush agents on the author instance.
- In each of the agent configurations, set听 Triggers 听=
>
听 Ignore Default 听option to enabled. This option makes the flush agents ignore when users click听 (Un)Publish 听or听 (De)Activate 听in the AEM UI.
Re-fetching Dispatcher Flush
To optimize the Dispatcher flush requests, all Dispatcher flush agents should have a feature called refetching flush enabled.
To enable re-fetching the dispatcher flush, do the following:
-
Go to听 http://aemhost:port/crx/packmgr/index.jsp 听and login as admin.
-
Download the package from听.
-
Upload and install the package to package manager.
-
Go to your Dispatcher flush agent configuration. For example听 /etc/replication/agents.author/flush.html
-
Click听 Edit
-
Set the following
- Serialization Type 听=听 Re-fetch Dispatcher Flush
- Extended 听=
>
听 HTTP Method 听=听 POST
-
Click听 Save
Note - The package installed above is just a basic example.听 To customize and optimize re-fetching flush you can modify the list of URIs that it sends.听 The code is open source and can be found听.听 The code adds a list of URIs to the request body as parameters telling Dispatcher which paths to re-fetch.听 You can add more paths per your application requirements to optimize your site鈥檚 caching capabilities.
Detailed explanation of re-fetching flush
Normally a Dispatcher flush works by deleting files:
- Touch .stat file(s)
- Delete /content/foo.*
- Delete /content/foo/_jcr_content
Due to the fact that files are deleted in step 2, the next time a user requests a file like /content/foo.html or /content/foo.json, while the file is being 鈥渞e-fetched鈥 then subsequent requests for the same file would also be sent to the publish instances until the file is cached.听 For slow responses or heavy traffic pages such as home pages this can cause flooding of the publish instance tier.
To solve this issue, enable a feature of the Dispatcher called re-fetching.听 This feature allows you to send a list of URIs that the Dispatcher should proactively 鈥渞e-fetch鈥 and replace instead deleting.
See 22:41-27:05 in this听听for a demo of how it works and how to configure it.