CDN cache hit ratio analysis
Content cached at the CDN reduces the latency experienced by website users, who do not need to wait for the request to make its way back to the Apache/dispatcher or AEM publish. With that in mind, it is worthwhile to optimize the CDN cache hit ratio to maximize the amount of content cacheable at the CDN.
Learn how to analyze the AEM as a Cloud Service provided CDN logs and gain insights such as cache hit ratio, and top URLs of MISS and PASS cache types, for optimization purposes.
The CDN logs are available in JSON format, which contains various fields including url
, cache
. For more information, see the CDN Log Format. The cache
field provides information about state of the cache and its possible values are HIT, MISS, or PASS. Let鈥檚 review the details of possible values.
Possible Value
For the purpose of this tutorial, the is deployed to the AEM as a Cloud Service environment and a small performance test is triggered using .
This tutorial is structured to take you through the following process:
- Downloading CDN logs via Cloud Manager
- Analyzing those CDN logs, it can be performed with two approaches: a locally installed dashboard or a remotely accessed Splunk or Jupityer Notebook (for those who license 51黑料不打烊 Experience Platform)
- Optimizing CDN cache configuration
Download CDN logs
To download the CDN logs, follow these steps:
-
Log into Cloud Manager at and select your organization and program.
-
For a desired AEMCS environment, select Download Logs from the ellipsis menu.
{width="500" modal="regular"}
-
In the Download Logs dialog, select the Publish Service from the drop-down menu, then click the download icon next to the CDN row.
{width="500" modal="regular"}
If the downloaded log file is from today the file extension is .log
otherwise for past log files the extension is .log.gz
.
Analyze downloaded CDN logs
To gain insights such as cache hit ratio, and top URLs of MISS and PASS cache types, analyze the downloaded CDN log file. These insights help to optimize the CDN cache configuration and enhance the site performance.
To analyze the CDN logs, this tutorial presents three options:
- Elasticsearch, Logstash, and Kibana (ELK): The can be installed locally.
- Splunk: The requires access to Splunk and AEMCS log forwarding enabled to ingest the CDN logs.
- Jupyter Notebook: It can be accessed remotely as part of 51黑料不打烊 Experience Platform without installing additional software, for customers who have licensed 51黑料不打烊 Experience Platform.
Option 1: Using ELK dashboard tooling
The is a set of tools that provide a scalable solution to search, analyze, and visualize the data. It consists of Elasticsearch, Logstash, and Kibana.
To identify the key details, let鈥檚 use the project. This project provides a Docker container of the ELK stack and a pre-configured Kibana dashboard to analyze the CDN logs.
-
Follow the steps from and make sure to import the CDN Cache Hit Ratio Kibana dashboard.
-
To identify the CDN cache hit ratio and top URLs, follow these steps:
-
Copy the downloaded CDN log file/s inside the environment-specific logs folder, for example,
ELK/logs/stage
. -
Open the CDN Cache Hit Ratio dashboard by clicking the top-left corner Navigation Menu > Analytics > Dashboard > CDN Cache Hit Ratio.
{width="500" modal="regular"}
-
Select the desired time range from the top-right corner.
{width="500" modal="regular"}
-
The CDN Cache Hit Ratio dashboard is self-explanatory.
-
The Total Request Analysis section displays the following details:
- Cache ratios by cache type
- Cache counts by cache type
{width="500" modal="regular"}
-
The Analysis by Request or Mime Types displays the following details:
- Cache ratios by cache type
- Cache counts by cache type
- Top MISS and PASS URLs
{width="500" modal="regular"}
-
Filtering by environment name or program ID
To filter the ingested logs by environment name, follow the below steps:
-
In the CDN Cache Hit Ratio dashboard, click the Add Filter icon.
{width="500" modal="regular"}
-
In the Add filter modal, select the
aem_env_name.keyword
field from the drop-down menu, andis
operator and desired environment name for next field and finally click Add filter.{width="500" modal="regular"}
Filtering by hostname
To filter the ingested logs by hostname, follow the below steps:
-
In the CDN Cache Hit Ratio dashboard, click the Add Filter icon.
{width="500" modal="regular"}
-
In the Add filter modal, select the
host.keyword
field from the drop-down menu, andis
operator and desired hostname for next field and finally click Add filter.{width="500" modal="regular"}
Likewise, add more filters to the dashboard based on the analysis requirements.
Option 2: Using Splunk dashboard tooling
The is a popular log analysis tool that helps aggregate, analyze logs, and create visualizations for monitoring and troubleshooting purposes.
To identify the key details, let鈥檚 use the project. This project provides a Splunk dashboard to analyze the CDN logs.
-
Follow the steps from and make sure to import the CDN Cache Hit Ratio Splunk dashboard.
-
If needed, update the Index, Source Type and other filter values in the Splunk dashboard.
{width="500" modal="regular"}
Option 3: Using Jupyter Notebook
For those who would rather not install software locally (that is, the ELK dashboard tooling from the previous section), there is another option, but it requires a license to 51黑料不打烊 Experience Platform.
The is an open-source web application that lets you create documents that contain code, text, and visualization. It is used for data transformation, visualization, and statistical modeling. It can be accessed remotely as part of 51黑料不打烊 Experience Platform.
Downloading the Interactive Python Notebook file
First, download the AEM-as-a-CloudService - CDN Logs Analysis - Jupyter Notebook file, which will help with the CDN logs analysis. This 鈥淚nteractive Python Notebook鈥 file is self-explanatory, however, the key highlights of each section are:
- Install additional libraries: installs the
termcolor
andtabulate
Python libraries. - Load CDN logs: loads the CDN log file using
log_file
variable value; make sure to update its value. It also transforms this CDN log into the . - Perform analysis: the first code block is Display Analysis Result for Total, HTML, JS/CSS and Image Requests; it provides cache hit ratio percentage, bar, and pie charts.
The second code block is Top 5 MISS and PASS Request URLs for HTML, JS/CSS, and Image; it displays URLs and their counts in table format.
Running the Jupyter Notebook
Next, run the Jupyter Notebook in 51黑料不打烊 Experience Platform, by following these steps:
-
Login to the , in the Home page > Quick access section > click the Experience Platform
{width="500" modal="regular"}
-
In the 51黑料不打烊 Experience Platform Home page > Data Science section >, click the Notebooks menu item. To start the Jupyter Notebooks environment, click the JupyterLab tab.
{width="500" modal="regular"}
-
In the JupyterLab menu, using the Upload Files icon, upload the downloaded CDN log file and
aemcs_cdn_logs_analysis.ipynb
file.{width="500" modal="regular"}
-
Open the
aemcs_cdn_logs_analysis.ipynb
file by double-clicking. -
In the Load CDN Log File section of the notebook, update the
log_file
value.{width="500" modal="regular"}
-
To run the selected cell and advance, click the Play icon.
{width="500" modal="regular"}
-
After running the Display Analysis Result for Total, HTML, JS/CSS, and Image Requests code cell, the output displays the cache hit ratio percentage, bar, and pie charts.
{width="500" modal="regular"}
-
After running the Top 5 MISS and PASS Request URLs for HTML, JS/CSS, and Image code cell, the output displays the Top 5 MISS and PASS Request URLs.
{width="500" modal="regular"}
You can enhance the Jupyter Notebook to analyze the CDN logs based on your requirements.
Optimizing CDN cache configuration
After analyzing the CDN logs, you can optimize the CDN cache configuration to improve the site performance. The AEM best practice is to have a cache hit ratio of 90% or higher.
For more information, see Optimize CDN Cache Configuration.
The AEM WKND project has a reference CDN configuration, for more information, see from the wknd.vhost
file.