Skip to main content

HTTP Caching

Free2017-09-10#HTTP#HTTP cache header#HTTP cache-control#browser default cache#浏览器默认缓存#HTTP缓存策略

Why does it still send a request when a strong cache hit occurs?

I. Classification

Caching is classified by cache strength into:

  • Strong Cache: Within the validity period, resources are retrieved directly from local cache (disk cache or memory cache); outside the validity period or when forced refresh occurs, request a new copy from the server

  • Negotiated Cache: Within the validity period, same as above; outside the validity period or when forced refresh occurs, ask the server with local version number whether the resource has been updated, receive response 304 (update caching status such as expiration time, continue using local version) or 200 (cache the new version, discard local version)

Negotiated cache can be further classified into:

  • Time-based: Uses resource modification time (Last-Modified) as version number

  • Content-based: Uses resource content hash (ETag) as version number

Negotiation only happens after the cache becomes invalid (expired or deprecated)

HTTP Header fields are divided into 4 categories:

  • general-header: Applicable to both request and response messages

  • request-header: Allows client to pass additional information to server, request modifiers, functioning like parameters

  • response-header: Allows server to pass additional information about the response to client, including server-related information and information needed for future access to the resource

  • entity-header: Provides meta information about the message entity, or if there is no message entity, information about the resource corresponding to the request

P.S. For more information about HTTP Headers, see 4.2 Message Headers

Pragma

HTTP 1.0 general header field, specifies caching policy

Pragma           = "Pragma" ":" 1#pragma-directive
pragma-directive = "no-cache" | extension-pragma
extension-pragma = token [ "=" ( token | quoted-string ) ]

Pragma is an ambiguous field. RFC only specifies that when Pragma: no-cache appears in a request, even if caching is valid, it should go back to origin to fetch new content, equivalent to Cache-Control: no-cache. When appearing in a response, it has no explicit meaning

P.S. For more information about Pragma, see 14.32 Pragma

Expires

HTTP 1.0 entity header field, indicates resource expiration time, specifies expiration policy

Expires = "Expires" ":" HTTP-date

A precise point in time, before which the cache is valid. This point in time is given by the server. If client and server times are not synchronized, the cache expiration policy becomes unreliable

Since there's no guarantee that the time given by Expires corresponds to the same moment on client and server, HTTP 1.1 newly added the ability to define shelf life through Cache-Control: max-age=<seconds>, giving a time period. After seconds seconds from when the client receives the resource, the cache expires. This way it only depends on client time, without requiring consistency

Cache-Control

General header field, specifies caching policy and expiration policy

Cache-Control   = "Cache-Control" ":" 1#cache-directive
cache-directive = cache-request-directive
    | cache-response-directive
cache-extension = token [ "=" ( token | quoted-string ) ]

9 values can appear in response header:

cache-response-directive =
    ; Resource will be cached by client and proxy server
    "public"
    ; Resource is cached only by client, proxy server not allowed
    | "private" [ "=" <"> 1#field-name <"> ]
    ; Without checking back to origin first, reuse of resource is not allowed
    | "no-cache" [ "=" <"> 1#field-name <"> ]
    ; Resource is not allowed to be written to cache
    | "no-store"
    ; Proxy servers are prohibited from modifying Content-Encoding, Content-Range, Content-Type fields
    | "no-transform"
    ; Using expired resources is not allowed, once expired must validate with origin (even if client is willing to accept expired resources)
    | "must-revalidate"
    ; Depends on public, similar to must-revalidate, only applies to proxy servers
    | "proxy-revalidate"
    ; Cache resource, but cache expires after specified time (in seconds)
    | "max-age" "=" delta-seconds
    ; Depends on public, only valid on proxy servers, overrides max-age
    | "s-maxage" "=" delta-seconds
    ; Custom extension value
    | cache-extension

7 values can appear in request header:

cache-request-directive =
    ; Force origin fetch, no content from cache
    "no-cache"
    ; Do not allow client request-related information to be written to cache
    | "no-store"
    ; Client is willing to accept resource whose age (proxy server cache time) does not exceed delta seconds
    | "max-age" "=" delta-seconds
    ; Client is willing to accept stale content within delta seconds of expiration
    | "max-stale" [ "=" delta-seconds ]
    ; Client hopes response content remains valid within delta seconds
    | "min-fresh" "=" delta-seconds
    ; Client does not accept transformed content, e.g., Content-Type
    | "no-transform"
    ; Client only wants cached resources, does not request resources anew
    | "only-if-cached"
    ; Custom extension value
    | cache-extension

Note the subtle differences in descriptions of no-store, no-cache, must-revalidate, and the same field appearing in request header and response header also has different meanings

Last-Modified

Entity header field, indicates resource's last modification time, specifies negotiation policy

Last-Modified  = "Last-Modified" ":" HTTP-date

After receiving it, client will save it. Next time requesting the resource from server, it will send this point in time as version number to verify whether local cached resource is still available

If resource has been modified but content hasn't changed, sending an identical response becomes redundant, so content-based negotiated cache is also provided to avoid this situation

P.S. Lower priority than Cache-Control: max-age, when appearing together, max-age takes precedence

If-Modified-Since

Request header field, needed for time-based negotiation policy implementation, compares whether resource last modification time (Last-Modified, resource last modification time) is consistent

If-Modified-Since = "If-Modified-Since" ":" HTTP-date

Send the Last-Modified version number back to server. If resource hasn't been updated, return 304 without response body; if updated, return 200 with new version content as response body

If-Unmodified-Since

Same as above, opposite behavior (compares whether resource last modification time is inconsistent), if inconsistent and method is POST/PUT or other update operations, return 412 (Precondition Failed, precondition not met) indicating update execution failed

ETag

Response header field, indicates resource's content hash, specifies negotiation policy

ETag = "ETag" ":" entity-tag

Client will record this value, and pass it back to server as version number when requesting the resource next time

P.S. ETag has higher priority than Last-Modified

If-Match

Request header field, needed for content-based negotiation policy implementation, compares whether the value of this field (ETag, resource content hash) is consistent

If-Match = "If-Match" ":" ( "*" | 1#entity-tag )

If inconsistent, and method is POST/PUT or other update operations, return 412 indicating update failed

If-None-Match

Same as above, opposite behavior (compares whether the value of this field is inconsistent), if consistent, return 304 to tell client it can continue using cache, otherwise return new resource

Age

Response header field, indicates how long the resource has been cached on proxy server

Age = "Age" ":" age-value
age-value = delta-seconds

Calculation method:

/*
 * age_value
 *      is the value of Age: header received by the cache with
 *              this response.
 * date_value
 *      is the value of the origin server's Date: header
 * request_time
 *      is the (local) time when the cache made the request
 *              that resulted in this cached response
 * response_time
 *      is the (local) time when the cache received the
 *              response
 * now
 *      is the current (local) time
 */

apparent_age = max(0, response_time - date_value);
corrected_received_age = max(apparent_age, age_value);
response_delay = response_time - request_time;
corrected_initial_age = corrected_received_age + response_delay;
resident_time = now - response_time;
current_age   = corrected_initial_age + resident_time;

Age:0 means just fetched from origin server, positive value indicates seconds elapsed since last fetch from origin

III. Strong Cache and Negotiated Cache

Occur at different stages of caching. Strong cache is used when cache is effective, no request is sent. Negotiated cache is used after cache becomes invalid, request is sent to ask about resource updates

Strong Cache

After response content hits strong cache, within the cache validity period, browser will not initiate request to server, but directly read from local cache (disk cache or memory cache)

As long as local cache version of the resource exists, and Cache-Control: max-age or Expires hasn't expired, strong cache hit occurs

Negotiated Cache

After cache expires, when accessing the resource again, browser will carry local cached version number to ask server. Server checks ETag or Last-Modified value sent by client, and tells client whether to update cache

ETag and Last-Modified in response header are the switches for negotiated cache. The benefit of negotiated cache is that if content hasn't changed, directly return 304 without transmitting response body

IV. Heuristic Cache

A rather special case is when response header doesn't provide any cache-related information. At this point, browser uses a heuristic algorithm to determine resource cache expiration:

max-age = Date - Last-Modified / 10

Default caching policy, called heuristic cache. Heuristic means based on experience, without strict basis

V. Refresh Behavior

Browser has 3 different refresh behaviors, easily confusing when verifying HTTP cache:

  • Open new page: Open new tab or window, visit page

  • Normal refresh: Click refresh button, press Enter in address bar, CMD + R

  • Hard refresh: CMD + Shift + R, long-press refresh button in Chrome, select hard reload

  • Disable cache then refresh: Check Disable cache setting, then open new page/refresh

Open New Page

Request header doesn't carry cache-related fields. If local cache version is valid, read from cache, no request sent, and display a fake request header:

Request Headers
    Provisional headers are shown
    Upgrade-Insecure-Requests:1
    User-Agent:...

Response header reuses the cached one

Normal Refresh

Do not fetch from cache, will definitely initiate request to server. Request header will carry If-Modified-Since and If-None-Match and other cache headers (if any). Additionally, it unilaterally adds:

Cache-Control:max-age=0

Requires proxy server to check whether cache has expired

P.S. When normal refresh behavior occurs, browser will definitely initiate request, even if resource cache is still valid and should be in strong cache state. Because user requests content refresh, hopes to see new content, while associated resources (such as CSS, JS and other resources contained in the page) won't be forced to initiate requests

Hard Refresh

Similarly forces request initiation, carries cache-related information, and also unilaterally adds:

Cache-Control:max-age=0
Pragma:no-cache

Requires fetching new from origin, even if cache hasn't expired

Disable Cache Then Refresh

After disabling cache, all subsequent requests will be added with:

Cache-Control:max-age=0
Pragma:no-cache

Equivalent to all going through hard refresh, including associated resources

P.S. Specific behavior of Cache-Control:max-age=0, Pragma:no-cache depends on server implementation, in reality proxy server may not necessarily go back to origin or check expiration

References

Comments

No comments yet. Be the first to share your thoughts.

Leave a comment