I. Classification
Caching is classified by cache strength into:
-
Strong Cache: Within the validity period, resources are retrieved directly from local cache (disk cache or memory cache); outside the validity period or when forced refresh occurs, request a new copy from the server
-
Negotiated Cache: Within the validity period, same as above; outside the validity period or when forced refresh occurs, ask the server with local version number whether the resource has been updated, receive response
304(update caching status such as expiration time, continue using local version) or200(cache the new version, discard local version)
Negotiated cache can be further classified into:
-
Time-based: Uses resource modification time (Last-Modified) as version number
-
Content-based: Uses resource content hash (ETag) as version number
Negotiation only happens after the cache becomes invalid (expired or deprecated)
II. Related Header Fields
HTTP Header fields are divided into 4 categories:
-
general-header: Applicable to both request and response messages
-
request-header: Allows client to pass additional information to server, request modifiers, functioning like parameters
-
response-header: Allows server to pass additional information about the response to client, including server-related information and information needed for future access to the resource
-
entity-header: Provides meta information about the message entity, or if there is no message entity, information about the resource corresponding to the request
P.S. For more information about HTTP Headers, see 4.2 Message Headers
Pragma
HTTP 1.0 general header field, specifies caching policy
Pragma = "Pragma" ":" 1#pragma-directive
pragma-directive = "no-cache" | extension-pragma
extension-pragma = token [ "=" ( token | quoted-string ) ]
Pragma is an ambiguous field. RFC only specifies that when Pragma: no-cache appears in a request, even if caching is valid, it should go back to origin to fetch new content, equivalent to Cache-Control: no-cache. When appearing in a response, it has no explicit meaning
P.S. For more information about Pragma, see 14.32 Pragma
Expires
HTTP 1.0 entity header field, indicates resource expiration time, specifies expiration policy
Expires = "Expires" ":" HTTP-date
A precise point in time, before which the cache is valid. This point in time is given by the server. If client and server times are not synchronized, the cache expiration policy becomes unreliable
Since there's no guarantee that the time given by Expires corresponds to the same moment on client and server, HTTP 1.1 newly added the ability to define shelf life through Cache-Control: max-age=<seconds>, giving a time period. After seconds seconds from when the client receives the resource, the cache expires. This way it only depends on client time, without requiring consistency
Cache-Control
General header field, specifies caching policy and expiration policy
Cache-Control = "Cache-Control" ":" 1#cache-directive
cache-directive = cache-request-directive
| cache-response-directive
cache-extension = token [ "=" ( token | quoted-string ) ]
9 values can appear in response header:
cache-response-directive =
; Resource will be cached by client and proxy server
"public"
; Resource is cached only by client, proxy server not allowed
| "private" [ "=" <"> 1#field-name <"> ]
; Without checking back to origin first, reuse of resource is not allowed
| "no-cache" [ "=" <"> 1#field-name <"> ]
; Resource is not allowed to be written to cache
| "no-store"
; Proxy servers are prohibited from modifying Content-Encoding, Content-Range, Content-Type fields
| "no-transform"
; Using expired resources is not allowed, once expired must validate with origin (even if client is willing to accept expired resources)
| "must-revalidate"
; Depends on public, similar to must-revalidate, only applies to proxy servers
| "proxy-revalidate"
; Cache resource, but cache expires after specified time (in seconds)
| "max-age" "=" delta-seconds
; Depends on public, only valid on proxy servers, overrides max-age
| "s-maxage" "=" delta-seconds
; Custom extension value
| cache-extension
7 values can appear in request header:
cache-request-directive =
; Force origin fetch, no content from cache
"no-cache"
; Do not allow client request-related information to be written to cache
| "no-store"
; Client is willing to accept resource whose age (proxy server cache time) does not exceed delta seconds
| "max-age" "=" delta-seconds
; Client is willing to accept stale content within delta seconds of expiration
| "max-stale" [ "=" delta-seconds ]
; Client hopes response content remains valid within delta seconds
| "min-fresh" "=" delta-seconds
; Client does not accept transformed content, e.g., Content-Type
| "no-transform"
; Client only wants cached resources, does not request resources anew
| "only-if-cached"
; Custom extension value
| cache-extension
Note the subtle differences in descriptions of no-store, no-cache, must-revalidate, and the same field appearing in request header and response header also has different meanings
Last-Modified
Entity header field, indicates resource's last modification time, specifies negotiation policy
Last-Modified = "Last-Modified" ":" HTTP-date
After receiving it, client will save it. Next time requesting the resource from server, it will send this point in time as version number to verify whether local cached resource is still available
If resource has been modified but content hasn't changed, sending an identical response becomes redundant, so content-based negotiated cache is also provided to avoid this situation
P.S. Lower priority than Cache-Control: max-age, when appearing together, max-age takes precedence
If-Modified-Since
Request header field, needed for time-based negotiation policy implementation, compares whether resource last modification time (Last-Modified, resource last modification time) is consistent
If-Modified-Since = "If-Modified-Since" ":" HTTP-date
Send the Last-Modified version number back to server. If resource hasn't been updated, return 304 without response body; if updated, return 200 with new version content as response body
If-Unmodified-Since
Same as above, opposite behavior (compares whether resource last modification time is inconsistent), if inconsistent and method is POST/PUT or other update operations, return 412 (Precondition Failed, precondition not met) indicating update execution failed
ETag
Response header field, indicates resource's content hash, specifies negotiation policy
ETag = "ETag" ":" entity-tag
Client will record this value, and pass it back to server as version number when requesting the resource next time
P.S. ETag has higher priority than Last-Modified
If-Match
Request header field, needed for content-based negotiation policy implementation, compares whether the value of this field (ETag, resource content hash) is consistent
If-Match = "If-Match" ":" ( "*" | 1#entity-tag )
If inconsistent, and method is POST/PUT or other update operations, return 412 indicating update failed
If-None-Match
Same as above, opposite behavior (compares whether the value of this field is inconsistent), if consistent, return 304 to tell client it can continue using cache, otherwise return new resource
Age
Response header field, indicates how long the resource has been cached on proxy server
Age = "Age" ":" age-value
age-value = delta-seconds
Calculation method:
/*
* age_value
* is the value of Age: header received by the cache with
* this response.
* date_value
* is the value of the origin server's Date: header
* request_time
* is the (local) time when the cache made the request
* that resulted in this cached response
* response_time
* is the (local) time when the cache received the
* response
* now
* is the current (local) time
*/
apparent_age = max(0, response_time - date_value);
corrected_received_age = max(apparent_age, age_value);
response_delay = response_time - request_time;
corrected_initial_age = corrected_received_age + response_delay;
resident_time = now - response_time;
current_age = corrected_initial_age + resident_time;
Age:0 means just fetched from origin server, positive value indicates seconds elapsed since last fetch from origin
III. Strong Cache and Negotiated Cache
Occur at different stages of caching. Strong cache is used when cache is effective, no request is sent. Negotiated cache is used after cache becomes invalid, request is sent to ask about resource updates
Strong Cache
After response content hits strong cache, within the cache validity period, browser will not initiate request to server, but directly read from local cache (disk cache or memory cache)
As long as local cache version of the resource exists, and Cache-Control: max-age or Expires hasn't expired, strong cache hit occurs
Negotiated Cache
After cache expires, when accessing the resource again, browser will carry local cached version number to ask server. Server checks ETag or Last-Modified value sent by client, and tells client whether to update cache
ETag and Last-Modified in response header are the switches for negotiated cache. The benefit of negotiated cache is that if content hasn't changed, directly return 304 without transmitting response body
IV. Heuristic Cache
A rather special case is when response header doesn't provide any cache-related information. At this point, browser uses a heuristic algorithm to determine resource cache expiration:
max-age = Date - Last-Modified / 10
Default caching policy, called heuristic cache. Heuristic means based on experience, without strict basis
V. Refresh Behavior
Browser has 3 different refresh behaviors, easily confusing when verifying HTTP cache:
-
Open new page: Open new tab or window, visit page
-
Normal refresh: Click refresh button, press Enter in address bar,
CMD + R -
Hard refresh:
CMD + Shift + R, long-press refresh button in Chrome, select hard reload -
Disable cache then refresh: Check
Disable cachesetting, then open new page/refresh
Open New Page
Request header doesn't carry cache-related fields. If local cache version is valid, read from cache, no request sent, and display a fake request header:
Request Headers
Provisional headers are shown
Upgrade-Insecure-Requests:1
User-Agent:...
Response header reuses the cached one
Normal Refresh
Do not fetch from cache, will definitely initiate request to server. Request header will carry If-Modified-Since and If-None-Match and other cache headers (if any). Additionally, it unilaterally adds:
Cache-Control:max-age=0
Requires proxy server to check whether cache has expired
P.S. When normal refresh behavior occurs, browser will definitely initiate request, even if resource cache is still valid and should be in strong cache state. Because user requests content refresh, hopes to see new content, while associated resources (such as CSS, JS and other resources contained in the page) won't be forced to initiate requests
Hard Refresh
Similarly forces request initiation, carries cache-related information, and also unilaterally adds:
Cache-Control:max-age=0
Pragma:no-cache
Requires fetching new from origin, even if cache hasn't expired
Disable Cache Then Refresh
After disabling cache, all subsequent requests will be added with:
Cache-Control:max-age=0
Pragma:no-cache
Equivalent to all going through hard refresh, including associated resources
P.S. Specific behavior of Cache-Control:max-age=0, Pragma:no-cache depends on server implementation, in reality proxy server may not necessarily go back to origin or check expiration
References
-
Hypertext Transfer Protocol -- HTTP/1.1: RFC 2616
-
浏览器缓存机制剖析: Cache mechanism flowchart is good, Header field description is incorrect
-
HTTP 缓存控制小结: Content is accurate and fairly comprehensive
-
What heuristics do browsers use to cache resources not explicitly set to be cachable?
No comments yet. Be the first to share your thoughts.