Caching


Overview


Samples

NOTE: there's a difference between IE5 and NC4.7 when pressing the reload button: NC simply adds a Pragma: no-cache to the request header. Because of this reason every sample page has a link to itself to avoid pressing the reload button.

Sample 1:
This sample uses ASP's Ad Rotator component which automates the rotation of documents, in this case images, on a Web page. Because of the dynamic type of this document, there is no Last-Modified entity header from the Web server for cache_rotate1.asp. The rotating gif (one of four) inside is static content and therefore gets the Last-Modified entity header.
Accessing this page several times shows following observations:
  • SHALL (HTTP/1.0):
    • GET request for cache_rotate1.asp
    • no Last-Modified entity therefore no caching of this document
    • GET request for a cache_x.gif, where x = 1..4
    • Last-Modified entity and therefore caching of this document, Conditional GET on every following request for this document
  • IS (IE5)- "Check for newer versions of stored pages : Every visit to the page":
    • Everything works as expected.
      Interesting is the fact that cache_rotate1.asp is also in the Temporary Internet Files:

      Actually there is no need of putting cache_rotate1.asp into the cache following HTTP/1.0 rules. Every request for that document is a normal GET followed by a 200 OK.
      So the reason why it is there is that Web browser handle the FORWARD/BACK buttons as cache hits.
  • IS (IE5)- "Check for newer versions of stored pages: Every time you start Internet Explorer":
    • Everything works as expected, except there is no request for already cached images.
  • IS (NC4.7)- "Document in cache is compared to document on network: Every time":
    • GET request for cache_rotate1.asp
    • no Last-Modified entity therefore no caching of this document
    • GET request for a cache_x.gif, where x = 1..4
    • Last-Modified entity and therefore caching of this document
    • no further requests for already cached documents (=images),
      so after the main document(cache_rotate1.asp) and the gifs are cached, further accesses to cache_rotate1.asp lead only to requests for roate1.asp but not for the images.
    • Note that cache_rotate1.asp (here "M0v8uj4b.asp") is in the cache directory although there is no need following the HTTP/1.0 rules. Again this is because Web browser handle the FORWARD/BACK buttons as cache hits.

  • IS (NC4.7)- "Document in cache is compared to document on network: Once per session":
    • There is absolutely no difference compared with this setting.
Sample 2:
This sample uses ASP and VBScript to generate random rotating images. The image is displayed twice on this page to proof the correct handling of chunked encoding. The URL of the two images is constructed following way:
<% Randomize %>
<% nr=Int((4 * Rnd) + 1) %>
<% reqimg="cache_" & nr & ".gif" %>
<img src=<%=reqimg%>>
<img src="cache_<%=nr%>.gif">
Because of the dynamic type of this document, there is no Last-Modified entity header from the Web server for cache_rotate2.asp. The first rotating gif inside is static content and therefore gets the Last-Modified entity header. THERE IS NO DEFINITION ON HOW TO HANDLE ANOTHER REQUEST TO THE SAME RESOURCE INSIDE ONE DOCUMENT.
Accessing this page several times shows following observations:
  • SHALL (HTTP/1.0):
    • GET request for cache_rotate1.asp
    • no Last-Modified entity therefore no caching of this document
    • GET request for the first cache_x.gif, where x = 1..4
    • Last-Modified entity and therefore caching of this document, Conditional GET on every following request for this document
    • THERE IS DEFINITION ON HOW TO HANDLE THE REQUEST FOR THE SECOND IMAGE!
  • IS (IE5)- "Check for newer versions of stored pages : Every visit to the page":
    • Everything works as in Sample 1, IE except there is NO REQUEST FOR THE SECOND IMAGE!
  • IS (IE5)- "Check for newer versions of stored pages: Every time you start Internet Explorer":
    • There are no requests for already cached images.
  • IS (NC4.7)- "Document in cache is compared to document on network: Every time":
    • Everything works as in Sample 1, NC except there is NO REQUEST FOR THE SECOND IMAGE!
      So no requests for already cached images
  • IS (NC4.7)- "Document in cache is compared to document on network: Once per session":
    • There is absolutely no difference between the "Every time" and "Once per session" setting.
      No requests for already cached images.
Sample 3:
This sample uses JavaScript to generate the random rotating images. NOTE that the main document (cache_rotate3.html) is now static html, the dynamic behavior(=client-side JavaScript) is done by the client(=browser). The URL of the image is constructed following way:
<script language="javascript">
   nr = Math.floor((4*Math.random())+1)
   document.write('  <img src="cache_'+nr+'.gif">');
</script>
Accessing this page several times shows following observations:
  • SHALL (HTTP/1.0):
    • GET request for cache_rotate3.html
    • Last-Modified entity therefore caching of this document
    • GET request for the first cache_x.gif, where x = 1..4
    • Last-Modified entity and therefore caching of this document, Conditional GET on every following request for this document
    • Conditional GET for every following request for cache_rotate3.html
  • IS (IE5)- "Check for newer versions of stored pages : Every visit to the page":
    • Everything works as expected.
  • IS (IE5)- "Check for newer versions of stored pages: Every time you start Internet Explorer":
    • No request for already cached documents. So if cache is cleared and you access this document for the first time, there is a request for cache_rotate3.html and the random image. On the second access to this document there is no request for cache_rotate3.html and only a request for the image if it is not in the cache already.
  • IS (NC4.7)- "Document in cache is compared to document on network: Every time":
    • Everything works as expected except there is no further request for already cached images.
  • IS (NC4.7)- "Document in cache is compared to document on network: Once per session":
    • No request for already cached documents. So if cache is cleared and you access this document for the first time, there is a request for cache_rotate3.html and the random image. On the second access to this document there is no more request for cache_rotate3.html and only a request for the image if it is not in the cache already.
Sample 4:
This sample uses JavaScript to generate the random rotating images. NOTE that the main document (cache_rotate4.html) is static html, the dynamic behavior(=client-side JavaScript) is done by the client(=browser).
One of the random URLs is now an ASP which sets its expiration date one year in the past. The URL of the image is constructed following way:
<script language="javascript">
   nr = Math.floor((5*Math.random())+1)
   switch(nr)
   {
     case 1:
       imgsrc = "cache_1.gif"
       break;
     case 2:
       imgsrc = "cache_2.gif"
       break;
     case 3:
       imgsrc = "cache_3.gif"
       break;
     case 4:
       imgsrc = "cache_4.gif"
       break;
     default:
       imgsrc = "cacheresponse.asp?expires_sign=-&expires_number=1&expires_timeunit=yyyy&lastmodified_sign=%2B&lastmodified_number=0&lastmodified_timeunit=n&gmt_sign=-&gmt_number=2&Get-Button=Submit+with+GET"
       break;
   }
   document.write('<img src="'+imgsrc+'">');
</script>>
Accessing this page several times shows following observations:
  • SHALL (HTTP/1.0):
    • GET request for cache_rotate4.html
    • Last-Modified entity therefore caching of this document
    • GET request for the first image
    • Response
      • for the gifs: Last-Modified entity and therefore caching of this document,
        Conditional GET on every following request for this document
      • for the ASP: no Last-Modified entry and therefore no caching
    • Conditional GET for every following request for cache_rotate4.html
  • IS (IE5)- "Check for newer versions of stored pages : Every visit to the page":
    • Everything works as expected.
  • IS (IE5)- "Check for newer versions of stored pages: Every time you start Internet Explorer":
    • No requests for already cached documents and no caching and therefor requests for the ASP.
  • IS (NC4.7)- "Document in cache is compared to document on network: Every time":
    • Everything works as expected except there is no further request for already cached images.
  • IS (NC4.7)- "Document in cache is compared to document on network: Once per session":
    • No requests for already cached documents and no caching and therefor requests for the ASP.

Short summary: documents which are generated by server-side scripting such as ASP don't get a Last-Modified entity header from the Web server (IIS4) by default.

In the next examples the Last-Modified entity header is set by the ASP itself.


Sample 5:
This sample allows you the set the Last-Modified and the Expires entity header. The first document brings up a form where you can decide between a GET or a POST request.
Accessing this page several times shows following observations:
  • IS (IE5 and NC4.7) - check cache every time:
    • Although both headers are set on the first request and the second request is a Conditional GET (in case of the first request was a GET) the server responses always with a 200 OK and not with a 304 Not Modified as expected.
      So we can assume that the Web server (IIS4) has its own Last-Modifed date when using dynamic documents such as ASPs.
    • The funny thing with this example is when you set the expiration date to the future and the last-modified date to the past, that on the first request there is everything OK but the follwing requests are cache hits!!
      It seems as if there is a special date for the expiration header, when being passed, the request is sent again (e.g. 2001 - no further requests, 2222 - requests).
  • IS (IE5 and NC4.7) - check cache every session:
    • No requests on already cached documents if
      • Last-Modified date in set to the past
      • Expiration date is in the future
      • Last-Modified date in set to the past and Expiration date is in the near future
    • Request are sent if
      • none of the two headers are set
      • Expiration date is in the past
      • Expiration date is too far in the future
Sample 6:
Same as Sample 4 except that the dynamic random URL is now the ASP from Sample 5, the Last-Modified date set to Fri, 01 Jan 1999 12:00:00 GMT and the Expiration date to Fri, 01 Jan 2010 12:00:00 GMT.
What now follows are really unexpected observations
  • IE5 - check cache every time
    There are conditional requests for every cached document (cache_rotate6.html and images) but there is no second request for the ASP.
  • IE5 - check cache every session
    There are no requests for already cached documents (cache_rotate6.html and images) but there are requests for the ASP.
    When we access the ASP by sending the request directly and not inside the <img> tag and switch back to sample6 there are no more requests for the ASP.
  • NC4.7 - check cache every time
    There are conditional requests for every cached document (cache_rotate6.html and images) and normal requests for the ASP.
  • NC4.7 - check cache every session
    There are no requests for already cached documents (cache_rotate6.html and images) but there are requests for the ASP.
    When we access the ASP by sending the request directly and not inside the <img> tag and switch back to sample6 there are no more requests for the ASP.

Requirements for SilkPerformer 3.5

Record:
  • Headers to remove during recording (Page-based Browser-level API):
    • from the request:
      • If-Modified-Since
      • If-Match
    • from the response:
      • Age
      • ETag
  • Headers to add during recording (Page-based Browser-level API):
    • to the request:
      • HTTP/1.0 - pragma: no-cache
      • HTTP/1.1 - cache-control: no-cache
  • Headers to change during recording (Page-based Browser-level API):
    • at the response
      • expires: set expiration data to 0 (expires immediately)
Replay:
  • First request during browser session (cache is not empty).
    Document is cached, what to do now
    Last-Modified: now --- is an exception
    Expiration dateLast-ModifiedTODO - compare document in cache with document on network
    alwayssessionnever
    past past Document is stale: conditional request (GET + If-Modified-Since)
    past now Document is stale: conditional request (GET + If-Modified-Since)
    past future Illegal Last-Modified date - such a document MUST NOT be sent from a server (shall be set to Date)
    past - Normal GET
    now past Document is stale: conditional request (GET + If-Modified-Since)
    now now Document is stale: conditional request (GET + If-Modified-Since)
    now future Illegal Last-Modified date - such a document MUST NOT be sent from a server (shall be set to Date)
    now - Normal GET
    future past Cache hit
    future now Cache hit
    future future Illegal Last-Modified date - such a document MUST NOT be sent from a server (shall be set to Date)
    future - Cache hit, no request
    - past Conditional request (GET + If-Modified-Since)Cache hit
    - now Conditional request (GET + If-Modified-Since)Cache hit
    - future Illegal Last-Modified date - such a document MUST NOT be sent from a server (shall be set to Date)
    - - Normal GET
  • Subsequent request during browser session (cache is not empty).
    Expiration dateLast-ModifiedTODO - compare document in cache with document on network
    alwayssessionnever
    past past Document is stale: conditional request (GET + If-Modified-Since)
    past now Document is stale: conditional request (GET + If-Modified-Since)
    past future Illegal Last-Modified date - such a document MUST NOT be sent from a server (shall be set to Date)
    past - normal GET
    now past Document is stale: conditional request (GET + If-Modified-Since)
    now now Document is stale: conditional request (GET + If-Modified-Since)
    now future Illegal Last-Modified date - such a document MUST NOT be sent from a server (shall be set to Date)
    now - normal GET
    future past Cache hit, no request
    future now Cache hit, no request
    future future Illegal Last-Modified date - such a document MUST NOT be sent from a server (shall be set to Date)
    - past Conditional requestCache hit, no request
  • Response:
    All responses are answered with a 200 OK
    Expiration dateLast-ModifiedTODO
    past past Do not cache
    past now Do not cache
    past future Illegal Last-Modified date - shall be set to Date
    past - Do not cache
    now past Do not cache
    now now Do not cache
    now future Illegal Last-Modified date - shall be set to Date
    now - Do not cache
    future past Cache
    future now Cache
    future future Illegal Last-Modified date - shall be set to Date
    future - Cache
    - past Cache (ASPs are not cached)
    - now Cache
    - future Illegal Last-Modified date- shall be set to Date
    - - Do not cache
  • Exceptions
    • Expiration header of an .html document set by IIS4 to the past:
      • IE5: the page is not cached an not stored in the Temporary Internet Files so it can't be accessed using the Web browser's BACK and FORWARD button without sending a request. Using the BACK/FORWARD button a new request is sent!
      • NC4.7: the page is not cached but stored in the Cache directory. But even it is stored using the BACK/FORWARD button a new request is sent!
    • Magical Date:
      It seems as if there is a special date for the expiration header, when being passed, the request is sent again (e.g. 2001 - no further requests, 2222 - requests).
    • Sample 6:
      There are no requests for already cached documents (cache_rotate6.html and images) but there are requests for the ASP.
      When we access the ASP by sending the request directly and not inside the <img> tag and switch back to sample6 there are no more requests for the ASP.
Observations:
Check the test page.


Reference

RFC 1945: Hypertext Transfer Protocol -- HTTP/1.0

...
8.1 GET

...
The semantics of the GET method changes to a "conditional GET" if the request message includes an If-Modified-Since header field. A conditional GET method requests that the identified resource be transferred only if it has been modified since the date given by the If-Modified-Since header, as described in Section 10.9.The conditional GET method is intended to reduce network usage by allowing cached entities to be refreshed without requiring multiple requests or transferring unnecessary data.

...
9.3 Redirection 3xx

...
304 Not Modified

If the client has performed a conditional GET request and access is allowed, but the document has not been modified since the date and time specified in the If-Modified-Since field, the server must respond with this status code and not send an Entity-Body to the client. Header fields contained in the response should only include information which is relevant to cache managers or which may have changed independently of the entity's Last-Modified date. Examples of relevant header fields include: Date, Server, and Expires. A cache should update its cached entity to reflect any new field values given in the 304 response.

...
10.7 Expires

The Expires entity-header field gives the date/time after which the entity should be considered stale. This allows information providers to suggest the volatility of the resource, or a date after which the information may no longer be valid. Applications must not cache this entity beyond the date given. The presence of an Expires field does not imply that the original resource will change or cease to exist at, before, or after that time. However, information providers that know or even suspect that a resource will change by a certain date should include an Expires header with that date. The format is an absolute date and time as defined by HTTP-date in Section 3.3.

Expires = "Expires" ":" HTTP-date
An example of its use is
Expires: Thu, 01 Dec 1994 16:00:00 GMT

If the date given is equal to or earlier than the value of the Date header, the recipient must not cache the enclosed entity. If a resource is dynamic by nature, as is the case with many data- producing processes, entities from that resource should be given an appropriate Expires value which reflects that dynamism.

The Expires field cannot be used to force a user agent to refresh its display or reload a resource; its semantics apply only to caching mechanisms, and such mechanisms need only check a resource's expiration status when a new request for that resource is initiated.

User agents often have history mechanisms, such as "Back" buttons and history lists, which can be used to redisplay an entity retrieved earlier in a session. By default, the Expires field does not apply to history mechanisms. If the entity is still in storage, a history mechanism should display it even if the entity has expired, unless the user has specifically configured the agent to refresh expired history documents.

Note: Applications are encouraged to be tolerant of bad or misinformed implementations of the Expires header. A value of zero (0) or an invalid date format should be considered equivalent to an "expires immediately." Although these values are not legitimate for HTTP/1.0, a robust implementation is always desirable.

...
10.9 If-Modified-Since

The If-Modified-Since request-header field is used with the GET method to make it conditional: if the requested resource has not been modified since the time specified in this field, a copy of the resource will not be returned from the server; instead, a 304 (not modified) response will be returned without any Entity-Body.

If-Modified-Since = "If-Modified-Since" ":" HTTP-date
An example of the field is:
If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT

A conditional GET method requests that the identified resource be transferred only if it has been modified since the date given by the If-Modified-Since header. The algorithm for determining this includes the following cases:

  1. If the request would normally result in anything other than a 200 (ok) status, or if the passed If-Modified-Since date is invalid, the response is exactly the same as for a normal GET. A date which is later than the server's current time is invalid.
  2. If the resource has been modified since the If-Modified-Since date, the response is exactly the same as for a normal GET.
  3. If the resource has not been modified since a valid If-Modified-Since date, the server shall return a 304 (not modified) response.
The purpose of this feature is to allow efficient updates of cached information with a minimum amount of transaction overhead.

10.10 Last-Modified

The Last-Modified entity-header field indicates the date and time at which the sender believes the resource was last modified. The exact semantics of this field are defined in terms of how the recipient should interpret it: if the recipient has a copy of this resource which is older than the date given by the Last-Modified field, that copy should be considered stale.

Last-Modified = "Last-Modified" ":" HTTP-date
An example of its use is
Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT

The exact meaning of this header field depends on the implementation of the sender and the nature of the original resource. For files, it may be just the file system last-modified time. For entities with dynamically included parts, it may be the most recent of the set of last-modify times for its component parts. For database gateways, it may be the last-update timestamp of the record. For virtual objects, it may be the last time the internal state changed.

An origin server must not send a Last-Modified date which is later than the server's time of message origination. In such cases, where the resource's last modification would indicate some time in the future, the server must replace that date with the message origination date.

10.12 Pragma

The Pragma general-header field is used to include implementation- specific directives that may apply to any recipient along the request/response chain. All pragma directives specify optional behavior from the viewpoint of the protocol; however, some systems may require that behavior be consistent with the directives.

Pragma  =  "Pragma" ":" 1#pragma-directive
  
pragma-directive  =  "no-cache" | extension-pragma
extension-pragma  =  token [ "=" word ]

When the "no-cache" directive is present in a request message, an application should forward the request toward the origin server even if it has a cached copy of what is being requested. This allows a client to insist upon receiving an authoritative response to its request. It also allows a client to refresh a cached copy which is known to be corrupted or stale.

Pragma directives must be passed through by a proxy or gateway application, regardless of their significance to that application, since the directives may be applicable to all recipients along the request/response chain. It is not possible to specify a pragma for a specific recipient; however, any pragma directive not relevant to a recipient should be ignored by that recipient.


Date and Time Specifications