Caching |
Overview
NOTE: there's a difference between IE5 and NC4.7 when pressing the reload button: NC simply adds a Pragma: no-cache to the request header. Because of this reason every sample page has a link to itself to avoid pressing the reload button.
Sample 1: This sample uses ASP's Ad Rotator component which automates the rotation of documents, in this case images, on a Web page. Because of the dynamic type of this document, there is no Last-Modified entity header from the Web server for cache_rotate1.asp. The rotating gif (one of four) inside is static content and therefore gets the Last-Modified entity header.
Accessing this page several times shows following observations:
- SHALL (HTTP/1.0):
- GET request for cache_rotate1.asp
- no Last-Modified entity therefore no caching of this document
- GET request for a cache_x.gif, where x = 1..4
- Last-Modified entity and therefore caching of this document, Conditional GET on every following request for this document
- IS (IE5)- "Check for newer versions of stored pages : Every visit to the page":
- Everything works as expected.
Interesting is the fact that cache_rotate1.asp is also in the Temporary Internet Files: Actually there is no need of putting cache_rotate1.asp into the cache following HTTP/1.0 rules. Every request for that document is a normal GET followed by a 200 OK.
So the reason why it is there is that Web browser handle the FORWARD/BACK buttons as cache hits.- IS (IE5)- "Check for newer versions of stored pages: Every time you start Internet Explorer":
- Everything works as expected, except there is no request for already cached images.
- IS (NC4.7)- "Document in cache is compared to document on network: Every time":
- GET request for cache_rotate1.asp
- no Last-Modified entity therefore no caching of this document
- GET request for a cache_x.gif, where x = 1..4
- Last-Modified entity and therefore caching of this document
- no further requests for already cached documents (=images),
so after the main document(cache_rotate1.asp) and the gifs are cached, further accesses to cache_rotate1.asp lead only to requests for roate1.asp but not for the images.- Note that cache_rotate1.asp (here "M0v8uj4b.asp") is in the cache directory although there is no need following the HTTP/1.0 rules. Again this is because Web browser handle the FORWARD/BACK buttons as cache hits.
- IS (NC4.7)- "Document in cache is compared to document on network: Once per session":
- There is absolutely no difference compared with this setting.
Sample 2: This sample uses ASP and VBScript to generate random rotating images. The image is displayed twice on this page to proof the correct handling of chunked encoding. The URL of the two images is constructed following way:Because of the dynamic type of this document, there is no Last-Modified entity header from the Web server for cache_rotate2.asp. The first rotating gif inside is static content and therefore gets the Last-Modified entity header. THERE IS NO DEFINITION ON HOW TO HANDLE ANOTHER REQUEST TO THE SAME RESOURCE INSIDE ONE DOCUMENT.<% Randomize %>
<% nr=Int((4 * Rnd) + 1) %>
<% reqimg="cache_" & nr & ".gif" %>
<img src=<%=reqimg%>>
<img src="cache_<%=nr%>.gif">
Accessing this page several times shows following observations:
- SHALL (HTTP/1.0):
- GET request for cache_rotate1.asp
- no Last-Modified entity therefore no caching of this document
- GET request for the first cache_x.gif, where x = 1..4
- Last-Modified entity and therefore caching of this document, Conditional GET on every following request for this document
- THERE IS DEFINITION ON HOW TO HANDLE THE REQUEST FOR THE SECOND IMAGE!
- IS (IE5)- "Check for newer versions of stored pages : Every visit to the page":
- Everything works as in Sample 1, IE except there is NO REQUEST FOR THE SECOND IMAGE!
- IS (IE5)- "Check for newer versions of stored pages: Every time you start Internet Explorer":
- There are no requests for already cached images.
- IS (NC4.7)- "Document in cache is compared to document on network: Every time":
- Everything works as in Sample 1, NC except there is NO REQUEST FOR THE SECOND IMAGE!
So no requests for already cached images- IS (NC4.7)- "Document in cache is compared to document on network: Once per session":
- There is absolutely no difference between the "Every time" and "Once per session" setting.
No requests for already cached images.Sample 3: This sample uses JavaScript to generate the random rotating images. NOTE that the main document (cache_rotate3.html) is now static html, the dynamic behavior(=client-side JavaScript) is done by the client(=browser). The URL of the image is constructed following way:Accessing this page several times shows following observations:<script language="javascript">
nr = Math.floor((4*Math.random())+1)
document.write(' <img src="cache_'+nr+'.gif">');
</script>
- SHALL (HTTP/1.0):
- GET request for cache_rotate3.html
- Last-Modified entity therefore caching of this document
- GET request for the first cache_x.gif, where x = 1..4
- Last-Modified entity and therefore caching of this document, Conditional GET on every following request for this document
- Conditional GET for every following request for cache_rotate3.html
- IS (IE5)- "Check for newer versions of stored pages : Every visit to the page":
- Everything works as expected.
- IS (IE5)- "Check for newer versions of stored pages: Every time you start Internet Explorer":
- No request for already cached documents. So if cache is cleared and you access this document for the first time, there is a request for cache_rotate3.html and the random image. On the second access to this document there is no request for cache_rotate3.html and only a request for the image if it is not in the cache already.
- IS (NC4.7)- "Document in cache is compared to document on network: Every time":
- Everything works as expected except there is no further request for already cached images.
- IS (NC4.7)- "Document in cache is compared to document on network: Once per session":
- No request for already cached documents. So if cache is cleared and you access this document for the first time, there is a request for cache_rotate3.html and the random image. On the second access to this document there is no more request for cache_rotate3.html and only a request for the image if it is not in the cache already.
Sample 4: This sample uses JavaScript to generate the random rotating images. NOTE that the main document (cache_rotate4.html) is static html, the dynamic behavior(=client-side JavaScript) is done by the client(=browser).
One of the random URLs is now an ASP which sets its expiration date one year in the past. The URL of the image is constructed following way:Accessing this page several times shows following observations:<script language="javascript">
nr = Math.floor((5*Math.random())+1)
switch(nr)
{
case 1:
imgsrc = "cache_1.gif"
break;
case 2:
imgsrc = "cache_2.gif"
break;
case 3:
imgsrc = "cache_3.gif"
break;
case 4:
imgsrc = "cache_4.gif"
break;
default:
imgsrc = "cacheresponse.asp?expires_sign=-&expires_number=1&expires_timeunit=yyyy&lastmodified_sign=%2B&lastmodified_number=0&lastmodified_timeunit=n&gmt_sign=-&gmt_number=2&Get-Button=Submit+with+GET"
break;
}
document.write('<img src="'+imgsrc+'">');
</script>>
- SHALL (HTTP/1.0):
- GET request for cache_rotate4.html
- Last-Modified entity therefore caching of this document
- GET request for the first image
- Response
- for the gifs: Last-Modified entity and therefore caching of this document,
Conditional GET on every following request for this document
- for the ASP: no Last-Modified entry and therefore no caching
- Conditional GET for every following request for cache_rotate4.html
- IS (IE5)- "Check for newer versions of stored pages : Every visit to the page":
- Everything works as expected.
- IS (IE5)- "Check for newer versions of stored pages: Every time you start Internet Explorer":
- No requests for already cached documents and no caching and therefor requests for the ASP.
- IS (NC4.7)- "Document in cache is compared to document on network: Every time":
- Everything works as expected except there is no further request for already cached images.
- IS (NC4.7)- "Document in cache is compared to document on network: Once per session":
- No requests for already cached documents and no caching and therefor requests for the ASP.
Short summary: documents which are generated by server-side scripting such as ASP don't get a Last-Modified entity header from the Web server (IIS4) by default.
In the next examples the Last-Modified entity header is set by the ASP itself.
Sample 5: This sample allows you the set the Last-Modified and the Expires entity header. The first document brings up a form where you can decide between a GET or a POST request.
Accessing this page several times shows following observations:
- IS (IE5 and NC4.7) - check cache every time:
- Although both headers are set on the first request and the second request is a Conditional GET (in case of the first request was a GET) the server responses always with a 200 OK and not with a 304 Not Modified as expected.
So we can assume that the Web server (IIS4) has its own Last-Modifed date when using dynamic documents such as ASPs.
- The funny thing with this example is when you set the expiration date to the future and the last-modified date to the past, that on the first request there is everything OK but the follwing requests are cache hits!!
It seems as if there is a special date for the expiration header, when being passed, the request is sent again (e.g. 2001 - no further requests, 2222 - requests).- IS (IE5 and NC4.7) - check cache every session:
- No requests on already cached documents if
- Last-Modified date in set to the past
- Expiration date is in the future
- Last-Modified date in set to the past and Expiration date is in the near future
- Request are sent if
- none of the two headers are set
- Expiration date is in the past
- Expiration date is too far in the future
Sample 6: Same as Sample 4 except that the dynamic random URL is now the ASP from Sample 5, the Last-Modified date set to Fri, 01 Jan 1999 12:00:00 GMT and the Expiration date to Fri, 01 Jan 2010 12:00:00 GMT.
What now follows are really unexpected observations
- IE5 - check cache every time
There are conditional requests for every cached document (cache_rotate6.html and images) but there is no second request for the ASP.- IE5 - check cache every session
There are no requests for already cached documents (cache_rotate6.html and images) but there are requests for the ASP.
When we access the ASP by sending the request directly and not inside the <img> tag and switch back to sample6 there are no more requests for the ASP.- NC4.7 - check cache every time
There are conditional requests for every cached document (cache_rotate6.html and images) and normal requests for the ASP.- NC4.7 - check cache every session
There are no requests for already cached documents (cache_rotate6.html and images) but there are requests for the ASP.
When we access the ASP by sending the request directly and not inside the <img> tag and switch back to sample6 there are no more requests for the ASP.
Requirements for SilkPerformer 3.5
Record:Replay:
- Headers to remove during recording (Page-based Browser-level API):
- from the request:
- If-Modified-Since
- If-Match
- from the response:
- Age
- ETag
- Headers to add during recording (Page-based Browser-level API):
- to the request:
- HTTP/1.0 - pragma: no-cache
- HTTP/1.1 - cache-control: no-cache
- Headers to change during recording (Page-based Browser-level API):
- at the response
- expires: set expiration data to 0 (expires immediately)
Observations:
- First request during browser session (cache is not empty).
Document is cached, what to do now
Last-Modified: now --- is an exception
Expiration date Last-Modified TODO - compare document in cache with document on network always session never past past Document is stale: conditional request (GET + If-Modified-Since) past now Document is stale: conditional request (GET + If-Modified-Since) past future Illegal Last-Modified date - such a document MUST NOT be sent from a server (shall be set to Date) past - Normal GET now past Document is stale: conditional request (GET + If-Modified-Since) now now Document is stale: conditional request (GET + If-Modified-Since) now future Illegal Last-Modified date - such a document MUST NOT be sent from a server (shall be set to Date) now - Normal GET future past Cache hit future now Cache hit future future Illegal Last-Modified date - such a document MUST NOT be sent from a server (shall be set to Date) future - Cache hit, no request - past Conditional request (GET + If-Modified-Since) Cache hit - now Conditional request (GET + If-Modified-Since) Cache hit - future Illegal Last-Modified date - such a document MUST NOT be sent from a server (shall be set to Date) - - Normal GET - Subsequent request during browser session (cache is not empty).
Expiration date Last-Modified TODO - compare document in cache with document on network always session never past past Document is stale: conditional request (GET + If-Modified-Since) past now Document is stale: conditional request (GET + If-Modified-Since) past future Illegal Last-Modified date - such a document MUST NOT be sent from a server (shall be set to Date) past - normal GET now past Document is stale: conditional request (GET + If-Modified-Since) now now Document is stale: conditional request (GET + If-Modified-Since) now future Illegal Last-Modified date - such a document MUST NOT be sent from a server (shall be set to Date) now - normal GET future past Cache hit, no request future now Cache hit, no request future future Illegal Last-Modified date - such a document MUST NOT be sent from a server (shall be set to Date) - past Conditional request Cache hit, no request - Response:
All responses are answered with a 200 OK Expiration date Last-Modified TODO past past Do not cache past now Do not cache past future Illegal Last-Modified date - shall be set to Date past - Do not cache now past Do not cache now now Do not cache now future Illegal Last-Modified date - shall be set to Date now - Do not cache future past Cache future now Cache future future Illegal Last-Modified date - shall be set to Date future - Cache - past Cache (ASPs are not cached) - now Cache - future Illegal Last-Modified date- shall be set to Date - - Do not cache - Exceptions
- Expiration header of an .html document set by IIS4 to the past:
- IE5: the page is not cached an not stored in the Temporary Internet Files so it can't be accessed using the Web browser's BACK and FORWARD button without sending a request. Using the BACK/FORWARD button a new request is sent!
- NC4.7: the page is not cached but stored in the Cache directory. But even it is stored using the BACK/FORWARD button a new request is sent!
- Magical Date:
It seems as if there is a special date for the expiration header, when being passed, the request is sent again (e.g. 2001 - no further requests, 2222 - requests).- Sample 6:
There are no requests for already cached documents (cache_rotate6.html and images) but there are requests for the ASP.
When we access the ASP by sending the request directly and not inside the <img> tag and switch back to sample6 there are no more requests for the ASP.Check the test page.
RFC 1945: Hypertext Transfer Protocol -- HTTP/1.0
...
8.1 GET...
The semantics of the GET method changes to a "conditional GET" if the request message includes an If-Modified-Since header field. A conditional GET method requests that the identified resource be transferred only if it has been modified since the date given by the If-Modified-Since header, as described in Section 10.9.The conditional GET method is intended to reduce network usage by allowing cached entities to be refreshed without requiring multiple requests or transferring unnecessary data....
304 Not Modified
If the client has performed a conditional GET request and access is allowed, but the document has not been modified since the date and time specified in the If-Modified-Since field, the server must respond with this status code and not send an Entity-Body to the client. Header fields contained in the response should only include information which is relevant to cache managers or which may have changed independently of the entity's Last-Modified date. Examples of relevant header fields include: Date, Server, and Expires. A cache should update its cached entity to reflect any new field values given in the 304 response....
10.7 ExpiresThe Expires entity-header field gives the date/time after which the entity should be considered stale. This allows information providers to suggest the volatility of the resource, or a date after which the information may no longer be valid. Applications must not cache this entity beyond the date given. The presence of an Expires field does not imply that the original resource will change or cease to exist at, before, or after that time. However, information providers that know or even suspect that a resource will change by a certain date should include an Expires header with that date. The format is an absolute date and time as defined by HTTP-date in Section 3.3.
Expires = "Expires" ":" HTTP-dateAn example of its use isExpires: Thu, 01 Dec 1994 16:00:00 GMTIf the date given is equal to or earlier than the value of the Date header, the recipient must not cache the enclosed entity. If a resource is dynamic by nature, as is the case with many data- producing processes, entities from that resource should be given an appropriate Expires value which reflects that dynamism.
The Expires field cannot be used to force a user agent to refresh its display or reload a resource; its semantics apply only to caching mechanisms, and such mechanisms need only check a resource's expiration status when a new request for that resource is initiated.
User agents often have history mechanisms, such as "Back" buttons and history lists, which can be used to redisplay an entity retrieved earlier in a session. By default, the Expires field does not apply to history mechanisms. If the entity is still in storage, a history mechanism should display it even if the entity has expired, unless the user has specifically configured the agent to refresh expired history documents.
Note: Applications are encouraged to be tolerant of bad or misinformed implementations of the Expires header. A value of zero (0) or an invalid date format should be considered equivalent to an "expires immediately." Although these values are not legitimate for HTTP/1.0, a robust implementation is always desirable.The If-Modified-Since request-header field is used with the GET method to make it conditional: if the requested resource has not been modified since the time specified in this field, a copy of the resource will not be returned from the server; instead, a 304 (not modified) response will be returned without any Entity-Body.
If-Modified-Since = "If-Modified-Since" ":" HTTP-dateAn example of the field is:If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMTA conditional GET method requests that the identified resource be transferred only if it has been modified since the date given by the If-Modified-Since header. The algorithm for determining this includes the following cases:
The purpose of this feature is to allow efficient updates of cached information with a minimum amount of transaction overhead.
- If the request would normally result in anything other than a 200 (ok) status, or if the passed If-Modified-Since date is invalid, the response is exactly the same as for a normal GET. A date which is later than the server's current time is invalid.
- If the resource has been modified since the If-Modified-Since date, the response is exactly the same as for a normal GET.
- If the resource has not been modified since a valid If-Modified-Since date, the server shall return a 304 (not modified) response.
The Last-Modified entity-header field indicates the date and time at which the sender believes the resource was last modified. The exact semantics of this field are defined in terms of how the recipient should interpret it: if the recipient has a copy of this resource which is older than the date given by the Last-Modified field, that copy should be considered stale.
Last-Modified = "Last-Modified" ":" HTTP-dateAn example of its use isLast-Modified: Tue, 15 Nov 1994 12:45:26 GMTThe exact meaning of this header field depends on the implementation of the sender and the nature of the original resource. For files, it may be just the file system last-modified time. For entities with dynamically included parts, it may be the most recent of the set of last-modify times for its component parts. For database gateways, it may be the last-update timestamp of the record. For virtual objects, it may be the last time the internal state changed.
An origin server must not send a Last-Modified date which is later than the server's time of message origination. In such cases, where the resource's last modification would indicate some time in the future, the server must replace that date with the message origination date.
The Pragma general-header field is used to include implementation- specific directives that may apply to any recipient along the request/response chain. All pragma directives specify optional behavior from the viewpoint of the protocol; however, some systems may require that behavior be consistent with the directives.
Pragma = "Pragma" ":" 1#pragma-directive pragma-directive = "no-cache" | extension-pragma extension-pragma = token [ "=" word ] When the "no-cache" directive is present in a request message, an application should forward the request toward the origin server even if it has a cached copy of what is being requested. This allows a client to insist upon receiving an authoritative response to its request. It also allows a client to refresh a cached copy which is known to be corrupted or stale.
Pragma directives must be passed through by a proxy or gateway application, regardless of their significance to that application, since the directives may be applicable to all recipients along the request/response chain. It is not possible to specify a pragma for a specific recipient; however, any pragma directive not relevant to a recipient should be ignored by that recipient.
- RCF 822: Standard for the Format of Arpa Internet Text Messages
date-time = [ day "," ] date time ; dd mm yy ; hh:mm:ss zzz day = "Mon" / "Tue" / "Wed" / "Thu" / "Fri" / "Sat" / "Sun" date = 1*2DIGIT month 2DIGIT ; day month year ; e.g. 20 Jun 82 month = "Jan" / "Feb" / "Mar" / "Apr" / "May" / "Jun" / "Jul" / "Aug" / "Sep" / "Oct" / "Nov" / "Dec" time = hour zone ; ANSI and Military hour = 2DIGIT ":" 2DIGIT [":" 2DIGIT] ; 00:00:00 - 23:59:59 zone = "UT" / "GMT" ; Universal Time ; North American : UT / "EST" / "EDT" ; Eastern: - 5/ - 4 / "CST" / "CDT" ; Central: - 6/ - 5 / "MST" / "MDT" ; Mountain: - 7/ - 6 / "PST" / "PDT" ; Pacific: - 8/ - 7 / 1ALPHA ; Military: Z = UT; ; A:-1; (J not used) ; M:-12; N:+1; Y:+12 / ( ("+" / "-") 4DIGIT ) ; Local differential - RFC 850: Standard for Interchange of USENET Messages
2.1.4 Date The Date line (formerly "Posted") is the date, in a format that must be acceptable both to the ARPANET and to the getdate routine, that the article was originally posted to the network. This date remains unchanged as the article is propagated throughout the network. One format that is acceptable to both isWeekday, DD-Mon-YY HH:MM:SS TIMEZONESeveral examples of valid dates appear in the sample article above. Note in particular that ctime format:Wdy Mon DD HH:MM:SS YYYYis not acceptable because it is not a valid ARPANET date. However, since older software still generates this format, news implementations are encouraged to accept this format and translate it into an acceptable format.- RFC 1036: Standard for Interchange of USENET Messages (obsoletes: RFC 850)
2.1.2. DateThe "Date" line (formerly "Posted") is the date that the message was originally posted to the network. Its format must be acceptable both in RFC-822 and to the getdate(3) routine that is provided with the Usenet software. This date remains unchanged as the message is propagated throughout the network. One format that is acceptable to both is:
Wdy, DD Mon YY HH:MM:SS TIMEZONE Several examples of valid dates appear in the sample message above. Note in particular that ctime(3) format:
Wdy Mon DD HH:MM:SS YYYY is not acceptable because it is not a valid RFC-822 date. However, since older software still generates this format, news implementations are encouraged to accept this format and translate it into an acceptable format.
There is no hope of having a complete list of timezones. Universal Time (GMT), the North American timezones (PST, PDT, MST, MDT, CST, CDT, EST, EDT) and the +/-hhmm offset specifed in RFC-822 should be supported. It is recommended that times in message headers be transmitted in GMT and displayed in the local time zone.
- RFC 1123: Requirements for Internet Hosts -- Application and Support (obsoletes: RFC 850)
5.2.14 RFC-822 Date and Time Specification: RFC-822 Section 5The syntax for the date is hereby changed to:
date = 1*2DIGIT month 2*4DIGITAll mail software SHOULD use 4-digit years in dates, to ease the transition to the next century.There is a strong trend towards the use of numeric timezone indicators, and implementations SHOULD use numeric timezones instead of timezone names. However, all implementations MUST accept either notation. If timezone names are used, they MUST be exactly as defined in RFC-822.
The military time zones are specified incorrectly in RFC-822: they count the wrong way from UT (the signs are reversed). As a result, military time zones in RFC-822 headers carry no information.
Finally, note that there is a typo in the definition of "zone" in the syntax summary of appendix D; the correct definition occurs in Section 3 of RFC-822.