HTTP Techniques: Difference between revisions

From Elvanör's Technical Wiki
Jump to navigation Jump to search
No edit summary
 
mNo edit summary
 
(9 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Cache-control ==
= HTTP Connections =
 
* With HTTP 1.1 persistent connections are used, which means that the same TCP connection is reused over and over again for requests. This is fairly different from pipelining which is also possible but '''disabled by default on most browsers.''' Pipelining means writing several requests to a persistent connection without waiting for the replies. Note that the RFC states that the server must reply in the same order as the requests.
 
* Most browsers limit the number of connections / persistent connections that are possible to a given server. Firefox has a default connection limit of 15 for instance. Note that servers usually don't limit this.
 
* To artifically slow down a request by a web server, the simplest trick is to simulate dynamic content (for instance a PHP script) and make the programming language used to generate the dynamic content sleep. For instance to force the web server to wait 3 seconds before serving a CSS file:
<pre>
<?php
header("Content-Type: text/css");
sleep(3);
?>
 
body { background-color: red; }
</pre>
 
= Cache-control =
 
== Totally disabling cache ==
 
* A radical solution to force a browser refetch in all cases is to generate a unique random key as a GET parameter in an Ajax call.
 
* Else, the normal solution is to use (in the HTTP headers) the Cache-Control directive. The following arguments should be enough to disable caching.
 
Cache-control "no-cache, no-store, max-age=0, must-revalidate"
 
* I don't know exactly what the "private" value in Cache-control does.
 
* Technically, an "Expires" header with any other value than a date is not valid. You can set the date in the past though, to force a refetch:
 
Expires "Fri, 01 Jan 1990 00:00:00 GMT"
 
* Note that Pragma: "No-cache" is not a valid header to disable caching.
 
== Basic HTTP caching concepts ==
 
* There are basically two concepts: strong caching (no revalidation, which means the client does not send an HTTP request at all) and weak caching (the browser sends a request to the server, but if the ressource did not change, the server just replies with a 304 and does not resend the content).
 
* Setting up weak caching is easy; you just have to make sure the Cache-Control header is not sent to a value that would not allow the browser to store the content.
 
* Setting up strong caching is a matter of sending an Expires header and a max-age component in Cache-Control. Until the content expires, the browser won't send requests for it at all, bypassing the invalidation mechanism (so use with caution). Note that by default, if you don't send an Expires header, the browser seems to use strong caching for some time. It is thus better to always specify the expiration time to get control over this.
 
== Reloading an image with the same URL ==
 
* If an image has changed on the server but has the same URL, and you wish to refresh it on the client (browser), one trick is to add an unique key to the HTTP request. For example, issue a GET /image/sample-image.png?random=156831357. This works out very well.
 
== Browser information ==
 
* For all browsers it seems that a reload (pressing F5 or clicking the reload button) disables strong caching. So a reload is not the same as a normal page load (just repressing Enter on the URL, which will then use strong caching if possible).
 
=== Firefox ===


* Under Firefox, when you press the back button, Firefox does not refetch the page from the server and it does not even retrieve it from cache. It retrieves it from RAM. This can cause problems when you want to refresh the pages (typically, if some information has been added to the HTTP session for example).
* Under Firefox, when you press the back button, Firefox does not refetch the page from the server and it does not even retrieve it from cache. It retrieves it from RAM. This can cause problems when you want to refresh the pages (typically, if some information has been added to the HTTP session for example).


* Even if the JavaScript code makes an AJAX call this call too will be cached.
* If the JavaScript code makes an AJAX call this call too will be cached.
 
* Firefox 3 cache implementation seems to be broken. Even with the above header values, Firefox seems to randomly use the version of the document in the RAM cache, or refetch.
 
= Ajax File Upload =
 
* The idea here is to use a hidden iframe that the upload form will target. To get back JS control once the upload is finished, several techniques are possible. The following one registers a "load" event listener on the upload frame:
 
$("uploadFrame").observe("load", this.myCallBack.bindAsEventListener(this));
 
* To access content inside an iframe, the following JS code can be used:
 
data = frames["uploadFrame"].document.getElementById("uploadFrameResultData").innerHTML.evalJSON();
 
= Session Mechanism =
 
* All server languages / frameworks tend to use the same technique to create and pass the session. The first time a session must be created, all the URL in the page will get rewritten with the session id as a parameter. At the same time a cookie will be sent to the browser. If on the next request, the cookie is not present (but the session id parameter is passed) then the server side considers that the browser does not support cookies, and will stop sending cookies. It will keep rewriting URLs. On the contrary, if the cookie is sent back, URL rewriting stops and the session is propagated via the cookie.
 
* Note that URL rewriting happens after all the other work is done.
 
= Benchmarking and Troubleshooting =
 
* A good tool is httperf. It allows you to create several HTTP connections, and set the delay between simultaneous (concurrent) calls.


* The solution is to use (in the HTTP headers) the Cache-Control directive with argument: no-store. Not that no-cache will *not* be enough!
* Example:


* Another solution would be (untested) to generate a unique key in an Ajax call, thus forcing the browser to actually make the call.
httperf --server www.test.net --uri="http://www.test.net/demos/setupGeneric?name=hello" --num-conns=100 --rate=5

Latest revision as of 23:39, 17 February 2012

HTTP Connections

  • With HTTP 1.1 persistent connections are used, which means that the same TCP connection is reused over and over again for requests. This is fairly different from pipelining which is also possible but disabled by default on most browsers. Pipelining means writing several requests to a persistent connection without waiting for the replies. Note that the RFC states that the server must reply in the same order as the requests.
  • Most browsers limit the number of connections / persistent connections that are possible to a given server. Firefox has a default connection limit of 15 for instance. Note that servers usually don't limit this.
  • To artifically slow down a request by a web server, the simplest trick is to simulate dynamic content (for instance a PHP script) and make the programming language used to generate the dynamic content sleep. For instance to force the web server to wait 3 seconds before serving a CSS file:
<?php 
	header("Content-Type: text/css");
	sleep(3);
?>

body { background-color: red; }

Cache-control

Totally disabling cache

  • A radical solution to force a browser refetch in all cases is to generate a unique random key as a GET parameter in an Ajax call.
  • Else, the normal solution is to use (in the HTTP headers) the Cache-Control directive. The following arguments should be enough to disable caching.
Cache-control "no-cache, no-store, max-age=0, must-revalidate"
  • I don't know exactly what the "private" value in Cache-control does.
  • Technically, an "Expires" header with any other value than a date is not valid. You can set the date in the past though, to force a refetch:
Expires "Fri, 01 Jan 1990 00:00:00 GMT"
  • Note that Pragma: "No-cache" is not a valid header to disable caching.

Basic HTTP caching concepts

  • There are basically two concepts: strong caching (no revalidation, which means the client does not send an HTTP request at all) and weak caching (the browser sends a request to the server, but if the ressource did not change, the server just replies with a 304 and does not resend the content).
  • Setting up weak caching is easy; you just have to make sure the Cache-Control header is not sent to a value that would not allow the browser to store the content.
  • Setting up strong caching is a matter of sending an Expires header and a max-age component in Cache-Control. Until the content expires, the browser won't send requests for it at all, bypassing the invalidation mechanism (so use with caution). Note that by default, if you don't send an Expires header, the browser seems to use strong caching for some time. It is thus better to always specify the expiration time to get control over this.

Reloading an image with the same URL

  • If an image has changed on the server but has the same URL, and you wish to refresh it on the client (browser), one trick is to add an unique key to the HTTP request. For example, issue a GET /image/sample-image.png?random=156831357. This works out very well.

Browser information

  • For all browsers it seems that a reload (pressing F5 or clicking the reload button) disables strong caching. So a reload is not the same as a normal page load (just repressing Enter on the URL, which will then use strong caching if possible).

Firefox

  • Under Firefox, when you press the back button, Firefox does not refetch the page from the server and it does not even retrieve it from cache. It retrieves it from RAM. This can cause problems when you want to refresh the pages (typically, if some information has been added to the HTTP session for example).
  • If the JavaScript code makes an AJAX call this call too will be cached.
  • Firefox 3 cache implementation seems to be broken. Even with the above header values, Firefox seems to randomly use the version of the document in the RAM cache, or refetch.

Ajax File Upload

  • The idea here is to use a hidden iframe that the upload form will target. To get back JS control once the upload is finished, several techniques are possible. The following one registers a "load" event listener on the upload frame:
$("uploadFrame").observe("load", this.myCallBack.bindAsEventListener(this));
  • To access content inside an iframe, the following JS code can be used:
data = frames["uploadFrame"].document.getElementById("uploadFrameResultData").innerHTML.evalJSON();

Session Mechanism

  • All server languages / frameworks tend to use the same technique to create and pass the session. The first time a session must be created, all the URL in the page will get rewritten with the session id as a parameter. At the same time a cookie will be sent to the browser. If on the next request, the cookie is not present (but the session id parameter is passed) then the server side considers that the browser does not support cookies, and will stop sending cookies. It will keep rewriting URLs. On the contrary, if the cookie is sent back, URL rewriting stops and the session is propagated via the cookie.
  • Note that URL rewriting happens after all the other work is done.

Benchmarking and Troubleshooting

  • A good tool is httperf. It allows you to create several HTTP connections, and set the delay between simultaneous (concurrent) calls.
  • Example:
httperf --server www.test.net --uri="http://www.test.net/demos/setupGeneric?name=hello" --num-conns=100 --rate=5