From Seo Wiki - Search Engine Optimization and Programming Languages
When a web page is requested via the Hypertext Transfer Protocol, the server locates a file in its file system based on the requested URL. This file may be a regular file or a program. In the second case, the server may (depending on its configuration) run the program, sending its output as the required page. The query string is a part of the URL which is passed to the program. Its use permits data to be passed from the HTTP client (often a web browser) to the program which generates the web page.
A typical URL containing a query string is as follows:
When a server receives a request for such a page, it runs a program (if configured to do so), passing the
query_string unchanged to the program. The question mark is used as a separator and is not part of the query string.
A link in a web page may have a URL that contains a query string. However, the main use of query strings is to contain the content of an HTML form, also known as web form. In particular, when a form containing the fields
field3 is submitted, the content of the fields is encoded as a query string as follows:
- The query string is composed of a series of field-value pairs.
- The field-value pairs are each separated by an equal sign.
- The series of pairs is separated by the ampersand, '&' or semicolon, ';'.
For each field of the form, the query string contains a pair field=value. Web forms may include fields that are not visible to the user; these fields are included in the query string when the form is submitted.
This 'name then equal sign then value then ampersand or semicolon' convention is a W3C recommendation. W3C recommends that all web servers support semicolon separators in the place of ampersand separators.
Technically, the form content is only encoded as a query string when the form submission method is GET. The same encoding is used by default when the submission method is POST, but the result is not sent as a query string, that is, is not added to the action URL of the form. Rather, the string is sent as the body of the request.
Some characters cannot be part of a URL (for example, the space) and some other characters have a special meaning in a URL: for example, the character
# can be used to further specify a subsection (or fragment) of a document; the character
= is used to separate a name from a value. A query string may need to be converted to satisfy these constraints. This can be done using a schema known as URL encoding.
In particular, encoding the query string uses the following rules:
- Letters (A-Z and a-z), numbers (0-9) and the characters '.', '-', '~' and '_' are left as-is
- SPACE is encoded as '+'
- All other characters are encoded as %FF hex representation with any non-ASCII characters first encoded as UTF-8 (or other specified encoding)
The encoding of SPACE as '+' and the selection of "as-is" characters distinguishes this encoding from RFC 1738.
As defined in RFC 1738, a URL of scheme
http can contain a searchpart following the rest of the URL and separated from it by a
? character. RFC 3986 specifies that the query component of an URI is the part between the
? and the end of the URI or the character
#. The term query string is of common usage for referring to this part for the case of HTTP URLs.
<form action="cgi-bin/test.cgi" method="get"> <input type="text" name="first"> <input type="text" name="second"> <input type="submit"> </form>
and the user inserts the strings “this is a field” and “was it clear (already)?” in the two text fields and presses the submit button, the program
test.cgi will receive the following query string:
A program receiving a query string can ignore part or all of it. If the requested URL corresponds to a file and not to a program, the whole query string is ignored. However, regardless of whether the query string is used or not, the whole URL including it is stored in the server log files.
These facts allow query strings to be used to track users in a manner similar to that provided by HTTP cookies. For this to work, every time the user downloads a page, a unique identifier is chosen and added as a query string to the URLs of all links the page contains. As soon as the user follows one of these links, the corresponding URL is requested to the server. This way, the download of this page is linked with the previous one.
For example, when a web page containing the following is requested:
<a href="frank.html">see my page!</a> <a href="ciccio.html">mine is better</a>
a unique string, such as
sdfsd23423 is chosen, and the page is modified as follows:
<a href="frank.html?sdfsd23423">see my page!</a> <a href="ciccio.html?sdfsd23423">mine is better</a>
The addition of the query string does not change the way the page is shown to the user. When the user follows, for example, the first link, the browser requests the page
frank.html?sdfsd23423 to the server, which ignores what follows
? and sends the page
frank.html as expected, adding the query string to its links as well.
This way, any subsequent page request from this user will carry the same query string
sdfsd23423, making it possible to establish that all these pages have been viewed by the same user. Query strings are often used in association with web beacons.
The main differences between query strings used for tracking and HTTP cookies are that:
- Query strings form part of the URL, and are therefore included if the user saves or sends the URL to another user; cookies can be maintained across browsing sessions, but are not saved or sent with the URL.
- If the user arrives at the same web server by two (or more) independent paths, it will be assigned two different query strings, while the stored cookies are the same.
Flexibility vs. security
A URL query string allows for flexibility in retrieving data from a web server and possibly from the database used to populate pages for that web server. A read only data store, such as a weather mapping service, is one example where URL query strings can be used with great flexibility.
In some circumstances, a URL query string may expose security issues because it can be edited by a user to retrieve data that they do not have access to. In particular, a URL query string containing a username and password could be used with a dictionary attack to guess at valid login credentials to a particular web site. Most secure webservers use at least MD5 hash checking, or more powerful encoding methods to validate all given strings.
According to the HTTP specification:
Servers should be cautious about depending on URI (which includes URLs) lengths above 255 bytes, because some older client or proxy implementations may not properly support these lengths.
The specification does not dictate a minimum or maximum URL length, but implementation varies by browser and version. For example, Internet Explorer does not support URLs that have more than 2083 characters. There is no limit on the number of parameters in a URL; only the raw (as opposed to URL encoded) character length of the URL matters. Web servers may also impose limits on the length of the query string, depending on how the URL and query string is stored.
The common workaround for these problems is to use POST instead of GET and store the parameters in the request body. The length limits on request bodies are typically much higher than those on URL length. For example, the limit on POST size, by default, is 2 MB on IIS 4.0 and 128 KB on IIS 5.0.
- ↑ http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html
- ↑ http://www.w3.org/MarkUp/html3/HTMLandSGML.html
- ↑ http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.2
- ↑ http://support.microsoft.com/kb/208427
- ↑ http://blogs.msdn.com/ie/archive/2008/03/11/address-bar-improvements-in-internet-explorer-8-beta-1.aspx#8161782
- ↑ http://classicasp.aspfaq.com/forms/what-is-the-limit-on-querystring/get/url-parameters.html