Describing HTTP Cookie Storage On a HTTP Client#
Cookie data storage (a.k.a “cookie jar” or “persistent cookie data store”) resides on a HTTP client (eg. browser). A single cookie is represented by an entry in the data store. Cookie’s components are spread out into properties in an entry (“attributes”). The data store can be easily visualised as a row-column matrix with entries being the rows and attributes being the columns (see Table A). As with any storage and data management mechanism, it will impose rules and restriction onto the data being stored (eg. data type, length, charset, etc.). By analysing the data store and its rules, and finding out more about mechanisms used, we should be able to tell more for how cookies actually work.
As would any dynamic data store, one would be interested in performing actions onto the store, notably reading and writing operations as well as other common actions, such as replacing or deleting entries. This should reveal more for what is involved in creating new cookies, deleting old ones, etc.
Communication and data transfer is another important issue, since cookie data is stored on the HTTP client side, whereas responses are received from an external HTTP server.
Making Requests to The Data Store on a HTTP Client and Vice Versa#
HTTP client request and HTTP server response is the communication mechanism where cookie data is sent from the client to the server and vice versa. Essentially, it is the mechanism for how cookie data is transferred back and forth. A HTTP client sends cookie data from its data store to the server, and the server then follows by sending back a response message (see Diagram A). To affect data on the cookie data store would mean to utilise the HTTP server response (see Diagram B).
HTTP transfer protocol has header fields designed for both HTTP request and response messages to handle cookie data. A HTTP client will use “Cookie” header field to provide cookies from its data store, whereas HTTP server will utilise the “Set-Cookie” header field to construct its request to the cookie data store which resides on the HTTP client (see Diagram C).
Due to the nature and limitations of these HTTP header fields the communication itself is somewhat quirky. Notably, large chunks of data is prohibited by HTTP headers communication mechanism, meaning that data sent by cookies should be fairly limited, to say the least. Similarly, bandwidth limitations should also be considered as a constraint factor.
Sending Requests to the User Agent and Action Delegation#
Structure of The “Set-Cookie” Header Field#
The “Set-Cookie“ header field value consists of a group of directives, name-value pairs joined with an equal sign (“=”) control symbol, eg. name=value
.
Set-Cookie: [directives; ...]
Directives are separated from each other with another control symbol – a semicolon (“;”). The first directive is a special one in a sense where the name part can contain any string instead of one of the predefined attributes that are used in all directives after the first one. It is also special in that its value will not reserve leading and trailing quotation marks as marking symbols. They will be considered as characters belonging to the value, in contrast to other directives where leading and trailing quotes will be stripped off. This special directive combines the cookie name and the cookie value and must be always put first.
Set-Cookie: `cookie name`=`cookie value`; [other directives; ...]
All other directives following the first-special directive can be listed in no particular order. They must have one of the predefined attributes in the name part and a chosen value in the value part. The predefined attributes are the following.
Domain
Path
Expires
Max-Age
HttpOnly
Secure
SameSite
Mind that new attributes can be added in the future by authoritative body).
The entire group of directives (including the first one) can be seen as a query string in the form of semicolon separated key=value
pairs, where the first pair is an abstract one, and all other pairs using one of the predefined keys. And in fact the “Set-Cookie” header field is a query mechanism which enables us to perform actions onto the data store that resides on the HTTP client.
The equal sign (“=”) control symbol is not compulsory. Implications will be explained down below.
There is no implicit way to identify entries from the data store making targeting specific entries somewhat peculiar. There is no cookie store ID involved here. We are essentially left with methods that include identifying unique entry and replacing it with something else, or using directives in a way that would enforce a certain action.
Structure and Rules of The Cookie Storage – Knowns and Unknowns#
Receiving and storing directive data (name-value pairs) into a row-column matrix is a rather straightforward task. Columns can be designated for the name part. Values will be united into an entry record.
We will cover 10 data store attributes. 7 of them will be exact matches of the “Set-Cookie” attribute names from the HTTP request header field (“Direct Properties”). The other 3 will be calculations made by the data store owner based upon the data received (“Derived Properties”) and something that was not explicitly stated in the request query. They can also be seen as helper properties acting to assist in managing and validating entries.
Direct Properties#
- Name (string)
- The chosen name for the cookie.
- Value (string)
- The chosen value for the cookie.
- Domain (string)
- Host name where the cookie request originated, optionally preceded with a dot to allow access on host names above the declared domain name (domain scoping).
-
Default value: current site’s full host name (fixed-host, no leading dot).
- Path (string)
- Directory path name from the request URL. Strictly refers to directory path. For example, with request containing path
/dir/index.html
theindex.html
segment will be seen as directory name. -
Default value: directory path name from the origin’s URI. If path does not end with a forward slash, file name segment is excluded. Then, a single trailing forward slash is excluded, except for when root path.
- Secure (boolean)
- Defines whether this cookie should be sent back only when secure connection is used.
-
Default value:
false
- HttpOnly (boolean)
- Defines whether this cookie should be sent to the server only.
-
Default value:
false
- SameSite (string)
- Defines whether this cookie should be sent with cross-origin request. Choices limited to “None”, “Strict”, or “Lax”.
-
Default value: “None” or “Lax”.
Derived Properties#
- Size (integer)
- The size of the cookie.
- Date Created (date-time)
- Time when the entry was created.
- Last Accessed (date-time)
- Time when the cookie name-value pair was last sent in response headers to the server.
Gateway and Validation#
Once HTTP client receives the cookie request query, it will process, validate, and run the query. However, validation mechanism is not very straightforward and the rules can be obscure.
Generally speaking, the final two outcomes can result in the request being accepted or declined. But, there can be multiple routes leading to these outcomes. Let’s consider the following.
Validation problems do not necessarily lead to rejection. Data can be processed before it is stored. To know the final outcome means to know validation and preprocessing procedures.
What Is a Valid Domain Name?#
A valid domain name is a domain (or a host) ending with an effective top-level domain (“eTLD”), eg. domain.co.uk
where the effective TLD is .co.uk
. Such domain name must, of course, contain at least one label above the eTLD to form a valid registrable part. It cannot be a root zone domain name, but rather a domain name that can actually be resolved on the Internet network (eg. working-domain.com
).
.co.uk
– effective top-level domain.domain.co.uk
– extra label above the eTLD to form a registrable/base domain.host.domain.co.uk
– host name.
If request’s domain name ends with a dot symbol (“.”), the dot will not block the validation.
Also, a domain will be considered valid if it is available on your local network (eg. an alias host), or if it is a valid IP address.
Data Processing and Normalisation Before Entry#
When a domain name is provided, and it is an Internet authority domain name, and it passes the domain validation rules, a dot will be prepended, unless it is already there. This can be described as domain scoping. Scoped domains mean that cookie data will be returned not just for the declared domain, but also on hosts above it.
domain.com
changed to.domain.com
This makes it impossible to set a fixed domain, unless “Domain” attribute is absent, which would then default to the domain name from the request made.
When other than Internet authority host name is used, it will not be scoped, meaning that no leading dot will be added. If provided host name contains a leading dot, it will be trimmed off.
127.0.0.1
left unchangedlocalhost
left unchanged.127.0.0.1
changed to127.0.0.1
.localhost
changed tolocalhost
If multiple leading dots are used, such request would be declined no matter what host type was provided.
Additionally, if domain name contains a trailing dot symbol (“.”), this symbol will be preserved.
.domain.com.
left unchangeddomain.com.
changed to.domain.com.
The other property that might potentially be prone to any type of processing is the path. One might wonder whether path will be normalised before it is stored. It will not be normalised. Paths such as //dir
, /dir/foo/..
, or /dir/.
will not be amended.
//dir
left unchanged/dir/.
left unchanged
What is more, even though path strictly refers to directory path, trailing forward slash will not be added.
- left unchanged
/dir
In the end, no processing will be done to the path property whatsoever. However, the path can still be invalidated. For instance, paths that do not start with a forward slash will be considered invalid, though the cookie request would not be declined.
As a side note, when path is not provided the default value will be current site’s directory path excluding one trailing forward slash, except when directory path is root.
https://www.domain.com/dir/
extracts path/dir
https://www.domain.com/
extracts path/
https://www.domain.com/dir//
extracts path/dir/
https://www.domain.com/dir/file.txt
extracts path/dir
It can also be mentioned that particular processing will be done to the values given by the “Max-Age” and “Expires” attributes. When both attributes are provided, “Max-Age” takes precedence, while “Expires” is ignored. “Max-Age” value is summed up with the current timestamp and stored. When only “Expires” attribute is given, its date-time string is converted to a unix timestamp and stored.
Matching Against Values of Data Elements#
While interacting with the data store and its data elements, a common operation would be to find a matching value. Matching value in this data store is of no particular difference compared to any other similar data store. However, this issue arises and can be seen as a follow up question to the domain scoping and normalisation problem. Let’s consider the following comparison cases.
- Does
domain.com
match.domain.com
? - Does path
/dir
match/dir/
?
Short answer to all above questions is “no”, when writing to the data store is concerned. This should not come as a surprise at all, because they indeed literally do not match. However, intuitively one might think that .domain.com
would cancel domain.com
and /tmp/
would cancel /tmp
, but in fact it will not, because in the first case domain scoping will prepend a dot, creating a new distinctive value, and in the second case no normalisation will be performed to the path, which in the end leaves us with values that do not match, and that do not cancel each other out, and eventually result in distinctive entries in the data store.
What Determines a Unique HTTP Cookie Entry In The Data Store?#
3 properties – cookie name, domain, and path – participate in nominating a unique entry. This is especially important when one wants to replace or delete a cookie entry. It might not be obvious, but all 3 parameters must strictly match their counterparts in a comparison procedure to declare a matching entry. For instance, as we have just learnt domain name domain.com
does not match .domain.com
, meaning that if other 2 parameters (cookie name and path) are identical to their counterparts, this would yield two entries instead of just one, resulting in the duplicate cookie name issue which will be described below. Similarly, path /dir
does not literally match path /dir/
, though they certainly point to the same directory, and will also result in 2 separate entries.
Unique entry = exact match of cookie name, domain, and path.
foo|domain.com|/dir = foo|domain.com|/dir
exact matchfoo|domain.com|/dir/ != foo|domain.com|/dir
path does not matchfoo|.domain.com|/dir != foo|domain.com|/dir
domain does not match"foo"|domain.com|/dir != foo|domain.com|/dir
cookie name does not match
How To Instantiate Actions Upon the Cookie Data Store on a HTTP Client?#
There can be 3 actions called upon the data store via the “Set-Cookie” response header field, and those actions rely heavily on the 3 parameters that establish a unique entry described in the section above. These actions also require us to utilise other properties (such as “Max-Age” or “Expires”).
No direct success or failure result is possible. If an action is executed as anticipated by the end-user, it should be reflected by the “Cookie” request header field result in the next HTTP request headers sent by the user agent, where your cookie will be available, or it will no longer be available, or it will be altered.
How Is HTTP Cookie Size Calculated?#
Cookie size is a derived property, which is generated by the data store owner based on other data that was provided. Cookie size is calculated by taking string length of cookie name and cookie value and adding them up together. A pseudo-code formula would look something like the following.
sum(string_length(cookie_name), string_length(cookie_value))
The implication of this formula is that 2 properties are involved in measuring the size. Similarly, 2 properties share the limitations and restrictions imposed to the size property, the most important being the maximum size limitation. The cookie name and cookie value maximum length is therefore dynamic, but cannot exceed the max size cap, and must trade size with the other property.
Let’s assume that the overall size is capped at 100. Theoretically, if we consider cookie name first, it means that its length can be in the range of 0-100. The max size of the cookie value is now dependent upon the resulting length of the cookie name. If the latter was set to 20, cookie’s value would then be capped at 80. If, say, it is set to 70, then value would be capped at 30 and so on. All of this can be done the other way around by setting the size of cookie value first and then trading it with cookie name.
If cookie size exceeds the maximum length set by the data store owner, such request would be declined. No trimming operations would normally be done.
The Cookie Size Cap Problem#
The general agreement is that cookies should be limited to 4KB (4096 bytes). That would imply that cookie size (the way we defined it above) should not exceed 4096 bytes. This is only partially true, because while some implementations might set the size cap exclusively to 4096 bytes, others might use this threshold to limit the entire “Set-Cookie” header field’s value.
Set-Cookie: `header field value`
where `header field value`
cannot exceed 4096 bytes.
If the above is met, the pseudo formula would be this.
max_cap = 4096 - (string_length(`header field value`) - `cookie size`)
In such implementations cookie size is variable and not fixed to 4KB. When it is necessary to use many attributes, or when some of the attributes are long, the max size cap can decrease extensively, eg.
Set-Cookie: foo=bar; Domain=my-domain-name.com; Path=/my/path; Secure; HttpOnly; Expires=Sat, 01 Jan 2022 08:00:00 GMT; SameSite=Strict
– max size cap would be .
If the domain name and path is set to something even longer, the cap will decrease further, though it will probably still be relatively large. However, the rule of thumb is that large data should not be stored in cookies, and this problem further proves it true.
What Is The HTTP Cookie Size Cap In Common Web Browsers?#
The safest way would be to limit the entire “Set-Cookie” header field's value to 4096 bytes.
The Problem of Ambiguous Cookie Name-Value Directive#
Directives consist of name and value parts joined with an equal (“=”) control symbol. However, the equal sign control symbol is not compulsory in any of the directives, including the cookie name-value pair. For instance, the “domain” and “path” attributes have their default values based on the request site’s URL components and therefore can be automatically filled in. Similarly, sister attributes “Expires” and “Max-Age” default to “on session end” when their value is not provided. Other attributes like “HttpOnly”, “Secure”, and “SameSite” do not require value part at all, since they are of a boolean type, where the existence of such attribute implies true
, and absence means false
.
However, the cookie name and value pair suffers from an ambiguity problem when the equal sign control symbol is absent. Since the symbol is not compulsory, technically the cookie request should be accepted. For example, consider special cookie directive “foo” (mind that the equal control symbol is absent and the value is empty). Is it a cookie name or a cookie value? Logically, such cookie request should be declined, but let’s take a look at how current web browsers are dealing with this problem.
The below 2 examples do not suffer from the ambiguity problem, but rather are here to test out name-less and value-less situations.
Using Special Characters in The Cookie Name-Value Directive#
Do common web browsers support US-ASCII charset only in the cookie name-value directive? Let's find out.
Let’s see how common web browsers cope with special characters inside the cookie name-value directive.
Thank You for reading and coming that far! We really appreciate your interest. Please continue to Part II.