curl/docs/HTTP-COOKIES.md

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

167 lines
6.4 KiB
Markdown
Raw Normal View History

# HTTP Cookies
## Cookie overview
Cookies are `name=contents` pairs that an HTTP server tells the client to
hold and then the client sends back those to the server on subsequent
requests to the same domains and paths for which the cookies were set.
Cookies are either "session cookies" which typically are forgotten when the
session is over which is often translated to equal when browser quits, or
the cookies are not session cookies they have expiration dates after which
the client will throw them away.
Cookies are set to the client with the Set-Cookie: header and are sent to
servers with the Cookie: header.
For a long time, the only spec explaining how to use cookies was the
2020-11-04 21:02:01 +08:00
original [Netscape spec from 1994](https://curl.se/rfc/cookie_spec.html).
In 2011, [RFC 6265](https://www.ietf.org/rfc/rfc6265.txt) was finally
published and details how cookies work within HTTP. In 2016, an update which
added support for prefixes was
[proposed](https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-cookie-prefixes-00),
and in 2017, another update was
[drafted](https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-cookie-alone-01)
to deprecate modification of 'secure' cookies from non-secure origins. Both
2019-12-06 19:56:14 +08:00
of these drafts have been incorporated into a proposal to
[replace](https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-rfc6265bis-11)
RFC 6265. Cookie prefixes and secure cookie modification protection has been
implemented by curl.
curl considers `http://localhost` to be a *secure context*, meaning that it
will allow and use cookies marked with the `secure` keyword even when done
over plain HTTP for this host. curl does this to match how popular browsers
work with secure cookies.
## Super cookies
A single cookie can be set for a domain that matches multiple hosts. Like if
set for `example.com` it gets sent to both `aa.example.com` as well as
`bb.example.com`.
A challenge with this concept is that there are certain domains for which
cookies should not be allowed at all, because they are *Public
Suffixes*. Similarly, a client never accepts cookies set directly for the
top-level domain like for example `.com`. Cookies set for *too broad*
domains are generally referred to as *super cookies*.
If curl is built with PSL (**Public Suffix List**) support, it detects and
discards cookies that are specified for such suffix domains that should not
be allowed to have cookies.
if curl is *not* built with PSL support, it has no ability to stop super
cookies.
## Cookies saved to disk
Netscape once created a file format for storing cookies on disk so that they
would survive browser restarts. curl adopted that file format to allow
sharing the cookies with browsers, only to see browsers move away from that
format. Modern browsers no longer use it, while curl still does.
The Netscape cookie file format stores one cookie per physical line in the
file with a bunch of associated meta data, each field separated with
TAB. That file is called the cookie jar in curl terminology.
When libcurl saves a cookie jar, it creates a file header of its own in
which there is a URL mention that will link to the web version of this
document.
## Cookie file format
The cookie file format is text based and stores one cookie per line. Lines
that start with `#` are treated as comments. An exception is lines that
start with `#HttpOnly_`, which is a prefix for cookies that have the
`HttpOnly` attribute set.
Each line that specifies a single cookie consists of seven text fields
separated with TAB characters. A valid line must end with a newline
character.
### Fields in the file
Field number, what type and example data and the meaning of it:
0. string `example.com` - the domain name
1. boolean `FALSE` - include subdomains
2. string `/foobar/` - path
3. boolean `TRUE` - send/receive over HTTPS only
4. number `1462299217` - expires at - seconds since Jan 1st 1970, or 0
5. string `person` - name of the cookie
6. string `daniel` - value of the cookie
## Cookies with curl the command line tool
curl has a full cookie "engine" built in. If you just activate it, you can
have curl receive and send cookies exactly as mandated in the specs.
Command line options:
`-b, --cookie`
tell curl a file to read cookies from and start the cookie engine, or if it
is not a file it will pass on the given string. `-b name=var` works and so
does `-b cookiefile`.
`-j, --junk-session-cookies`
when used in combination with -b, it will skip all "session cookies" on load
so as to appear to start a new cookie session.
`-c, --cookie-jar`
tell curl to start the cookie engine and write cookies to the given file
after the request(s)
## Cookies with libcurl
libcurl offers several ways to enable and interface the cookie engine. These
options are the ones provided by the native API. libcurl bindings may offer
access to them using other means.
`CURLOPT_COOKIE`
Is used when you want to specify the exact contents of a cookie header to
send to the server.
`CURLOPT_COOKIEFILE`
Tell libcurl to activate the cookie engine, and to read the initial set of
cookies from the given file. Read-only.
`CURLOPT_COOKIEJAR`
Tell libcurl to activate the cookie engine, and when the easy handle is
closed save all known cookies to the given cookie jar file. Write-only.
`CURLOPT_COOKIELIST`
Provide detailed information about a single cookie to add to the internal
storage of cookies. Pass in the cookie as an HTTP header with all the
details set, or pass in a line from a Netscape cookie file. This option can
also be used to flush the cookies etc.
`CURLOPT_COOKIESESSION`
Tell libcurl to ignore all cookies it is about to load that are session
cookies.
`CURLINFO_COOKIELIST`
Extract cookie information from the internal cookie storage as a linked
list.
## Cookies with JavaScript
These days a lot of the web is built up by JavaScript. The web browser loads
complete programs that render the page you see. These JavaScript programs
can also set and access cookies.
Since curl and libcurl are plain HTTP clients without any knowledge of or
capability to handle JavaScript, such cookies will not be detected or used.
Often, if you want to mimic what a browser does on such websites, you can
record web browser HTTP traffic when using such a site and then repeat the
cookie operations using curl or libcurl.