This clarifies the handling of server responses by folding the code for
the complicated protocols into their protocol handlers. This concerns
mainly HTTP and its bastard sibling RTSP.
The terms "read" and "write" are often used without clear context if
they refer to the connect or the client/application side of a
transfer. This PR uses "read/write" for operations on the client side
and "send/receive" for the connection, e.g. server side. If this is
considered useful, we can revisit renaming of further methods in another
PR.
Curl's protocol handler `readwrite()` method been changed:
```diff
- CURLcode (*readwrite)(struct Curl_easy *data, struct connectdata *conn,
- const char *buf, size_t blen,
- size_t *pconsumed, bool *readmore);
+ CURLcode (*write_resp)(struct Curl_easy *data, const char *buf, size_t blen,
+ bool is_eos, bool *done);
```
The name was changed to clarify that this writes reponse data to the
client side. The parameter changes are:
* `conn` removed as it always operates on `data->conn`
* `pconsumed` removed as the method needs to handle all data on success
* `readmore` removed as no longer necessary
* `is_eos` as indicator that this is the last call for the transfer
response (end-of-stream).
* `done` TRUE on return iff the transfer response is to be treated as
finished
This change affects many files only because of updated comments in
handlers that provide no implementation. The real change is that the
HTTP protocol handlers now provide an implementation.
The HTTP protocol handlers `write_resp()` implementation will get passed
**all** raw data of a server response for the transfer. The HTTP/1.x
formatted status and headers, as well as the undecoded response
body. `Curl_http_write_resp_hds()` is used internally to parse the
response headers and pass them on. This method is public as the RTSP
protocol handler also uses it.
HTTP/1.1 "chunked" transport encoding is now part of the general
*content encoding* writer stack, just like other encodings. A new flag
`CLIENTWRITE_EOS` was added for the last client write. This allows
writers to verify that they are in a valid end state. The chunked
decoder will check if it indeed has seen the last chunk.
The general response handling in `transfer.c:466` happens in function
`readwrite_data()`. This mainly operates now like:
```
static CURLcode readwrite_data(data, ...)
{
do {
Curl_xfer_recv_resp(data, buf)
...
Curl_xfer_write_resp(data, buf)
...
} while(interested);
...
}
```
All the response data handling is implemented in
`Curl_xfer_write_resp()`. It calls the protocol handler's `write_resp()`
implementation if available, or does the default behaviour.
All raw response data needs to pass through this function. Which also
means that anyone in possession of such data may call
`Curl_xfer_write_resp()`.
Closes#12480
A new error code to be used when an internal field grows too large, like
when a dynbuf reaches its maximum. Previously it would return
CURLE_OUT_OF_MEMORY for this, which is highly misleading.
Ref: #12268Closes#12269
- add `SingleRequest->download_done` as indicator that
all download bytes have been received
- remove `stop_reading` bool from readwrite functions
- move excess body handling into client download writer
Closes#12371
The current design of the Hyper integration requires rebuilding the
Hyper clientconn for each request. However, building the clientconn
requires resending the HTTP/2 connection preface, which is incorrect
from a protocol perspective. That in turn causes servers to send GOAWAY
frames, effectively degrading performance to "no connection reuse" in
the best case. It may also be triggering some bugs where requests get
dropped entirely and reconnects take too long.
This doesn't rule out HTTP/2 support with Hyper, but it may take a
redesign of the Hyper integration in order to make things work.
Closes#12191
This PR has these changes:
Renaming of unencode_* to cwriter, e.g. client writers
- documentation of sendf.h functions
- move max decode stack checks back to content_encoding.c
- define writer phase which was used as order before
- introduce phases for monitoring inbetween decode phases
- offering default implementations for init/write/close
Add type paramter to client writer's do_write()
- always pass all writes through the writer stack
- writers who only care about BODY data will pass other writes unchanged
add RAW and PROTOCOL client writers
- RAW used for Curl_debug() logging of CURLINFO_DATA_IN
- PROTOCOL used for updates to data->req.bytecount, max_filesize checks and
Curl_pgrsSetDownloadCounter()
- remove all updates of data->req.bytecount and calls to
Curl_pgrsSetDownloadCounter() and Curl_debug() from other code
- adjust test457 expected output to no longer see the excess write
Closes#12184
- move definitions from content_encoding.h to sendf.h
- move create/cleanup/add code into sendf.c
- installed content_encoding writers will always be called
on Curl_client_write(CLIENTWRITE_BODY)
- Curl_client_cleanup() frees writers and tempbuffers from
paused transfers, irregardless of protocol
Closes#11908
Previously it would only stop them from getting started if the size is
known to be too big then.
Update the libcurl and curl docs accordingly.
Fixes#11810
Reported-by: Elliot Killick
Assisted-by: Jay Satiro
Closes#11820
- use CLIENTWRITE_BODY *only* when data is actually body data
- add CLIENTWRITE_INFO for meta data that is *not* a HEADER
- debug assertions that BODY/INFO/HEADER is not used mixed
- move `data->set.include_header` check into Curl_client_write
so protocol handlers no longer have to care
- add special in FTP for `data->set.include_header` for historic,
backward compatible reasons
- move unpausing of client writes from easy.c to sendf.c, so that
code is in one place and can forward flags correctly
Closes#11885
`Curl_hyper_stream` needs to distinguish between two kinds of
`HYPER_TASK_EMPTY` tasks: (a) the `foreach` tasks it creates itself, and
(b) background tasks that hyper produces. It does this by recording the
address of any `foreach` task in `hyptransfer->endtask` before pushing
it into the executor, and then comparing that against the address of
tasks later polled out of the executor.
This works right now, but there is no guarantee from hyper that the
addresses are stable. `hyper_executor_push` says "The executor takes
ownership of the task, which should not be accessed again unless
returned back to the user with `hyper_executor_poll`". That wording is a
bit ambiguous but with my Rust programmer's hat on I read it as meaning
the task returned with `hyper_executor_poll` may be conceptually the
same as a task that was pushed, but that there are no other guarantees
and comparing addresses is a bad idea.
This commit instead uses `hyper_task_set_userdata` to mark the `foreach`
task with a `USERDATA_RESP_BODY` value which can then be checked for,
removing the need for `hyptransfer->endtask`. This makes the code look
more like that hyper C API examples, which use userdata for every task
and never look at task addresses.
Closes#11779
`Curl_pgrsSetUploadCounter` should be a passed a total count, not an
increment.
This changes the failing diff for test 579 with hyper from this:
```
Progress callback called with UL 0 out of 0[LF]
-Progress callback called with UL 8 out of 0[LF]
-Progress callback called with UL 16 out of 0[LF]
-Progress callback called with UL 26 out of 0[LF]
-Progress callback called with UL 61 out of 0[LF]
-Progress callback called with UL 66 out of 0[LF]
+Progress callback called with UL 29 out of 0[LF]
```
to this:
```
Progress callback called with UL 0 out of 0[LF]
-Progress callback called with UL 8 out of 0[LF]
-Progress callback called with UL 16 out of 0[LF]
-Progress callback called with UL 26 out of 0[LF]
-Progress callback called with UL 61 out of 0[LF]
-Progress callback called with UL 66 out of 0[LF]
+Progress callback called with UL 40 out of 0[LF]
```
Presumably a step in the right direction.
Closes#11780
Some of these changes come from comparing `Curl_http` and
`start_CONNECT`, which are similar, and adding things to them that are
present in one and missing in another.
The most important changes:
- In `start_CONNECT`, add a missing `hyper_clientconn_free` call on the
happy path.
- In `start_CONNECT`, add a missing `hyper_request_free` on the error
path.
- In `bodysend`, add a missing `hyper_body_free` on an early-exit path.
- In `bodysend`, remove an unnecessary `hyper_body_free` on a different
error path that would cause a double-free.
https://docs.rs/hyper/latest/hyper/ffi/fn.hyper_request_set_body.html
says of `hyper_request_set_body`: "This takes ownership of the
hyper_body *, you must not use it or free it after setting it on the
request." This is true even if `hyper_request_set_body` returns an
error; I confirmed this by looking at the hyper source code.
Other changes are minor but make things slightly nicer.
Closes#11745
There is a `hyper_clientconn_free` call on the happy path, but not one
on the error path. This commit adds one.
Fixes the second memory leak reported by Valgrind in #10803.
Fixes#10803Closes#11729
A request created with `hyper_request_new` must be consumed by either
`hyper_clientconn_send` or `hyper_request_free`.
This is not terrifically clear from the hyper docs --
`hyper_request_free` is documented only with "Free an HTTP request if
not going to send it on a client" -- but a perusal of the hyper code
confirms it.
This commit adds a `hyper_request_free` to the `error:` path in
`Curl_http` so that the request is consumed when an error occurs after
the request is created but before it is sent.
Fixes the first memory leak reported by Valgrind in #10803.
Closes#11729
To avoid abuse. The limit is set to 300 KB for the accumulated size of
all received HTTP headers for a single response. Incomplete research
suggests that Chrome uses a 256-300 KB limit, while Firefox allows up to
1MB.
Closes#11582
- refs #11203 where hyper was reported as being slow
- fixes hyper_executor_poll to loop until it is out of
tasks as advised by @seanmonstar in https://github.com/hyperium/hyper/issues/3237
- added a fix in hyper io handling for detecting EAGAIN
- added some debug logs to see IO results
- pytest http/1.1 test cases pass
- pytest h2 test cases fail on connection reuse. HTTP/2
connection reuse does not seem to work. Hyper submits
a request on a reused connection, curl's IO works and
thereafter hyper declares `Hyper: [1] operation was canceled: connection closed`
on stderr without any error being logged before.
Fixes#11203
Reported-by: Gisle Vanem
Advised-by: Sean McArthur
Closes#11344
Out of 415 labels throughout the code base, 86 of those labels were
not at the start of the line. Which means labels always at the start of
the line is the favoured style overall with 329 instances.
Out of the 86 labels not at the start of the line:
* 75 were indented with the same indentation level of the following line
* 8 were indented with exactly one space
* 2 were indented with one fewer indentation level then the following
line
* 1 was indented with the indentation level of the following line minus
three space (probably unintentional)
Co-Authored-By: Viktor Szakats
Closes#11134
- they are mostly pointless in all major jurisdictions
- many big corporations and projects already don't use them
- saves us from pointless churn
- git keeps history for us
- the year range is kept in COPYING
checksrc is updated to allow non-year using copyright statements
Closes#10205
- Replace `Github` with `GitHub`.
- Replace `windows` with `Windows`
- Replace `advice` with `advise` where a verb is used.
- A few fixes on removing repeated words.
- Replace `a HTTP` with `an HTTP`
Closes#9802
Next Protocol Negotiation is a TLS extension that was created and used
for agreeing to use the SPDY protocol (the precursor to HTTP/2) for
HTTPS. In the early days of HTTP/2, before the spec was finalized and
shipped, the protocol could be enabled using this extension with some
servers.
curl supports the NPN extension with some TLS backends since then, with
a command line option `--npn` and in libcurl with
`CURLOPT_SSL_ENABLE_NPN`.
HTTP/2 proper is made to use the ALPN (Application-Layer Protocol
Negotiation) extension and the NPN extension has no purposes
anymore. The HTTP/2 spec was published in May 2015.
Today, use of NPN in the wild should be extremely rare and most likely
totally extinct. Chrome removed NPN support in Chrome 51, shipped in
June 2016. Removed in Firefox 53, April 2017.
Closes#9307
As virtually no called checked the return code, and those that did
wrongly treated it as a CURLcode. Detected by the icc compiler warning:
enumerated type mixed with another type
Closes#9179
Add licensing and copyright information for all files in this repository. This
either happens in the file itself as a comment header or in the file
`.reuse/dep5`.
This commit also adds a Github workflow to check pull requests and adapts
copyright.pl to the changes.
Closes#8869
Hyper now has the ability to preserve header order. This commit adds a
few lines setting the connection options for this feature.
Related to issue #8617Closes#8707
- Make content length (ie download size) accessible to the user in the
header callback, but only after all headers have been processed (ie
only in the final call to the header callback).
Background:
For a long time the content length could be retrieved in the header
callback via CURLINFO_CONTENT_LENGTH_DOWNLOAD_T as soon as it was parsed
by curl.
Changes were made in 8a16e54 (precedes 7.79.0) to ignore content length
if any transfer encoding is used. A side effect of that was that
content length was not set by libcurl until after the header callback
was called the final time, because until all headers are processed it
cannot be determined if content length is valid.
This change keeps the same intention --all headers must be processed--
but now the content length is available before the final call to the
header function that indicates all headers have been processed (ie
a blank header).
Bug: https://github.com/curl/curl/commit/8a16e54#r57374914
Reported-by: sergio-nsk@users.noreply.github.com
Co-authored-by: Daniel Stenberg
Fixes https://github.com/curl/curl/issues/7804
Closes https://github.com/curl/curl/pull/7803
Pass on better return codes when errors occur within Curl_http instead
of insisting that CURLE_OUT_OF_MEMORY is the only possible one.
Pointed-out-by: Jay Satiro
Closes#7851