2024-01-17 18:32:44 +08:00
|
|
|
---
|
|
|
|
c: Copyright (C) Daniel Stenberg, <daniel.se>, et al.
|
|
|
|
SPDX-License-Identifier: curl
|
|
|
|
Title: curl_url_get
|
|
|
|
Section: 3
|
|
|
|
Source: libcurl
|
|
|
|
See-also:
|
|
|
|
- CURLOPT_CURLU (3)
|
|
|
|
- curl_url (3)
|
|
|
|
- curl_url_cleanup (3)
|
|
|
|
- curl_url_dup (3)
|
|
|
|
- curl_url_set (3)
|
|
|
|
- curl_url_strerror (3)
|
|
|
|
---
|
|
|
|
|
|
|
|
# NAME
|
|
|
|
|
2018-08-05 17:51:07 +08:00
|
|
|
curl_url_get - extract a part from a URL
|
2024-01-17 18:32:44 +08:00
|
|
|
|
|
|
|
# SYNOPSIS
|
|
|
|
|
|
|
|
~~~c
|
2021-11-26 21:20:18 +08:00
|
|
|
#include <curl/curl.h>
|
|
|
|
|
2023-11-25 00:52:15 +08:00
|
|
|
CURLUcode curl_url_get(const CURLU *url,
|
2023-08-22 17:26:05 +08:00
|
|
|
CURLUPart part,
|
|
|
|
char **content,
|
2023-09-05 17:30:53 +08:00
|
|
|
unsigned int flags);
|
2024-01-17 18:32:44 +08:00
|
|
|
~~~
|
|
|
|
|
|
|
|
# DESCRIPTION
|
|
|
|
|
|
|
|
Given a *url* handle of a URL object, this function extracts an individual
|
2023-08-22 17:26:05 +08:00
|
|
|
piece or the full URL from it.
|
2018-08-05 17:51:07 +08:00
|
|
|
|
2024-01-17 18:32:44 +08:00
|
|
|
The *part* argument specifies which part to extract (see list below) and
|
|
|
|
*content* points to a 'char *' to get updated to point to a newly
|
2018-08-05 17:51:07 +08:00
|
|
|
allocated string with the contents.
|
|
|
|
|
2024-01-17 18:32:44 +08:00
|
|
|
The *flags* argument is a bitmask with individual features.
|
|
|
|
|
|
|
|
The returned content pointer must be freed with curl_free(3) after use.
|
|
|
|
|
|
|
|
# FLAGS
|
2018-08-05 17:51:07 +08:00
|
|
|
|
|
|
|
The flags argument is zero, one or more bits set in a bitmask.
|
2024-01-17 18:32:44 +08:00
|
|
|
|
|
|
|
## CURLU_DEFAULT_PORT
|
|
|
|
|
|
|
|
If the handle has no port stored, this option makes curl_url_get(3)
|
2018-08-05 17:51:07 +08:00
|
|
|
return the default port for the used scheme.
|
2024-01-17 18:32:44 +08:00
|
|
|
|
|
|
|
## CURLU_DEFAULT_SCHEME
|
|
|
|
|
|
|
|
If the handle has no scheme stored, this option makes curl_url_get(3)
|
2023-08-22 17:26:05 +08:00
|
|
|
return the default scheme instead of error.
|
2024-01-17 18:32:44 +08:00
|
|
|
|
|
|
|
## CURLU_NO_DEFAULT_PORT
|
|
|
|
|
|
|
|
Instructs curl_url_get(3) to not return a port number if it matches the
|
2018-08-05 17:51:07 +08:00
|
|
|
default port for the scheme.
|
2024-01-17 18:32:44 +08:00
|
|
|
|
|
|
|
## CURLU_URLDECODE
|
|
|
|
|
|
|
|
Asks curl_url_get(3) to URL decode the contents before returning it. It
|
2023-08-22 17:26:05 +08:00
|
|
|
does not decode the scheme, the port number or the full URL.
|
2022-12-15 19:57:48 +08:00
|
|
|
|
2023-08-22 17:26:05 +08:00
|
|
|
The query component also gets plus-to-space conversion as a bonus when this
|
|
|
|
bit is set.
|
2018-08-05 17:51:07 +08:00
|
|
|
|
2023-08-22 17:26:05 +08:00
|
|
|
Note that this URL decoding is charset unaware and you get a zero terminated
|
|
|
|
string back with data that could be intended for a particular encoding.
|
2018-08-05 17:51:07 +08:00
|
|
|
|
2023-08-22 17:26:05 +08:00
|
|
|
If there are byte values lower than 32 in the decoded string, the get
|
|
|
|
operation returns an error instead.
|
2024-01-17 18:32:44 +08:00
|
|
|
|
|
|
|
## CURLU_URLENCODE
|
|
|
|
|
|
|
|
If set, curl_url_get(3) URL encodes the host name part when a full URL
|
2023-08-22 17:26:05 +08:00
|
|
|
is retrieved. If not set (default), libcurl returns the URL with the host name
|
|
|
|
"raw" to support IDN names to appear as-is. IDN host names are typically using
|
|
|
|
non-ASCII bytes that otherwise gets percent-encoded.
|
2021-10-09 02:12:26 +08:00
|
|
|
|
2023-08-22 17:26:05 +08:00
|
|
|
Note that even when not asking for URL encoding, the '%' (byte 37) is URL
|
2021-10-09 02:12:26 +08:00
|
|
|
encoded to make sure the host name remains valid.
|
2024-01-17 18:32:44 +08:00
|
|
|
|
|
|
|
## CURLU_PUNYCODE
|
|
|
|
|
|
|
|
If set and *CURLU_URLENCODE* is not set, and asked to retrieve the
|
|
|
|
**CURLUPART_HOST** or **CURLUPART_URL** parts, libcurl returns the host
|
2022-12-26 17:58:37 +08:00
|
|
|
name in its punycode version if it contains any non-ASCII octets (and is an
|
|
|
|
IDN name).
|
|
|
|
|
2023-08-22 17:26:05 +08:00
|
|
|
If libcurl is built without IDN capabilities, using this bit makes
|
2024-01-17 18:32:44 +08:00
|
|
|
curl_url_get(3) return *CURLUE_LACKS_IDN* if the host name contains
|
2022-12-26 17:58:37 +08:00
|
|
|
anything outside the ASCII range.
|
|
|
|
|
|
|
|
(Added in curl 7.88.0)
|
2024-01-17 18:32:44 +08:00
|
|
|
|
|
|
|
## CURLU_PUNY2IDN
|
|
|
|
|
|
|
|
If set and asked to retrieve the **CURLUPART_HOST** or **CURLUPART_URL**
|
2023-08-11 15:41:28 +08:00
|
|
|
parts, libcurl returns the host name in its IDN (International Domain Name)
|
2023-08-22 17:26:05 +08:00
|
|
|
UTF-8 version if it otherwise is a punycode version. If the punycode name
|
|
|
|
cannot be converted to IDN correctly, libcurl returns
|
2024-01-17 18:32:44 +08:00
|
|
|
*CURLUE_BAD_HOSTNAME*.
|
2023-08-11 15:41:28 +08:00
|
|
|
|
2023-08-22 17:26:05 +08:00
|
|
|
If libcurl is built without IDN capabilities, using this bit makes
|
2024-01-17 18:32:44 +08:00
|
|
|
curl_url_get(3) return *CURLUE_LACKS_IDN* if the host name is using
|
2023-08-11 15:41:28 +08:00
|
|
|
punycode.
|
|
|
|
|
|
|
|
(Added in curl 8.3.0)
|
2024-01-17 18:32:44 +08:00
|
|
|
|
|
|
|
# PARTS
|
|
|
|
|
|
|
|
## CURLUPART_URL
|
|
|
|
|
|
|
|
When asked to return the full URL, curl_url_get(3) returns a normalized
|
2023-08-22 17:26:05 +08:00
|
|
|
and possibly cleaned up version using all available URL parts.
|
2023-03-07 18:01:15 +08:00
|
|
|
|
2024-01-17 18:32:44 +08:00
|
|
|
We advise using the *CURLU_PUNYCODE* option to get the URL as "normalized"
|
2023-03-07 18:01:15 +08:00
|
|
|
as possible since IDN allows host names to be written in many different ways
|
|
|
|
that still end up the same punycode version.
|
2024-01-17 18:32:44 +08:00
|
|
|
|
|
|
|
## CURLUPART_SCHEME
|
|
|
|
|
2018-08-05 17:51:07 +08:00
|
|
|
Scheme cannot be URL decoded on get.
|
2024-01-17 18:32:44 +08:00
|
|
|
|
|
|
|
## CURLUPART_USER
|
|
|
|
|
|
|
|
## CURLUPART_PASSWORD
|
|
|
|
|
|
|
|
## CURLUPART_OPTIONS
|
|
|
|
|
2023-03-07 18:01:15 +08:00
|
|
|
The options field is an optional field that might follow the password in the
|
|
|
|
userinfo part. It is only recognized/used when parsing URLs for the following
|
|
|
|
schemes: pop3, smtp and imap. The URL API still allows users to set and get
|
|
|
|
this field independently of scheme when not parsing full URLs.
|
2024-01-17 18:32:44 +08:00
|
|
|
|
|
|
|
## CURLUPART_HOST
|
|
|
|
|
2023-08-22 17:26:05 +08:00
|
|
|
The host name. If it is an IPv6 numeric address, the zone id is not part of it
|
2024-01-17 18:32:44 +08:00
|
|
|
but is provided separately in *CURLUPART_ZONEID*. IPv6 numerical addresses
|
2023-08-22 17:26:05 +08:00
|
|
|
are returned within brackets ([]).
|
2023-03-07 18:01:15 +08:00
|
|
|
|
|
|
|
IPv6 names are normalized when set, which should make them as short as
|
|
|
|
possible while maintaining correct syntax.
|
2024-01-17 18:32:44 +08:00
|
|
|
|
|
|
|
## CURLUPART_ZONEID
|
|
|
|
|
2019-05-03 19:18:12 +08:00
|
|
|
If the host name is a numeric IPv6 address, this field might also be set.
|
2024-01-17 18:32:44 +08:00
|
|
|
|
|
|
|
## CURLUPART_PORT
|
|
|
|
|
2023-03-07 18:01:15 +08:00
|
|
|
A port cannot be URL decoded on get. This number is returned in a string just
|
|
|
|
like all other parts. That string is guaranteed to hold a valid port number in
|
|
|
|
ASCII using base 10.
|
2024-01-17 18:32:44 +08:00
|
|
|
|
|
|
|
## CURLUPART_PATH
|
|
|
|
|
|
|
|
The *part* is always at least a slash ('/') even if no path was supplied
|
2023-08-22 17:26:05 +08:00
|
|
|
in the URL. A URL path always starts with a slash.
|
2024-01-17 18:32:44 +08:00
|
|
|
|
|
|
|
## CURLUPART_QUERY
|
|
|
|
|
2023-03-07 18:01:15 +08:00
|
|
|
The initial question mark that denotes the beginning of the query part is a
|
2024-01-17 18:32:44 +08:00
|
|
|
delimiter only. It is not part of the query contents.
|
2021-08-13 15:22:05 +08:00
|
|
|
|
2024-01-17 18:32:44 +08:00
|
|
|
A not-present query returns *part* set to NULL.
|
|
|
|
A zero-length query returns *part* as a zero-length string.
|
2021-08-13 15:22:05 +08:00
|
|
|
|
2023-08-22 17:26:05 +08:00
|
|
|
The query part gets pluses converted to space when asked to URL decode on get
|
|
|
|
with the CURLU_URLDECODE bit.
|
2024-01-17 18:32:44 +08:00
|
|
|
|
|
|
|
## CURLUPART_FRAGMENT
|
|
|
|
|
2023-03-07 18:01:15 +08:00
|
|
|
The initial hash sign that denotes the beginning of the fragment is a
|
|
|
|
delimiter only. It is not part of the fragment contents.
|
2024-01-17 18:32:44 +08:00
|
|
|
|
|
|
|
# EXAMPLE
|
|
|
|
|
|
|
|
~~~c
|
2023-12-04 17:50:42 +08:00
|
|
|
int main(void)
|
|
|
|
{
|
2018-08-05 17:51:07 +08:00
|
|
|
CURLUcode rc;
|
|
|
|
CURLU *url = curl_url();
|
|
|
|
rc = curl_url_set(url, CURLUPART_URL, "https://example.com", 0);
|
|
|
|
if(!rc) {
|
|
|
|
char *scheme;
|
|
|
|
rc = curl_url_get(url, CURLUPART_SCHEME, &scheme, 0);
|
|
|
|
if(!rc) {
|
2024-01-17 18:32:44 +08:00
|
|
|
printf("the scheme is %s\n", scheme);
|
2018-08-05 17:51:07 +08:00
|
|
|
curl_free(scheme);
|
|
|
|
}
|
|
|
|
curl_url_cleanup(url);
|
|
|
|
}
|
2023-12-04 17:50:42 +08:00
|
|
|
}
|
2024-01-17 18:32:44 +08:00
|
|
|
~~~
|
|
|
|
|
|
|
|
# AVAILABILITY
|
|
|
|
|
2021-10-25 17:45:09 +08:00
|
|
|
Added in 7.62.0. CURLUPART_ZONEID was added in 7.65.0.
|
2024-01-17 18:32:44 +08:00
|
|
|
|
|
|
|
# RETURN VALUE
|
|
|
|
|
2021-10-25 14:54:08 +08:00
|
|
|
Returns a CURLUcode error value, which is CURLUE_OK (0) if everything went
|
2024-01-17 18:32:44 +08:00
|
|
|
fine. See the libcurl-errors(3) man page for the full list with
|
2021-10-25 14:54:08 +08:00
|
|
|
descriptions.
|
|
|
|
|
|
|
|
If this function returns an error, no URL part is returned.
|