This library deals with the analysis and construction of a URL, Universal Resource Locator. URL is the basis for communicating locations of resources (data) on the web. A URL consists of a protocol identifier (e.g. HTTP, FTP, and a protocol-specific syntax further defining the location. URLs are standardized in RFC-1738.
The implementation in this library covers only a small portion of the defined protocols. Though the initial implementation followed RFC-1738 strictly, the current is more relaxed to deal with frequent violations of the standard encountered in practical use.
<Action> <Location> HTTP/<version>
Location | Atom or list of character codes. |
url:
, an
identifier separated from the remainder of the URL using :.
parse_url/2 assumes the http
protocol if no protocol is specified and the URL can be
parsed as a valid HTTP url. In addition to the RFC-1738 specified
protocols, the file
protocol is supported as well.
\
arg{Host}. This only
appears if the port is explicitly specified in the URL.
Implicit default ports (e.g. 80 for HTTP) do \
emph{not}
appear in the part-list.
ftp
, http
and file
protocols. If
no path appears, the library generates the path /
.
?
, normally used to transfer data from HTML forms that use
the GET
protocol. In the URL it consists of a
www-form-encoded list of Name=Value pairs. This is mapped to a list of
Prolog Name=Value terms with decoded names and values.
#
character.
The example below illustrates the all this for an HTTP URL.
?- parse_url('http://swi.psy.uva.nl/message.cgi?msg=Hello+World%21#x', P). P = [ protocol(http), host('swi.psy.uva.nl'), fragment(x), search([ msg = 'Hello World!' ]), path('/message.cgi') ]
By instantiating the parts-list this predicate can be used to create a URL.
%XX
and
newlines to %OD%OA
. When decoding, newlines appear as a
single newline (10) character.//
URL.