Request for Comments
Uniform Resource Locator
Network Access Layer
Uniform Resource Locator (URL)
Die weltweit eindeutige Adresse einer Datei kann als Kombination von
Rechnernamen und Position innerhalb des jeweiligen Webspace angegeben werden.
Diese Angabe wird als URL (Universal Resource Locator) bezeichnet (Protokoll://Benutzerkennung:Kennwort@Rechner(FQDN):Portnummer/Zugriffspfad?Übergabewerte,
FQDN: Rechnername.subdomain.domain.topleveldomain, siehe auch http://www-rnks.informatik.tu-cottbus.de/ ).
Jeder Rechner, der online geht und mit anderen kommunizieren will benötigt
eine aus 4 Bytes (zukünftig 6 Bytes) bestehende Internet-Protokoll (IP)
Diese IP Adressen kann man sich aber nur schlecht merken. Deswegen wurden für
den Menschen einfachere Adressangaben eingesetzt:
|fully qualified domain name (FQDN)
|Subdomains (mit Rechnername hochstaufen)
||top level domain (TLD)
Das Domain Name System (DNS) übersetzt den Fully Qualified Domain Name (FQDN)
in eine IP Adresse (http://www.htmlgoodies.com/,
RFC 1034 Domain Concepts and Facilities November 1987, Page 8
A domain is identified by a domain name, and consists of that part of
the domain name space that is at or below the domain name which
specifies the domain. A domain is a subdomain of another domain if it
is contained within that domain. This relationship can be tested by
seeing if the subdomain's name ends with the containing domain's name.
For example, A.B.C.D is a subdomain of B.C.D, C.D, D, and " ".
Originaltext der RFC 1738 Uniform Resource Locators (URL) December 1994 (Berners-Lee, Masinter & McCahill):
3.1. Common Internet Scheme Syntax
While the syntax for the rest of the URL may vary depending on the
particular scheme selected, URL schemes that involve the direct use
of an IP-based protocol to a specified host on the Internet use a
common syntax for the scheme-specific data:
Some or all of the parts "<user>:<password>@", ":<password>",
":<port>", and "/<url-path>" may be excluded. The scheme specific
data start with a double slash "//" to indicate that it complies with
the common Internet scheme syntax. The different components obey the
An optional user name. Some schemes (e.g., ftp) allow the
specification of a user name.
An optional password. If present, it follows the user
name separated from it by a colon.
The user name (and password), if present, are followed by a
commercial at-sign "@". Within the user and password field, any ":",
"@", or "/" must be encoded.
Note that an empty user name or password is different than no user
name or password; there is no way to specify a password without
specifying a user name. E.g., <URL:ftp://@host.com/> has an empty
user name and no password, <URL:ftp://host.com/> has no user name,
while <URL:ftp://foo:@host.com/> has a user name of "foo" and an
The fully qualified domain name of a network host, or its IP
address as a set of four decimal digit groups separated by
".". Fully qualified domain names take the form as described
in Section 3.5 of RFC 1034  and Section 2.1 of RFC 1123
: a sequence of domain labels separated by ".", each domain
label starting and ending with an alphanumerical character and
possibly also containing "-" characters. The rightmost domain
label will never start with a digit, though, which
syntactically distinguishes all domain names from the IP
The port number to connect to. Most schemes designate
protocols that have a default port number. Another port number
may optionally be supplied, in decimal, separated from the
host by a colon. If the port is omitted, the colon is as well.
The rest of the locator consists of data specific to the
scheme, and is known as the "url-path". It supplies the
details of how the specified resource can be accessed. Note
that the "/" between the host (or port) and the url-path is
NOT part of the url-path.
The url-path syntax depends on the scheme being used, as does the
manner in which it is interpreted.
The HTTP URL scheme is used to designate Internet resources
accessible using HTTP (HyperText Transfer Protocol).
The HTTP protocol is specified elsewhere. This specification only
describes the syntax of HTTP URLs.
An HTTP URL takes the form:
where <host> and <port> are as described in Section 3.1. If :<port>
is omitted, the port defaults to 80. No user name or password is
allowed. <path> is an HTTP selector, and <searchpart> is a query
string. The <path> is optional, as is the <searchpart> and its
preceding "?". If neither <path> nor <searchpart> is present, the "/"
may also be omitted.
Within the <path> and <searchpart> components, "/", ";", "?" are
reserved. The "/" character may be used within HTTP to designate a
Originaltext der RFC 1034 Section 3.5 Domain Concepts and Facilities November 1987:
3.4. Example name space
The following figure shows a part of the current domain name space, and
is used in many examples in this RFC. Note that the tree is a very
small subset of the actual name space.
In this example, the root domain has three immediate subdomains: MIL,
EDU, and ARPA. The LCS.MIT.EDU domain has one immediate subdomain named
XX.LCS.MIT.EDU. All of the leaves are also domains.
3.5. Preferred name syntax
The DNS specifications attempt to be as general as possible in the rules
for constructing domain names. The idea is that the name of any
existing object can be expressed as a domain name with minimal changes.
However, when assigning a domain name for an object, the prudent user
will select a name which satisfies both the rules of the domain system
and any existing rules for the object, whether these rules are published
or implied by existing programs.
For example, when naming a mail domain, the user should satisfy both the
rules of this memo and those in RFC-822. When creating a new host name,
the old rules for HOSTS.TXT should be followed. This avoids problems
when old software is converted to use domain names.
The following syntax will result in fewer problems with many
applications that use domain names (e.g., mail, TELNET).
<domain> ::= <subdomain> | " "
<subdomain> ::= <label> | <subdomain> "." <label>
<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]
<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>
<let-dig-hyp> ::= <let-dig> | "-"
<let-dig> ::= <letter> | <digit>
<letter> ::= any one of the 52 alphabetic characters A through Z in
upper case and a through z in lower case
<digit> ::= any one of the ten digits 0 through 9
Note that while upper and lower case letters are allowed in domain
names, no significance is attached to the case. That is, two names with
the same spelling but different case are to be treated as if identical.
The labels must follow the rules for ARPANET host names. They must
start with a letter, end with a letter or digit, and have as interior
characters only letters, digits, and hyphen. There are also some
restrictions on the length. Labels must be 63 characters or less.
For example, the following strings identify hosts in the Internet:
A.ISI.EDU XX.LCS.MIT.EDU SRI-NIC.ARPA
Originaltext der RFC1123 Section 2.1 APPLICATIONS LAYER -- GENERAL October 1989:
2.1 Host Names and Numbers
The syntax of a legal Internet host name was specified in RFC-952
[DNS:4]. One aspect of host name syntax is hereby changed: the
restriction on the first character is relaxed to allow either a
letter or a digit. Host software MUST support this more liberal
Host software MUST handle host names of up to 63 characters and
SHOULD handle host names of up to 255 characters.
Whenever a user inputs the identity of an Internet host, it SHOULD
be possible to enter either (1) a host domain name or (2) an IP
address in dotted-decimal ("#.#.#.#") form. The host SHOULD check
the string syntactically for a dotted-decimal number before
looking it up in the Domain Name System.
This last requirement is not intended to specify the complete
syntactic form for entering a dotted-decimal host number;
that is considered to be a user-interface issue. For
example, a dotted-decimal number must be enclosed within
"[ ]" brackets for SMTP mail (see Section 5.2.17). This
notation could be made universal within a host system,
simplifying the syntactic checking for a dotted-decimal
If a dotted-decimal number can be entered without such
identifying delimiters, then a full syntactic check must be
made, because a segment of a host domain name is now allowed
to begin with a digit and could legally be entirely numeric
(see Section 18.104.22.168). However, a valid host name can never
have the dotted-decimal form #.#.#.#, since at least the
highest-level component label will be alphabetic.
RFC 3696 - Application Techniques for Checking and Transformation of Names
4.2. The HTTP URL
Absolute HTTP URLs consist of the scheme name, a host name (expressed
as a domain name or IP address), and optional port number, and then,
optionally, a path, a search part, and a fragment identifier. These
are separated, respectively, by a colon and the two slashes that
precede the host name, a colon, a slash, a question mark, and a hash
mark ("#"). So we have
and other variations on that form. There is also a "relative" form,
but it almost never appears in text that a user might, e.g., enter
into a form. See [RFC2616] for details.
/ ; ?
are reserved within the path and search parts and must be encoded;
the first of these may be used unencoded, and is often used within
the path, to designate hierarchy.