Uniform Resource Locator

Navigation www.wast2000.de


Request for Comments
Uniform Resource Locator
Application Layer
Transport Layer
Internet Layer
Network Access Layer
Subnet Mask

Uniform Resource Locator (URL)

Die weltweit eindeutige Adresse einer Datei kann als Kombination von Rechnernamen und Position innerhalb des jeweiligen Webspace angegeben werden. Diese Angabe wird als URL (Universal Resource Locator) bezeichnet (Protokoll://Benutzerkennung:Kennwort@Rechner(FQDN):Portnummer/Zugriffspfad?Übergabewerte,
FQDN: Rechnername.subdomain.domain.topleveldomain, siehe auch http://www-rnks.informatik.tu-cottbus.de/ ).

Jeder Rechner, der online geht und mit anderen kommunizieren will benötigt eine aus 4 Bytes (zukünftig 6 Bytes) bestehende Internet-Protokoll (IP) Adresse.

Diese IP Adressen kann man sich aber nur schlecht merken. Deswegen wurden für den Menschen einfachere Adressangaben eingesetzt:

fully qualified domain name (FQDN)
Subdomains (mit Rechnername hochstaufen) top level domain (TLD) 
chiemgau. alpen. gebirge.

Das Domain Name System (DNS) übersetzt den Fully Qualified Domain Name (FQDN) in eine IP Adresse (http://www.htmlgoodies.com/, http://www.webopedia.com/).

RFC 1034 Domain Concepts and Facilities November 1987, Page 8

A domain is identified by a domain name, and consists of that part of
the domain name space that is at or below the domain name which
specifies the domain.  A domain is a subdomain of another domain if it
is contained within that domain.  This relationship can be tested by
seeing if the subdomain's name ends with the containing domain's name.
For example, A.B.C.D is a subdomain of B.C.D, C.D, D, and " ".

Originaltext der RFC 1738 Uniform Resource Locators (URL) December 1994 (Berners-Lee, Masinter & McCahill):

3.1. Common Internet Scheme Syntax

   While the syntax for the rest of the URL may vary depending on the
   particular scheme selected, URL schemes that involve the direct use
   of an IP-based protocol to a specified host on the Internet use a
   common syntax for the scheme-specific data:


   Some or all of the parts "<user>:<password>@", ":<password>",
   ":<port>", and "/<url-path>" may be excluded.  The scheme specific
   data start with a double slash "//" to indicate that it complies with
   the common Internet scheme syntax. The different components obey the
   following rules:

        An optional user name. Some schemes (e.g., ftp) allow the
        specification of a user name.

        An optional password. If present, it follows the user
        name separated from it by a colon.

   The user name (and password), if present, are followed by a
   commercial at-sign "@". Within the user and password field, any ":",
   "@", or "/" must be encoded.

   Note that an empty user name or password is different than no user
   name or password; there is no way to specify a password without
   specifying a user name. E.g., <URL:ftp://@host.com/> has an empty
   user name and no password, <URL:ftp://host.com/> has no user name,
   while <URL:ftp://foo:@host.com/> has a user name of "foo" and an
   empty password.

        The fully qualified domain name of a network host, or its IP
        address as a set of four decimal digit groups separated by
        ".". Fully qualified domain names take the form as described
        in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123
        [5]: a sequence of domain labels separated by ".", each domain
        label starting and ending with an alphanumerical character and
        possibly also containing "-" characters. The rightmost domain
        label will never start with a digit, though, which
        syntactically distinguishes all domain names from the IP

        The port number to connect to. Most schemes designate
        protocols that have a default port number. Another port number
        may optionally be supplied, in decimal, separated from the
        host by a colon. If the port is omitted, the colon is as well.

        The rest of the locator consists of data specific to the
        scheme, and is known as the "url-path". It supplies the
        details of how the specified resource can be accessed. Note
        that the "/" between the host (or port) and the url-path is
        NOT part of the url-path.

   The url-path syntax depends on the scheme being used, as does the
   manner in which it is interpreted.


3.3. HTTP

   The HTTP URL scheme is used to designate Internet resources
   accessible using HTTP (HyperText Transfer Protocol).

   The HTTP protocol is specified elsewhere. This specification only
   describes the syntax of HTTP URLs.

   An HTTP URL takes the form:


   where <host> and <port> are as described in Section 3.1. If :<port>
   is omitted, the port defaults to 80.  No user name or password is
   allowed.  <path> is an HTTP selector, and <searchpart> is a query
   string. The <path> is optional, as is the <searchpart> and its
   preceding "?". If neither <path> nor <searchpart> is present, the "/"
   may also be omitted.

   Within the <path> and <searchpart> components, "/", ";", "?" are
   reserved.  The "/" character may be used within HTTP to designate a
   hierarchical structure.


Originaltext der RFC 1034 Section 3.5 Domain Concepts and Facilities November 1987:

3.4. Example name space

The following figure shows a part of the current domain name space, and
is used in many examples in this RFC.  Note that the tree is a very
small subset of the actual name space.

In this example, the root domain has three immediate subdomains: MIL,
EDU, and ARPA.  The LCS.MIT.EDU domain has one immediate subdomain named
XX.LCS.MIT.EDU.  All of the leaves are also domains.

3.5. Preferred name syntax

The DNS specifications attempt to be as general as possible in the rules

for constructing domain names.  The idea is that the name of any
existing object can be expressed as a domain name with minimal changes.
However, when assigning a domain name for an object, the prudent user
will select a name which satisfies both the rules of the domain system
and any existing rules for the object, whether these rules are published
or implied by existing programs.

For example, when naming a mail domain, the user should satisfy both the
rules of this memo and those in RFC-822.  When creating a new host name,
the old rules for HOSTS.TXT should be followed.  This avoids problems
when old software is converted to use domain names.

The following syntax will result in fewer problems with many
applications that use domain names (e.g., mail, TELNET).

<domain> ::= <subdomain> | " "

<subdomain> ::= <label> | <subdomain> "." <label>

<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]

<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>

<let-dig-hyp> ::= <let-dig> | "-"

<let-dig> ::= <letter> | <digit>

<letter> ::= any one of the 52 alphabetic characters A through Z in
upper case and a through z in lower case

<digit> ::= any one of the ten digits 0 through 9

Note that while upper and lower case letters are allowed in domain
names, no significance is attached to the case.  That is, two names with
the same spelling but different case are to be treated as if identical.

The labels must follow the rules for ARPANET host names.  They must
start with a letter, end with a letter or digit, and have as interior
characters only letters, digits, and hyphen.  There are also some
restrictions on the length.  Labels must be 63 characters or less.

For example, the following strings identify hosts in the Internet:


Originaltext der RFC1123 Section 2.1 APPLICATIONS LAYER -- GENERAL October 1989:

2.1  Host Names and Numbers

      The syntax of a legal Internet host name was specified in RFC-952
      [DNS:4].  One aspect of host name syntax is hereby changed: the
      restriction on the first character is relaxed to allow either a
      letter or a digit.  Host software MUST support this more liberal

      Host software MUST handle host names of up to 63 characters and
      SHOULD handle host names of up to 255 characters.

      Whenever a user inputs the identity of an Internet host, it SHOULD
      be possible to enter either (1) a host domain name or (2) an IP
      address in dotted-decimal ("#.#.#.#") form.  The host SHOULD check
      the string syntactically for a dotted-decimal number before
      looking it up in the Domain Name System.

           This last requirement is not intended to specify the complete
           syntactic form for entering a dotted-decimal host number;
           that is considered to be a user-interface issue.  For
           example, a dotted-decimal number must be enclosed within
           "[ ]" brackets for SMTP mail (see Section 5.2.17).  This
           notation could be made universal within a host system,
           simplifying the syntactic checking for a dotted-decimal

           If a dotted-decimal number can be entered without such
           identifying delimiters, then a full syntactic check must be
           made, because a segment of a host domain name is now allowed
           to begin with a digit and could legally be entirely numeric
           (see Section  However, a valid host name can never
           have the dotted-decimal form #.#.#.#, since at least the
           highest-level component label will be alphabetic.

RFC 3696 - Application Techniques for Checking and Transformation of Names

4.2.  The HTTP URL

   Absolute HTTP URLs consist of the scheme name, a host name (expressed
   as a domain name or IP address), and optional port number, and then,
   optionally, a path, a search part, and a fragment identifier.  These
   are separated, respectively, by a colon and the two slashes that
   precede the host name, a colon, a slash, a question mark, and a hash
   mark ("#").  So we have






   and other variations on that form.  There is also a "relative" form,
   but it almost never appears in text that a user might, e.g., enter
   into a form.  See [RFC2616] for details.

   The characters

      / ; ?

   are reserved within the path and search parts and must be encoded;
   the first of these may be used unencoded, and is often used within
   the path, to designate hierarchy.