chap-understanding-ldap.xml revision b1dce270ec218b8ad86ce6d745d295da038a5c88
<?xml version="1.0" encoding="UTF-8"?>
<!--
! CCPL HEADER START
!
! This work is licensed under the Creative Commons
! Attribution-NonCommercial-NoDerivs 3.0 Unported License.
! To view a copy of this license, visit
! http://creativecommons.org/licenses/by-nc-nd/3.0/
! or send a letter to Creative Commons, 444 Castro Street,
! Suite 900, Mountain View, California, 94041, USA.
!
! You can also obtain a copy of the license at
! trunk/opendj3/legal-notices/CC-BY-NC-ND.txt.
! See the License for the specific language governing permissions
! and limitations under the License.
!
! If applicable, add the following below this CCPL HEADER, with the fields
! enclosed by brackets "[]" replaced with your own identifying information:
! Portions Copyright [yyyy] [name of copyright owner]
!
! CCPL HEADER END
!
! Copyright 2011-2015 ForgeRock AS.
!
-->
<chapter xml:id='chap-understanding-ldap'
xmlns='http://docbook.org/ns/docbook' version='5.0' xml:lang='en'
xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xsi:schemaLocation='http://docbook.org/ns/docbook
http://docbook.org/xml/5.0/xsd/docbook.xsd'
xmlns:xlink='http://www.w3.org/1999/xlink'>
<title>Understanding Directory Services</title>
<indexterm>
<primary>Directory services</primary>
<secondary>About</secondary>
</indexterm>
<indexterm>
<primary>LDAP</primary>
<secondary>About</secondary>
</indexterm>
<para>A directory resembles a dictionary or a phone book. If you know a
word, you can look it up its entry in the dictionary to learn its definition
or its pronunciation. If you know a name, you can look it up its entry in the
phone book to find the telephone number and street address associated with the
name. If you are bored, curious, or have lots of time, you can also read
through the dictionary, phone book, or directory, entry after entry.</para>
<para>Where a directory differs from a paper dictionary or phone book is
in how entries are indexed. Dictionaries typically have one index: words
in alphabetical order. Phone books, too: names in alphabetical order.
Directories entries on the other hand are often indexed for multiple
attributes, names, user identifiers, email addresses, telephone numbers.
This means you can look up a directory entry by the name of the user the
entry belongs to, but also by her user identifier, her email address, or
her telephone number, for example.</para>
<para>OpenDJ directory services are based on the Lightweight Directory Access
Protocol (LDAP). Much of this chapter serves therefore as an introduction to
LDAP. OpenDJ directory services also provide RESTful access to directory data,
yet as directory administrator you will find it useful to understand the
underlying model even if most users are accessing the directory over HTTP
rather than LDAP.</para>
<section xml:id="ldap-directory-history">
<title>How Directories &amp; LDAP Evolved</title>
<para>Phone companies have been managing directories for many decades. The
Internet itself has relied on distributed directory services like DNS since
the mid 1980s.</para>
<para>It was not until the late 1980s, however, that experts from what is now
the International Telecommunications Union brought forth the X.500 set of
international standards, including Directory Access Protocol. The X.500
standards specify Open Systems Interconnect (OSI) protocols and
data definitions for general-purpose directory services. The X.500 standards
were designed to meet the needs of systems built according to the X.400
standards, covering electronic mail services.</para>
<para>Lightweight Directory Access Protocol has been around since the early
1990s. LDAP was originally developed as an alternative protocol that would
allow directory access over Internet protocols rather than OSI protocols,
and be lightweight enough for desktop implementations. By the mid 1990s, LDAP
directory servers became generally available and widely used.</para>
<para>Until the late 1990s, LDAP directory servers were designed primarily
with quick lookups and high availability for lookups in mind. LDAP directory
servers replicate data, so when an update is made, that update gets pushed
out to other peer directory servers. Thus if one directory server goes down,
lookups can continue on other servers. Furthermore, if a directory service
needs to support more lookups, the administrator can simply add another
directory server to replicate with its peers.</para>
<para>As organizations rolled out larger and larger directories serving more
and more applications, they discovered that they needed high availability
not only for lookups, but also for updates. Around the year 2000 directories
began to support multi-master replication, that is replication with multiple
read-write servers. Soon thereafter the organizations with the very largest
directories started to need higher update performance as well as
availability.</para>
<para>The OpenDJ code base began in the mid 2000s, when engineers solving the
update performance issue decided the cost of adapting the existing C-based
directory technology for high performance updates would be higher than the
cost of building a next generation, high performance directory using Java
technology.</para>
</section>
<section xml:id="directory-data">
<title>About Data In LDAP Directories</title>
<indexterm>
<primary>LDAP</primary>
<secondary>Data</secondary>
</indexterm>
<para>LDAP directory data is organized into entries, similar to the entries
for words in the dictionary, or for subscriber names in the phone book.
A sample entry follows.</para>
<programlisting language="ldif">
dn: uid=bjensen,ou=People,dc=example,dc=com
uid: bjensen
cn: Babs Jensen
cn: Barbara Jensen
facsimileTelephoneNumber: +1 408 555 1992
gidNumber: 1000
givenName: Barbara
homeDirectory: /home/bjensen
l: Cupertino
mail: bjensen@example.com
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: person
objectClass: posixAccount
objectClass: top
ou: People
ou: Product Development
roomNumber: 0209
sn: Jensen
telephoneNumber: +1 408 555 1862
uidNumber: 1076
</programlisting>
<para>Barbara Jensen's entry has a number of attributes, such as
<literal>uid: bjensen</literal>,
<literal>telephoneNumber: +1 408 555 1862</literal>, and
<literal>objectClass: posixAccount</literal><footnote><para>The
<literal>objectClass</literal> attribute type indicates which types of
attributes are allowed and optional for the entry. As the entries object
classes can be updated online, and even the definitions of object classes
and attributes are expressed as entries that can be updated online, directory
data is extensible on the fly.</para></footnote>. When you look up her entry
in the directory, you specify one or more attributes and values to match.
The directory server then returns entries with attribute values that match
what you specified.</para>
<para>The attributes you search for are indexed in the directory, so the
directory server can retrieve them more quickly.<footnote><para>Attribute
values do not have to be strings. Some attribute values are pure binary like
certificates and photos.</para></footnote></para>
<para>The entry also has a unique identifier, shown at the top of the entry,
<literal>dn: uid=bjensen,ou=People,dc=example,dc=com</literal>. DN stands
for distinguished name. No two entries in the directory have the same
distinguished name. Yet, DNs are typically composed of case insensitive
attributes.<footnote><para>Sometimes your distinguished names include
characters that you must escape. The following example shows an entry that
includes escaped characters in the DN.</para>
<screen>
$ <userinput>ldapsearch --port 1389 --baseDN dc=example,dc=com "(uid=escape)"</userinput>
<computeroutput>dn: cn=\" # \+ \, \; \&lt; = \&gt; \\ DN Escape Characters,dc=example,dc=com
objectClass: person
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: top
givenName: " # + , ; &lt; = &gt; \
uid: escape
cn: " # + , ; &lt; = &gt; \ DN Escape Characters
sn: DN Escape Characters
mail: escape@example.com</computeroutput>
</screen></footnote></para>
<para>LDAP entries are arranged hierarchically in the directory. The
hierarchical organization resembles a file system on a PC or a web server,
often imagined as an upside-down tree structure, looking similar to a
pyramid. <footnote><para>Hence pyramid icons are associated with directory
servers.</para></footnote> The distinguished name consists of components
separated by commas,
<literal>uid=bjensen,ou=People,dc=example,dc=com</literal>. The names are
little-endian. The components reflect the hierarchy of directory entries.</para>
<mediaobject xml:id="figure-data-organization">
<alt>Directory data hierarchy as seen in OpenDJ Control Panel.</alt>
<imageobject>
<imagedata fileref="images/data-organization.png" format="PNG" />
</imageobject>
<textobject>
<para>You can see the hierarchy of directory data in the left pane of
the Manage Entries browser.</para>
</textobject>
</mediaobject>
<para>Barbara Jensen's entry is located under an entry with DN
<literal>ou=People,dc=example,dc=com</literal>, an organization unit and
parent entry for the people at Example.com. The
<literal>ou=People</literal> entry is located under the entry with DN
<literal>dc=example,dc=com</literal>, the base entry for Example.com.
DC stands for domain component. The directory has other base entries, such
as <literal>cn=config</literal>, under which the configuration is accessible
through LDAP. A directory can serve multiple organizations, too. You might
find <literal>dc=example,dc=com</literal>,
<literal>dc=mycompany,dc=com</literal>, and
<literal>o=myOrganization</literal> in the same LDAP directory.
Therefore when you look up entries, you specify the base DN to look under
in the same way you need to know whether to look in the New York, Paris,
or Tokyo phone book to find a telephone number.<footnote>
<para>The root entry for the directory, technically the entry with DN
<literal>""</literal> (the empty string), is called the root DSE, and
contains information about what the server supports, including the other
base DNs it serves.</para></footnote></para>
<para>
A directory server stores two kinds of attributes in a directory entry:
<firstterm>user attributes</firstterm>
and <firstterm>operational attributes</firstterm>.
User attributes hold the information for users of the directory.
All of the attributes shown in the entry at the outset of this section
are user attributes.
Operational attributes hold information used by the directory itself.
Examples of operational attributes include
<literal>entryUUID</literal>, <literal>modifyTimestamp</literal>,
and <literal>subschemaSubentry</literal>.
When an LDAP search operation finds an entry in the directory,
the directory server returns all the visible user attributes
unless the search request restricts the list of attributes
by specifying those attributes explicitly.
The directory server does not however return any operational attributes
unless the search request specifically asks for them.
Generally speaking, applications should change only user attributes,
and leave updates of operational attributes to the server,
relying on public directory server interfaces to change server behavior.
An exception is access control instruction (<literal>aci</literal>) attributes,
which are operational attributes used to control access to directory data.
</para>
</section>
<section xml:id="ldap-client-server-communication">
<title>About LDAP Client &amp; Server Communication</title>
<para>In some client server communication, like web browsing, a connection is
set up and then torn down for each client request to the server. LDAP has a
different model. In LDAP the client application connects to the server and
authenticates, then requests any number of operations, perhaps processing
results in between requests, and finally disconnects when done.</para>
<itemizedlist xml:id="standard-ldap-operations">
<para>The standard operations are as follows.</para>
<listitem>
<para>Bind (authenticate). The first operation in an LDAP session usually
involves the client binding to the LDAP server, with the server
authenticating the client.<footnote><para>If the client does not bind
explicitly, the server treats the client as an anonymous client. The client
can also bind again on the same connection.</para></footnote> Authentication
identifies the client's identity in LDAP terms, the identity which is later
used by the server to authorize (or not) access to directory data that the
client wants to lookup or change.</para>
</listitem>
<listitem>
<para>Search (lookup). After binding, the client can request that the server
return entries based on an LDAP filter, which is an expression that the
server uses to find entries that match the request, and a base DN under
which to search. For example, to lookup all entries for people with email
address <literal>bjensen@example.com</literal> in data for Example.com,
you would specify a base DN such as
<literal>ou=People,dc=example,dc=com</literal> and the filter
<literal>(mail=bjensen@example.com)</literal>.</para>
</listitem>
<listitem>
<para>Compare. After binding, the client can request that the server
compare an attribute value the client specifies with the value stored
on an entry in the directory.</para>
<para>This operation is not used as commonly as others.</para>
</listitem>
<listitem>
<para>Modify. After binding, the client can request that the server
change one or more attribute values on an entry. Often administrators
do not allow clients to change directory data, so allow appropriate access
for client application if they have the right to update data.</para>
</listitem>
<listitem>
<para>Add. After binding, the client can request to add one or more
new LDAP entries to the server. </para>
</listitem>
<listitem>
<para>Delete. After binding, the client can request that the server
delete one or more entries. To delete and entry with other entries
underneath, first delete the children, then the parent.</para>
</listitem>
<listitem>
<para>Modify DN. After binding, the client can request that the server
change the distinguished name of the entry. In other words, this renames
the entry or moves it to another location. For example, if Barbara
changes her unique identifier from <literal>bjensen</literal> to something
else, her DN would have to change. For another example, if you decide
to consolidate <literal>ou=Customers</literal> and
<literal>ou=Employees</literal> under <literal>ou=People</literal>
instead, all the entries underneath must change distinguished names.
<footnote><para>Renaming entire branches of entries can be a major
operation for the directory, so avoid moving entire branches if you
can.</para></footnote></para>
</listitem>
<listitem>
<para>Unbind. When done making requests, the client can request an
unbind operation to end the LDAP session.</para>
</listitem>
<listitem>
<para>Abandon. When a request seems to be taking too long to complete,
or when a search request returns many more matches than desired, the client
can send an abandon request to the server to drop the operation in
progress.</para>
</listitem>
</itemizedlist>
<para>For practical examples showing how to perform the key operations using
the command line tools delivered with OpenDJ directory server, read
<link xlink:show="new" xlink:href="admin-guide#chap-ldap-operations"
xlink:role="http://docbook.org/xlink/role/olink"><citetitle>Performing LDAP
Operations</citetitle></link>.</para>
</section>
<section xml:id="standard-ldap-controls-extensions">
<title>About LDAP Controls &amp; Extensions</title>
<para>LDAP has standardized two mechanisms for extending what directory
servers can do beyond the basic operations listed above. One mechanism
involves using LDAP controls. The other mechanism involves using LDAP extended
operations.</para>
<indexterm>
<primary>LDAP controls</primary>
<secondary>About</secondary>
</indexterm>
<para>LDAP controls are information added to an LDAP message to further
specify how an LDAP operation should be processed. For example, the
Server Side Sort Request Control modifies a search to request that the
directory server return entries to the client in sorted order. The Subtree
Delete Request Control modifies a delete to request that the server
also remove child entries of the entry targeted for deletion.</para>
<para>One special search operation that OpenDJ supports is Persistent
Search. The client application sets up a Persistent Search to continue
receiving new results whenever changes are made to data that is in the
scope of the search, thus using the search as a form of change notification.
Persistent Searches are intended to remain connected permanently, though
they can be idle for long periods of time.</para>
<para>The directory server can also send response controls in some cases to
indicate that the response contains special information. Examples include
responses for entry change notification, password policy, and paged results.</para>
<para>For the list of supported LDAP controls, see
<link xlink:show="new" xlink:href="reference#appendix-controls"
xlink:role="http://docbook.org/xlink/role/olink"><citetitle>LDAP
Controls</citetitle></link>.</para>
<indexterm>
<primary>LDAP extended operations</primary>
<secondary>About</secondary>
</indexterm>
<para>LDAP extended operations are additional LDAP operations not included
in the original standard list. For example, the Cancel Extended Operation
works like an abandon operation, but finishes with a response from the
server after the cancel is complete. The StartTLS Extended Operation allows
a client to connect to a server on an unsecure port, but then start
Transport Layer Security negotiations to protect communications.</para>
<para>For the list of supported LDAP extended operations, see
<link xlink:show="new" xlink:href="reference#appendix-extended-ops"
xlink:role="http://docbook.org/xlink/role/olink"><citetitle>LDAP Extended
Operations</citetitle></link>.</para>
</section>
<section xml:id="about-indexes">
<title>About Indexes</title>
<para>As mentioned early in this chapter, directories have indexes for
multiple attributes. In fact by default OpenDJ does not let normal users
perform searches that are not indexed, because such searches mean OpenDJ
has to scan the entire directory looking for matches.</para>
<para>As directory administrator, part of your responsibility is making sure
directory data is properly indexed. OpenDJ provides tools for building
and rebuilding indexes, for verifying indexes, and also for evaluating
how well they are working.</para>
<para>For help better understanding and managing indexes, read the chapter
<link xlink:show="new" xlink:href="admin-guide#chap-indexing"
xlink:role="http://docbook.org/xlink/role/olink"><citetitle>Indexing
Attribute Values</citetitle></link>.</para>
</section>
<section xml:id="schema-overview">
<title>About LDAP Schema</title>
<indexterm>
<primary>Schema</primary>
</indexterm>
<para>Some databases are designed to hold huge amounts of data for a
particular application. Although such databases might support multiple
applications, how their data is organized depends a lot on the particular
applications served.</para>
<para>In contrast, directories are designed for shared, centralized services.
Although the first guides to deploying directory services suggested taking
inventory of all the applications that would access the directory, many
directory administrators today do not even know how many applications use
their services. The shared, centralized nature of directory services fosters
interoperability in practice, and has helped directory services be successful
in the long term.</para>
<para>Part of what makes this possible is the shared model of directory user
information, and in particular the LDAP schema. LDAP schema defines what the
directory can contain. This means that directory entries are not arbitrary
data, but instead tightly codified objects whose attributes are completely
predictable from publicly readable definitions. Many schema definitions are
in fact standard, and so are the same not just across a directory service but
across different directory services.</para>
<para>At the same time, unlike some databases, LDAP schema and the data it
defines can be extended on the fly while the service is running. LDAP schema
is also accessible over LDAP. One attribute of every entry is its set of
<literal>objectClass</literal> values. This gives you as administrator great
flexibility in adapting your directory service to store new data without
losing or changing the structure of existing data, and also without ever
stopping your directory service.</para>
<para>For a closer look, see <link xlink:show="new"
xlink:href="admin-guide#chap-schema"
xlink:role="http://docbook.org/xlink/role/olink"><citetitle>Managing
Schema</citetitle></link>.</para>
</section>
<section xml:id="about-access-control">
<title>About Access Control</title>
<indexterm>
<primary>Access control</primary>
</indexterm>
<para>In addition to directory schema, another feature of directory services
that enables sharing is fine-grained access control.</para>
<para>As directory administrator, you can control who has access to what
data when, how, where and under what conditions by using access control
instructions (ACI). You can allow some directory operations and not others.
You can scope access control from the whole directory service down to
individual attributes on directory entries. You can specify when, from what
host or IP address, and what strength of encryption is needed in order to
perform a particular operation.</para>
<para>As ACIs are stored on entries in the directory, you can furthermore
update access controls while the service is running, and even delegate that
control to client applications. OpenDJ combines the strengths of ACIs with
separate administrative privileges to help you secure access to directory
data.</para>
<para>For more, read <link xlink:show="new"
xlink:href="admin-guide#chap-privileges-acis"
xlink:role="http://docbook.org/xlink/role/olink"><citetitle>Configuring
Privileges &amp; Access Control</citetitle></link>.</para>
</section>
<section xml:id="about-replication">
<title>About Replication</title>
<para>Replication in OpenDJ consists of copying each update to the directory
service to multiple directory servers. This brings both redundancy in the
case of network partitions or of crashes, and also scalability for read
operations. Most directory deployments involve multiple servers replicating
together.</para>
<para>When you have replicated servers, all of which are writable, you can
have replication conflicts. What if, for example, there is a network outage
between two replicas, and meanwhile two different values are written to the
same attribute on the same entry on the two replicas? In nearly all cases,
OpenDJ replication can resolve these situations automatically without
involving you, the directory administrator. This makes your directory service
resilient and safe even in the unpredictable real world.</para>
<para>One perhaps counterintuitive aspect of replication is that although you
do add directory <emphasis>read</emphasis> capacity by adding replicas to
your deployment, you do not add directory <emphasis>write</emphasis> capacity
by adding replicas. As each write operation must be replayed everywhere, the
result is that if you have N servers, you have N write operations to
replay.</para>
<para>Another aspect of replication to keep in mind is that it is "loosely
consistent." Loosely consistent means that directory data will eventually
converge to be the same everywhere, but it will not necessarily be the same
everywhere right away. Client applications sometimes get this wrong when they
write to a pool of load-balanced directory servers, immediately read back
what they wrote, and are surprised that it is not the same. If your users
are complaining about this, either make sure their application always gets
sent to the same server, or else ask that they adapt their application to
work in a more realistic manner.</para>
<para>To get started with replication, see <link xlink:show="new"
xlink:href="admin-guide#chap-replication"
xlink:role="http://docbook.org/xlink/role/olink"><citetitle>Managing Data
Replication</citetitle></link>.</para>
</section>
<section xml:id="directory-services-markup-language">
<title>About DSMLv2</title>
<indexterm>
<primary>DSML</primary>
</indexterm>
<para>Directory Services Markup Language (DSML) was developed starting in 1999
and v2.0 became a standard in 2001. DSMLv2 describes directory data and basic
directory operations in XML format, allowing them to be carried in SOAP
messages. DSMLv2 further allows clients to batch multiple operations together
in a single request, to be processed either in sequential order or in
parallel.</para>
<para>OpenDJ provides support for DSMLv2 as a DSML gateway, which is a Servlet
that connects to any standard LDAPv3 directory. DSMLv2 opens basic directory
services to SOAP based web services and service oriented architectures.</para>
<para>To set up DSMLv2 access, see <link xlink:show="new"
xlink:href="admin-guide#setup-dsml"
xlink:role="http://docbook.org/xlink/role/olink"><citetitle>DSML Client
Access</citetitle></link>.</para>
</section>
<section xml:id="rest-and-ldap">
<title>About RESTful Access to Directory Services</title>
<indexterm>
<primary>REST</primary>
</indexterm>
<para>OpenDJ can expose directory data as JSON resources over HTTP to REST
clients, providing easy access to directory data for developers who are not
familiar with LDAP. RESTful access depends on configuration that describes
how the JSON representation maps to LDAP entries.</para>
<para>Although client applications have no need to understand LDAP, OpenDJ's
underlying implementation still uses the LDAP model for its operations. The
mapping adds some overhead. Furthermore, depending on the configuration,
individual JSON resources can require multiple LDAP operations. For example,
an LDAP user entry represents <literal>manager</literal> as a DN (of the
manager's entry). The same manager might be represented in JSON as an object
holding the manager's user ID and full name, in which case OpenDJ must look
up the manager's entry to resolve the mapping for the manager portion of the
JSON resource, in addition to looking up the user's entry. As another example,
suppose a large group is represented in LDAP as a set of 100,000 DNs. If the
JSON resource is configured so that a member is represented by its name, then
listing that resource would involve 100,000 LDAP searches to translate DNs to
names.</para>
<para>A primary distinction between LDAP entries and JSON resources is that
LDAP entries hold sets of attributes and their values, whereas JSON resources
are documents containing arbitrarily nested objects. As LDAP data is governed
by schema, almost no LDAP objects are arbitrary collections of data.
<footnote><para>LDAP has the object class <literal>extensibleObject</literal>,
but its use should be the exception rather than the rule.</para></footnote>
Furthermore, JSON resources can hold arrays, ordered collections that can
contain duplicates, whereas LDAP attributes are sets, unordered collections
without duplicates. For most directory and identity data, these distinctions
do not matter. You are likely to run into them however if you try to turn
your directory into a document store for arbitrary JSON resources.</para>
<para>Despite some extra cost in terms of system resources, exposing directory
data over HTTP can unlock your directory services for a new generation of
applications. The configuration provides flexible mapping, so that you can
configure views that correspond to how client applications need to see
directory data. OpenDJ also gives you a deployment choice for HTTP access.
You can deploy the REST LDAP gateway, which is a Servlet that connects to
any standard LDAPv3 directory, or you can activate the HTTP Connection Handler
on OpenDJ itself to allow direct and more efficient HTTP and HTTPS
access.</para>
<para>For examples showing how to use RESTful access, see the chapter on
<link xlink:show="new" xlink:href="admin-guide#chap-rest-operations"
xlink:role="http://docbook.org/xlink/role/olink"><citetitle>Performing
RESTful Operations</citetitle></link>.</para>
</section>
<section xml:id="about-building-directory-services">
<title>About Building Directory Services</title>
<para>This chapter is meant to serve as an introduction, and so does not
even cover everything in this guide, let alone everything you might want
to know about directory services.</para>
<para>When you have understood enough of the concepts to build the directory
services you want to deploy, you must still build a prototype and test it
before you roll out shared, centralized services for your organization.
Read the chapter on <link xlink:show="new"
xlink:href="admin-guide#chap-tuning"
xlink:role="http://docbook.org/xlink/role/olink"><citetitle>Tuning Servers
For Performance</citetitle></link> for a look at how to meet the service
levels your clients expect.</para>
</section>
</chapter>