chap-cluster.xml revision 3f86d4e2ad2128cae27b60d8584d6befb05505d8
<?xml version="1.0" encoding="UTF-8"?>
<!--
! CCPL HEADER START
!
! This work is licensed under the Creative Commons
! Attribution-NonCommercial-NoDerivs 3.0 Unported License.
! To view a copy of this license, visit
! http://creativecommons.org/licenses/by-nc-nd/3.0/
! or send a letter to Creative Commons, 444 Castro Street,
! Suite 900, Mountain View, California, 94041, USA.
!
! You can also obtain a copy of the license at
! legal/CC-BY-NC-ND.txt.
! See the License for the specific language governing permissions
! and limitations under the License.
!
! If applicable, add the following below this CCPL HEADER, with the fields
! enclosed by brackets "[]" replaced with your own identifying information:
! Portions Copyright [yyyy] [name of copyright owner]
!
! CCPL HEADER END
!
! Copyright 2011-2014 ForgeRock AS
!
-->
<chapter xml:id='chap-cluster'
xmlns='http://docbook.org/ns/docbook'
version='5.0' xml:lang='en'
xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xsi:schemaLocation='http://docbook.org/ns/docbook
http://docbook.org/xml/5.0/xsd/docbook.xsd'
xmlns:xlink='http://www.w3.org/1999/xlink'>
<title>Configuring OpenIDM to Work in a Cluster</title>
<indexterm>
<primary>cluster management</primary>
</indexterm>
<indexterm>
<primary>high availability</primary>
</indexterm>
<indexterm>
<primary>failover</primary>
</indexterm>
<para>
To ensure availability of the identity management service, you can deploy
multiple OpenIDM instances in a cluster. In a clustered environment, all
instances point to the same external database. The database itself might or
might not be clustered, depending on your particular availability strategy.
</para>
<para>
In a clustered environment, if an instance becomes unavailable or fails to
check in with the cluster management service, another instance in the cluster
detects this situation. If the unavailable instance was in the process of
executing any pending jobs, the available instance attempts to recover these
jobs.
</para>
<para>
For example, if instance-1 goes down while executing a scheduled task, the
cluster manager will notify the scheduler service that instance-1 is
unavailable. The scheduler service then attempts to clean up any jobs that
instance-1 was executing when it went down.
</para>
<para>
Specific configuration changes must be made to configure multiple instances
that point to a single database. These configuration changes are described in
this chapter.
</para>
<section xml:id="cluster-config">
<title>Configuring an OpenIDM Instance as Part of a Cluster</title>
<para>
Before you configure an instance to work in a cluster, make sure that the
instance is not running. If the instance was previously started, delete the
<filename>felix-cache</filename> folder.
</para>
<para>
To configure an individual OpenIDM instance to be part of a clustered
deployment, follow these steps.
</para>
<orderedlist>
<listitem>
<para>
Configure OpenIDM for a MySQL repository, as described in <link
xlink:href="install-guide#chap-repository"
xlink:role="http://docbook.org/xlink/role/olink"><citetitle>Installing a
Repository For Production</citetitle></link> in the
<citetitle>Installation Guide</citetitle>.
</para>
<para>
All OpenIDM instances that form part of a single cluster must be homogenous
in terms of their repository - in other words, the instances must all be
configured to use the same type of repository (MySQL, MS SQL, or OracleDB).
Note that OrientDB is currently unsupported in production environments.
</para>
<para>You need only import the data definition language script for OpenIDM
into MySQL once, not repeatedly for each OpenIDM instance.
</para>
</listitem>
<listitem>
<para><xref linkend="cluster-boot-config" /></para>
</listitem>
<listitem>
<para><xref linkend="cluster-config-file" /></para>
</listitem>
<listitem>
<para>
If you are using scheduled tasks, make sure that schedules are persisted to
ensure that they fire only once across the cluster. For more information,
see the section on <link xlink:show="new"
xlink:href="integrators-guide#persistent-schedules"
xlink:role="http://docbook.org/xlink/role/olink"><citetitle>Persisted
Schedules</citetitle></link>.
</para>
</listitem>
</orderedlist>
<section xml:id="cluster-boot-config">
<title>Edit the Boot Configuration</title>
<para>
Each participating instance in a cluster must have its own unique node or
instance ID, and must be attributed a role in the cluster. Specify these
parameters in the <filename>conf/boot/boot.properties</filename> file of
each instance.
</para>
<itemizedlist>
<listitem>
<para>
Specify a unique identifier for the instance. For example:
</para>
<screen>$ grep openidm.node.id /path/to/openidm/conf/boot/boot.properties
openidm.node.id=instance1</screen>
<para>
On subsequent instances, the <literal>openidm.node.id</literal> can be set
to <literal>instance2</literal>, <literal>instance3</literal>, and so
forth. You can choose any value, as long as it is unique within the
cluster.
</para>
<para>
The cluster manager (configured in <filename>cluster.json</filename>) and
the scheduler (configured in <filename>scheduler.json</filename>)
reference the instance ID from the <filename>boot.properties</filename>
file as follows:
</para>
<programlisting>
"instanceId" : "&amp;{openidm.node.id}",
</programlisting>
<para>
The scheduler uses the instance ID to claim and execute pending jobs. If
multiple nodes have the same instance ID, problems will arise with
multiple nodes attempting to execute the same scheduled jobs.
</para>
<para>
The cluster manager requires nodes to have unique IDs to ensure that it is
able to detect when a node becomes unavailable.
</para>
</listitem>
<listitem>
<para>
Specify the instance type in the cluster.
</para>
<para>
On the primary instance, add the following line to the
<filename>boot.properties</filename> file:
</para>
<programlisting>
openidm.instance.type=clustered-first
</programlisting>
<para>
On subsequent instances, add the following line to the
<filename>boot.properties</filename> file:
</para>
<programlisting>
openidm.instance.type=clustered-additional
</programlisting>
<para>
The instance type is used during the setup process. When the primary node
has been configured, additional nodes are bootstrapped with the security
settings (key store and trust store) of the primary node. After all nodes
have been configured, they are all considered equal in the cluster, that
is, there is no concept of a "master" node.
</para>
<para>
If no instance type is specified, the default value for this property is
<literal>openidm.instance.type=standalone</literal>, which indicates that
the instance will not be part of a cluster.
</para>
</listitem>
</itemizedlist>
</section>
<section xml:id="cluster-config-file">
<title>Edit the Cluster Configuration</title>
<para>
The cluster configuration for each instance is defined in the file
<filename>/path/to/openidm/conf/cluster.json</filename>. In most cases, you
should be able to retain the default cluster configuration, which is as
follows:
</para>
<programlisting language="javascript">{
"instanceId" : "&amp;{openidm.node.id}",
"instanceTimeout" : "30000",
"instanceRecoveryTimeout" : "30000",
"instanceCheckInInterval" : "5000",
"instanceCheckInOffset" : "0"
} </programlisting>
<itemizedlist>
<listitem>
<para>
The ID of the instance (<literal>instanceId</literal>) is set in the
<filename>conf/boot/boot.properties</filename> file, as described in the
previous section.
</para>
</listitem>
<listitem>
<para>
<literal>instanceTimeout</literal> specifies the length of time (in
milliseconds) that an instance can be "down" before the instance is
considered to be in recovery mode.
</para>
</listitem>
<listitem>
<para>
<literal>instanceRecoveryTimeout</literal> specifies the length of time (in
milliseconds) that an instance can be in recovery mode before that instance
is considered to be offline.
</para>
</listitem>
<listitem>
<para>
<literal>instanceCheckInInterval</literal> specifies the frequency (in
milliseconds) that this instance checks in with the cluster manager to
indicate that it is still online.
</para>
</listitem>
<listitem>
<para>
<literal>instanceCheckInOffset</literal> specifies an offset (in
milliseconds) for the checkin timing, per instance, when a number of
instances in a cluster are started simultaneously.
</para>
<para>
Specifying a checkin offset prevents a situation in which all the instances
in a cluster check in at the same time, and place a strain on the cluster
manager resource.
</para>
</listitem>
</itemizedlist>
<para>
If the default cluster configuration is not suitable for your deployment,
edit the <filename>cluster.json</filename> file for each instance.
</para>
</section>
</section>
<section xml:id="clustering-scheduled-tasks">
<title>Managing Scheduled Tasks Across a Cluster</title>
<itemizedlist>
<para>
In a clustered environment, the scheduler service looks for pending jobs and
handles them as follows:
</para>
<listitem>
<para>
Non-persistent (in-memory) jobs will fire on each node in the cluster.
</para>
</listitem>
<listitem>
<para>
Persistent scheduled jobs are picked up and executed by a single node in
the cluster.
</para>
</listitem>
<listitem>
<para>
Jobs that are configured as persistent but <emphasis>not concurrent</emphasis>
will fire only once across the cluster and will not fire again at the
scheduled time, on the same node, or on a different node, until the current
job has completed.
</para>
<para>
For example, a reconciliation operation that runs for longer than the time
between scheduled intervals will not trigger a duplicate job while it is
still running.
</para>
</listitem>
</itemizedlist>
<para>
The order in which nodes in a cluster claim jobs is random. The cluster
manager will automatically fail over jobs that were claimed by a node, if
that node goes down. Note that this behavior is different to the cluster
behavior in OpenIDM 2.1.0, in which an unavailable node would need to come up
again to free a job that it had already claimed.
</para>
<para>
If a number of changes are made as a result of a LiveSync action, a single
instance will claim the action, and will process all the changes related to
that action.
</para>
<para>
To prevent a specific instance from claiming pending jobs,
<literal>"executePersistentSchedules"</literal> should be set to
<literal>false</literal> in the scheduler configuration for that instance.
Because all nodes in a cluster read their configuration from a single
repository you must use token substitution, via the
<filename>boot.properties</filename> file, to define a specific scheduler
configuration for each node.
</para>
<para>
So, if you want certain nodes to participate in processing clustered
schedules (such as LiveSync) and other nodes not to participate, you can
specify this information in the <filename>conf/boot/boot.properties</filename>
file of each node. For example, to prevent a node from participating, add the
following line to the <filename>boot.properties</filename> file of that node:
</para>
<programlisting>
execute.clustered.schedules=false
</programlisting>
<para>
The initial scheduler configuration that is loaded into the repository must
point to the relevant property in <filename>boot.properties</filename>. So,
the initial <filename>scheduler.json</filename> file would include a token
such as the following:
</para>
<programlisting>
{
"executePersistentSchedules" : "&amp;{execute.clustered.schedules}",
}
</programlisting>
<para>
To prevent changes to configuration files from overwriting the global
configuration in the repository, you should disable the file-based
configuration view in a clustered deployment. For more information, see
<link xlink:show="new"
xlink:href="integrators-guide#disabling-file-based-config"
xlink:role="http://docbook.org/xlink/role/olink"><citetitle>Disabling the
File-Based Configuration View</citetitle></link>.
</para>
</section>
<section xml:id="cluster-over-REST">
<title>Managing Nodes Over REST</title>
<para>
You can manage clusters and individual nodes over the REST interface, at the
URL <literal>https://localhost:8443/openidm/cluster/</literal>. The following
sample REST commands demonstrate the cluster information that is available
over REST.
</para>
<example>
<title>Displaying the Nodes in the Cluster</title>
<para>
The following REST request displays the nodes configured in the cluster, and
their status.
</para>
<screen>$ <userinput>curl \
--cacert self-signed.crt \
--header "X-OpenIDM-Username: openidm-admin" \
--header "X-OpenIDM-Password: openidm-admin" \
--request GET \
"https://localhost:8443/openidm/cluster" </userinput>
<computeroutput>
{
"results": [
{
"shutdown": "",
"startup": "2013-10-28T11:48:29.026+02:00",
"instanceId": "openidm-1",
"state": "running"
},
{
"shutdown": "",
"startup": "2013-10-28T11:51:31.639+02:00",
"instanceId": "openidm-2",
"state": "running"
}
]</computeroutput>
} </screen>
</example>
<example>
<title>Checking the State of an Individual Node</title>
<para>
To check the status of a specific node, include its instance ID in the URL,
for example:
</para>
<screen>$ <userinput> curl \
--cacert self-signed.crt \
--header "X-OpenIDM-Username: openidm-admin" \
--header "X-OpenIDM-Password: openidm-admin" \
--request GET \
"https://localhost:8443/openidm/cluster/openidm-1"</userinput>
<computeroutput>
{
"results": {
"shutdown": "",
"startup": "2013-10-28T11:48:29.026+02:00",
"instanceId": "openidm-1",
"state": "running"
}</computeroutput>
} </screen>
</example>
</section>
</chapter>