A ZooKeeper client establishes a
session with the ZooKeeper service by creating a handle to the service using a language binding.
Once created, the handle starts of in the
CONNECTING state and the client library tries to connect to one of the servers that make up the ZooKeeper service at which point it switches to the
During normal operation will be in one of these two states. If an unrecoverable error occurs, such as session expiration or authentication failure, or if the application explicitly closes the handle, the handle will move to the
The following figure shows the possible state transitions of a ZooKeeper client:
To create a client session, the application code must provide a string
(connection string) containing a comma separated list of
host:port pairs, each corresponding to a ZooKeeper server (e.g.
Added in 3.2.0:
chroot suffix may also be appended to the connection string. This will run the client commands while interpreting all paths relative to this root (similar to the unix chroot command). If used the example would look like:
127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002/app/a where the client would be rooted at “/app/a” and all paths would be relative to this root - ie getting/setting/etc…
/foo/bar would result in operations being run on
/app/a/foo/bar (from the server perspective).
This feature is particularly useful
in multi-tenant environments where each user of a particular ZooKeeper service could be rooted differently. This makes re-use much simpler as each user can code his/her application as if it were rooted at
/, while actual location (say
/app/a) could be determined at deployment time.
The ZooKeeper client library will pick an arbitrary server
(Note! Not necessarily the local server if there's one running along the client!) and try to connect to it. If this connection fails, or if the client becomes disconnected from the server for any reason, the client will automatically try the next server in the list, until a connection is (re-)established.
When a client gets a handle to the ZooKeeper service, ZooKeeper creates a ZooKeeper
represented as a 64-bit number, that it assigns to the client.
If the client connects to a different ZooKeeper server, it will send the
session id as a part of the connection handshake. As a security measure, the server creates a
password for the
session id that any ZooKeeper server can validate.The password is sent to the client with the session id when the client establishes the session. The client sends this password with the session id whenever it reestablishes the session with a new server.
One of the parameters to the ZooKeeper client library call to create a ZooKeeper session is
the session timeout in milliseconds. The client sends a requested timeout, the server responds with the timeout that it can give the client. The current implementation requires that the timeout be
a minimum of 2 times the tickTime (as set in the server configuration) and
a maximum of 20 times the tickTime.
When a client (session) becomes partitioned from the ZK serving cluster it will begin searching the list of servers that were specified during session creation. Eventually, when connectivity between the client and at least one of the servers is re-established, the session will either again transition to the
connected state (if reconnected within the session timeout value) or it will transition to the
expired state (if reconnected after the session timeout).
It is not advisable to create a new session object (a new
ZooKeeper.class or zookeeper handle in the c binding) for disconnection. The ZK client library will handle reconnect for you. In particular we have heuristics built into the client library to handle things like
herd effect, etc… Only create a new session when you are notified of session expiration (mandatory).
Session expiration is managed by the ZooKeeper cluster itself, not by the client.
When the ZK client establishes a session with the cluster it provides a
timeout value detailed above. This value is used by the cluster to determine when the client’s session expires.
Expirations happens when the cluster does not hear from the client within the specified session timeout period (i.e. no heartbeat).
At session expiration the cluster will delete any/all ephemeral nodes owned by that session and immediately notify any/all connected clients of the change (anyone watching those znodes).
At this point the client of the expired session is still disconnected from the cluster, it will not be notified of the session expiration until/unless it is able to re-establish a connection to the cluster. The client will stay in disconnected state until the TCP connection is re-established with the cluster, at which point the watcher of the expired session will receive the
Example state transitions for an expired session as seen by the expired session’s watcher:
connected: session is established and client is communicating with cluster (client/server communication is operating properly)
- …. client is partitioned from the cluster
disconnected: client has lost connectivity with the cluster
- …. time elapses, after ‘timeout’ period the cluster expires the session, nothing is seen by client as it is disconnected from cluster
- …. time elapses, the client regains network level connectivity with the cluster
expired: eventually the client reconnects to the cluster, it is then notified of the expiration
Another parameter to the ZooKeeper session establishment call is
the default watcher. Watchers are notified when any state change occurs in the client. For example if the client loses connectivity to the server the client will be notified, or if the client’s session expires, etc…
This watcher should consider the initial state to be
disconnected (i.e. before any state changes events are sent to the watcher by the client lib). In the case of a new connection, the first event sent to the watcher is typically the session connection event.
The session is kept alive by requests sent by the client. If the session is idle for a period of time that would timeout the session, the client will send
a PING request to keep the session alive.
This PING request not only allows the ZooKeeper server to know that the client is still active, but it also allows the client to verify that its connection to the ZooKeeper server is still active. The timing of the PING is conservative enough to ensure reasonable time to detect a dead connection and reconnect to a new server.
Once a connection to the server is successfully established
connected there are basically two cases where the client lib generates
connectionloss (the result code in c binding, exception in Java – see the API documentation for binding specific details) when either a synchronous or asynchronous operation is performed and one of the following holds:
- The application calls an operation on a session that is no longer alive/valid
- The ZooKeeper client disconnects from a server when there are pending operations to that server, i.e., there is a pending asynchronous call.
Added in 3.2.0 -- SessionMovedException.
There is an internal exception that is generally not seen by clients called the
SessionMovedException. This exception occurs because a request was received on a connection for a session which has been reestablished on a different server. The normal cause of this error is a client that sends a request to a server, but the network packet gets delayed, so the client times out and connects to a new server. When the delayed packet arrives at the first server, the old server detects that the session has moved, and closes the client connection.
Clients normally do not see this error since they do not read from those old connections. (Old connections are usually closed.) One situation in which this condition can be seen is when two clients try to reestablish the same connection using a saved session id and password. One of the clients will reestablish the connection and the second client will be disconnected (causing the pair to attempt to re-establish its connection/session indefinitely).
Updating the list of servers. We allow a client to update the connection string by providing a new comma separated list of
host:port pairs, each corresponding to a ZooKeeper server.
The function invokes a probabilistic load-balancing algorithm which may cause the client to disconnect from its current host with the goal to achieve expected uniform number of connections per server in the new list. In case the current host to which the client is connected is not in the new list this call will always cause the connection to be dropped. Otherwise, the decision is based on whether the number of servers has increased or decreased and by how much.
For example, if the previous connection string contained 3 hosts and now the list contains these 3 hosts and 2 more hosts, 40% of clients connected to each of the 3 hosts will move to one of the new hosts in order to balance the load. The algorithm will cause the client to drop its connection to the current host to which it is connected with probability 0.4 and in this case cause the client to connect to one of the 2 new hosts, chosen at random.
Another example – suppose we have 5 hosts and now update the list to remove 2 of the hosts, the clients connected to the 3 remaining hosts will stay connected, whereas all clients connected to the 2 removed hosts will need to move to one of the 3 hosts, chosen at random. If the connection is dropped, the client moves to a special mode where he chooses a new server to connect to using the probabilistic algorithm, and not just round robin.
In the first example, each client decides to disconnect with probability 0.4 but once the decision is made, it will try to connect to a random new server and only if it cannot connect to any of the new servers will it try to connect to the old ones. After finding a server, or trying all servers in the new list and failing to connect, the client moves back to the normal mode of operation where it picks an arbitrary server from the
connectString and attempt to connect to it. If that fails, is will continue trying different random servers in round robin. (see above the algorithm used to initially choose a server)