Warm failover addresses the issue of local availability of data in the face of a system hardware or software failure. It consists of two sets of failover features - Server failover features and ObjectStore APIs useful to an ObjectStore client application writer.
Asynchronous replication addresses the issue of wide-area availability by providing a method for replicating data to another system, potentially at another geographic location.
Failover provides nonstop service of client applications so that a single Server failure does not affect a running application. It does this by shifting processing to a secondary Server system in the event of a failure of the primary system.
A failover system is made up of two ObjectStore Servers, running on different machines, that share a disk. If the primary Objectstore Server fails, the client process continues operation by connecting to the secondary Server. Service is uninterrupted and you can continue to access and modify the database.
The primary Server process accepts connections from an ObjectStore client. The secondary Server does not accept connections from clients, but waits offline, ready to take over should the primary Server fail.
Warm failover alone cannot guarantee 100% fault tolerance as the shared disk can crash, the network can fail, and so on.
Each of the ObjectStore Servers that implements a failover Server must have a rawfs and log file on a disk that is shared by the two machines upon which the Servers are running. They must also be running on the same software architecture.
The ObjectStore client run time, through the use of the new APIs, locates the on-line Server of the failover Server using a locator file that is local to the client. The locator file declares configuration information for a failover Server.
You must set the Failover Heartbeat Time parameter in the Server parameter file and restart the Server with the -upgradeRAWFS switch set. To reconfigure without failover, remove the parameter from the Server parameter file and restart the Server with the -upgradeRAWFS switch. The next paragraph describes this parameter.
Partition0: partition /dev/rdsk/c0t0d0s3 Failover Heartbeat Time: 2
The ObjectStore client application should always use the logical Server host name when manipulating a database on a failover Server. This is because, when an application creates a database on the failover Server and the secondary Server is on line, the pathname of the database includes the logical Server host name, not the name of the secondary Server that created the database. If the application uses the nonlogical Server name in a pathname, the exception err_not_supported is raised.
When using locator files to implement failover Servers, be certain that the locator file is in a file that is local to the client application so NFS is not involved in a failover situation.
FAILOVER_SERVER server_host_1 ALTERNATIVE_SERVER server_host_2 RECONNECT_TIMEOUT integer # in seconds RECONNECT_RETRY_INTERVAL integer # in seconds ENDserver_host_1 is also the logical Server host name that is recorded in all database pathnames to databases of this failover Server. server_host_2 specifies the backup Server of a failover Server pair. Both values are required.
RECONNECT_TIMEOUT is the maximum amount of time that a client will attempt to reconnect to a failover Server before raising the exception err_broken_replicated_server_connection.
RECONNECT_RETRY_INTERVAL specifies how often the two failover Servers are pinged during the RECONNECT_TIMEOUT. RECONNECT_RETRY_INTERVAL should be less than or equal to RECONNECT_RETRY_TIMEOUT.
Also, RECONNECT_RETRY_INTERVAL cannot be zero if RECONNECT_TIMEOUT is nonzero. The exception err_locator_syntax is raised if these constraints are violated.
The locator file is read the first time it is needed by the ObjectStore run time. The locator file is not read again by the application unless the method objectstore::set_locator_file() is called. Then, the locator file is reread the next time the ObjectStore run time uses its contents.
This means that client programs written using lexical transactions do not need to be modified to take advantage of the failover feature.
Use of these interfaces is not required. They include the os_failover_server class and functions in other classes, all described here.
static char* get_locator_file() ;Returns a string representing the locator file. If the first character of the string is a white-space character or #, the string is the contents of the file rather than a file name.
The caller should delete the returned value.
char* get_host_name();For failover Servers, this function returns the logical failover Server host name. Note that the logical Server name is not always identical to the Server name for the machine providing access to the database. The caller should delete the returned value. See os_failover_server::get_online_server().
os_boolean is_failover() const;Returns true if and only if the Server is also a failover Server.
This method is used to identify the os_failover_server in the list of Servers returned by objectstore::get_all_servers().
class os_failover_server : public os_serverThis class is derived from os_server.
The types os_int32 and os_boolean, used throughout this manual, are each defined as a signed 32-bit integer type. The type os_unsigned_int32 is defined as an unsigned 32-bit integer type.
Programs using this class must include <ostore/ostore.hh>, followed by <ostore/coll.hh> (if ObjectStore collections are used).
char* get_logical_server_hostname() const;Returns the logical name of a failover Server. A failover Server should always be referred to by its logical Server name.
The caller should delete the returned value.
char* get_online_server_hostname() const;Returns the Server that the client is currently connected to, either the logical Server, alternative Server, or the empty string if there is no connection.
The caller should delete the returned value.
os_unsigned_int32 get_reconnect_retry_interval() const;Returns the frequency with which to ping both Servers composing a failover Server pair while attempting to reconnect to them.
os_unsigned_int32 get_reconnect_timeout() const;Returns the maximum amount of time that a client application will attempt to reconnect to a broken failover Server connection.
After this amount of time passes, the following exception is raised:
err_broken_failover_server_connection
os_boolean set_reconnect_timeout_and_interval( os_unsigned_int32 total_timeout_secs, os_unsigned_int32 interval_secs);Sets the total amount of time to try to reconnect a broken connection to a failover Server. The interval_secs argument is used to control how frequently the Servers of a failover Server pair are pinged to see if they are available.
Returns true if the reconnect timeout has been reset with the specified parameters.
If the parameters are invalid, the function returns the value false and does not change the reconnect_timeout or reconnect_retry_interval.
Invalid parameters are those for which
err_failover_server_refused_connectionRaised when the initial connection to a failover Server pair cannot be made.
err_broken_failover_server_connectionRaised when neither the logical nor alternative Servers comes back up in some predetermined maximum amount of time that a client process should wait for either Server to come up on its own.
err_server_restartedRaised when a failover Server connection is discovered to be lost, and then one of the logical or alternative Servers comes back up before RECONNECT_TIMEOUT. Lexical transactions are restarted when the exception err_server_restarted is raised within them.
err_not_supportedRaised when the alternative Server name is used directly to reference a database. An Objectstore application should only reference the logical Server name of a failover Server pair.
Also raised when os_dbutil::ping_failover_server is called and the locator file does not declare hostname as a failover Server.
err_conflicting_failover_configurationRaised by os_dbutil::ping_failover_server() if both Servers composing the failover Server pair are alive or if a server stat of the on-line Server indicates that it is not a failover Server.
ObjectStore Release 4 and later databases and rawfs directories can be replicated, as well as all ObjectStore Release 4 file databases. Native file system directories cannot be replicated.
See osreplic: Replicating Databases for further information.
Updated: 03/26/98 20:44:34