SoloManager

From Eigenvector Documentation Wiki
Jump to: navigation, search

This page describes the SoloManager program and its usage

Introduction

The purpose of SoloManager is to start a target program locally and to then continuously monitor the target program's availability. The target program responds to tcp/ip queries on a specified port if it is operating normally. If the target program becomes unresponsive for for a specified period of time then the SoloManager can terminate it and restart it, and/or reboot the host computer entirely. Many aspects of the SoloManager program can be configured by specifying values in the SoloManager.ini text file.


Description of components

SoloManager .jar file

This contains the SoloManager program, and a sample SoloManager.ini file. It also contains all necessary Java library files.

SoloManager configuration file

Contains configuration details specifying how the SoloManager operates. See the example configuration file listed below.

Target program

This is an the program we wish to monitor and to ensure is always available. It must expose a TCP port and respond to socket queries on that port.

Wrapper service (optional)

This is an optional component which will start SoloManager whenever the host computer is booted up. It is described in the sections below about starting SoloManager as a Service or Daemon.

Relationships and processing sequence

These components are related as shown in the SoloManager flowchart.

Flowchart.png

Typical process flow

The SoloManager is typically started automatically when the host computer is booted up, usually via the Service and Daemon Wrapper.

Once started, the SoloManager begins by reading in values for all configurable parameters from the SoloManager.ini file. This file can be edited by the user to specify their preferred settings but it must be located in the same directory as the SoloManager jar file. This is where the user specified the name of the target executable which SoloManager will start and monitor, for example.

SoloManager then begins its unending loop where it checks the status of the target program. SoloManager creates a socket connection to the target program and sends a query. If the target program is alive it sends a response which must match what SoloManager is expecting.

SoloManager checks the Target program is alive by:

  1. opening a socket on the target program's port
  2. Sending the parameter "msgToSocket" to the socket and verifying that the first line returned from the socket equals the parameter "expectedResponse".
If the response is not valid SoloManager will repeat this check up to "fastFailCountLimit" times with a pause of "FastQueryIntervalSeconds" seconds.
If the response is valid the check is complete with result success.

If the target check was successful then the failure counter is reset to zero and the loop repeats after a specified pause period of "SlowQueryIntervalSeconds" seconds. If the target check was not successful then the failure counter is incremented. The loop continues until this counter reaches a specified "nResponseRestart" counter value, whereupon SoloManager issues a command to restart the the target program and continue with the loop. If the target program restarts then the next check will be successful so the loop continues normally.

If the restart command does not succeed in restarting the target program then the target checks will continue failing and the failure counter incrementing until it eventually attains the specified "nResponseReboot" counter value. At this point SoloManager issues a command to reboot the host computer and the entire process begins again.

During these operations SoloManager writes status information to a log file and optionally can send e-mail to report events. The log file will be located in the directory specified by "outdir". Its size is limited to the last "logFileMsgCapacity" log messages. E-mailed alerts are optional and are enabled by setting "enableEmailing" = true. In this case e-mail messages will be sent to the specified user whenever:

  1. The SoloManager program starts.
  2. SoloManager is about to issue a restart command for the target program.
  3. SoloManager is about to issue a reboot command to the host computer's operating system.

Dependencies

SoloManager requires the following:

  1. Java version 1.5 or later is available on the host computer.
  2. It must be able to write to a log file on the filesystem.
  3. It must be able to issue a system reboot command (command can be defined within the configuration file).
  4. Operating system may be any of: Linux, Windows (2000, XP, 2003, 2008, Vista, 7), or MAC


Configuration File

The configuration file will almost always need to be modified for the individual application and installation settings. An example file is included below, but a few key settings to modify include:

  • executableName Name of the program to run (usually either Solo.exe or Solo_Predictor.exe.)
  • startExecutableCommandPre Full path to the program listed as executableName (unless the program's folder has been added to the system path by the installer.)
  • outdir Specifies the folder which should contain the log files. By default these will be written to the same folder as the configuration file, but another file may be preferable if the user does not have read/write permissions to that folder.
  • maxTargetRunDurationHours The target will be stopped and restarted every maxTargetRunDurationHours if this is a positive number. It has no effect if it is not a positive number.
  • nResponseRestart and nResponseReboot indicates how many target check failures must occur before the application is restarted and/or the system is rebooted (respectively). If the Target Application fails after starting successfully, it will be detected by the next normal check, which occur every slowQueryIntervalSeconds seconds. When a target check fails a restart is invoked after (fastQueryIntervalSeconds+1)*fastFailCountLimit*nResponseRestart seconds. If the restart attempts fail then a system reboot is invoked after (fastQueryIntervalSeconds+1)*fastFailCountLimit*nResponseRboot seconds. Thus the worst case total elapsed time, in seconds, from the target failing until an action occurs can be roughly calculated by:
ResponseTime = slowQueryIntervalSeconds + (fastQueryIntervalSeconds+1)*fastFailCountLimit*nResponse____


The settings in the configuration file represent likely minimum settings. If longer delays are acceptable before a response, increase the fastQueryIntervalSeconds and/or the nResponse___ settings.

--------------------------------------------------------------------------
------------ start: Example SoloManager.ini configuration file -----------
# default values for the SoloManager
#
# Period to pause when fast and slow polling the executable
fastQueryIntervalSeconds = 2
slowQueryIntervalSeconds = 6
#
# How many times to poll when getting fail result before escalating the response level
fastFailCountLimit = 2
# The initial fastFailCountLimit is usually larger, to allow time for target system startup
startFastFailCountLimit = 15
#
# How many fast cycles should occur with fails before applying response for level 1, 2, etc.
# Note: set to zero or a negative integer to suppress the response action from occurring
#nResponse1
nResponseRestart = 1
# nResponse2
nResponseReboot = 3
#
# maxTargetRunDurationHours. Non-positive value disables this feature.
# Positive value must be greater than 0.05 (hours)
maxTargetRunDurationHours = 0
#
# executable details
executableName = solo_predictor.exe
startExecutableCommandPre = c:\\Progra~1\\EVRI\\Solo_Predictor\\application\\runtime\\win32\\
startExecutableCommandPost =
stopExecutableCommandPre = taskkill /F /IM \"
stopExecutableCommandPost = \"
#
# reboot
rebootCommandPre =
rebootCommandPost =
rebootCommand = shutdown /?
#
# executable socket details
serverIP = 127.0.0.1
serverPort = 2211
#
# log file capacity
logFileMsgCapacity = 6000
#
# Output directory. DO NOT add surrounding quotes
outdir = .
#
# must be true or false, case insensitive:
enableEmailing        = false
#
# mailserver

mailServer            = mail.eigenvector.com

mailServerPort        = 587
mailUsername          = USERNAME@eigenvector.com
mailPassword          = PASSWORD
# Note: mail Addresses cannot include spaces and must be well-formed addresses
mailRecepientAddress  = SOMEONE@gmail.com
# Use something which will be a valid e-mail address:
mailSenderAddress     = monitor@solopredictor.com
#
//---------- start: Example SoloManager.ini configuration file -----------

Starting SoloManager Automatically

SoloManager is most useful when run automatically by an operating system. This will start the Target Application in the background. The following describes how to install SoloManager as a service (Windows) or daemon (Linux).

Running SoloManager as a Windows Service

The service folder in the SoloManager main folder contains the tools necessary to run SoloManager as a Windows service. This will automatically start the application without a user logging in. Follow these instructions to install SoloManager as a Windows service:

  1. Copy the application files onto the computer on which the application is to be run.
  2. Configure solomanager.ini as needed for the intended behavior.
  3. Copy solomanager.ini into the "service" folder. This copy of solomanager.ini will be used by the service.
  4. Run the Install_Service.bat file in the service folder to install the service (this batch file must be run by a user with administrative privileges).

Note: different versions of Windows have differing levels of user access control. For example, under Windows XP it is typically sufficient to be logged in as a user with administrative privileges to successfully install the service described in step 4 above. With Windows Vista and higher, even if you are logged in as an administrator by default you do not have administrative privileges when launching an application. To run an application in an administrative mode, you will have to right click on the application icon and select "Run as an administrator".

To workaround the issue, you will need to open a command window as an Administrator. To do so, click on Start and search for "cmd". Right-click on "cmd.exe" and select the option "Run as Administrator". From this command window, you will be able to run all of the necessary batch files at administrative level.

Troubleshooting Windows Service Problems

  • Errors and status messages will be reported to the log files stored in the C:/temp folder (if this doesn't exist, the log will be created in the same folder as the wrapper.exe). To move logs to a different location, edit the service/conf/service.conf file. You can also modify the logging behavior in this file (maximum length, number of log backups, etc.)
  • Several of the configuration files expect the folder C:/temp to exist. If it does not, you may receive "Null Pointer Exception" errors from the SoloManager application. Either modify the service.conf and solomanager.ini files or create the folder as needed.
  • An error in the log saying that Java could not be found usually means that the service was unable to locate java in the standard Solo_Predictor folder. If encountered, edit the service/conf/service.conf file and locate the "wrapper.java.command" property. The usual value for this property is:
  C:/Progra~1/EVRI/Solo_Predictor/application/sys/java/jre/win32/jre/bin/java
which is the default Solo_Predictor sub-folder in which the 32-bit version of Java is located. If Solo_Predictor is installed in a location other than the default folder, or you are using the 64-bit version of Solo_Predictor, change this value to reflect the correct location. For 64-bit Solo_Predictor, replace "win32" with "win64".
An alternative solution to the above issue is to execute the service with the credentials of a specific user that has a full copy of Java installed. To resolve this issue, go to the windows "Services" control panel, locate the EVRI SoloManager service, double-click the service and change the "Log On" properties to a specified user.
  • If you have problems, try running the test script:
 Test_Service
to see if the server will start when run manually. Errors from this script can be used to adjust the service.conf file.
  • To uninstall the service, run the Uninstall_Service.bat file (as an administrator.)

Running SoloManager as a Unix/Linux Daemon

The daemon_linux folder in the SoloManager main folder contains the tools necessary to run SoloManager as a Linux daemon. This will automatically start the application without a user logging in. Follow these instructions to install SoloManager as a Linux Daemon:

  1. Copy the application files onto the computer on which the application is to be run.
  2. Configure solomanager.ini as needed for the intended behavior.
  3. Copy solomanager.ini into the daemon_linux folder. This copy of solomanager.ini will be used by the daemon.
  4. Run the Install_Daemon script to install the daemon (this batch file must be run by a user with root privileges).
./Install_Service

NOTE: In order to execute this script and have the daemon operate correctly, you may have to manually set the "execute" bit on all files in the top-level daemon_linux folder to "on" using the chmod command inside the daemon_linux folder:

chmod 755 *

Errors and status messages will be reported to the log files stored in the daemon_linux/logs folder. To move logs to a different location, edit the daemon_linux/conf/wrapper.conf file. You can also modify the logging behavior in this file (maximum length, number of log backups, etc.)

To uninstall the daemon, run the Uninstall_Daemon script (as root.)

./Uninstall_Daemon

If you have problems, try running the test script:

./Test_Daemon

to see if the server will start when run manually.