elasticsearch-watchdog

0.1.6 • Public • Published

elasticsearch-watchdogNPM version Build Status

A watchdog of elasticsearch - cluster nodes' statuses monitor, auto restart, keep PRIMARY node unique.

In my situation, millions data are indexed to ElasticSearch everyday, and our cluster has too many nodes, we spent a lot of time to make it stable and reliable, but unfortunately, they crash every few months due to:

  • Status changes to red or grey.
  • Different primary nodes but not a unique one (like autocephaly).
  • Unresponsive (HTTP timeout, shake failed and all that stuff).
  • Other issues.

What Can Watchdog Do

  • Monitor statuses/healths/states of ElasticSearch cluster/node.
  • Auto restart ElasticSearch through openSSH.
  • Quick look of Watchdog statuses any where, especially on mobile device.
  • Make every day is Sunday.

Installation

$ npm install elasticsearch-watchdog -g

Usage

watchdog

 
  Usage: watchdog [cmd] [file|name]
 
  Commands:
 
    pwd <password>            encrypt the password
    encrypt [options] <file>  encrypt the configuration file and save it to disk
    tmpl <name>               render a configuration template
    start [options] <file>    start watching on an ElasticSearch cluster
    stop <uid>                stop watching by `uid`, all the watchdogs will be killed if `uid` is `all`
    restart <uid>             restart watching by `uid`, call the watchdogs back and then send them out for watching again if `uid` is `all`
    ls [options]              list all the watchdogs we have
    web [port]                launch a web GUI, port default by 8088
 
  Options:
 
    -h, --help     output usage information
    -v, --version  output the version number
    -r, --root     the root location, you can find all logs here.
 
  Basic Examples:
 
    Start a watchdog, by file:
    $ watchdog start watchdog.yml
 
    Restart the alive watchdog, by uid:
    $ watchdog restart 1001
 
    Restart all watchdogs:
    $ watchdog restart all
 
    Stop the watchdog, by uid:
    $ watchdog stop 1001
 
    Stop all the watchdogs:
    $ watchdog stop all

encrypt

Usage: encrypt [options] <file>
 
  Options:
 
    -h, --help  output usage information
    --no-blank  remove the blank line if this option is provided

tmpl

$ watchdog tmpl <file>

<file> is the name of configuration file, .yml is optional, i.e. $ watchdog tmpl es-server and $ watchdog tmpl es-server.yml are both fine.

start

 Usage: start [options] <file>
 
  Options:
 
    -h, --help          output usage information
    --no-daemon         running watchdog as a service, otherwise in the terminal
    -m, --max <number>  maximize retry count when dog has died

stop

$ watchdog stop <uid>

All the watchdogs will be killed if uid is all. Head over to Printf to get more information about uid.

restart

$ watchdog restart <uid>

All the watchdogs will be called back and then sent out for watching if name is all. Head over to Printf to get more information about uid.

ls

  Usage: ls [options]
 
  Options:
 
    -h, --help   output usage information
    --no-format  print list as JSON without formatting

web

# simple 
$ watchdog web [port]
 
# daemonic 
# start 
$ nohup watchdog web > /dev/null 2>&1 & echo $! > /path/to/watchdog.pid
# stop 
kill -9 `cat /path/to/watchdog.pid`

Port of web interface is optional (8088 by default). In order to have a perfect viewport, using your mobile device in a landscape mode, but not portrait.

GUI:

image

And a restful interface is providing yet, i.e.:http://[domain|ip]:[port]/json.

Printf

Take an example for $ watchdog ls, the output will be formatted like following.

image

  • name

    CLUSTER-SERVER and PERCOLATOR-SERVER are names of the Watchdog.

  • uid

    7707 and 6384 are uids of the Watchdogs, run $ watchdog stop 7707 or $ watchdog restart 7707 to do a stop/restart operation.

  • colors

    red, yellow, grey and green are the statuses of ElasticSearch.

  • symbols

    means primary node, means leaves (not master nodes).

  • dim style

    • UNKNOWN [missing status] / 192.168.100.112 [unknown]

      It means unknown primary node, and can not get the status through _cluster/health / _cluster/state API.

    • 192.168.100.166 [error]

      It means can not connect to server through openSSH, and you'd better check the logs (~/.watchdog/logs/).

Programmatic

var Watchdog = require('watchdog');
 
// load configuration.
var monit = Watchdog({
  conf: '/path/to/conf.yml',
  uid: false
});
 
// listen events.
monit.on('info', function(msg){
  console.log('[INFO]', msg.type, msg.message);
});
 
// start watching.
monit.watching();
 
// end it.
// monit.end();

Configuration

Execute $ watchdog tmpl my-es to render a copy one, edit it to meet the individual requirements. BTW, it almost supports all the YAML syntaxs.

In order to restart ElasticSearch smoothly, if you have ElasticSearch running then stop the process and start it using:

$ elasticsearch -d -p /path/to/es.pid [options]

Local environment

If you're running Watchdog and ElasticSearch on a same server, get the IP address by visit:

http://localhost:9200/_cluster/state

The transport_address of current server is which you're binding to ElasticSearch, and there is no need to provide nodes.ssh.password in configuration for it.

Examples

Head over to example or test directories.

Test

$ npm test

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Package Sidebar

Install

npm i elasticsearch-watchdog

Weekly Downloads

1

Version

0.1.6

License

Apache, Version 2.0

Last publish

Collaborators

  • tjatse