Monitoring With Hubot

Time to start turning Hubot into a whistleblower who is always first to tell you when something goes wrong. We will look through several possible scenarios that could prove to be useful in environment.

Hubot PubSub

After using simple HTTP notification script for many months, I realized that it’s not convenient to route Hubot messages by providing room along with your request. You may want to split the stream into more rooms, and eventually routing gets harder to handle. That’s where Hubot PubSub comes in handy.

Hubot PubSub uses publish-subscribe concept - you ask Hubot to subscribe rooms to various event types, and you can publish events which then get soft-routed to interested parties. It decouples message publishing and subscriptions - systems that publish events don’t have to know what rooms will be subscribing them.

Take a look at https://github.com/spajus/hubot-pubsub - it has an animated GIF showing event routing in action.

Alternative To Hubot PubSub

Hubot PubSub will be used often in the scripts throughout this book. If you don’t need the flexibility it offers and prefer to have fewer dependencies, simply use robot.messageRoom room, message instead of robot.emit 'pubsub:publish', event, message.

Installing Hubot PubSub

To install Hubot PubSub, go to your Hubot directory and install the package:

hubot@focus:~/campfire$ npm install --save hubot-pubsub [email protected] node_modules/hubot-pubsub

Then add "hubot-pubsub" to external-scripts.json, so it looks like this:

["...", "hubot-pubsub"]

You need to restart Hubot afterwards

Subscribing To Event Notifications

Let’s see if our room has any subscriptions, and subscribe it to get error events:

Tomas V.  hubot subscriptions Hubot     Total subscriptions for 585164: 0 Tomas V.  hubot subscribe errors Hubot     Subscribed 585164 to errors events Tomas V.  hubot subscriptions Hubot     errors -> 585164           Total subscriptions for 585164: 1 

You can subscribe as many rooms as you want to receive any set of events you like.

Publishing Events

There are two ways to publish an event. The simple way, mostly used for testing purposes or announcements, is asking Hubot to do it:

Tomas V.  hubot publish news network will be down for 5 minutes - upgrading h\ ardware Hubot     news: network will be down for 5 minutes - upgrading hardware           Notified 2 rooms about news 

When you want to publish events from shell scripts or remote systems, use HTTP requests. This is how it’s done with curl:

hubot@focus:~/campfire$ curl "http://localhost:8080/publish?event=errors&data\ =boom" 

Then in chatrooms subscribed to errors, it will appear like this:

Hubot     errors: boom

You may want to use POST requests rather than GET:

hubot@focus:~/campfire$ curl -X POST \   -d 'event=errors' \   -d 'data=Stack trace in your face' \   "http://localhost:8080/publish" 

Using Event Namespaces For Advanced Message Routing

Hubot PubSub treats . as namespace separator, and it automatically notifies about sub-events, therefore if you have subscribed to errors, you will also receive errors.db and errors.app. And when the volume of errors stream starts getting too big to handle, you can divide and conquer it - subscribe one room to errors.db, and another one to errors.app. Plan ahead and you will never need to change anything in your applications and monitoring scripts.

A quick demo of how splitting up of event stream looks like:

Room: Errors

Tomas V.  hubot subscribe errors Hubot     Subscribed 585163 to errors events Hubot     errors.app: lost connection to redis Hubot     errors.app.500: error 500 in /users/23/update: Transaction timeout Hubot     errors.db: deadlock detected in users table Hubot     errors.app.401: error 401 in /admin: bad login attempt from ip 10.1\ 0.0.48 Hubot     errors.db: db2.infra.net: slave 10 seconds behind master

When you decide to create “Errors: App” and “Errors: DB” rooms:

Room: Errors

Tomas V.  hubot unsubscribe errors Hubot     Unsubscribed 585163 from errors events

Room: Errors: App

Tomas V.  hubot subscribe errors.app Hubot     Subscribed 585164 to errors.app events Hubot     errors.app.404: error 404 in /users/foobar Hubot     errors.app.401: error 401 in /users/login: bad login attempt from i\ p 45.47.184.12 Hubot     errors.app.response: average response time > 0.5 sec

Room: Errors: DB

Tomas V.  hubot subscribe errors.db Hubot     Subscribed 585165 to errors.db events Hubot     errors.db: db2.infra.net: slave 15 seconds behind master Hubot     errors.db: query running over 60 seconds: "select * from users wher\ e status in (1,3,55)"

Later you can split errors.app stream into several parts, each covering different error code, and perhaps not subscribe to errors.app.404 since it’s very noisy.

Handling Unsubscribed Events

Sometimes you may get lost - some events are not received when you expect them, possibly due to mistype in event name. Also, you may want to know what events are not subscribed at all. There is one special event type that catches all unsubscribed events. Subscribe a room to unsubscribed.event and you will get this:

Room: Hubot Debugging

Tomas V.  hubot subscribe unsubscribed.event Hubot     Subscribed 585168 to unsubscribed.event events Hubot     unsubscribed.event: erors.app.fatal: lost connection to db1.infra.n\ et

Now you know that your app is sending erors.app.fatal instead of errors.app.fatal, so that’s why you’re not getting anything when you know it happens.

Securing Hubot PubSub

Events that contain sensitive data should not be sent over plain HTTP. Put Hubot under HTTPS, or use it only in local network or VPN.

To protect against unwanted message publishing, you may want to set a password that will be required when using HTTP endpoints for publishing messages. To do that, set HUBOT_SUBSCRIPTIONS_PASSWORD environmental variable and restart Hubot. Then provide password parameter along with your HTTP requests:

hubot@botserv:~$ curl "http://botserv:8080/publish?event=test&data=hello&pass\ word=secret" 

Publishing Events From Ruby

You’ve seen how to publish events with curl, that means you can use it in bash scripts, but what about your application code?

All you have to do, is make an HTTP request. Every programming language has a library for that, and it’s usually built-in. A basic Hubot client in Ruby can look like this:

require 'net/http'  module Hubot   def self.publish(message, event, host, password=nil)     uri = URI("#{host}/publish")     params = { event: event, data: message }     params[:password] = password if password     uri.query = URI.encode_www_form(params)     Net::HTTP.get_response(uri)   rescue => e     puts "Failed to publish an event via Hubot: #{e}"   end end  Hubot.publish('I like rubies!', 'news', 'http://botserv:8080') 

It would probably be a bad idea to use this script in production under heavy load - there are no timeout settings, and you don’t want HTTP requests to Hubot to lock your app in case Hubot is restarting or down. Here is a better implementation that uses gem:

require 'rest_client'  class Hubot   TIMEOUT = 0.1   ENDPOINT = 'http://botserv:8080/publish'   PASS = 'secret'   def self.publish(event, message)     payload = { event: event, data: message, password: SECRET }     RestClient::Request.execute(method: :post,                                 url: ENDPOINT,                                 open_timeout: TIMEOUT,                                 timeout: TIMEOUT,                                 payload: payload)     rescue => e       Rails.logger.warn("Hubot publish failed: #{event}: #{message}: #{e}")     end   end end  Hubot.publish('news', 'rest-client implementation rocks!') 

Now you can safely use it in production knowing it will have minimal impact to your users no matter what happens.

What And When To Monitor

Monitoring is a tricky game, those who are on call usually hate it. There is even a movement called . Monitoring with Hubot can suck too, just start publishing everything to hubot and your chatrooms will soon be firehosed with messages that nobody will read.

A good way to monitor is to have a separate room for errors where you publish critical messages instantly, and warn about error rate when it reaches certain threshold. Critical messages must appear really often. If your application can run without major problems after an error - it’s not critical.

Monitoring Error Rates With Graylog2 Alarms And Hubot PubSub

Graylog2 is an open source data analytics system for aggregating, searching and charting your logs. It can be configured to publish it’s alarms to Hubot. In the older version of Graylog2 web interface (below 0.20.0), it is done using “Exec alarm callback” plugin. A short recipe to get you started:

Install graylog-server plugin
Configure some alarms in your graylog streams (streams -> <stream> -> alarms)
In Graylog2 web interface go to “System settings” (settings -> system)
Find “Exec alarm callback” configuration block
Check “Forced for all streams”
Click “Configure” and set the executable command to the path to alarm callback notification script, which should be executable and have following contents (make sure to change the configuration):

/opt/graylog2/hubot-alarm-callback

#!/usr/local/bin/ruby  ##### Configuration ####################################### hubot_pubsub_uri = 'http://botserv:8080/publish' hubot_pubsub_pass = 'secret' hubot_pubsub_event = 'graylog.alert'  graylog_messages_uri = 'https://graylog.infra.net/messages' ###########################################################  require 'uri' require 'net/http'  # Example: # Stream message count alert: [errors] topic = ENV['GL2_TOPIC']  # Example: # Stream [errors] received 549 messages in the last 5 minutes. Limit: 500 desc  = ENV['GL2_DESCRIPTION']  begin   uri = URI(hubot_pubsub_uri)   params = { password: hubot_pubsub_pass,              event: hubot_pubsub_event,              data: "#{graylog_messages_uri} -> #{desc}" }   uri.query = URI.encode_www_form(params)   Net::HTTP.get_response(uri) rescue => e   puts "Hubot message failed: #{e}" end

You can test the script like this:

[email protected]:~$ GL2_TOPIC="test topic" \   GL2_DESCRIPTION="test decription" \   /opt/graylog2/hubot-alarm-callback

Now, subscribe some room to graylog.alert using hubot-pubsub, and you will start getting messages like this:

Hubot   graylog.alert: https://graylog.infra.net/messages -> Stream [background-jobs] received 272 messages in the last 5 minutes. Limit: \ 150 

Receiving Nagios Alerts

It is possible to configure Hubot to receive Nagios service and alerts. We will do it by sending an HTTP POST from Nagios to Hubot. To start, we will create a simple Hubot script that consumes a POST on /nagios/alert:

Hubot script that would consume this HTTP POST would look like this:

scripts/nagios-alert.coffee

# Description: #   Receives Nagios alerts and posts them to chatroom # # Dependencies: #   "hubot-pubsub": "1.0.0" # # URLS: #   POST /nagios/alert (message=<message>)  module.exports = (robot) ->   robot.router.post "/nagios/alert", (req, res) ->     res.end()     robot.emit 'pubsub:publish', 'nagios.alert', req.body.message

Now, let’s implement the rest in Nagios. We will assume that your Nagios configuration is at /etc/nagios.

To begin with, create a simple shell script that would take message as a variable and post it to Hubot’s HTTP endpoint. We will put it in /etc/nagios/plugins/notify_by_hubot.sh:

/etc/nagios/plugins/notify_by_hubot.sh

#!/bin/bash HUBOT_URL="http://botserv:8080/nagios/alert" MESSAGE=$1 curl $HUBOT_URL --data-urlencode message="$MESSAGE"

To use it from Nagios, we will have to add notify-service-by-hubot and notify-host-by-hubot command definitions to /etc/nagios/objects/commands.cfg:

/etc/nagios/objects/commands.cfg

define command{   command_name notify-service-by-hubot   command_line /etc/nagios/plugins/notify_by_hubot.sh \     "Service $NOTIFICATIONTYPE$: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$" }  define command{   command_name notify-host-by-hubot   command_line /etc/nagios/plugins/notify_by_hubot.sh \     "Host $NOTIFICATIONTYPE$: $HOSTALIAS$ is $HOSTSTATE$" }

Then define Hubot contact in /etc/nagios/objects/contacts.cfg:

/etc/nagios/objects/contacts.cfg

# Hubot chat notifications define contact{   contact_name                    hubot   use                             generic-contact   alias                           Hubot   service_notification_commands   notify-service-by-hubot   host_notification_commands      notify-host-by-hubot }

And add hubot contact to a group that is configured to receive notifications, i.e. admins:

/etc/nagios/objects/contacts.cfg

define contactgroup{   contactgroup_name       admins   alias                   Nagios Administrators   members                 jimmy,spajus,hubot }

Restart Nagios, subscribe your Hubot chatroom to nagios.alert and wait for it. Having a separate chatroom for Nagios alerts would be wise, because it may get noisy.

Tomas V.   hubot subscribe nagios.alert Hubot      Subscribed 585164 to nagios.alert events Hubot      nagios.alert: Service PROBLEM: jobs/resque is WARNING Hubot      nagios.alert: Service PROBLEM: webapp/rake-stuck is WARNING