The automated graceful shutdown I setup for the hosts protected by my APC UPS on Sunday is working well - but I decided I want to be notified when there’s a power event. Building on Amazon Simple Notification Service, I’ve created some simple scripts that notify me both by email and by SMS when the power fails, is restored and when the 10 minute remaining charge threshold has been reached and the hosts are being auto-shutdown.
Here’s how I did it:
Create AWS Resources
For this feature I needed the following:
- SNS topic to receive the messages
- Subscription to publish to email
- Subscription to publish to SMS
- IAM user with permission to call the
publish
API on the topic
For any real project you should always define your AWS resources in CloudFormation following the concept of Infrastructure-as-Code, so I defined all the above resources in a CloudFormation template which you can grab from my GitHub account if you want to build something similar.
https://github.com/brendonmatheson/apcupssns
I deployed the stack defined by this template using the following command-line:
aws cloudformation create-stack \
--profile bren@prod \
--stack-name "control-notification" \
--template-body file://01-sns-topic.yaml \
--capabilities CAPABILITY_NAMED_IAM
Note that bren@prod
is a named profile that I had previously defined using aws configure --profile bren@prod
. I like to use only named profiles and never the default profile so it’s an explicit choice about which account a command is run in.
I actually wrap this command up in a “up” script and also have a corresponding “down” script - this is useful when you’re developing the CloudFormation stack:
# Deploy the stack
./up.sh
# Tear down the stack
./down.sh
These scripts are also in the GitHub repo in Bash and PowerShell variants.
Once the stack had deployed I then went into the IAM console and generated a new access key credential for the service user, and copied the access key and secret key into my password safe.
The CloudFormation stack creates an email subscription which has to be confirmed before it can be used, so I went to my inbox and clicked through the confirmation email that AWS had just sent me.
A good reference in case you want to extend or adapt this template is the KB article How do I create a subscription between my Amazon SQS queue and an Amazon SNS topic in AWS CloudFormation?
Test Subscriptions
You can optionally test that the subscriptions work by opening the Topic from the SNS console and hitting the Publish Message button to send a test message. This should come through to both the email and SMS subscribers.
Define apcupsd Event Handlers
On my control node, logged in as root, I installed the AWS CLI v1:
pip3 install awscli --upgrade --user
echo PATH=$PATH:/root/.local/bin >> /root/.profile
. /root/.profile
I then defined a named profile for the service user that was created by my CloudFormation template:
aws configure --profile=control-notification-service@prod
And here entered the access key and secret key from my password safe.
I also installed the boto3 library since I’ll be writing Python scripts to talk to AWS:
pip3 install boto3
Now as we saw last time to define new event handlers we create executable shell scripts in /etc/apcupsd
named after the event. In this case we want to handle the events onbattery and offbattery. These scripts are both identical differing only in the message that is sent.
onbattery:
#!/usr/bin/env python3
import boto3
print("onbattery")
boto3.setup_default_session(profile_name="control-notification-service@prod")
client = boto3.client("sns")
client.publish(
TopicArn="arn:aws:sns:<myregion>:<myaccountnumber>:control-notification",
Message="Power has failed at home. Running on battery",
Subject="Power has failed at home. Running on battery")
offbattery:
#!/usr/bin/env python3
import boto3
print("offbattery")
boto3.setup_default_session(profile_name="control-notification-service@prod")
client = boto3.client("sns")
client.publish(
TopicArn="arn:aws:sns:<myregion>:<myaccountnumber>:control-notification",
Message="Power has been restored at home. Running on mains",
Subject="Power has been restored at home. Running on mains")
Note replace <myregion> and <myaccountnumber> with your real values if you want to use these scripts.
I also replaced my original doshutdown Bash script with a Python equivalent so that I could also have it send a notification message:
#!/usr/bin/env python3
import boto3
import os
print("doshutdown")
boto3.setup_default_session(profile_name="control-notification-service@prod")
client = boto3.client("sns")
client.publish(
TopicArn="arn:aws:sns:<myregion>:<myaccountnumber>:control-notification",
Message="Performing host shutdown",
Subject="Performing host shutdown")
os.system("ssh shutdownbot@nas \"sudo /sbin/shutdown -h now\" &")
os.system("ssh shutdownbot@backup \"sudo /sbin/shutdown -h now\" &")
os.system("ssh shutdownbot@compute \"sudo /sbin/shutdown -h now\" &")
Testing
I tested all three scripts by invoking them directly from the command-line, then did the big test and pulled and then restored mains power from the UPS to get the failed / restored notifications. It all worked smoothly!
Security
Two notes on security:
- The control-notification-service IAM user account has limited permissions - all it can do is call publish on that one topic so the blast radius for compromise is quite small.
- Storing the credentials for the user in the named profile under the root user on the control node is not ideal but currently I don’t have a better option on my home network such as Hashicorp Vault, so for now this is how it has to be.
Future Improvement
Sending messages from on-prem to SNS requires an Internet connection. In a power outage my Internet connection should stay up because the ADSL router is also powered from the UPS so the “power failed” message should always make it out. It’s always possible even under normal circumstances for sending to fail in case there’s a transient network error.
If the UPS battery is exhausted, the ADSL router will lose power and the Internet link will be down. In this case when mains is restored the control node will immediately attempt to dispatch the “power restored” message but this will occur before the ADSL modem and the rest of the network fabric is back up and that message will fail.
A future improvement to make this more robust will be some local buffering of messages so that the control node can retry when sending fails. This will change the messaging semantics from at-most-once to at-least-once so some deduplicating logic in a Lambda function on the AWS side might also be useful.
Conclusion
Handling power events in my home lab is automated and hands-off but being able to be aware of power events in near real-time is still nice. Normally I don’t find out about power failures until I get home and realize stuff is off, or try to remote into something. Knowing that there was a power event will be useful, particularly when I’m trying to connect to my home lab from another location.