My control node which was already running apcupsd
for monitoring purposes is now configured to automatically power off my NAS, backup server and KVM compute node when the remaining charge in the UPS drops to 10 minutes or less. Here’s how I did it.
This post assumes you already have apcupsd
running - for a guide on setting it up see my earlier post on apcupsd
and apcupsd_exporter
.
doshutdown Script
The main script in /etc/apcupsd
is apccontrol
and this script is invoked by the apcupsd
daemon process when it detects power state changes. You should not actually edit this file directly - however in this script you will see dispatchers for each of the power events that apcupsd
handles:
case "$1" in
killpower)
echo "Apccontrol doing: ${APCUPSD} --killpower on UPS ${2}" | (${WALL} 2>/dev/null || cat)
sleep 10
${APCUPSD} --killpower
echo "Apccontrol has done: ${APCUPSD} --killpower on UPS ${2}" | (${WALL} 2>/dev/null || cat)
;;
commfailure)
echo "Warning communications lost with UPS ${2}" | ${WALL}
;;
commok)
echo "Communications restored with UPS ${2}" | ${WALL}
;;
...
To handle one of these events you just need to create a script with the name of that event in the /etc/apcupsd
directory. The event we want to handle is doshutdown which is triggered when the time threshold or one of the other thresholds configured in apcupsd.conf
is reached.
In my case I just wanted to issue shutdown to the three hosts mentioned above - here is my doshutdown
script:
$ cat /etc/apcupsd/doshutdown
#!/bin/bash
WALL=wall
echo "Shutdown initiated by apcupsd" | ${WALL}
echo "Issuing shutdown command to nas" | ${WALL}
ssh shutdownbot@nas "sudo /sbin/shutdown -h now" &
echo "Issuing shutdown command to backup" | ${WALL}
ssh shutdownbot@backup "sudo /sbin/shutdown -h now" &
echo "Issuing shutdown command to kvmcompute" | ${WALL}
ssh shutdownbot@kvmcompute "sudo /sbin/shutdown -h now" &
See the documentation for a full explanation of customizing event handlers. Note that the script needs to be executable - ie. chmod 755 doshutdown
.
As you can see from this script we need a user called shutdownbot
on each of the hosts which has sufficient rights to sudo shutdown
with no password.
Remaing Minutes Threshold
I set the threshold in /etc/apcupsd/apcupsd.conf
to 10 minutes - meaning that when apcupsd
detects that the UPS is down to 10 minutes charge it will trigger the doshutdown
action:
MINUTES 10
Key Pair
apcupsd
is running as root on my control node (not totally happy with that but that is a backlog item for another day). Since the root user will be ssh’ing to each box I created a SSH keypair as root:
sudo ssh-keygen -t rsa -b 2048
Then grabbed the public key as this will need to be authorized on each of the target nodes
cat /root/.ssh/id_rsa.pub
Linux Hosts
On my KVM compute node and backup node which both run Linux provisioning the shutdownbot
user followed the same process:
sudo useradd -u 2004 -m -s /bin/bash shutdownbot
I have a little table of user ID’s for these service accounts so that I can keep them consistent across hosts - in this case shutdownbot got the ID 2004.
After creating the user I edited the sudoers file and granted shutdownbot the ability to run the shutdown command:
sudo visudo
The config rule to allow this is:
shutdownbot ALL=(ALL) NOPASSWD:/sbin/shutdown -h now
Importantly we’re only allowing shutdownbot
to run one specific command as sudo
, meaning the blast radius of the account being compromised is limited. The NOPASSWD
directive is required so that it can be invoked by the apcupsd daemon without human interaction.
Finally we can register the previously created public key as an authorized key for shutdownbot
:
mkdir /home/shutdownbot/.ssh
chmod 700 /home/shutdownbot/.ssh
echo <paste id_rsa.pub contents here> > /home/shutdownbot/.ssh/authorized_keys
chmod 600 /home/shutdownbot/.ssh/authorized_keys
chown -R shutdownbot:shutdownbot /home/shutdownbot/.ssh
Synology NAS
On my Synology NAS the process was a bit different. To create the user I logged into the DSM console and used the Users applet from the Control Panel to create the new shutdownbot
user:
Unfortunately due to a change that was made in DSM around version 6.2.2 it is necessary to make this user a member of the administrators group for it to be able to access SSH.
However you can lock down access further by overriding the permissions and application access to deny everything except access to the homes share:
Next SSH into the NAS as admin
and register the public key:
mkdir /var/services/homes/shutdownbot/.ssh
chmod 700 /var/services/homes/shutdownbot/.ssh
echo <paste id_rsa.pub contents here> > /var/services/homes/shutdownbot/.ssh/authorized_keys
chmod 600 /var/services/homes/shutdownbot/.ssh/authorized_keys
chown -R shutdownbot:users /var/services/homes/shutdownbot/.ssh
Then also changed the permissions on the user directory - as noted in my earlier post this step is not well documented, but’s necessary for SSH with to work with a keypair:
chmod 700 /var/services/homes/shutdownbot
The Synology OS doesn’t include visudo
so you need to directly edit the sudoers
file.
vi /etc/sudoers
The advantage of visudo
is that it will validate your changes before applying them - when directly editing you don’t have that validation so be very careful when making your edits so you don’t accidentally lock yourself out of sudo
by committing a bad config. The line to be added is the same as on the Linux nodes:
# Allow the shutdownbot user to run shutdown commands
shutdownbot ALL=(ALL) NOPASSWD:/sbin/shutdown -h now
Testing
Back on the control node now check that SSH works:
sudo ssh shutdownbot@nas
sudo ssh shutdownbot@compute
sudo ssh shutdownbot@backup
Next test that the doshutdown
script works by directly invoking it - this should shutdown the remote hosts:
./doshutdown
Finally test the end-to-end integration by pulling the mains power from the UPS and observe it shutdown the remote hosts when the 10 minute remaining charge threshold is reached.
It’s nice to have a Grafana dashboard to monitor the state of your UPS while testing, in particular so you can see the remaining charge and remaining runtime. For example here’s the Grafana dashboard running on my control node just after pulling mains power showing remaining runtime of 15.6 minutes
The remaining charge got down to 8 minutes before the script ran to shutdown the nodes - the rate of discharge was a bit faster than apcupsd’s estimate meaning that it overshot the 10 minute threshold before kicking in:
With the load reduced after auto-shutting down the remote hosts, the UPS had sufficient charge to run the remaining devices (control node, UniFi network switch, pfSense firewall, and UniFi AP) for an estimated 34 minutes (not sure how accurate that is):
If you want to setup a dashboard like this check out my earlier post on deploying apcupsd_exporter
with Prometheus and Grafana.
Disable Shutdown of control Node
One final thing - I don’t want the control node itself which is running apcupsd
to be shutdown when the 10 minute threshold is reached because it will eventually have it’s own local battery backup in the form of two 18650 cells which will keep it running for around 12 hours. The control node needs to stay online until the bitter end so that it can be used to issue wake-on-lan packets to the other nodes when mains power is restored.
As noted at the beginning of this post you should not directly edit the /etc/apcupsd/apccontrol
script because it will be replaced between upgrades (although the apcupsd package hasn’t been updated for years) however the behaviour is to invoke your doshutdown script AND execute the default lines in apccontrol
which are:
doshutdown)
echo "UPS ${2} initiated Shutdown Sequence" | ${WALL}
#${SHUTDOWN} -h now "apcupsd UPS ${2} initiated shutdown"
;;
So I had to go in and comment that shutdown command.
Conclusion
It feels really good to have this mechanism in place to provide some level of assurance that devices in my home lab running sensitive file systems will be shutdown cleanly in case of a power outage.