Power Monitoring of the Home Lab
Categories: tech
Tags: kubernetes homelab longhorn
So the time has come: monitoring the power usage of my homelab. I will be introducing two S31’s flashed with Tasmata to better understand the draw of my cluster. These are setup to integrate with my Home Assistant instance.
My home lab is on two plugs. For the first plug I have my two k8s nodes. To safely handle the transition I’ll mark
both for draining and quarantine. [kubectl drain --ignore-daemonsets --delete-emptydir-data <node>
](https://kubernetes.
io/docs/tasks/administer-cluster/safely-drain-node/) . Some complaints were issued around Longhorn . I am hoping the
system comes back on-line without much issue.
So the machines came back up with seemingly no problem. Hard part being the machines communicate with the kubernetes cluster to record telemetry and centralize logs. So I will not known until I bring both nodes fully online. At least the operating systems came up.
kubectl uncordon <node>
is the next step. Watching dmesg
shows the XFS
configuration of Longhorn is subject
to the 2038 time issue. Based on [a Stackoverflow response]
(https://askubuntu.com/questions/1302943/xfs-filesystem-being-mounted-at-disk-supports-timestamps-until-2038-0x7fffffff)
this appears like a trivial problem.
Storage Eventual Consistency?
So most of the cluster came back online as expected. Longhorn did get stuck at one point. Marked molly
as
unschedulable
for some reason but was not very clear. Originally I was thinking this was one of the many race
conditions identified in the change log. After deleting pods failed I tried upgrade from 1.5.1 to 1.6.1 which went
smoothly and got a majority of pods online.
Upon further investigation I discover only some of the device nodes for Longhorn were mounting. First suggestion for
Longhorn is to check multipathd
did not grab the device node. I guess it has been pestering Redhat users
for a while. Overall it sounds like multipathd
might be a great utility in the future of Longhorn, as
supposedly helps manage pathing for iSCSI.
Resolving the issue involved editing the file /etc/multipath.conf
and adding the following stanza at the bottom of
the file:
blacklist {
devnode "^sd[a-z0-9]+"
}
This required a restart of the service systemctl restart multipath
. multipath -t
provides a simple method to
verify the changes have been picked up. lsblk
will also show devices managed by multipath
which was released.
EtcD upgrade woes!
An instance of EtcD I use within the cluster was failing to come up. Images associated with the helm chart will not
run on arm64
for version 6.10.0
of the Bitnami’s chart.
An attempt to upgrade to 10.0.2
resulted in a demand for auth.rbac.rootPassword
to be set despite
auth.rbac.enabled
being set to false
. Sadly the change also results in changing immutable Statefulset
fields.
I really need to revisit this anyway as I am not sure having three zones for split horizon DNS is as appropriate as
I thought. Especially since the new Gateway API might resolve the issue anyway. So I added the following stanza to
ensure the pod is only assigned to an amd64
node.
nodeSelector:
kubernetes.io/arch: amd64
Unfortunately the chart 6.10.0
no longer exists. The oldest version I could find was 8.11.4
. This applied
without a problem. The referenced version of etcd
does include an arm64
image. So I removed the stanza anyway.
Reflection
The power cross over was not particularly painful. Next time I think I should do an upgrade audit prior to restarting everything. I am hoping this will shake out many of the problems ahead of time. In general, I would like to find a way to automate notifications of upgrade availability. Internal Ingress definitely needs to be revisited.