In this tutorial, I will talk about Prometheus and Grafana. You will learn to install both on CentOS/RHEL and understand how to use Prometheus and Grafana
running a lemmy instance has learned me a lot about devops already!
I love playing around with this, the lemmy instance was a great reason. lemmyfly.org doesn't have a lot of traffic yet, I think it can handle some more. Currently running on 2 vcpu/4gb ram.
After chatting to my devops colleague at work I wanted to set up a Grafana dashboard with metrics on the server.
Seems it's pretty pretty doable!
I created a second server at my VPS. Very cheap, and don't think I'll be running the Prometheus/Grafana 24/7 so I'll just create a snapshot of it, destroy the server and re-create when wanted
Created an internal private network so the instances can communicate without exposing ports to the public
log in to Grafana, admin:admin then change password
Go to the Grafana home dashboard, click on + sign and click on Import. In Import via grafana.com, put the dashboard id 1860 (preset for all prometheus node_exporter metrics) and click on Load.
Done!
Last 2 hours in the garden with a beer on the side, WFH FTW
I fell into the deep end with Grafana + Prometheus, went crazy with all the metrics and dashboard. But I got burnt out before I even began making alerts, so I just went with Netdata at the end.
This would be nice to run on my home server to monitor my Lemmy instance in the cloud. It would also be nice to be able to check some Lemmy stats as well as general server stats. I'm not a grafana or Prometheus expert though.
actually I did delete the server (after creating a snapshot of it) a week or so ago. But this morning I wanted to check lemmyfly.org, couldn't load the page. Checking my Hetzner dashboard I noticed CPU was spiked at 200%?! It did drop again though, but apparently had last for 2-3 minutes.
But prometheus was down, so no graphs apart from the hetzner ones. I doesn't relate to network traffic spikes, so I don't know what caused it.
I've started the prometheus server again (that snapshot was really useful :) ) and will leave it on for a couple of months now.
spikes as seen on hetzner graphs:
current system consumption:
I might need to get an extra volume for storage, Lemmy is starting to eat up the root filesystem..
Does anyone know how I re-configure Lemmy to look at a different volume for storage ?