Monitoring OCM using Prometheus-Operator

In this page, we provide a way to monitor your OCM environment using Prometheus-Operator.

Before you get started

You must have an OCM environment set up. You can also follow our recommended quick start guide to set up a playground OCM environment.

And then please install the Prometheus-Operator in your hub cluster. You can also run the following commands copied from the official doc:

git clone https://github.com/prometheus-operator/kube-prometheus.git
cd kube-prometheus

# Create the namespace and CRDs, and then wait for them to be availble before creating the remaining resources
kubectl create -f manifests/setup

# Wait until the "servicemonitors" CRD is created. The message "No resources found" means success in this context.
until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done

kubectl create -f manifests/

Monitoring the control-plane resource usage.

You can use kubectl proxy to open prometheus UI in your browser on localhost:9090:

kubectl --namespace monitoring port-forward svc/prometheus-k8s 9090

The following queries are to monitor the control-plane pods’ cpu usage, memory usage and apirequestcount for critical CRs:

rate(container_cpu_usage_seconds_total{namespace=~"open-cluster-management.*"}[3m])
container_memory_working_set_bytes{namespace=~"open-cluster-management.*"}
rate(apiserver_request_total{resource=~"managedclusters|managedclusteraddons|managedclustersetbindings|managedclustersets|addonplacementscores|placementdecisions|placements|manifestworks|manifestworkreplicasets"}[1m])

Visualized with Grafana

We provide an initial grafana dashboard for you to visualize the metrics. But you can also customize your own dashboard.

First, use the following command to proxy grafana service:

kubectl --namespace monitoring port-forward svc/grafana 3000

Next, open the grafana UI in your browser on localhost:3000.

Click the “Import Dashboard” and run the following command to copy a sample dashboard and paste it to the grafana:

curl https://raw.githubusercontent.com/open-cluster-management-io/open-cluster-management-io.github.io/main/content/en/getting-started/administration/assets/grafana-sample.json | pbcopy

Then, you will get a sample grafana dashboard that you can fine-tune further:

grafana