Alerts with Grafana Loki on Kubernetes

Alerts with Grafana Loki on Kubernetes

Environment information

Configuration

This is the content for the values.yaml for the Grafana Loki chart.

loki:
  auth_enabled: false

  storage:
    type: gcs
    .....
    ...
    ..

  schemaConfig:
    configs:
      - from: 2023-01-01
        store: boltdb-shipper
        object_store: gcs
        schema: v11
        index:
          period: 24h
          prefix: ........
        chunks:
          period: 24h


  # Alerts configuration - THE MOST IMPORTANT PART (1)
  rulerConfig:
    wal:
      # /var/loki is mounted as PVC
      dir: /var/loki/ruler-wal
    storage:
      type: local
      local:
        directory: /rules
    rule_path: /tmp/scratch
    # Internal address of Alertmanager
    alertmanager_url: http://kube-prometheus-stack-alertmanager:9093
    ring:
      kvstore:
        store: inmemory
    enable_api: true
    enable_alertmanager_v2: true
    # Enable sending metrics to Prometheus
    remote_write:
      enabled: true
      client:
        # Internal address of Prometheus
        url: http://kube-prometheus-stack-prometheus:9090/api/v1/write
  # ---

# Service Account used for Workload Identity to get access
# to the bucket used as storage.
serviceAccount:
  create: true
  name: loki-sa
  annotations:
    iam.gke.io/gcp-service-account: ....

# THE MOST IMPORTANT PART (2)
backend:
  # Alerts configuration
  extraVolumes:
    - name: loki-rules
      configMap:
        name: loki-rules
    - name: loki-rules-scratch
      emptyDir: {}
  extraVolumeMounts:
    - name: loki-rules
      mountPath: /rules/fake
    - name: loki-rules-scratch
      mountPath: /tmp/scratch

With the above configuration, Loki will be able to send alerts to the Alertmanager/Prometheus.

The following ConfigMap configures alerts and recording rules for Loki:

apiVersion: v1
kind: ConfigMap
metadata:
  name: loki-rules
  namespace: <the same namespace as Loki>
data:
  # Recording rule just as an example
  recording-rules.yaml: |-
    groups:
      - name: my_app
        interval: 5m
        rules:
          - record: loki:my_app:logs:count:1h
            expr: |
              count_over_time({app="my-app", container="my-container"} [1h])
  alert-rules.yaml: |-
    groups:
      - name: MyFirstGroup
        rules:
          - alert: ExampleAlert1
            expr: |
              absent_over_time({app="my-app"} [40m])
            for: 10m
            labels:
                severity: error
            annotations:
                summary: My app has stopped streaming logs.
                description: My app did not send any logs since 40m.
          - alert: ExampleAlert2
            expr: |
              sum by(app) (count_over_time({app="my-app"} | json | severity = `ERROR` | __error__="" [5m])) > 2
            for: 1s
            labels:
                severity: warning
            annotations:
                summary: 2 errors in logs has occured for my-app since 5m.
      - name: MySecondGroup
        rules:
          - alert: ExampleAlert3
            expr: ......
            for: ....
            labels:
                severity: warning
            annotations:
                summary: ...
                description: ....

The above alerting rules will appear on the alerts list in the Grafana dashboard.