Gitea 可观测性：Prometheus 指标 + Loki 日志

背景链接到标题

Gitea 部署完成后，需要监控保障服务稳定性。监控三要素——指标、日志、告警——缺一不可。

现有监控基础设施已部署在监控节点：

服务	端口	说明
Prometheus	9090	指标采集 + 告警规则
Loki	3100	日志存储
Alertmanager	9093	告警路由
Alloy	12345	Docker 日志采集

Gitea 部署在应用节点，Docker Compose 运行，端口 3000（Web）+ 2222（SSH）。

开启 Gitea Metrics 链接到标题

Gitea 原生支持 Prometheus，只需在配置文件 app.ini 中追加：

[metrics]
ENABLED = true

重启后验证：

curl http://git-server:3000/metrics | head

返回的指标分三类：

业务指标：

指标	说明
`gitea_users`	用户数
`gitea_repositories`	仓库数
`gitea_issues`	Issue 数
`gitea_pull_requests`	PR 数
`gitea_accesses`	HTTP 总访问次数
`gitea_mirrors`	镜像仓库数

运行时指标：

指标	说明
`process_resident_memory_bytes`	进程内存占用
`go_goroutines`	Goroutine 数
`go_memstats_*`	Go 内存统计

HTTP 指标（v1.26+）：

指标	说明
`http_server_request_duration_seconds`	请求延迟（按路由/方法/状态码）
`http_server_active_requests`	活跃请求数

Prometheus 采集配置链接到标题

在 Prometheus 配置文件 prometheus.yml 的 scrape_configs 中新增 job：

  - job_name: gitea
    static_configs:
      - targets:
          - git-server:3000
        labels:
          hostname: git-server

重启 Prometheus，确认 target 状态：

curl http://monitor:9090/api/v1/targets | jq '.data.activeTargets[] | select(.labels.job=="gitea") | .health'
# "up"

Prometheus 告警规则链接到标题

在 alerts.yml 中新增告警组，覆盖四个关键场景：

  - name: gitea_alerts
    interval: 30s
    rules:
      - alert: GiteaDown
        expr: up{job="gitea"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Gitea 服务不可用"

      - alert: GiteaHighMemory
        expr: process_resident_memory_bytes{job="gitea"} > 1e9
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Gitea 内存占用过高"
          description: "当前 {{ $value | humanize1024 }}，超过 1GB"

      - alert: GiteaGoroutineLeak
        expr: go_goroutines{job="gitea"} > 500
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Gitea Goroutine 异常增长"
          description: "当前 {{ $value }} 个 goroutine，疑似泄漏"

      - alert: GiteaHighLatency
        expr: histogram_quantile(0.95,
          rate(http_server_request_duration_seconds_bucket{job="gitea"}[5m])) > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Gitea 请求延迟过高"
          description: "P95 延迟 {{ $value }}s，超过 2s"

告警通过 Alertmanager 路由到飞书机器人，实时通知运维人员。

Loki 日志采集链接到标题

应用节点上已运行 Alloy 采集 Docker 容器日志。在 Alloy 配置中新增 Gitea 容器标签规则：

discovery.relabel "windmill" {
  targets = discovery.docker.windmill.targets

  // 已有规则...
  rule {
    source_labels = ["__meta_docker_container_name"]
    regex         = "/gitea$"
    target_label  = "container"
    replacement   = "gitea"
  }
}

重启 Alloy 后，在 Grafana 中即可通过 LogQL 查询 Gitea 日志：

{container="gitea"}

同时配置 Loki ruler，当日志中出现 ERROR 时触发告警：

groups:
  - name: gitea_errors
    interval: 30s
    rules:
      - alert: GiteaError
        expr: count_over_time({container="gitea"} |= "error" [5m]) > 0
        labels:
          severity: warning
        annotations:
          summary: "Gitea 运行错误"
          description: "5 分钟内出现 ERROR 日志"

后续扩展链接到标题

告警触发后，可以接入 OpenClaw 实现自动化处理。例如：

Gitea 宕机告警 → OpenClaw 自动执行容器重启
内存异常告警 → 触发诊断脚本并飞书通知
日志 ERROR 告警 → 收集上下文并创建 Issue

指标与日志的告警不再依赖人工响应，而是通过 OpenClaw 编排成自动化运维流程。

总结链接到标题

Gitea 的原生 Prometheus 支持让指标采集零成本，Alloy + Loki 让日志采集一行配置即可接入。指标 + 日志 + 告警三件套覆盖了 Gitea 的可观测性需求，全部基于现有监控基础设施，新增配置极少。

背景 链接到标题

开启 Gitea Metrics 链接到标题

Prometheus 采集配置 链接到标题

Prometheus 告警规则 链接到标题

Loki 日志采集 链接到标题

后续扩展 链接到标题

总结 链接到标题