61 Semaphore Monitoring und Betrieb

61.1 Logging und Auditing

61.1.1 Semaphore-Logs

Semaphore generiert verschiedene Log-Typen für unterschiedliche Zwecke. Das Verständnis der Log-Struktur ist essentiell für effektives Monitoring und Troubleshooting.

61.1.1.1 Standard-Logs (System-Logs)

System-Log-Kategorien:

61.1.1.2 Job-Logs (Task-Logs)

Job-spezifische Protokollierung:

61.1.2 Zugriff auf Logs

61.1.2.1 Dateisystem-basierte Logs

Standard-Log-Verzeichnisse:

/var/log/semaphore/
├── semaphore.log           # Haupt-Anwendungslog
├── access.log              # HTTP-Access-Log
├── error.log               # Fehler-Log
├── database.log            # Datenbank-Operationen
└── jobs/
    ├── 2024-08-18/
       ├── job-1234.log    # Spezifische Job-Logs
       └── job-1235.log
    └── 2024-08-17/
        └── job-1230.log

Log-Konfiguration in config.json:

{
  "log_level": "info",
  "log_format": "json",
  "log_file": "/var/log/semaphore/semaphore.log",
  "access_log": "/var/log/semaphore/access.log",
  "error_log": "/var/log/semaphore/error.log",
  "log_rotation": {
    "max_size": "100MB",
    "max_files": 10,
    "max_age": "30d"
  }
}

61.1.2.2 Web-UI Log-Zugriff

Job-Log-Ansicht:

61.1.3 Log-Level-Konfiguration

61.1.3.1 Verfügbare Log-Level

Log-Level-Hierarchie:

TRACE < DEBUG < INFO < WARN < ERROR < FATAL

Level-Beschreibungen:

61.1.3.2 Log-Level-Anpassung

Dynamische Log-Level-Änderung:

{
  "log_level": "debug",
  "component_log_levels": {
    "database": "info",
    "worker": "debug",
    "api": "warn",
    "authentication": "info"
  }
}

Umgebungsvariable für Log-Level:

export SEMAPHORE_LOG_LEVEL=debug
export SEMAPHORE_DB_LOG_LEVEL=info

61.1.4 Auditing und Nachvollziehbarkeit

61.1.4.1 Audit-Trail-Komponenten

Überwachte Aktivitäten: - Benutzeranmeldungen und -abmeldungen - Konfigurationsänderungen - Job-Starts und -Status - Credential-Zugriffe - System-Konfigurationsänderungen

61.1.4.2 Audit-Log-Format

Strukturiertes Audit-Log:

{
  "timestamp": "2024-08-18T14:30:00Z",
  "event_type": "job_started",
  "user_id": 1,
  "user_name": "deploy-user",
  "resource_type": "template",
  "resource_id": 5,
  "resource_name": "Production Deployment",
  "project_id": 1,
  "changes": {
    "template_id": 5,
    "inventory_id": 2,
    "environment": {
      "app_version": "v2.1.0"
    }
  },
  "source_ip": "192.168.1.100",
  "user_agent": "Mozilla/5.0...",
  "session_id": "sess_abc123def456"
}

61.1.4.3 Audit-Retention-Policy

Aufbewahrungsrichtlinien:

{
  "audit_retention": {
    "login_events": "90d",
    "config_changes": "365d",
    "job_executions": "180d",
    "credential_access": "365d",
    "system_events": "30d"
  },
  "audit_archive": {
    "enabled": true,
    "location": "/backup/audit-archive",
    "compression": "gzip"
  }
}

61.1.5 Strukturierte Logging

61.1.5.1 JSON-Log-Format

Strukturierte Log-Ausgabe:

{
  "level": "info",
  "timestamp": "2024-08-18T14:30:00.123Z",
  "logger": "semaphore.worker",
  "message": "Job started successfully",
  "job_id": 1234,
  "template_id": 5,
  "user_id": 1,
  "duration_ms": 1250,
  "status": "running"
}

61.1.5.2 Log-Aggregation-Integration

Syslog-Konfiguration:

{
  "syslog": {
    "enabled": true,
    "protocol": "tcp",
    "host": "syslog.company.com",
    "port": 514,
    "facility": "local0",
    "tag": "semaphore"
  }
}

ELK-Stack-Integration:

{
  "elasticsearch": {
    "enabled": true,
    "hosts": ["elasticsearch.company.com:9200"],
    "index": "semaphore-logs",
    "username": "semaphore",
    "password": "[credential_reference]"
  }
}

61.2 Datenbankpflege

61.2.1 Rolle der Datenbank

Die Datenbank ist das zentrale Persistenzlayer von Semaphore und speichert alle kritischen Informationen für den Betrieb.

61.2.1.1 Datenbank-Schema-Übersicht

Zentrale Tabellen:

Semaphore Database Schema:
├── users                   # Benutzerdaten und Authentifizierung
├── projects                # Projekt-Konfigurationen
├── project_users           # Projekt-Berechtigungen
├── repositories            # Git-Repository-Informationen
├── inventories             # Inventory-Definitionen
├── templates               # Template-Konfigurationen
├── tasks                   # Job-Ausführungen und Status
├── task_outputs            # Job-Logs und Ausgaben
├── access_keys             # SSH-Keys und Credentials
└── events                  # Audit-Logs und Events

61.2.1.2 Datenvolumen-Charakteristika

Wachstumsraten typischer Tabellen:

61.2.2 Wartungsaufgaben

61.2.2.1 PostgreSQL-Wartung

Vacuum-Operationen:

-- Automatisches Vacuum-Schedule
SELECT schemaname, tablename, last_vacuum, last_autovacuum 
FROM pg_stat_user_tables 
WHERE schemaname = 'public';

-- Manuelles Vacuum für große Tabellen
VACUUM ANALYZE task_outputs;
VACUUM ANALYZE tasks;
VACUUM ANALYZE events;

-- Full Vacuum bei hoher Fragmentierung (Downtime erforderlich)
VACUUM FULL task_outputs;

Index-Wartung:

-- Index-Nutzung analysieren
SELECT schemaname, tablename, indexname, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_tup_read DESC;

-- Unnötige Indizes identifizieren
SELECT schemaname, tablename, indexname, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_tup_read = 0 AND schemaname = 'public';

-- Index neu erstellen bei Fragmentierung
REINDEX INDEX idx_tasks_created;
REINDEX INDEX idx_task_outputs_task_id;

61.2.2.2 MySQL-Wartung

Table-Optimierung:

-- Table-Status analysieren
SELECT table_name, data_length, index_length, data_free
FROM information_schema.tables
WHERE table_schema = 'semaphore';

-- Table-Optimierung
OPTIMIZE TABLE tasks;
OPTIMIZE TABLE task_outputs;
OPTIMIZE TABLE events;

-- Index-Analyse
SHOW INDEX FROM tasks;
ANALYZE TABLE tasks;

61.2.3 Datenbereinigung

61.2.3.1 Automatische Cleanup-Jobs

Cleanup-Script für alte Job-Daten:

-- PostgreSQL: Cleanup-Procedure
CREATE OR REPLACE FUNCTION cleanup_old_tasks(retention_days INTEGER DEFAULT 90)
RETURNS INTEGER AS $$
DECLARE
    deleted_count INTEGER;
BEGIN
    -- Lösche task_outputs älter als retention_days
    DELETE FROM task_outputs 
    WHERE task_id IN (
        SELECT id FROM tasks 
        WHERE created < NOW() - INTERVAL '1 day' * retention_days
    );
    
    GET DIAGNOSTICS deleted_count = ROW_COUNT;
    
    -- Lösche alte tasks
    DELETE FROM tasks 
    WHERE created < NOW() - INTERVAL '1 day' * retention_days;
    
    -- Vacuum nach Cleanup
    VACUUM ANALYZE task_outputs;
    VACUUM ANALYZE tasks;
    
    RETURN deleted_count;
END;
$$ LANGUAGE plpgsql;

-- Cleanup ausführen (Daten älter als 90 Tage löschen)
SELECT cleanup_old_tasks(90);

MySQL Cleanup-Procedure:

DELIMITER //

CREATE PROCEDURE CleanupOldTasks(IN retention_days INT)
BEGIN
    DECLARE deleted_count INT DEFAULT 0;
    
    -- Cleanup task_outputs
    DELETE FROM task_outputs 
    WHERE task_id IN (
        SELECT id FROM tasks 
        WHERE created < DATE_SUB(NOW(), INTERVAL retention_days DAY)
    );
    
    SET deleted_count = ROW_COUNT();
    
    -- Cleanup tasks
    DELETE FROM tasks 
    WHERE created < DATE_SUB(NOW(), INTERVAL retention_days DAY);
    
    -- Optimize tables
    OPTIMIZE TABLE task_outputs;
    OPTIMIZE TABLE tasks;
    
    SELECT deleted_count as 'Deleted Records';
END //

DELIMITER ;

-- Ausführung
CALL CleanupOldTasks(90);

61.2.3.2 Selektive Datenbereinigung

Cleanup nach Job-Status:

-- Nur fehlgeschlagene Jobs älter als 30 Tage löschen
DELETE FROM task_outputs 
WHERE task_id IN (
    SELECT id FROM tasks 
    WHERE status = 'error' 
    AND created < NOW() - INTERVAL '30 days'
);

-- Erfolgreiche Jobs älter als 180 Tage
DELETE FROM task_outputs 
WHERE task_id IN (
    SELECT id FROM tasks 
    WHERE status = 'success' 
    AND created < NOW() - INTERVAL '180 days'
);

61.2.4 Datenbank-Performance-Monitoring

61.2.4.1 Performance-Metriken

PostgreSQL Performance-Queries:

-- Langsamste Queries identifizieren
SELECT query, mean_time, calls, total_time
FROM pg_stat_statements
ORDER BY mean_time DESC LIMIT 10;

-- Tabellen-I/O-Statistiken
SELECT schemaname, tablename, seq_tup_read, idx_tup_fetch, n_tup_ins, n_tup_upd, n_tup_del
FROM pg_stat_user_tables
WHERE schemaname = 'public'
ORDER BY seq_tup_read DESC;

-- Lock-Konflikte überwachen
SELECT locktype, database, relation::regclass, mode, granted
FROM pg_locks
WHERE NOT granted;

61.2.4.2 Index-Optimierung

Fehlende Indizes identifizieren:

-- PostgreSQL: Seq-Scans auf großen Tabellen
SELECT schemaname, tablename, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch
FROM pg_stat_user_tables
WHERE seq_tup_read > 10000
ORDER BY seq_tup_read DESC;

-- Empfohlene Indizes für Semaphore
CREATE INDEX CONCURRENTLY idx_tasks_status_created ON tasks(status, created);
CREATE INDEX CONCURRENTLY idx_tasks_template_id_created ON tasks(template_id, created);
CREATE INDEX CONCURRENTLY idx_task_outputs_task_id ON task_outputs(task_id);
CREATE INDEX CONCURRENTLY idx_events_created ON events(created);
CREATE INDEX CONCURRENTLY idx_events_user_id_created ON events(user_id, created);

61.3 Backup- und Restore-Konzepte

61.3.1 Datenbank-Backup-Strategien

61.3.1.1 PostgreSQL-Backup

pg_dump für vollständige Backups:

#!/bin/bash
# semaphore-backup.sh

# Konfiguration
DB_NAME="semaphore"
DB_USER="semaphore"
DB_HOST="localhost"
BACKUP_DIR="/backup/semaphore"
DATE=$(date +%Y%m%d_%H%M%S)
RETENTION_DAYS=30

# Backup-Verzeichnis erstellen
mkdir -p $BACKUP_DIR

# Full Backup mit pg_dump
pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME \
    --verbose \
    --format=custom \
    --compress=9 \
    --file=$BACKUP_DIR/semaphore_full_$DATE.dump

# Schema-only Backup
pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME \
    --schema-only \
    --file=$BACKUP_DIR/semaphore_schema_$DATE.sql

# Data-only Backup ohne große Tabellen
pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME \
    --data-only \
    --exclude-table=task_outputs \
    --exclude-table=events \
    --file=$BACKUP_DIR/semaphore_data_$DATE.sql

# Backup-Integrität prüfen
pg_restore --list $BACKUP_DIR/semaphore_full_$DATE.dump > /dev/null
if [ $? -eq 0 ]; then
    echo "Backup integrity check: PASSED"
else
    echo "Backup integrity check: FAILED"
    exit 1
fi

# Alte Backups bereinigen
find $BACKUP_DIR -name "semaphore_*.dump" -mtime +$RETENTION_DAYS -delete
find $BACKUP_DIR -name "semaphore_*.sql" -mtime +$RETENTION_DAYS -delete

echo "Backup completed: $BACKUP_DIR/semaphore_full_$DATE.dump"

61.3.1.2 MySQL-Backup

mysqldump für vollständige Backups:

#!/bin/bash
# semaphore-mysql-backup.sh

# Konfiguration
DB_NAME="semaphore"
DB_USER="semaphore"
DB_PASS="password"
DB_HOST="localhost"
BACKUP_DIR="/backup/semaphore"
DATE=$(date +%Y%m%d_%H%M%S)

mkdir -p $BACKUP_DIR

# Full Backup
mysqldump -h $DB_HOST -u $DB_USER -p$DB_PASS \
    --single-transaction \
    --routines \
    --triggers \
    --add-drop-database \
    --compress \
    $DB_NAME > $BACKUP_DIR/semaphore_full_$DATE.sql

# Komprimierung
gzip $BACKUP_DIR/semaphore_full_$DATE.sql

# Schema-only Backup
mysqldump -h $DB_HOST -u $DB_USER -p$DB_PASS \
    --no-data \
    --routines \
    --triggers \
    $DB_NAME > $BACKUP_DIR/semaphore_schema_$DATE.sql

echo "Backup completed: $BACKUP_DIR/semaphore_full_$DATE.sql.gz"

61.3.2 Inkrementelle Backup-Strategien

61.3.2.1 PostgreSQL WAL-Archivierung

Kontinuierliche Archivierung konfigurieren:

# postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'test ! -f /backup/wal_archive/%f && cp %p /backup/wal_archive/%f'
max_wal_senders = 3
wal_keep_segments = 32

# Backup mit WAL-Archivierung
pg_basebackup -h localhost -U semaphore \
    --pgdata=/backup/semaphore/base_backup_$(date +%Y%m%d) \
    --format=tar \
    --gzip \
    --progress \
    --verbose

61.3.3 Restore-Verfahren

61.3.3.1 PostgreSQL-Restore

Vollständiges Restore:

#!/bin/bash
# semaphore-restore.sh

BACKUP_FILE="/backup/semaphore/semaphore_full_20240818_143000.dump"
DB_NAME="semaphore_restored"
DB_USER="semaphore"

# Neue Datenbank erstellen
createdb -U postgres $DB_NAME

# Backup einspielen
pg_restore -h localhost -U $DB_USER \
    --dbname=$DB_NAME \
    --verbose \
    --clean \
    --if-exists \
    $BACKUP_FILE

# Verbindungstest
psql -h localhost -U $DB_USER -d $DB_NAME -c "SELECT count(*) FROM users;"

echo "Restore completed for database: $DB_NAME"

61.3.3.2 Point-in-Time Recovery

PITR für PostgreSQL:

# Recovery-Konfiguration
# recovery.conf (PostgreSQL < 12) oder postgresql.conf (PostgreSQL >= 12)
restore_command = 'cp /backup/wal_archive/%f %p'
recovery_target_time = '2024-08-18 14:30:00'
recovery_target_action = 'promote'

# Restore-Prozess
pg_ctl stop -D /var/lib/postgresql/data
rm -rf /var/lib/postgresql/data/*
tar -xzf /backup/semaphore/base_backup_20240818.tar.gz -C /var/lib/postgresql/data/
# Recovery-Konfiguration hinzufügen
pg_ctl start -D /var/lib/postgresql/data

61.3.4 Backup-Automatisierung

61.3.4.1 Cron-basierte Backup-Schedules

Backup-Crontab:

# /etc/cron.d/semaphore-backup
# Täglich um 2:00 Uhr Vollbackup
0 2 * * * root /opt/scripts/semaphore-backup.sh >> /var/log/semaphore-backup.log 2>&1

# Stündlich WAL-Archive bereinigen
0 * * * * postgres find /backup/wal_archive -name "*.backup" -mtime +7 -delete

# Wöchentlich Backup-Integrität prüfen
0 3 * * 0 root /opt/scripts/backup-integrity-check.sh

61.3.4.2 Backup-Monitoring

Backup-Status-Überwachung:

#!/bin/bash
# backup-monitor.sh

BACKUP_DIR="/backup/semaphore"
MAX_AGE_HOURS=26  # 24h + 2h Buffer

# Neuestes Backup finden
LATEST_BACKUP=$(find $BACKUP_DIR -name "semaphore_full_*.dump" -type f -printf '%T@ %p\n' | sort -n | tail -1 | cut -d' ' -f2-)

if [ -z "$LATEST_BACKUP" ]; then
    echo "CRITICAL: No backups found in $BACKUP_DIR"
    exit 2
fi

# Backup-Alter prüfen
BACKUP_AGE=$(( ($(date +%s) - $(stat -c %Y "$LATEST_BACKUP")) / 3600 ))

if [ $BACKUP_AGE -gt $MAX_AGE_HOURS ]; then
    echo "WARNING: Latest backup is $BACKUP_AGE hours old"
    exit 1
else
    echo "OK: Latest backup is $BACKUP_AGE hours old"
    exit 0
fi

61.4 Hochverfügbarkeit und Skalierung

61.4.1 Architektur für Hochverfügbarkeit

61.4.1.1 Multi-Worker-Setup

Worker-Skalierung-Konzept:

Semaphore HA Architecture:
├── Load Balancer (nginx/HAProxy)
│   ├── SSL-Termination
│   ├── Health Checks
│   └── Session Affinity
├── Semaphore Frontend (2+ Instanzen)
│   ├── Web UI
│   ├── API
│   └── Job Coordination
├── Semaphore Workers (N Instanzen)
│   ├── Job Execution
│   ├── Ansible Runs
│   └── Independent Scaling
└── Database (PostgreSQL/MySQL)
    ├── Primary-Replica Setup
    ├── Connection Pooling
    └── Backup/Recovery

61.4.2 Load Balancer-Konfiguration

61.4.2.1 nginx als Reverse Proxy

nginx-Konfiguration für Semaphore:

# /etc/nginx/sites-available/semaphore
upstream semaphore_backend {
    least_conn;
    server 192.168.1.10:3000 max_fails=3 fail_timeout=30s;
    server 192.168.1.11:3000 max_fails=3 fail_timeout=30s;
    server 192.168.1.12:3000 max_fails=3 fail_timeout=30s;
}

server {
    listen 443 ssl http2;
    server_name semaphore.company.com;
    
    # SSL-Konfiguration
    ssl_certificate /etc/ssl/certs/semaphore.crt;
    ssl_certificate_key /etc/ssl/private/semaphore.key;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512;
    
    # Security Headers
    add_header Strict-Transport-Security "max-age=31536000" always;
    add_header X-Frame-Options DENY;
    add_header X-Content-Type-Options nosniff;
    
    # Gzip Compression
    gzip on;
    gzip_types text/plain application/json application/javascript text/css;
    
    location / {
        proxy_pass http://semaphore_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        
        # WebSocket Support für Live-Logs
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        
        # Timeouts
        proxy_connect_timeout 60s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
        
        # Health Check
        proxy_next_upstream error timeout invalid_header http_500 http_502 http_503;
        proxy_next_upstream_tries 3;
    }
    
    # Health Check Endpoint
    location /health {
        access_log off;
        proxy_pass http://semaphore_backend/ping;
        proxy_connect_timeout 1s;
        proxy_send_timeout 1s;
        proxy_read_timeout 1s;
    }
}

61.4.2.2 HAProxy-Alternative

HAProxy-Konfiguration:

# /etc/haproxy/haproxy.cfg
global
    maxconn 4096
    log stdout local0
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin
    
defaults
    mode http
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms
    option httplog
    option dontlognull
    retries 3
    
frontend semaphore_frontend
    bind *:443 ssl crt /etc/ssl/certs/semaphore.pem
    redirect scheme https if !{ ssl_fc }
    default_backend semaphore_servers
    
backend semaphore_servers
    balance roundrobin
    option httpchk GET /ping
    http-check expect status 200
    server semaphore1 192.168.1.10:3000 check inter 5s
    server semaphore2 192.168.1.11:3000 check inter 5s
    server semaphore3 192.168.1.12:3000 check inter 5s
    
listen stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 30s

61.4.3 Worker-Skalierung

61.4.3.1 Docker Compose Multi-Worker-Setup

docker-compose.yml für Worker-Skalierung:

version: '3.8'

services:
  postgres:
    image: postgres:13
    environment:
      POSTGRES_DB: semaphore
      POSTGRES_USER: semaphore
      POSTGRES_PASSWORD: semaphore_password
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./postgres-init:/docker-entrypoint-initdb.d
    networks:
      - semaphore_network
    
  semaphore-web:
    image: semaphoreui/semaphore:latest
    environment:
      SEMAPHORE_DB_DIALECT: postgres
      SEMAPHORE_DB_HOST: postgres
      SEMAPHORE_DB_PORT: 5432
      SEMAPHORE_DB_USER: semaphore
      SEMAPHORE_DB_PASS: semaphore_password
      SEMAPHORE_DB_NAME: semaphore
      SEMAPHORE_ADMIN_PASSWORD: admin_password
      SEMAPHORE_ADMIN_NAME: admin
      SEMAPHORE_ADMIN_EMAIL: admin@company.com
      SEMAPHORE_WORKER_MODE: "web"
    ports:
      - "3000:3000"
    depends_on:
      - postgres
    networks:
      - semaphore_network
    deploy:
      replicas: 2
      
  semaphore-worker:
    image: semaphoreui/semaphore:latest
    environment:
      SEMAPHORE_DB_DIALECT: postgres
      SEMAPHORE_DB_HOST: postgres
      SEMAPHORE_DB_PORT: 5432
      SEMAPHORE_DB_USER: semaphore
      SEMAPHORE_DB_PASS: semaphore_password
      SEMAPHORE_DB_NAME: semaphore
      SEMAPHORE_WORKER_MODE: "worker"
      SEMAPHORE_MAX_PARALLEL_TASKS: 3
    depends_on:
      - postgres
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - worker_data:/tmp/semaphore
    networks:
      - semaphore_network
    deploy:
      replicas: 4

volumes:
  postgres_data:
  worker_data:

networks:
  semaphore_network:
    driver: bridge

61.4.3.2 Kubernetes-Deployment

Kubernetes-Manifeste für HA-Setup:

# semaphore-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: semaphore-web
  namespace: automation
spec:
  replicas: 3
  selector:
    matchLabels:
      app: semaphore
      component: web
  template:
    metadata:
      labels:
        app: semaphore
        component: web
    spec:
      containers:
      - name: semaphore
        image: semaphoreui/semaphore:latest
        env:
        - name: SEMAPHORE_DB_DIALECT
          value: "postgres"
        - name: SEMAPHORE_DB_HOST
          value: "postgresql.automation.svc.cluster.local"
        - name: SEMAPHORE_WORKER_MODE
          value: "web"
        ports:
        - containerPort: 3000
        livenessProbe:
          httpGet:
            path: /ping
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ping
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 5
        resources:
          limits:
            cpu: 500m
            memory: 512Mi
          requests:
            cpu: 200m
            memory: 256Mi

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: semaphore-worker
  namespace: automation
spec:
  replicas: 5
  selector:
    matchLabels:
      app: semaphore
      component: worker
  template:
    metadata:
      labels:
        app: semaphore
        component: worker
    spec:
      containers:
      - name: semaphore-worker
        image: semaphoreui/semaphore:latest
        env:
        - name: SEMAPHORE_DB_DIALECT
          value: "postgres"
        - name: SEMAPHORE_DB_HOST
          value: "postgresql.automation.svc.cluster.local"
        - name: SEMAPHORE_WORKER_MODE
          value: "worker"
        - name: SEMAPHORE_MAX_PARALLEL_TASKS
          value: "2"
        resources:
          limits:
            cpu: 1000m
            memory: 1Gi
          requests:
            cpu: 500m
            memory: 512Mi
        volumeMounts:
        - name: tmp-storage
          mountPath: /tmp/semaphore
      volumes:
      - name: tmp-storage
        emptyDir: {}

---
apiVersion: v1
kind: Service
metadata:
  name: semaphore-service
  namespace: automation
spec:
  selector:
    app: semaphore
    component: web
  ports:
  - port: 80
    targetPort: 3000
  type: ClusterIP

---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: semaphore-worker-hpa
  namespace: automation
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: semaphore-worker
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

61.4.4 Database-Clustering

61.4.4.1 PostgreSQL Primary-Replica-Setup

Primary-Server-Konfiguration:

# postgresql.conf (Primary)
listen_addresses = '*'
wal_level = replica
max_wal_senders = 3
max_replication_slots = 3
synchronous_commit = on
synchronous_standby_names = 'replica1'

# pg_hba.conf
host replication semaphore_repl 192.168.1.0/24 md5

Replica-Server-Setup:

# Replica initialisieren
pg_basebackup -h primary-server -D /var/lib/postgresql/data -U semaphore_repl -v -P

# recovery.conf (PostgreSQL < 12) oder postgresql.conf (>= 12)
standby_mode = 'on'
primary_conninfo = 'host=primary-server port=5432 user=semaphore_repl'
trigger_file = '/var/lib/postgresql/trigger_failover'

61.4.5 Monitoring und Health Checks

61.4.5.1 Application-Health-Checks

Health-Check-Endpoint:

# Semaphore Health Check
curl -f http://localhost:3000/ping || exit 1

# Extended Health Check
curl -f http://localhost:3000/api/ping || exit 1

# Database Connection Check
curl -f http://localhost:3000/api/ping/db || exit 1

61.4.5.2 Monitoring-Integration

Prometheus-Metrics-Exporter:

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'semaphore'
    static_configs:
      - targets: ['semaphore.company.com:3000']
    metrics_path: /metrics
    scrape_interval: 30s
    scrape_timeout: 10s

61.5 Übungen

61.5.1 Übung 1: Log-Level ändern und Logs beobachten

61.5.1.1 Aufgabenstellung

  1. Ändern Sie den Log-Level von Semaphore auf DEBUG
  2. Führen Sie einen Job aus und beobachten Sie die detaillierten Logs
  3. Analysieren Sie die verschiedenen Log-Ausgaben

61.5.1.2 Durchführung

Schritt 1: Aktuelle Log-Konfiguration prüfen

# Semaphore-Konfiguration anzeigen
sudo cat /etc/semaphore/config.json | jq '.log_level'

# Aktuelle Log-Dateien prüfen
sudo ls -la /var/log/semaphore/

Schritt 2: Log-Level auf DEBUG setzen

# Config-Backup erstellen
sudo cp /etc/semaphore/config.json /etc/semaphore/config.json.backup

# Log-Level ändern
sudo jq '.log_level = "debug"' /etc/semaphore/config.json > /tmp/config.json
sudo mv /tmp/config.json /etc/semaphore/config.json

# Semaphore-Service neu starten
sudo systemctl restart semaphore

Schritt 3: Debug-Logs beobachten

# Logs in Echtzeit verfolgen
sudo tail -f /var/log/semaphore/semaphore.log

# In neuem Terminal: Job über Web-UI starten
# Beobachten Sie die detaillierten Debug-Ausgaben

Schritt 4: Log-Analyse

# Debug-Logs nach Komponenten filtern
sudo grep "database" /var/log/semaphore/semaphore.log | tail -10
sudo grep "worker" /var/log/semaphore/semaphore.log | tail -10
sudo grep "api" /var/log/semaphore/semaphore.log | tail -10

# Log-Level zurücksetzen
sudo cp /etc/semaphore/config.json.backup /etc/semaphore/config.json
sudo systemctl restart semaphore

61.5.1.3 Erwartetes Ergebnis

61.5.2 Übung 2: Cleanup alter Job-Daten in der DB durchführen

61.5.2.1 Aufgabenstellung

  1. Analysieren Sie die aktuelle Datenbankgröße
  2. Erstellen Sie ein Cleanup-Script für alte Job-Daten
  3. Führen Sie das Cleanup durch und messen Sie die Größenreduktion

61.5.2.2 Durchführung

Schritt 1: Datenbank-Analyse

-- Mit der Datenbank verbinden
psql -h localhost -U semaphore -d semaphore

-- Tabellengröße analysieren
SELECT 
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size,
    pg_total_relation_size(schemaname||'.'||tablename) as raw_size
FROM pg_tables 
WHERE schemaname = 'public'
ORDER BY raw_size DESC;

-- Anzahl Jobs nach Status
SELECT status, COUNT(*) as count, 
       MIN(created) as oldest,
       MAX(created) as newest
FROM tasks 
GROUP BY status;

-- Jobs älter als 30 Tage
SELECT COUNT(*) as old_jobs
FROM tasks 
WHERE created < NOW() - INTERVAL '30 days';

Schritt 2: Cleanup-Script erstellen

#!/bin/bash
# cleanup-script.sh

DB_NAME="semaphore"
DB_USER="semaphore"
DB_HOST="localhost"
RETENTION_DAYS=30

echo "Starting database cleanup..."
echo "Retention period: $RETENTION_DAYS days"

# Vor-Cleanup-Statistiken
echo "=== Before Cleanup ==="
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "
SELECT 
    'tasks' as table_name,
    COUNT(*) as record_count,
    pg_size_pretty(pg_total_relation_size('tasks')) as table_size
FROM tasks
UNION ALL
SELECT 
    'task_outputs' as table_name,
    COUNT(*) as record_count,
    pg_size_pretty(pg_total_relation_size('task_outputs')) as table_size
FROM task_outputs;"

# Cleanup durchführen
echo "=== Performing Cleanup ==="
psql -h $DB_HOST -U $DB_USER -d $DB_NAME << EOF
-- Alte task_outputs löschen
DELETE FROM task_outputs 
WHERE task_id IN (
    SELECT id FROM tasks 
    WHERE created < NOW() - INTERVAL '$RETENTION_DAYS days'
);

-- Alte tasks löschen
DELETE FROM tasks 
WHERE created < NOW() - INTERVAL '$RETENTION_DAYS days';

-- Vacuum durchführen
VACUUM ANALYZE task_outputs;
VACUUM ANALYZE tasks;
EOF

# Nach-Cleanup-Statistiken
echo "=== After Cleanup ==="
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "
SELECT 
    'tasks' as table_name,
    COUNT(*) as record_count,
    pg_size_pretty(pg_total_relation_size('tasks')) as table_size
FROM tasks
UNION ALL
SELECT 
    'task_outputs' as table_name,
    COUNT(*) as record_count,
    pg_size_pretty(pg_total_relation_size('task_outputs')) as table_size
FROM task_outputs;"

echo "Cleanup completed!"

Schritt 3: Script ausführen

chmod +x cleanup-script.sh
./cleanup-script.sh

61.5.2.3 Erwartetes Ergebnis

61.5.3 Übung 3: Backup erstellen und Restore in Testumgebung

61.5.3.1 Aufgabenstellung

  1. Erstellen Sie ein vollständiges Datenbank-Backup
  2. Richten Sie eine Test-Datenbank ein
  3. Spielen Sie das Backup in die Test-Umgebung ein
  4. Verifizieren Sie die Datenintegrität

61.5.3.2 Durchführung

Schritt 1: Produktions-Backup erstellen

#!/bin/bash
# production-backup.sh

DB_NAME="semaphore"
DB_USER="semaphore"
DB_HOST="localhost"
BACKUP_DIR="/backup/semaphore"
DATE=$(date +%Y%m%d_%H%M%S)

mkdir -p $BACKUP_DIR

echo "Creating production backup..."

# Full Backup
pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME \
    --verbose \
    --format=custom \
    --compress=9 \
    --file=$BACKUP_DIR/semaphore_production_$DATE.dump

# Schema-only Backup
pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME \
    --schema-only \
    --file=$BACKUP_DIR/semaphore_schema_$DATE.sql

# Backup-Integrität prüfen
pg_restore --list $BACKUP_DIR/semaphore_production_$DATE.dump > /dev/null

if [ $? -eq 0 ]; then
    echo "✅ Backup created successfully: $BACKUP_DIR/semaphore_production_$DATE.dump"
    echo "Backup size: $(du -h $BACKUP_DIR/semaphore_production_$DATE.dump | cut -f1)"
else
    echo "❌ Backup integrity check failed"
    exit 1
fi

Schritt 2: Test-Datenbank vorbereiten

# Test-Datenbank erstellen
createdb -U postgres semaphore_test

# Test-User erstellen (falls nicht vorhanden)
psql -U postgres -c "CREATE USER semaphore_test WITH PASSWORD 'test_password';"
psql -U postgres -c "GRANT ALL PRIVILEGES ON DATABASE semaphore_test TO semaphore_test;"

Schritt 3: Restore durchführen

#!/bin/bash
# restore-to-test.sh

BACKUP_FILE="/backup/semaphore/semaphore_production_20240818_143000.dump"
TEST_DB="semaphore_test"
TEST_USER="semaphore_test"

echo "Restoring backup to test database..."

# Backup in Test-DB einspielen
pg_restore -h localhost -U $TEST_USER \
    --dbname=$TEST_DB \
    --verbose \
    --clean \
    --if-exists \
    $BACKUP_FILE

if [ $? -eq 0 ]; then
    echo "✅ Restore completed successfully"
else
    echo "❌ Restore failed"
    exit 1
fi

Schritt 4: Datenintegrität verifizieren

#!/bin/bash
# verify-restore.sh

TEST_DB="semaphore_test"
TEST_USER="semaphore_test"

echo "Verifying data integrity..."

# Tabellen-Anzahl prüfen
TABLE_COUNT=$(psql -h localhost -U $TEST_USER -d $TEST_DB -t -c "
SELECT COUNT(*) FROM information_schema.tables 
WHERE table_schema = 'public';" | tr -d ' ')

echo "Tables found: $TABLE_COUNT"

# Benutzer-Daten prüfen
USER_COUNT=$(psql -h localhost -U $TEST_USER -d $TEST_DB -t -c "
SELECT COUNT(*) FROM users;" | tr -d ' ')

echo "Users found: $USER_COUNT"

# Projekt-Daten prüfen
PROJECT_COUNT=$(psql -h localhost -U $TEST_USER -d $TEST_DB -t -c "
SELECT COUNT(*) FROM projects;" | tr -d ' ')

echo "Projects found: $PROJECT_COUNT"

# Job-Daten prüfen
TASK_COUNT=$(psql -h localhost -U $TEST_USER -d $TEST_DB -t -c "
SELECT COUNT(*) FROM tasks;" | tr -d ' ')

echo "Tasks found: $TASK_COUNT"

# Test-Verbindung zur Test-DB
echo "Testing database connection..."
psql -h localhost -U $TEST_USER -d $TEST_DB -c "SELECT version();" > /dev/null

if [ $? -eq 0 ]; then
    echo "✅ Database connectivity test passed"
else
    echo "❌ Database connectivity test failed"
    exit 1
fi

echo "Data integrity verification completed"

61.5.3.3 Erwartetes Ergebnis

61.5.4 Übung 4: Semaphore-Setup mit zwei Workern und Load-Test

61.5.4.1 Aufgabenstellung

  1. Konfigurieren Sie ein Multi-Worker-Setup mit Docker Compose
  2. Implementieren Sie einen Load Balancer
  3. Führen Sie einen Load-Test mit mehreren gleichzeitigen Jobs durch
  4. Analysieren Sie die Lastverteilung

61.5.4.2 Durchführung

Schritt 1: Multi-Worker Docker Compose erstellen

# docker-compose-ha.yml
version: '3.8'

services:
  postgres:
    image: postgres:13
    environment:
      POSTGRES_DB: semaphore
      POSTGRES_USER: semaphore
      POSTGRES_PASSWORD: semaphore_password
    volumes:
      - postgres_data:/var/lib/postgresql/data
    networks:
      - semaphore_network
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U semaphore"]
      interval: 10s
      timeout: 5s
      retries: 5

  semaphore-web:
    image: semaphoreui/semaphore:latest
    environment:
      SEMAPHORE_DB_DIALECT: postgres
      SEMAPHORE_DB_HOST: postgres
      SEMAPHORE_DB_PORT: 5432
      SEMAPHORE_DB_USER: semaphore
      SEMAPHORE_DB_PASS: semaphore_password
      SEMAPHORE_DB_NAME: semaphore
      SEMAPHORE_ADMIN_PASSWORD: admin_password
      SEMAPHORE_ADMIN_NAME: admin
      SEMAPHORE_ADMIN_EMAIL: admin@company.com
      SEMAPHORE_WEB_ROOT: "http://localhost:8080"
    ports:
      - "3001:3000"
    depends_on:
      postgres:
        condition: service_healthy
    networks:
      - semaphore_network
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/ping"]
      interval: 30s
      timeout: 10s
      retries: 3

  semaphore-worker-1:
    image: semaphoreui/semaphore:latest
    environment:
      SEMAPHORE_DB_DIALECT: postgres
      SEMAPHORE_DB_HOST: postgres
      SEMAPHORE_DB_PORT: 5432
      SEMAPHORE_DB_USER: semaphore
      SEMAPHORE_DB_PASS: semaphore_password
      SEMAPHORE_DB_NAME: semaphore
      SEMAPHORE_WORKER_MODE: "worker"
      SEMAPHORE_MAX_PARALLEL_TASKS: 3
      WORKER_ID: "worker-1"
    depends_on:
      postgres:
        condition: service_healthy
    volumes:
      - worker1_data:/tmp/semaphore
    networks:
      - semaphore_network

  semaphore-worker-2:
    image: semaphoreui/semaphore:latest
    environment:
      SEMAPHORE_DB_DIALECT: postgres
      SEMAPHORE_DB_HOST: postgres
      SEMAPHORE_DB_PORT: 5432
      SEMAPHORE_DB_USER: semaphore
      SEMAPHORE_DB_PASS: semaphore_password
      SEMAPHORE_DB_NAME: semaphore
      SEMAPHORE_WORKER_MODE: "worker"
      SEMAPHORE_MAX_PARALLEL_TASKS: 3
      WORKER_ID: "worker-2"
    depends_on:
      postgres:
        condition: service_healthy
    volumes:
      - worker2_data:/tmp/semaphore
    networks:
      - semaphore_network

  nginx:
    image: nginx:alpine
    ports:
      - "8080:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - semaphore-web
    networks:
      - semaphore_network

volumes:
  postgres_data:
  worker1_data:
  worker2_data:

networks:
  semaphore_network:
    driver: bridge

Schritt 2: nginx-Konfiguration

# nginx.conf
events {
    worker_connections 1024;
}

http {
    upstream semaphore_backend {
        server semaphore-web:3000 max_fails=3 fail_timeout=30s;
    }

    server {
        listen 80;
        server_name localhost;

        location / {
            proxy_pass http://semaphore_backend;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            
            # Health check
            proxy_connect_timeout 5s;
            proxy_send_timeout 5s;
            proxy_read_timeout 60s;
        }
        
        location /health {
            access_log off;
            proxy_pass http://semaphore_backend/ping;
        }
    }
}

Schritt 3: Setup starten

# Docker Compose Setup starten
docker-compose -f docker-compose-ha.yml up -d

# Services überprüfen
docker-compose -f docker-compose-ha.yml ps

# Logs der Worker überwachen
docker-compose -f docker-compose-ha.yml logs -f semaphore-worker-1 semaphore-worker-2

Schritt 4: Load-Test-Script erstellen

#!/bin/bash
# load-test.sh

SEMAPHORE_URL="http://localhost:8080"
API_TOKEN="YOUR_API_TOKEN"
PROJECT_ID="1"
TEMPLATE_ID="1"
CONCURRENT_JOBS=10

echo "Starting load test with $CONCURRENT_JOBS concurrent jobs..."

# Array für Job-IDs
declare -a JOB_IDS

# Gleichzeitig Jobs starten
for i in $(seq 1 $CONCURRENT_JOBS); do
    echo "Starting job $i..."
    
    JOB_ID=$(curl -s -X POST \
        "$SEMAPHORE_URL/api/project/$PROJECT_ID/tasks" \
        -H "Authorization: Bearer $API_TOKEN" \
        -H "Content-Type: application/json" \
        -d "{
            \"template_id\": $TEMPLATE_ID,
            \"environment\": {
                \"load_test_job\": \"$i\",
                \"start_time\": \"$(date)\"
            }
        }" | jq -r '.id')
    
    if [ "$JOB_ID" != "null" ]; then
        JOB_IDS+=($JOB_ID)
        echo "✅ Job $i started with ID: $JOB_ID"
    else
        echo "❌ Failed to start job $i"
    fi
    
    sleep 1
done

echo "Started ${#JOB_IDS[@]} jobs. Monitoring progress..."

# Job-Status überwachen
while true; do
    RUNNING_JOBS=0
    COMPLETED_JOBS=0
    FAILED_JOBS=0
    
    for JOB_ID in "${JOB_IDS[@]}"; do
        STATUS=$(curl -s -H "Authorization: Bearer $API_TOKEN" \
            "$SEMAPHORE_URL/api/project/$PROJECT_ID/tasks/$JOB_ID" \
            | jq -r '.status')
        
        case $STATUS in
            "waiting"|"running")
                ((RUNNING_JOBS++))
                ;;
            "success")
                ((COMPLETED_JOBS++))
                ;;
            "error")
                ((FAILED_JOBS++))
                ;;
        esac
    done
    
    echo "Status: Running: $RUNNING_JOBS, Completed: $COMPLETED_JOBS, Failed: $FAILED_JOBS"
    
    if [ $RUNNING_JOBS -eq 0 ]; then
        echo "All jobs completed!"
        echo "Final results: Completed: $COMPLETED_JOBS, Failed: $FAILED_JOBS"
        break
    fi
    
    sleep 10
done

Schritt 5: Lastverteilung analysieren

#!/bin/bash
# analyze-load-distribution.sh

echo "=== Worker Performance Analysis ==="

# Worker-1 Logs analysieren
echo "Worker-1 Activity:"
docker-compose -f docker-compose-ha.yml logs semaphore-worker-1 | grep -c "Job started"

# Worker-2 Logs analysieren
echo "Worker-2 Activity:"
docker-compose -f docker-compose-ha.yml logs semaphore-worker-2 | grep -c "Job started"

# Database-Load analysieren
docker-compose -f docker-compose-ha.yml exec postgres psql -U semaphore -d semaphore -c "
SELECT 
    COUNT(*) as total_jobs,
    COUNT(CASE WHEN status = 'success' THEN 1 END) as successful_jobs,
    COUNT(CASE WHEN status = 'error' THEN 1 END) as failed_jobs,
    COUNT(CASE WHEN status = 'running' THEN 1 END) as running_jobs
FROM tasks 
WHERE created > NOW() - INTERVAL '1 hour';"

# Resource Usage analysieren
echo "=== Resource Usage ==="
docker stats --no-stream --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}"

61.5.4.3 Erwartetes Ergebnis

Diese Übungen führen progressiv von einfacher Log-Konfiguration über Datenbank-Wartung bis hin zu komplexen HA-Setups und bieten praktische Erfahrung mit allen wichtigen Aspekten des Semaphore-Betriebs.