Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pulsar-operator] Disable/Debug startup probe failure #1146

Open
tglanz opened this issue Feb 22, 2024 · 0 comments
Open

[pulsar-operator] Disable/Debug startup probe failure #1146

tglanz opened this issue Feb 22, 2024 · 0 comments

Comments

@tglanz
Copy link

tglanz commented Feb 22, 2024

Hi,

I try to use the pulsar-operator:0.17.9 to deploy a relatively small pulsar cluster in GKE. My configuration includes only the manifests listed below (which are practically unchanged from the examples).

Everything set ups nicely and all of the resources are created validly, besides the bookkeepers - They fail due to startup probe checks with a "connection refused". The describe and logs of one of those pods is below as well.

The controller managers don't have any relevant log information.

I was wondering

  1. Is there anyway I can easily disable the liveness checks so I can enter the containers for debug? Any other way to debug?
  2. Do you have any idea for what the reason?

Any help will be appreciated.
Thanks!

apiVersion: pulsar.streamnative.io/v1alpha1
kind: PulsarBroker
metadata:
  name: brokers
spec:
  image: streamnative/sn-platform-slim:2.10.5.3
  pod:
    resources:
      requests:
        cpu: 200m
        memory: 256Mi
      limits:
        cpu: 500m
        memory: 512Gi
    securityContext:
      runAsNonRoot: true
  replicas: 3
  zkServers: zookeepers-zk:2181
apiVersion: bookkeeper.streamnative.io/v1alpha1
kind: BookKeeperCluster
metadata:
  name: bookkeepers
spec:
  image: streamnative/sn-platform-slim:2.10.5.3
  replicas: 3
  pod:
    resources:
      requests:
        cpu: 200m
        memory: 256Mi
      limits:
        cpu: 500m 
        memory: 512Mi
    securityContext:
      runAsNonRoot: true
  storage:
    journal:
      numDirsPerVolume: 1
      numVolumes: 1
      volumeClaimTemplate:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 4Gi
    ledger:
      numDirsPerVolume: 1
      numVolumes: 1
      volumeClaimTemplate:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 8Gi
    reclaimPolicy: Delete
  zkServers: zookeepers-zk:2181
kind: ZooKeeperCluster
metadata:
  name: zookeepers
spec:
  image: streamnative/sn-platform-slim:2.10.5.3
  pod:
    resources:
      requests:
        cpu: 50m
        memory: 256Mi
      limits:
        cpu: 100m
        memory: 512Mi
    securityContext:
      runAsNonRoot: true
  persistence:
    data:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 4Gi
    dataLog:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi
    reclaimPolicy: Delete
  replicas: 3

describe

Name:                 pulsar-bookkeepers-bk-0
Namespace:            data
Priority:             1000
Priority Class Name:  global-default
Service Account:      default
Node:                 REDACTED/10.0.0.41
Start Time:          Thu, 22 Feb 2024 15:44:59 +0200  
Labels:               cloud.streamnative.io/app=pulsar
                      cloud.streamnative.io/cluster=pulsar-bookkeepers
                      cloud.streamnative.io/component=bookie
                      controller-revision-hash=pulsar-bookkeepers-bk-864ffd9d
                      statefulset.kubernetes.io/pod-name=pulsar-bookkeepers-bk-0
Annotations:          cloud.streamnative.io/checksum-config: 1A8D975DA88DF16C
                      operator-sdk/primary-resource: data/pulsar-bookkeepers
                      operator-sdk/primary-resource-type: pod
                      prometheus.io/port: 8000
                      prometheus.io/scrape: true
Status:               Running
IP:                   10.4.3.65
IPs:
  IP:           10.4.3.65
Controlled By:  StatefulSet/pulsar-bookkeepers-bk
Containers:
  bookie:
    Container ID:  containerd://916ce6fb761b343ca26999aa10339f7674daab834702e6547e877562c08bc707
    Image:         streamnative/sn-platform-slim:2.10.5.3
    Image ID:      docker.io/streamnative/sn-platform-slim@sha256:a85536ac3684e0a00026b207ac00833be1e69cfeef29f66855a7b61efbd4c25b
    Ports:         8000/TCP, 3181/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      bash
      -c
    Args:
      bin/apply-config-from-env.py conf/bookkeeper.conf &&
                        if [ -x scripts/run-bookie.sh ];then exec scripts/run-bookie.sh;else exec bin/pulsar bookie; fi
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 22 Feb 2024 15:51:27 +0200
      Finished:     Thu, 22 Feb 2024 15:51:30 +0200
    Ready:          False
    Restart Count:  6
    Limits:
      cpu:     500m
      memory:  512Mi
    Requests:
      cpu:      200m
      memory:   256Mi
    Liveness:   http-get http://:http/api/v1/bookie/state delay=30s timeout=5s period=3s #success=1 #failure=10
    Readiness:  http-get http://:http/api/v1/bookie/is_ready delay=30s timeout=5s period=3s #success=1 #failure=10
    Startup:    http-get http://:http/api/v1/bookie/is_ready delay=0s timeout=5s period=3s #success=1 #failure=200
    Environment Variables from:
      pulsar-bookkeepers-bk-config  ConfigMap  Optional: false
    Environment:                    <none>
    Mounts:
      /pulsar/data/bookkeeper/journal-0 from journal-0 (rw)
      /pulsar/data/bookkeeper/ledgers-0 from ledgers-0 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-grtsv (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  journal-0:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  journal-0-pulsar-bookkeepers-bk-0
    ReadOnly:   false
  ledgers-0:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  ledgers-0-pulsar-bookkeepers-bk-0
    ReadOnly:   false
  kube-api-access-grtsv:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                     From                     Message
  ----     ------                  ----                    ----                     -------
  Normal   Scheduled               9m46s                   default-scheduler        Successfully assigned data/pulsar-bookkeepers-bk-0 to REDACTED
  Normal   SuccessfulAttachVolume  9m38s                   attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-93f8a07d-1257-4b0c-ba56-5d6d5f8a7bb8"
  Normal   SuccessfulAttachVolume  9m33s                   attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-d0091848-c9d9-46da-80bf-eab0b0b2c9a9"
  Normal   Pulled                  8m31s (x4 over 9m32s)   kubelet                  Container image "streamnative/sn-platform-slim:2.10.5.3" already present on machine
  Normal   Created                 8m31s (x4 over 9m32s)   kubelet                  Created container bookie
  Normal   Started                 8m31s (x4 over 9m32s)   kubelet                  Started container bookie
  Warning  Unhealthy               8m29s (x4 over 9m29s)   kubelet                  Startup probe failed: Get "http://10.4.3.65:8000/api/v1/bookie/is_ready": dial tcp 10.4.3.65:8000: connect: connection refused
  Warning  BackOff                 4m31s (x28 over 9m22s)  kubelet                  Back-off restarting failed container bookie in pod pulsar-bookkeepers-bk-0_data(b3f3b793-2503-4d4a-9b42-e103c3b004b4)

logs

[conf/bookkeeper.conf] Applying config autoRecoveryDaemonEnabled = false
[conf/bookkeeper.conf] Applying config compactionRateByBytes = 52428800
[conf/bookkeeper.conf] Applying config fileInfoFormatVersionToWrite = 1
[conf/bookkeeper.conf] Applying config gcWaitTime = 300000
[conf/bookkeeper.conf] Applying config isThrottleByBytes = true
[conf/bookkeeper.conf] Applying config journalDirectories = /pulsar/data/bookkeeper/journal-0
[conf/bookkeeper.conf] Applying config journalFormatVersionToWrite = 6
[conf/bookkeeper.conf] Applying config journalMaxBackups = 0
[conf/bookkeeper.conf] Applying config ledgerDirectories = /pulsar/data/bookkeeper/ledgers-0
[conf/bookkeeper.conf] Applying config numHighPriorityWorkerThreads = 1
[conf/bookkeeper.conf] Applying config numReadWorkerThreads = 1
[conf/bookkeeper.conf] Applying config persistBookieStatusEnabled = false
[conf/bookkeeper.conf] Applying config httpServerEnabled = true
[conf/bookkeeper.conf] Applying config httpServerPort = 8000
[conf/bookkeeper.conf] Applying config prometheusStatsHttpPort = 8000
[conf/bookkeeper.conf] Applying config useHostNameAsBookieID = true
[conf/bookkeeper.conf] Applying config zkServers = pulsar-zookeepers-zk:2181
[conf/bookkeeper.conf] Updating config autoRecoveryDaemonEnabled = false
[conf/bookkeeper.conf] Updating config compactionRateByBytes = 52428800
[conf/bookkeeper.conf] Updating config fileInfoFormatVersionToWrite = 1
[conf/bookkeeper.conf] Updating config gcWaitTime = 300000
[conf/bookkeeper.conf] Updating config isThrottleByBytes = true
[conf/bookkeeper.conf] Updating config journalDirectories = /pulsar/data/bookkeeper/journal-0
[conf/bookkeeper.conf] Updating config journalFormatVersionToWrite = 6
[conf/bookkeeper.conf] Updating config journalMaxBackups = 0
[conf/bookkeeper.conf] Updating config ledgerDirectories = /pulsar/data/bookkeeper/ledgers-0
[conf/bookkeeper.conf] Updating config numHighPriorityWorkerThreads = 1
[conf/bookkeeper.conf] Updating config numReadWorkerThreads = 1
[conf/bookkeeper.conf] Updating config persistBookieStatusEnabled = false
[conf/bookkeeper.conf] Adding config useTransactionalCompaction = true
++ PULSAR_HOME=/pulsar
++ BOOKIE_PORT=3181
++ cd /pulsar
++ '[' '' == true ']'
++ echo 'Skipping rack-awareness setup'
Skipping rack-awareness setup
++ exec /pulsar/bin/pulsar bookie
Stream closed EOF for data/pulsar-bookkeepers-bk-0 (bookie)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant