weird issue with pv, pvc, efs, csi, pod stuck at EKS

have been facing various different issues with one EKS cluster,

  • the pods is stuck at failedMount, even though the PV and PVC has been in bound state, and the PVC also shows bound to the specific pods, the EFS on aws is also in good state
## relevant error message
list of unattached/unmounted volumes

Warning  FailedMount            1m (x3 over 6m)  kubelet, .....  Unable to mount volumes for pod "mypod1_test(c038c571-00ca-11e8-b696-0ee5f3530ee0)": timeout expired waiting for volumes to attach/mount for pod "test"/"mypod1". list of unattached/unmounted volumes=[aws]


Normal   Scheduled    5m45s                default-scheduler                                       Successfully assigned /test3 to ....
  Warning  FailedMount  83s (x2 over 3m42s)  kubelet, ...  Unable to mount volumes for pod "test3_....(fe03d56e-aa26-11ea-9d9c-067c9e734f0a)": timeout expired waiting for volumes to attach or mount for pod ""/"test3". list of unmounted volumes=[mypd]. list of unattached volumes=[mypd default-token-mznhh]
 
  • related, it shows the efs csi driver not available
aws ebs csi driver 

Warning  FailedMount  9m1s (x4 over 15m)     kubelet, ....Unable to mount volumes for pod "test3_...(f0d65986-aa27-11ea-a7b2-022dd8ed078a)": timeout expired waiting for volumes to attach or mount for pod ""/"test3". list of unmounted volumes=[mypd]. list of unattached volumes=[mypd default-token-mznhh]
  Warning  FailedMount  2m35s (x6 over 2m51s)  kubelet, ... MountVolume.MountDevice failed for volume "scratch-pv" : driver name efs.csi.aws.com not found in the list of registered CSI drivers
  Warning  FailedMount  2m19s                  kubelet, ....  MountVolume.SetUp failed for volume ".." : rpc error: code = Internal desc = Could not mount "fs-43b99802:/" at "/var/lib/kubelet/pods/f0d65986-aa27-11ea-a7b2-022dd8ed078a/volumes/kubernetes.io~csi/scratch-pv/mount": mount failed: exit status 1
Mounting command: mount
Mounting arguments: -t efs fs-43b99802:/ /var/lib/kubelet/pods/f0d65986-aa27-11ea-a7b2-022dd8ed078a/volumes/kubernetes.io~csi/scratch-pv/mount
Output: Failed to resolve "fs-43b99802....amazonaws.com" - check that your file system ID is correct.
See https://docs.aws.amazon.com/console/efs/mount-dns-name for more detail.
  • pods stuck at termination state, even thought the svc, deployment, and rs has been killed, the pod still has been stuck at termination

the final solution to sort out all is to reboot the EKS worker node

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s