Skip to content

Latest commit

 

History

History
88 lines (60 loc) · 16.2 KB

v3.1.0.md

File metadata and controls

88 lines (60 loc) · 16.2 KB

HPE Volume Driver for Kubernetes FlexVolume Plugin 3.1.0 Release Notes

HPE Volume Driver for Kubernetes FlexVolume Plugin 3.1.0

Version: 3.1.0
Revision: Friday Nov 1, 2019 22:05:17

The Release Notes describe the major changes, fixes, and known issues for this release of the HPE Volume Driver for Kubernetes FlexVolume Plugin 3.1.0. They do not include all individual fixes and internal changes.

For technical support, contact HPE Nimble Storage Support at: mailto:support@nimblestorage.com 877-3-NIMBLE (877-364-6253), option 2.

Special Notes

Note Description
Critical: HPE Nimble Storage continues to qualify configurations between releases. The Validated Configuration Matrix provides information about validated configurations and is updated frequently. It is a good practice to check your system configuration against this online tool. The Validated Configuration Matrix tool is available on HPE InfoSight: https://infosight.hpe.com/resources/nimble/validated-configuration-matrix
Important: HPE Nimble Storage Volume Driver for Kubernetes 3.1.0 supports multiple versions of NimbleOS starting with NimbleOS 5.0.8.x and 5.1.3.x. Details about this tool are available on the HPE Nimble Storage Github at https://github.com/hpe-storage/flexvolume-driver/blob/master/README.md.

New Features

  • HPE Volume Driver for Kubernetes FlexVolume Plugin 3.1.0 adds support for HPE Cloud Volumes.

Documentation

The documentation for HPE Nimble Storage and HPE Cloud Volumes Volume Plugin for Docker is available on the HPE Nimble Storage Github at https://github.com/hpe-storage/flexvolume-driver/blob/master/README.md.

The documentation on the hub provides details for setting up and using the product. It contains the most current information about the HPE Volume Driver for Kubernetes FlexVolume Plugin.

Additional HPE Nimble Storage user documentation is available on HPE InfoSight:

https://infosight.hpe.com/resources/nimble/docs

You can manually reach the documentation page by logging onto HPE InfoSight and selecting Resources > Nimble Documentation.

Third-Party Software Notices

All third-party software notices can found in the Documentation Portal on HPE InfoSight.

Here are the steps to manually access the third-party software notices:

  1. Log in to HPE Infosight (https://infosight.hpe.com/resources/nimble/docs) .
  2. From the menu, select the Resources > Nimble Documentation.
  3. In the navigation pane on the left of the Documentation Portal, scroll through the Document Type sections and select Support Policy.
  4. In the page that appears, select General Terms and Conditions. This document opens in a browser.

Known Critical Issues

The following table lists the critical known issues for HPE Volume Driver for Kubernetes FlexVolume Plugin 3.1.0.

ID Component Title Description Workaround
NLT-1353 nlt.docker.vol destroyOnRm option is not valid for volume import. If the destroyOnRm section is set in the overrides section of the config file (volume-driver.json) it will always throw an error. The destroyOnRm option is not valid for volume import even though it is set for the specific request. Remove the destroyOnRm flag in overrides section of volume-driver.json.

Resolved Critical Issues

There are no critical issues that were resolved in HPE Volume Driver for Kubernetes FlexVolume Plugin 3.1.0.

Resolved Issues

There are no non-critical issues that were resolved in HPE Volume Driver for Kubernetes FlexVolume Plugin 3.1.0.

Known Issues

The following table lists the known issues for HPE Volume Driver for Kubernetes FlexVolume Plugin 3.1.0.

ID Component Title Description Workaround
CON-538 flexvolume Duplicate bind mount paths on host After the host path mount is completed by FlexVolume driver, kubelet bind mounts this path into a pod namespace. Duplicate bind mount entries(/var/lib/kubelet/pods...) are often seen in /proc/mounts or mount output No action is necessary, beacause no functionality is affected. All bindmounts are cleaned up during the unmount operation automatically
CON-527 flexvolume Pod is stuck in creation state for a long time to come up The FlexVolume driver checks for /dev/disk/by-path symlinks to verify the logged-in iSCSI targets. If pods were moved back and forth between the nodes, there can be stale entries for that iSCSI target that point to a non-existent device. The FlexVolume Driver incorrectly assumes that the iSCSI target is logged-in for a new volume and avoids a new login. Thus mount will not succeed and pod is stuck in Creation state. This is seen very rarely and only when pods are moved between nodes. The workaround is to either reboot the node or to manually clean up stale symlinks under /dev/disk/by-path for the iSCSI target with the error
CON-529 flexvolume Node stuck after reboot After you reboot a node, it might take a long time come up. This appears to be a kubelet issue so there is no workaround at this time. If this issue occurs, the pods are failed over to another node without issues.
CON-456 flexvolume iSCSI connectivity failures with multiple interfaces When multiple interfaces are configured on each node for iSCSI traffic, if discovery IP is not reachable by all the interfaces, then FlexVolume driver will take a long time during mount requests. No action is necessary. The mount will be successful using at least one good connection. But path redundancy will be lost across different interfaces.
NLT-1630 nlt.docker.vol XFS Mount Error If the event of a failover or failback, there may be scenarios in which the mount fails due to errors such as the following:
1. rc=32 err=mount: /dev/mapper/mpathfhw: can't read superblock
2. rc=888 mount timed out
Repair the filesystem on the volume:
1. Stop the nimbled service on host (systemctl stop nimbled)
2. On the array, add the ACL to the volume.
3. Scan the device from the host and find its mpath name: echo "- - 99" > /sys/class/scsi_host/host7/scan echo "- - 99" > /sys/class/scsi_host/host8/scan
4. Create a temporary directory to mount the volume: mkdir -p /tmp/kk
5. Mount the volume: mount /dev/mapper/mpathfhw /tmp/kk
6. Repair the filesystem on the volume and then verify the data is recovered: xfs_repair /dev/mapper/mpathfhw
7. Unmount the volume: umount /tmp/kk
8. On the array, remove the ACL from the volume.
9. Start nimbled service (systemctl start nimbled)
10. Once nimbled is up and running, the volume should get mounted from the host and the pod should be able to come up
NLT-1623 nlt.docker.vol Ubuntu FC Hosts Mount Failures With latest multipath packages in Ubuntu 16 and 18, lsscsi and multipath are not in sync and multipath does not contain updated paths, which may lead to mount failures.
1 # lsscsi | grep :15 [7:0:4:15] disk Nimble Server 1.0 /dev/sdc
2 # cat /sys/block/sdc/device/state running
3 # sg_vpd -p di /dev/sdc | grep vendor vendor specific: fc62cac62d184df46c9ce900a5007a7f
4 # multipath -ll | grep fc62cac62d184df46c9ce900a5007a7f
5 # multipathd show paths | grep fc62cac62d184df46c9ce900a5007a7f HPE Nimble Storage will be working with Ubuntu to raise this issue with multipath
1. Execute multipathd reconfigure and multipath commands.
2 . Verify that all the above paths from lsscsi show up in multipath show paths. If not, then contact HPE Nimble Support.
After running multipathd reconfigure:
# multipathd show paths | grep :15 7:0:4:15 sdc 8:32 1 active ghost running XXX....... 6/20
# multipathd show paths format "%w %d %t %i %o %T %c %r %R %m" | grep fc62cac62d184df46c9ce900a5007a7f 2fc62cac62d184df46c9ce900a5007a7f sdc active 7:0:4:15 running ghost tur 0x56c9ce90c75b4704 0x10000090fa8eeff8 mpathog
NLT-1611 nlt.docker.vol Unable to Re-Create Pod When Its Associated PVC is in Terminating State Known issue as per https://bugzilla.redhat.com/show_bug.cgi?id=1570606 None.
NLT-1548 nlt.docker.vol PVC Creations Stuck in Pending State on Ubuntu 18.04 node When creating a large number of PVCs (100s), HPE Nimble Storage has observed that the multipathd daemon on the host times out with certain commands like "show paths" and "show maps”. Due to this, the volume plugin cannot fetch mounted devices/path information, which causes PVC creations to be stuck in a pending state. multipathd and systemd-udevd services also seem to hang, causing failures with any further volume attached to the node. Currently there is no fix for this. HPE Nimble Storage is working with Ubuntu support on getting this issue fixed in multipath-tools updates. This was seen with large number of PVC creates (100s). To mitigate this issue, the storage administrator should create PVCs in an incremental fashion (20-50) on Ubuntu 18.04. Once the fix is available in multipath-tools, this limitation can be ignored.
NLT-1592 nlt.docker.vol Stale SCSI Paths Appearing on the Host After a PVC Create Completes In an FC environment, SCSI paths for the volume are appearing on the host, even after a PVC create is complete and multipath device is cleaned up. These paths can be seen using utilities, such as lsscsi, or by using /proc/scsi/scsi. You can usually ignore these paths. To clean up these state paths, make sure that volume with the same serial is not currently attached to host:
1. Get the serial number of the path using below command, eg
# cat /sys/class/scsi_device/8\:0\:3\:0/device/vpd_pg80? e362fc505e0b9ed56c9ce90029989377
2. Verify multipath is not using any map with this serial using multipath -ll output.
3. Fetch the LUN ID for the path from lsscsi or /proc/scsi/scsi file
4. From the Nimble group, check if the volume matching the Lun ID and serial number is attached to the host in initiator group.
5. If the above volume does not have an ACL on the array corresponding to the host, clean up the device as follows (e.g for device 8:0:3:0)
5.1 echo offline > /sys/class/scsi_device/8\:0\:3\:0/device/state 5.2 echo 1 > /sys/class/scsi_device/8\:0\:3\:0/device/delete
NLT-1593 nlt.docker.vol After Pod Eviction, Rescheduling the Pod to the Same Node Within 150 Seconds Causes FS Metadata Inconsistencies If a pod is evicted or rescheduled to another node, sometimes the volume will not be gracefully removed from the current node because K8s can’t guarantee the unmount on the volume before the volume takeover from another node occurs. This can cause I/O's pending on the first node. If the volume is rescheduled back to the original node within 150 seconds (all paths down timeout with multipathd), those I/O's can be flushed again, causing overwrites. As a result, the volume mounts can fail. It is recommended that you avoid the reschedule/failback of pods within quick succession between nodes. File system repair tools, such as xfs_repair, must be used to replay the log in order to mount the volume. HPE Nimble Storage also recommends setting the node to panic after 150 seconds of hung_task_timeout. This will ensure that failed paths and mounts are properly cleaned up before new pods can be scheduled. This is documented in the Linux Integration Guide, which is available from Infosight.
NLT-1594 nlt.docker.vol Unable to Mount Volumes Due to Multipathd Errors, Such as "failed in domap" At times, multipathd is unable to add SCSI paths for the volume into multipath tables. After udev event is generated, the following errors can appear in /var/log/syslog or in /var/log/messages: multipathd show paths hcil dev dev_t pri dm_st chk_st dev_st next_check 635:0:0:0 sdad 65:208 50 undef undef unknown orphan 636:0:0:0 sdae 65:224 50 undef undef unknown orphan snapshot from /var/log/messages multipathd[1998]: mpathvw: failed in domap for addition of new path sdad multipathd[1998]: uevent trigger error Re-Trigger udev add events by running the following command and reconfigure multipathd. udevadm trigger --subsystem-match=block --action=add multipathd reconfigure
NLT-1511 nlt.docker.vol Failed Paths May Exist on Disruptive Failover of Pods When a node is failed/shutdown, all pods of these nodes will be initiated to be failover to the new node. If the the shutdown node is brought up too soon (before all the pods have failed over to the new node) this can cause devices on the old node to be discovered, which would eventually be failed when pods have moved over to the new node. Even though failed devices would exist, fail back or new device attach on the same LUN will be executed successfully as the system will handle any remapped LUNs on this host.
NLT-1459 nlt.docker.vol allowOverrides Not Working as Desired with destroyOnDetach The allowOverrides parameter dictates the behavior of the HPE Nimble Kube Storage Controller. If destroyOnDetach is overridden, only the first detach will destroy the volume. The next creation will be a "create on attach" by the HPE Nimble FlexVolume driver as dictated by the PV created by K8s. The allowOverrides parameter will be populated on the PV but the intent from the PVC (true/false) can't be read by the FlexVolume driver as it has no access to the K8s API to inspect the bound PVC. This is a limitation by the K8s FlexVolume plugin framework and will be addressed with the Container Storage Interface (CSI) driver in a future release.
NLT-1354 nlt.docker.vol Container on Upstream Become Inaccessible after Takeover and reverseRepl. After the downstream replica volume is imported with takeover and reverRepl enabled it would reverse the replication direction making the original upstream volume as replica and offline. This could cause any active container on the original upstream volume to be inaccessible. Make sure there are no active containers or I/O running on the upstream volume when a takeover with reverseRepl is initiated on the downstream.
NLT-1350 nlt.docker.vol Import Docker Volume Breaks Original Replication Synchronization with Takeover When a volume is imported with takeover on the downstream with the takeover option, both upstream and downstream become joint owners of the volume collection. If replication is enabled on this volume, it will cause a synchronization error. To avoid a synchronization error, only one group should be the owner. The original volume on upstream needs to demote itself so that only the downstream is the owner.
CON-272 con.docker.vol HPE Dynamic Provisioner Is Unable to Communicate with FlexVolume Driver HPE dynamic provisioner (doryd) needs to communicate with FlexVolume driver for provisioning volumes. In OCP, doryd might be running on master where FlexVolume Daemonset pod might not be scheduled due to default nodeSelector as noted in https://docs.openshift.com/container-platform/3.11/dev_guide/daemonsets.html. This causes failures during volume provisioning. Add nodeSelector of compute nodes to Dynamic-Provisioner (Doryd) spec in YAML file, so that it runs along with FlexVolume plugin on same node.
CON-346 con.docker.vol Mount Operation Fails with Multipath Error "Error Receiving Packet" When a Large Number of Volumes are Connected to Host This is due to a known issue: https://access.redhat.com/solutions/3350821 Add the parameter uxsock_timeout 10000 to the defaults section of the multipath.conf file for Ubuntu 16.04/18.04.
CON-377 con.docker.vol HPE FlexVolume Driver May Not be Dynamically Initialized by Kubelet At times the node service (kubelet) might not initialize the HPE FlexVolume Driver dynamically after the bits are deployed. You can trigger dynamic scanning of volume-plugin-dir by kubelet by accessing the FlexVolume driver. Any access to the FlexVolume driver triggers dynamic scanning. For example, just touching /usr/libexec/kubernetes/kubelet-plugins/volume/exec/hpe.com~nimble/nimble should initialize it.