Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set READONLY flag in CSI PV based on PVC accessmode #469

Merged
merged 2 commits into from
Aug 10, 2021

Conversation

humblec
Copy link
Contributor

@humblec humblec commented Aug 21, 2020

Refer: kubernetes/kubernetes#70505

The provisioner sidecar now has an argument called `controller-publish-readonly` which sets the value of CSI PV spec `readonly` field value based on the PVC access mode. If this flag is set to `true` and the PVC access mode only contains the `ROX` access mode, the controller automatically sets `PersistentVolume.spec.CSIPersistentVolumeSource.readOnly` field to `true`.

Signed-off-by: Humble Chirammal hchiramm@redhat.com

@k8s-ci-robot k8s-ci-robot added the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Aug 21, 2020
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Aug 21, 2020
@humblec
Copy link
Contributor Author

humblec commented Aug 21, 2020

@msau42 this PR try to address the volume readonly issue or AI item mentioned in sig call. As mentioned in one of the sig call I still have doubt on :kubernetes/kubernetes#70505 (comment). Appreciated if you can share your thoughts on this PR and the mentioned comment.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Aug 21, 2020
@humblec humblec force-pushed the some-fox branch 2 times, most recently from 232c23e to b653dc2 Compare August 21, 2020 14:15
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 21, 2020
@humblec
Copy link
Contributor Author

humblec commented Aug 21, 2020

/assign @msau42
/assign @jsafrane

@humblec
Copy link
Contributor Author

humblec commented Aug 21, 2020

/release-note-none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Aug 21, 2020
@humblec
Copy link
Contributor Author

humblec commented Aug 24, 2020

/assign @xing-yang

pvReadOnly = true
}
}

Copy link
Contributor

@gnufied gnufied Aug 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if presence of ROX accessMode means volume should be controller published as readonly. To me, these can be orthogonal concerns. cc @bswartz

Copy link
Contributor Author

@humblec humblec Aug 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true.. but regardless of how controller publish, the PV source could still set it RO .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but interpreting ROX as readonly in PV source should be done with caution. It could make the PV unmountable in case journal needs to be replayed or something else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was going to argue the same. Volumes should not be controller-published as read-only unless the PV is marked read-only. There are too many good reasons for publishing them writable to the node and then having the node re-mount them as read-only to the pod.

Journal-replays are one use case. Pending expand operations are another.

I interpret PVs with the readOnly: true flag to be PERMANENTLY read-only, which you almost never want in practice. We might want to make it easier to cause that PV flag to get set to true in some cases, but I think the decision for ControllerPublishVolume should look at that flag only and not look at access modes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the FS operations may need it RW while controller publish. But SP driver is free to do the operation or default to RW while controller publishing or node staging. Isnt it ?

I interpret PVs with the readOnly: true flag to be PERMANENTLY read-only, which you almost never want in practice.

I could think of a scenario where user provision a new volume as "RO" with dataSource set to PVC or Snapshot. Isnt it a valid use case where user want a PV to be READONLY?

@saad-ali @msau42 @jsafrane @xing-yang inviting your thoughts .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. the pod tries to come up, the mount fails because the kernel is unable to replay the journal on a readonly device

I'd like to question this part. I expect filesystems to be prepared for this case. Consider a normal device that degrades such that writes start to fail. The normal reaction of the kernel is to remount the filesystem as read-only. Being able to mount read-only after a system reboot to extract more data seems like a sensible feature to support - much better than refusing to mount completely.

Whether data and files are consistent is another question, but it's not different from protecting against a sudden power loss.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, I want to be clear that I'm not trying to block this issue. I think it's a good change, and I'd like to see it merge.

I was just trying to answer the question of what potential downsides this change could have, because it does introduce this new corner case with journaled filesystems. We shouldn't be surprised to see support cases from people with problems mounting readonly volumes that were cloned from in-use filesystems or snapshots taken of in-use filesystems, if journaling is in use.

There are some workarounds we can either recommend to users, or possibly even build into our CSI drivers to automatically overcome this class of problems. The most obvious workaround is to create a volume that's writable and use that instead. Or, create a writable volume, attach that to a pod that does nothing (to let the journal replay), then clone that volume after the pod has terminated cleanly (so the journal is empty).

The specific kernel error I see when I reproduce this problem with ext4 is:
[587843.214656] EXT4-fs (nbd2): INFO: recovery required on readonly filesystem
[587843.214657] EXT4-fs (nbd2): write access unavailable, cannot proceed (try mounting with noload)

Possibly CSI drivers could use the -o noload option with ext4 filesystems when mounting readonly to avoid the problem.

When using XFS, the kernel errors look like this:
[588571.823845] XFS (nbd1): Mounting V5 Filesystem
[588571.881098] XFS (nbd1): recovery required on read-only device.
[588571.881100] XFS (nbd1): write access unavailable, cannot proceed.
[588571.881101] XFS (nbd1): log mount/recovery failed: error -30
[588571.881123] XFS (nbd1): log mount failed

Some searching turns up the -o norecovery options for XFS which might also work.

In summary, I like this change but I'd urge that CSI driver authors consider special handling for mounting of volumes published read-only at the controller level. And if users do get stuck with unmountable volumes, be ready to suggest the workaround of using a writable volume.

Copy link
Contributor Author

@humblec humblec Apr 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bswartz @pohly thanks for the inputs.. First of all good discussion about the concerned area.

Create an writable volume, with an ext4 (or other journaling) filesystem
Pod attaches to the volume, and begins writing data
Node containing that pod crashes, loses power, or otherwise dies suddenly
Take a snapshot of the volume before it is attached to another pod
....

Let me share some thoughts here: In general, a storage vendor does seizing of in flight i/os and also put write barriers in place to make sure the data is persisted correctly while snapshot is taken. Thats one of the main requirement of consistent snapshots. Thus, in above scenario, it would be ideally achieved by the SP. From CSI layer, I assume it should not be of much concern. iow, a snapshot taken on an in used volume or snapshot taken after a crash or node/pod loss comes pretty much to the same situation and SP or any layer responsible at storage low level should take care of it. While it comes to controller publish/node stage, again, decision can be made by SP to node stage or controller publish either by respecting the pv spec or not.
Regardless, the mention about the possibilities of a 'unclean journal' situation and a workaround/solution is indeed a good point 👍 . I am not sure about the different mount options of ext4 or xfs mentioned, have to look into the details of these.
About the direction of this PR and its completion, I feel its good wait some more input from storage sig.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the best thing to do before merging this fix would be to add documentation to the CSI spec around the PUBLISH_READONLY controller capability. Drivers that assert that capability will need to be prepared to mount unmodifiable volumes on the nodes, which means understanding how to deal with unclean journals of each of the fstypes that your driver supports.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we wait for others to share their thoughts, let me rebase the PR and keep it updated with latest code.

@humblec
Copy link
Contributor Author

humblec commented Sep 4, 2020

@msau42 @xing-yang @saad-ali @jsafrane ptal 👍

@xing-yang
Copy link
Contributor

@humblec In the proposal described in kubernetes/kubernetes#70505, there is also the following. I don't see that is addressed by this PR?

Add new csiControllerPublishVolumeReadOnly StorageClass parameter to CSI external-provisioner

@humblec
Copy link
Contributor Author

humblec commented Sep 4, 2020

@humblec In the proposal described in kubernetes/kubernetes#70505, there is also the following. I don't see that is addressed by this PR?

Add new csiControllerPublishVolumeReadOnly StorageClass parameter to CSI external-provisioner

@xing-yang I was in an impression that, the SC param is an optional one. Or is it mandatory that, we will expose the SC param too ?

@xing-yang
Copy link
Contributor

@humblec In the proposal described in kubernetes/kubernetes#70505, there is also the following. I don't see that is addressed by this PR?

Add new csiControllerPublishVolumeReadOnly StorageClass parameter to CSI external-provisioner

@xing-yang I was in an impression that, the SC param is an optional one. Or is it mandatory that, we will expose the SC param too ?

@humblec based on kubernetes/kubernetes#70505, "Add new csiControllerPublishVolumeReadOnly StorageClass parameter to CSI external-provisioner" should be addressed as well.

@saad-ali can you chime in as you wrote the original proposal.

@humblec
Copy link
Contributor Author

humblec commented Oct 23, 2020

@humblec In the proposal described in kubernetes/kubernetes#70505, there is also the following. I don't see that is addressed by this PR?

Add new csiControllerPublishVolumeReadOnly StorageClass parameter to CSI external-provisioner

@xing-yang I was in an impression that, the SC param is an optional one. Or is it mandatory that, we will expose the SC param too ?

@humblec based on kubernetes/kubernetes#70505, "Add new csiControllerPublishVolumeReadOnly StorageClass parameter to CSI external-provisioner" should be addressed as well.

Somehow I was in an impression that its either 1 or 2 in that proposal :), so I was trying clarify. I will wait for @saad-ali thought to double confirm.

@saad-ali can you chime in as you wrote the original proposal.

@humblec
Copy link
Contributor Author

humblec commented Jul 13, 2021

[Changes in the last refresh compared to previous version]

  • new flag has been introduced in the provisioner sidecar
  • Removed SC param setting which was the initial workflow based on the referenced issue
  • Test cases have been added for provisioner sidecar flag to be true and false against MULTI_NODE_MULTI_READER capability

PTAL.. Thanks 👍

@@ -103,17 +103,13 @@ var (
nodeDeploymentImmediateBinding = flag.Bool("node-deployment-immediate-binding", true, "Determines whether immediate binding is supported when deployed on each node.")
nodeDeploymentBaseDelay = flag.Duration("node-deployment-base-delay", 20*time.Second, "Determines how long the external-provisioner sleeps initially before trying to own a PVC with immediate binding.")
nodeDeploymentMaxDelay = flag.Duration("node-deployment-max-delay", 60*time.Second, "Determines how long the external-provisioner sleeps at most before trying to own a PVC with immediate binding.")
publishROXVol = flag.Bool("publish-rox-vol", false, "This option enables PV to be marked as readonly at nodepublish call if PVC accessmode has been set to ROX.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the analysis at kubernetes/kubernetes#70505, this impacts ControllerPublish, not NodePublish

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.. updated , thanks @msau42 . 👍

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 5, 2021
leaderelection interface has been defined which is not in use at
the moment. This commit remove the same.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 6, 2021
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Aug 6, 2021

@humblec: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
pull-kubernetes-csi-external-provisioner-distributed-on-kubernetes-1-20 0f89907 link /test pull-kubernetes-csi-external-provisioner-distributed-on-kubernetes-1-20

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@@ -103,17 +103,13 @@ var (
nodeDeploymentImmediateBinding = flag.Bool("node-deployment-immediate-binding", true, "Determines whether immediate binding is supported when deployed on each node.")
nodeDeploymentBaseDelay = flag.Duration("node-deployment-base-delay", 20*time.Second, "Determines how long the external-provisioner sleeps initially before trying to own a PVC with immediate binding.")
nodeDeploymentMaxDelay = flag.Duration("node-deployment-max-delay", 60*time.Second, "Determines how long the external-provisioner sleeps at most before trying to own a PVC with immediate binding.")
publishROXVol = flag.Bool("publish-rox-vol", false, "This option enables PV to be marked as readonly at controllerpublish call if PVC accessmode has been set to ROX.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of calling the flag something like "controller-publish-readonly".

Also in the help text, use the full csi call name "ControllerPublishVolume"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also update the README?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msau42 yeah, thats a good suggestion, considering we are on controller publish, controller-publish-readonly suite better. Let me update the flag and README.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msau42 done.. thanks

@@ -417,7 +417,7 @@ func TestCreateDriverReturnsInvalidCapacityDuringProvision(t *testing.T) {

pluginCaps, controllerCaps := provisionCapabilities()
csiProvisioner := NewCSIProvisioner(clientSet, 5*time.Second, "test-provisioner", "test",
5, csiConn.conn, nil, driverName, pluginCaps, controllerCaps, "", false, true, csitrans.New(), nil, nil, nil, nil, nil, false, defaultfsType, nil)
5, csiConn.conn, nil, driverName, pluginCaps, controllerCaps, "", false, true, csitrans.New(), nil, nil, nil, nil, nil, false, defaultfsType, nil, true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should investigate this separately, but I think the number of function arguments is getting to the point that we should probably have an options struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

completely agree here, the arg has grown a lot and even the test case organization is difficult in general with current state. I will revisit this part in a followup PR. thanks.

based on newly added "controller-publish-read-only" and PVC accessmode setting
`ReadOnly` field in CSI persistent volume source has been set to `true`.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
@jsafrane
Copy link
Contributor

@humblec, please update release-note section of the first comment of this PR to reflect the latest changes.

@humblec
Copy link
Contributor Author

humblec commented Aug 10, 2021

@humblec, please update release-note section of the first comment of this PR to reflect the latest changes.

Sure @jsafrane

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Aug 10, 2021
@humblec
Copy link
Contributor Author

humblec commented Aug 10, 2021

@humblec, please update release-note section of the first comment of this PR to reflect the latest changes.

Sure @jsafrane .

Now the release note has been updated, ptal.. Thanks @jsafrane

@msau42
Copy link
Collaborator

msau42 commented Aug 10, 2021

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 10, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: humblec, msau42, saad-ali

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@chrishenzie
Copy link
Contributor

Adding a hold to give others an opportunity to review.

Removing now that PR is LGTM'd and approved.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 10, 2021
@k8s-ci-robot k8s-ci-robot merged commit c0d7087 into kubernetes-csi:master Aug 10, 2021
@chrishenzie
Copy link
Contributor

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 10, 2021
@msau42 msau42 mentioned this pull request Aug 11, 2021
@msau42 msau42 moved this from To do to Done in K8s 1.22 Sep 28, 2021
kbsonlong pushed a commit to kbsonlong/external-provisioner that referenced this pull request Dec 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.

None yet