-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No LustreFS support in newer variants (k8s-1.28, ecs-2) #3459
Comments
This affects all the new variants using 6.1:
Other variants are not affected. |
What options does one have if one needs continued FSx support? Not upgrade to EKS 1.28? Because you can't downgrade, and you won't know this is an issue, until you upgrade. Just ran into this issue in one of our clusters |
That's a good question @autarchprinceps. Assuming you are not relying on any Kubernetes 1.28 functionality, you could deploy new worker nodes using a Bottlerocket 1.27 AMI. That is a supported Kubernetes configuration that would allow you to still use Lustre, even after the EKS cluster has been upgraded to 1.28. |
Another workaround that we have chosen, to avoid version skew, is to leverage Karpenter to force pods that require FSX onto AL2(Kernel 5.10) until this is resolved and we can move back to BottleRocket for those services. |
A (very) quick test pre-release showed that 6.1 had the LustreFS configuration and patch from upstream, and that the pieces appeared to be able to speak to a Lustre file system. If you feel confident in upstream testing, this may be sufficient. If not, let me know and I can add Lustre testing to our loop. |
I'd like to see positive proof that the Lustre CSI driver works on Bottlerocket with the 6.1 kernel before we resolve this and add a release note about it. |
We are running Kubernetes 1.29 now and had to switch back to
Searching around, the Any ideas or further debugging I can help with? |
I tested the 1.20 CSI drivers on version 1.19.4 of our k8s-1.28 variant. I can (probably should) go back and try this with 1.29, but the bottlerocket AMIs for 1.28 and 1.29 use the same kernel. I know "it works for me" is not helpful; it obviously does not work for @natehudson.
For my testing, I followed the getting-started example in the EKS FSx documentation. I can share the two yaml files I used, in the hope that this will help with your debugging: The kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: fsx-sc
provisioner: fsx.csi.aws.com
parameters:
subnetId: subnet-0123456780abcdef
securityGroupIds: sg-0123456780abcdef
deploymentType: PERSISTENT_1
automaticBackupRetentionDays: "1"
dailyAutomaticBackupStartTime: "00:00"
copyTagsToBackups: "true"
perUnitStorageThroughput: "200"
dataCompressionType: "NONE"
weeklyMaintenanceStartTime: "7:09:00"
fileSystemTypeVersion: "2.12"
mountOptions:
- flock Where the subnetId is any one of the private subnets (172.something) for your EKS cluster, and the SecurityGroupIds value is the cluster security group. The apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: fsx-claim
spec:
accessModes:
- ReadWriteMany
storageClassName: fsx-sc
resources:
requests:
storage: 1200Gi You should be able to edit parameters to suit your application, of course. I would certainly suggest selecting deploymentType based on your own requirements. Deployment type imposes constraints on the value of Using this configuration (and following the step-by-step process in the documentation), I can verify that the sample container was able to mount and write to the Lustre file system. I also tested a non-CSI application running on a cluster without the CSI driver, connecting to a separately-provisioned Lustre filesystem. It was able to mount and write to the file system. For this non-CSI test, I did provision the FSx filesystem separately, and I did add the required firewall rules to both the FSx SG and the cluster SG (and the first time I tried, I did not do this correctly, and I got the error message you report, much to my dismay and confusion). |
This is available in all variants in releases starting with v1.19.3. As I mentioned earlier, it works for either native Lustre mounts or the CSI driver for FSx Lustre. |
Thanks for all your help @larvacea ! I tested yesterday again with the latest In my research, I stumbled across this compatibility matrix page https://docs.aws.amazon.com/fsx/latest/LustreGuide/lustre-client-matrix.html when I realized that our older FSx mounts are I tested a new storageclass with |
The 6.1 kernel currently doesn't support LustreFS but the 5.10 and 5.15 kernels do. There seems to be some work ongoing upstream to bring this support back into the 6.1 kernel. We should add this support once it is ready for the 6.1 kernels.
The text was updated successfully, but these errors were encountered: