Skip to content

Navigation Menu

Explore
For
- Enterprise
- Teams
- Startups
- Education
By Solution
Resources
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

aws / aws-parallelcluster-cookbook Public

Notifications You must be signed in to change notification settings
Fork 99
Star 104

Code
Pull requests 14
Actions
Security
Insights

Additional navigation options

Code
Pull requests
Actions
Security
Insights

Releases: aws/aws-parallelcluster-cookbook

Releases · aws/aws-parallelcluster-cookbook

AWS ParallelCluster v3.9.2

28 May 19:20

gmarciani

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

AWS ParallelCluster v3.9.2 Latest

Latest

We're excited to announce the release of AWS ParallelCluster Cookbook 3.9.2

This is associated with AWS ParallelCluster v3.9.2

CHANGES

Upgrade Slurm to 23.11.7 (from 23.11.4).

Assets 2

All reactions

AWS ParallelCluster v3.9.1

11 Apr 10:42

gmarciani

Compare

Choose a tag to compare

AWS ParallelCluster v3.9.1

We're excited to announce the release of AWS ParallelCluster Cookbook 3.9.1

This is associated with AWS ParallelCluster v3.9.1

BUG FIXES

Remove recursive deletion of shared storage mountdir when unmounting filesystems as part of update-cluster operation.

Assets 2

All reactions

AWS ParallelCluster v3.9.0

12 Mar 01:28

himani2411

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

AWS ParallelCluster v3.9.0

We're excited to announce the release of AWS ParallelCluster Cookbook 3.9.0

This is associated with AWS ParallelCluster v3.9.0

ENHANCEMENTS

Permit to update the external shared storage of type Efs, FsxLustre, FsxOntap, FsxOpenZfs and FileCache
without replacing compute and login fleet.
Add support for RHEL9.
Add support for Rocky Linux 9 as CustomAmi created through build-image process. No public official ParallelCluster Rocky9 Linux AMI is made available at this time.
Add the configuration parameter DeploymentSettings/DefaultUserHome to allow users to move the default user's home directory to /local/home instead of /home (default).
- SSH connections will be closed and rejected while the user's home directory is being moved during the bootstrapping process.
Add possibility to choose between Open and Closed Source Nvidia Drivers when building an AMI, through the ['cluster']['nvidia']['kernel_open'] cookbook node attribute.
Add configuration parameter DeploymentSettings/DisableSudoAccessForDefaultUser to disable sudo access of default user in supported OSes.

CHANGES

Upgrade Slurm to 23.11.4 (from 23.02.7).
- Upgrade Pmix to 4.2.9 (from 4.2.6).
Upgrade NVIDIA driver to version 535.154.05.
Upgrade EFA installer to 1.30.0.
- Efa-driver: efa-2.6.0-1
- Efa-config: efa-config-1.15-1
- Efa-profile: efa-profile-1.6-1
- Libfabric-aws: libfabric-aws-1.19.0
- Rdma-core: rdma-core-46.0-1
- Open MPI: openmpi40-aws-4.1.6-2 and openmpi50-aws-5.0.0-11
Upgrade NICE DCV to version 2023.1-16388.
- server: 2023.1.16388-1
- xdcv: 2023.1.565-1
- gl: 2023.1.1047-1
- web_viewer: 2023.1.16388-1
Upgrade ARM PL to version 23.10.
Upgrade third-party cookbook dependencies:
- nfs-5.1.2 (from nfs-5.0.0)

BUG FIXES

Fix issue making job fail when submitted as active directory user from login nodes.
The issue was caused by an incomplete configuration of the integration with the external Active Directory on the head node.
Fix issue making login nodes fail to bootstrap when the head node takes more time than expected in writing keys.

Assets 2

All reactions

AWS ParallelCluster v3.8.0

19 Dec 17:40

enrico-usai

Compare

Choose a tag to compare

AWS ParallelCluster v3.8.0

We're excited to announce the release of AWS ParallelCluster Cookbook 3.8.0

This is associated with AWS ParallelCluster v3.8.0

ENHANCEMENTS

Add support for EC2 Capacity Blocks for ML.
Add support for Rocky Linux 8.
Add support for Scheduling/SlurmSettings/Database/DatabaseName parameter to render StorageLoc
in the slurmdbd configuration generated by ParallelCluster.
Add the option to use EFS storage instead of NFS exports from the head node root volume
for intra-cluster shared file system resources: ParallelCluster, Intel, Slurm, and /home data.
Allow for mounting home as an EFS or FSx external shared storage via the SharedStorage section of the config file.

CHANGES

Upgrade Slurm to 23.02.7 (from 23.02.6).
Upgrade NVIDIA driver to version 535.129.03.
Upgrade CUDA Toolkit to version 12.2.2.
Use Open Source NVIDIA GPU drivers (OpenRM) as NVIDIA kernel module for Linux instead of NVIDIA closed source module.
Do not wait for static nodes in maintenance to signal CFN that the head node initialization is complete.
Upgrade EFA installer to 1.29.1.
- Efa-driver: efa-2.6.0-1
- Efa-config: efa-config-1.15-1
- Efa-profile: efa-profile-1.5-1
- Libfabric-aws: libfabric-aws-1.19.0-1
- Rdma-core: rdma-core-46.0-1
- Open MPI: openmpi40-aws-4.1.6-1
Upgrade GDRCopy to version 2.4 in all supported OSes, except for Centos 7 where version 2.3.1 is used.
Upgrade aws-cfn-bootstrap to version 2.0-28.
Upgrade Python to 3.9.17.

BUG FIXES

Fix inconsistent scaling configuration after cluster update rollback when modifying the list of instance types declared in the Compute Resources.
Fix users SSH keys generation when switching users without root privilege in clusters integrated with an external LDAP server through cluster configuration files.
Fix disabling Slurm power save mode when setting ScaledownIdletime = -1.
Fix hard-coded path to Slurm installation dir in update_slurm_database_password.sh script for Slurm Accounting.

Assets 2

All reactions

AWS ParallelCluster v3.7.2

13 Oct 19:37

jdeamicis

Compare

Choose a tag to compare

AWS ParallelCluster v3.7.2

We're excited to announce the release of AWS ParallelCluster Cookbook 3.7.2

This is associated with AWS ParallelCluster v3.7.2

CHANGES

Upgrade Slurm to 23.02.6.

Assets 2

All reactions

AWS ParallelCluster v3.7.1

22 Sep 20:15

Compare

Choose a tag to compare

AWS ParallelCluster v3.7.1

We're excited to announce the release of AWS ParallelCluster Cookbook 3.7.1

This is associated with AWS ParallelCluster v3.7.1

CHANGES

Upgrade Slurm to 23.02.5 (from 23.02.4).
- Upgrade Pmix to 4.2.6 (from 3.2.3).
- Upgrade libjwt to 1.15.3 (from 1.12.0).
Upgrade EFA installer to 1.26.1, fixing RDMA writedata issue in P5.
- Efa-driver: efa-2.5.0-1
- Efa-config: efa-config-1.15-1
- Efa-profile: efa-profile-1.5-1
- Libfabric-aws: libfabric-aws-1.18.2-1
- Rdma-core: rdma-core-46.0-1
- Open MPI: openmpi40-aws-4.1.5-4

Assets 2

All reactions

AWS ParallelCluster v3.7.0

30 Aug 12:11

dreambeyondorange

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

AWS ParallelCluster v3.7.0

We're excited to announce the release of AWS ParallelCluster Cookbook 3.7.0

This is associated with AWS ParallelCluster v3.7.0

ENHANCEMENTS

Add support for Ubuntu 22. RSA keys are not supported by default. See this page.
Add support for login nodes.
Add support to mount existing Amazon File Cache as shared storage.
Allow configuration of static and dynamic node priorities in Slurm compute resources via the ParallelCluster configuration YAML file.
Add a queue-level parameter (JobExclusiveAllocation) to ensure nodes in the partition are exclusively allocated to a single job at any given time.
Allow overriding the aws-parallelcluster-node package at cluster creation and update time (only on the head node during update). Useful for development purposes only.
Allow memory-based scheduling when multiple instance types are specified for a Slurm Compute Resource.
Avoid starting the NFS server on compute nodes.
Forward SLURM_RESUME_FILE to ParallelCluster resume program.

CHANGES

Deprecate Ubuntu 18.
Upgrade Slurm to version 23.02.4.
Update the default root volume size to 40 GB to account for limits on Centos 7.
Upgrade NVIDIA driver to version 535.54.03.
Upgrade CUDA library to version 12.2.0.
Upgrade NVIDIA Fabric manager to nvidia-fabricmanager-535.
Upgrade NICE DCV to version 2023.0-15487.
- server: 2023.0.15487-1
- xdcv: 2023.0.551-1
- gl: 2023.0.1039-1
- web_viewer: 2023.0.15487-1
Upgrade EFA installer to 1.25.1.
- Efa-driver: efa-2.5.0-1
- Efa-config: efa-config-1.15-1
- Efa-profile: efa-profile-1.5-1
- Libfabric-aws: libfabric-aws-1.18.1-1
- Rdma-core: rdma-core-46.0-1
- Open MPI: openmpi40-aws-4.1.5-4
Upgrade ARM PL to version 23.04.1 for Ubuntu 22.04 only.
Upgrade third-party cookbook dependencies:
- apt-7.5.14 (from apt-7.4.0)
- line-4.5.13 (from line-4.5.2)
- openssh-2.11.3 (from openssh-2.10.3)
- pyenv-4.2.3 (from pyenv-3.5.1)
- selinux-6.1.12 (from selinux-6.0.5)
- yum-7.4.13 (from yum-7.4.0)
- yum-epel-5.0.2 (from yum-epel-4.5.0)
Assign Slurm dynamic nodes a priority (weight) of 1000 by default. This allows Slurm to prioritize idle static nodes over idle dynamic ones.
Change the default value of Imds/ImdsSupport from v1.0 to v2.0.
Make aws-parallelcluster-node daemons handle only ParallelCluster-managed Slurm partitions.
Restrict permission on file /tmp/wait_condition_handle.txt within the head node so that only root can read it.
Create a Slurm partition-nodelist mapping JSON file to be used by the node package daemons to recognize PC-managed Slurm partitions and nodelists.
Increase EFS-utils watchdog poll interval to 10 seconds. Note: This change is meaningful only if EncryptionInTransit is set to true, because watchdog does not run otherwise.

BUG FIXES

Add validation to ScaledownIdletime value, to prevent setting a value lower than -1.
Fix issue causing dangling IAM policies to be created when creating ParallelCluster CloudFormation custom resource provider with CustomLambdaRole.
Fix an issue that was causing misalignment of compute nodes DNS name on instances with multiple network interfaces,
when using SlurmSettings/Dns/UseEc2Hostnames equals to True.
Fix cluster creation failure with Ubuntu Deep Learning AMI on GPU instances and DCV enabled.

Assets 2

All reactions

AWS ParallelCluster v3.6.1

05 Jul 14:22

gmarciani

Compare

Choose a tag to compare

AWS ParallelCluster v3.6.1

We're excited to announce the release of AWS ParallelCluster Cookbook 3.6.1

This is associated with AWS ParallelCluster v3.6.1

CHANGES

Remove security updates step executed on cluster nodes bootstrap in US isolated regions
in order to reduce bootstrap time and avoid a potential point of failure.
Replace nvidia-persistenced service with parallelcluster_nvidia service to avoid conflicts with DLAMI.

BUG FIXES

Fix an issue that was preventing ptrace protection from being disabled on Ubuntu and allowing Cross Memory Attach (CMA) in libfabric.

Assets 2

All reactions

AWS ParallelCluster v3.6.0

22 May 15:51

enrico-usai

Compare

Choose a tag to compare

AWS ParallelCluster v3.6.0

We're excited to announce the release of AWS ParallelCluster Cookbook 3.6.0

This is associated with AWS ParallelCluster v3.6.0

ENHANCEMENTS

Add support for RHEL8.
Add support for customizing the cluster Slurm configuration via the ParallelCluster configuration YAML file.
Build Slurm with support for LUA.
Add health check manager and GPU health check, which can be activated through cluster configuration.
Health check manager execution is triggered by a Slurm prolog script. GPU check verifies healthiness of a node by executing NVIDIA DCGM L2 diagnostic.
Add log rotation support for ParallelCluster managed logs.
Track head node memory and root volume disk utilization using the mem_used_percent and disk_used_percent metrics collected through the CloudWatch Agent.
Enforce the DCV Authenticator Server to use at least TLS-1.2 protocol when creating the SSL Socket.
Load kernel module nvidia-uvm by default to provide Unified Virtual Memory (UVM) functionality to the CUDA driver.
Install NVIDIA Persistence Daemon as a system service.
Install NVIDIA Data Center GPU Manager (DCGM) package on all supported OSes except for aarch64 centos7 and alinux2.

CHANGES

Upgrade Slurm to version 23.02.2.
Upgrade munge to version 0.5.15.
Set Slurm default TreeWidth to 30.
Set Slurm prolog and epilog configurations to target a directory, /opt/slurm/etc/scripts/prolog.d/ and /opt/slurm/etc/scripts/epilog.d/ respectively.
Set Slurm BatchStartTimeout to 3 minutes so to allow max 3 minutes Prolog execution during compute node registration.
Upgrade EFA installer to 1.22.1
- Dkms : 2.8.3-2
- Efa-driver: efa-2.1.1g
- Efa-config: efa-config-1.13-1
- Efa-profile: efa-profile-1.5-1
- Libfabric-aws: libfabric-aws-1.17.1-1
- Rdma-core: rdma-core-43.0-1
- Open MPI: openmpi40-aws-4.1.5-1
Upgrade Lustre client version to 2.12 on Amazon Linux 2 (same version available on Ubuntu 20.04, 18.04 and CentOS >= 7.7).
Upgrade Lustre client version to 2.10.8 on CentOS 7.6.
Upgrade aws-cfn-bootstrap to version 2.0-24.
Upgrade NVIDIA driver to version 470.182.03.
Upgrade NVIDIA Fabric Manager to version 470.182.03.
Upgrade NVIDIA CUDA Toolkit to version 11.8.0.
Upgrade NVIDIA CUDA sample to version 11.8.0.
Upgrade Intel MPI Library to 2021.9.0.43482.
Upgrade NICE DCV to version 2023.0-15022.
- server: 2023.0.15022-1
- xdcv: 2023.0.547-1
- gl: 2023.0.1027-1
- web_viewer: 2023.0.15022-1

BUG FIXES

Fix an issue that was causing misalignment of compute nodes IP on instances with multiple network interfaces.
Fix replacement of StoragePass in slurm_parallelcluster_slurmdbd.conf when a queue parameter update is performed and the Slurm accounting configurations are not updated.
Fix issue causing cfn-hup daemon to fail when it gets restarted.
Fix issue causing NVIDIA GPU compute nodes not to resume correctly after executing an scontrol reboot command.

Assets 2

All reactions

AWS ParallelCluster v3.5.1

28 Mar 20:12

gmarciani

Compare

Choose a tag to compare

AWS ParallelCluster v3.5.1

We're excited to announce the release of AWS ParallelCluster Cookbook 3.5.1

This is associated with AWS ParallelCluster v3.5.1

ENHANCEMENTS

Add support for US isolated region us-isob-east-1.

CHANGES

Upgrade EFA installer to 1.22.0
- Efa-driver: efa-2.1.1g
- Efa-config: efa-config-1.13-1
- Efa-profile: efa-profile-1.5-1
- Libfabric-aws: libfabric-aws-1.17.0-1
- Rdma-core: rdma-core-43.0-1
- Open MPI: openmpi40-aws-4.1.5-1
Upgrade NICE DCV to version 2022.2-14521.
- server: 2022.2.14521-1
- xdcv: 2022.2.519-1
- gl: 2022.2.1012-1
- web_viewer: 2022.2.14521-1

BUG FIXES

Fix update cluster to remove shared EBS volumes can potentially cause node launching failures if MountDir match the same pattern in /etc/exports.

Assets 2

All reactions

Previous 1 2 3 4 5 6 7 Next

Previous Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.