Skip to content

eWaterCycle/infra

Repository files navigation

Instructions for system administrators to deploy the eWaterCycle platform

Ansible Lint Concept DOI

This repo contains (codified) instructions for deploying the eWaterCycle platform. The target audience of these instructions are system administrators. For more information on the eWaterCycle platform (and how to deploy it) see the eWaterCycle documentation.

For instructions on how to use the machine as deployed by this repo see the User guide.

These instructions assume you have some basic knowledge of vagrant and Ansible.

Setup of eWaterCycle platform on the SURF Research cloud

The hardware environment used by the eWaterCycle platform development team is the SURF Research Cloud. Starting a machine on the Surf Research Cloud requires that you have research budget with SURF, for more info see the website of SURF. Once running, access to the machine can be shared to anyone.

The setup instructions in this repo will create an eWaterCycle application(a sort-of VM template) that when started will create a machine with:

  • Explorer: web visualization of available models / parameter sets combinations and a way to generate Jupyter notebooks
  • Jupyter Hub: to interactivly generate forcings and perform experiments on hydrological models using the eWatercycle Python package
  • ERA5 and ERA-Interim global climate data, which can be used to generate forcings
  • Installed models and their example parameter sets

An application on the SURF Research cloud is provisioned by running an Ansible playbook (research-cloud-plugin.yml).

In addition to the standard VM storage, additional read-only datasets are mounted at /mnt/data from dCache using rclone. They may contain things like:

Previously the eWatercycle platform consisted of multiple VM on SURF HPC cloud, see v0.1.2 release for that code.

Setup of eWaterCycle platform on a local test VM

Deploying a local test VM is mostly useful for developing the SURF Research Cloud applications. This vagrant setup creates a virtual machine with 8Gb memory, 4 virtual cores, and 70Gb storage. This should work on any Linux or Windows machine.

To set up an Explorer/Jupyter server on your local machine with vagrant and Ansible

Create config file research-cloud-plugin.vagrant.vars with

---
dcache_ro_token: <dcache macaroon with read permission>
rclone_cache_dir: /data/volume_2
# Directory where /home should point to
alt_home_location: /data/volume_3

The token can be found in the eWaterCycle password manager.

vagrant --version
# Vagrant 2.2.18
vagrant plugin install vagrant-vbguest
# Installed the plugin 'vagrant-vbguest (0.30.0)'
export VAGRANT_EXPERIMENTAL="disks"
vagrant up

Visit site

# Get ip of server with
vagrant ssh -c 'ifconfig eth1'

Go to http://<ip of eth1> and login with vagrant:vagrant.

You will get some complaints about unsecure serving, this is OK for local testing and this will not happen on Research Cloud.

Test on Windows Subsystem for Linux 2

WSL2 users should follow steps on https://www.vagrantup.com/docs/other/wsl.

Importantly:

  • Work on a folder on the windows file system.
  • Export VAGRANT_WSL_WINDOWS_ACCESS_USER_HOME_PATH="/mnt/c/.../infra"
  • Install virtualbox_WSL2 vagrant plugin
  • Approve the firewall popup

Catalog item registration

This chapter is dedicated for catalog item developers.

On the Research cloud the developer can add an catalog item for other people to use. The generic steps to do this are documented here.

For eWatercycle component following specialization was done

  • Use Ansible playbook as component script type
    • Use https://github.com/eWaterCycle/infra.git as repository URL
    • Use research-cloud-plugin.yml as script path
    • Use main as tag
  • Component parameters, all fixed source type and non-overwitable unless otherwise stated
    • Add dcache_ro_token parameter for dcache read-only token aka macaroon. The token can be found in the eWaterCycle password manager. This token has an expiration date, so it needs to be updated every now and then.
    • Add alt_home_location parameter with value /data/volume_2. For mount point of the storage item which should hold homes mounted.
    • Add rclone_cache_dir parameter with value /data/volume_3. For directory where rclone can store its cache.
    • Add rclone_max_gsize with value 45. For maximum size of cache on rclone_cache_dir volume. In Gb.
  • Set documentation URL to https://github.com/eWaterCycle/infra
  • Do not allow every org to use this component. Data on the dcache should not be made public.
  • Select the organizations (CO) that are allowed to use the component.

For eWatercycle catalog item following specialization was done

  • Select the following components:
    1. SRC-OS
    2. SRC-CO
    3. SRC-Nginx
    4. SRC-External plugin
    5. eWatercycle
  • Set documentation URL to https://github.com/eWaterCycle/infra
  • Add SURF HPC Cloud as cloud provider
    • Set Operating Systems to Ubuntu 22.04
    • Set Sizes to all non-gpu and non-disabled sizes
  • In parameter settings step keep all values as is except
    • Set co_irods to false as we do not use irods
    • Set co_research_drive to false as we do not use research drive
  • Set boot disk size to 150Gb, as default size will be mostly used by the conda environment and will trigger out of space warnings.
  • Set workspace acces button behavior to Webinterface (https:), so clicking on ACCESS button will open up the eWatercycle experiment explorer web interface
  • Select the organizations (CO) that are allowed to use the catalog item.

To become root on a VM the user needs to be member of the src_co_admin group on SRAM. See docs.

SURF Research cloud VM deployment

This chapter is dedicated for application deployers.

  1. Log into Research Cloud
  2. Create new storage item for home directories
    • To store user files
    • Use 50Gb size for simple experiments or bigger when required for experiment.
    • As each storage item can only be used by a single workspace, give it a name and description so you know which workspace and storage items go together.
  3. Create new storage item for cache
    • To store cached files from dCache by rclone
    • Use 50GB size as size
    • As each storage item can only be used by a single workspace, give it a name and description so you know which workspace and storage items go together.
  4. Create a new workspace
  5. Select eWaterCycle application
  6. Select collaborative organisation (CO) for example ewatercycle-nlesc
  7. Select size of VM (cpus/memory) based on use case
  8. Select home storage item.
    • Order in which the storage items are select is important, make sure to select home before cache storage item.
  9. Select cache storage item
  10. Wait for machine to be running
  11. Visit URL/IP
  12. When done delete machine

For a new CO make sure

  • application is allowed to be used by CO. See Sharing catalog items
  • data storage item and home dir are created for the CO

End user should be invited to CO so they can login.

See User guide to see what users have to do to login or use GitHub repository.

Example notebooks

To get example notebooks end users should use following URL (with <workspace id> with your currently running workspace)

https://<workspace id
  >.workspaces.live.surfresearchcloud.nl/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2FeWaterCycle%2Fewatercycle&urlpath=lab%2Ftree%2Fewatercycle%2Fdocs%2Fexamples%2FMarrmotM01.ipynb&branch=main</workspace
>

TODO add this link to home page of server at

This link uses nbgitpuller to sync a git repo and open a notebook in it.

Fill shared data disk

This chapter is dedicated for application data preparer.

The eWatercycle system setup requires a lot of data files. For the Research cloud virtual machines we will mount a dcache bucket.

To fill the dcache bucket you can run

ansible-playbook \
  -e cds_uid=1234 -e cds_api_key <cds api key> \
  -e dcache_rw_token=<dcache macaroon with read/write permissions>
  shared-data-disk.yml

Runnig this script will download all data files to /mnt/data and upload them to dcache.

Sync dcache with existing folder elsewhere

The steps above fetch the data from original sources. If you want to sync some files from another location, say, Snellius, you can use rclone directly. In our experience, it works better to sync entire directories than to try and copy single files.

Create the file ~/.config/rclone/rclone.conf and add the following content:

[ dcache ]
type = webdav
url = https://webdav.grid.surfsara.nl:2880
vendor = other
user =
pass =
bearer_token = <dcache macaroon with read/write permissions>

You can verify your access by running an innocent rclone ls dcache:parameter-sets. The command to sync directories is rclone copy somedir dcache:parameter-sets/somedir. Beware that this will overwrite any existing files, if different!

Note: password manager can be used for exchanging macaroons.

Mount dcache on local machine

Create the file ~/.config/rclone/rclone.conf and add the following content:

[dcache]
type = webdav
url = https://webdav.grid.surfsara.nl:2880
vendor = other
user =
pass =
bearer_token = <dcache macaroon with read permissions>

Install rclone and run following command to mount dcache at ~/dcache directory.

mkdir ~/dcache
rclone mount --read-only --cache-dir /tmp/rclone-cache --vfs-cache-max-size 30G --vfs-cache-mode full dcache:/ ~/dcache

In ESMValTool config files you can use ~/dcache/climate-data/obs6 for rootpath:OBS6.

Docker images

In the eWaterCycle project we make Docker images. The images are hosted on Docker Hub . A project member can create issues here for permisison to push images to Docker Hub.

Logs

All services are running with systemd. Their logs can be viewed with journalctl. The log of the Jupyter server for each user can be followed with

journalctl -f -u jupyter-vagrant-singleuser.service

(replace vagrant with own username)