Skip to content

Community calls

Vasileios Karakasis edited this page Jul 13, 2022 · 45 revisions

This page holds (temporarily) the agenda and minutes of the bi-weekly community conference calls.

July 12, 2022

Call skipped due to low participance.

Participants

Agenda

May 3, 2022

Participants

  • Vasileios Karakasis (CSCS)
  • Kenneth Hoste (HPC-UGent)
  • Victor Holanda (CSCS)
  • Carlos Rosales-Fernandez (AWS)
  • Theofilos Manitaras (CSCS)

Agenda

Development updates

  • ReFrame 3.11.0 released on April 13.

  • Key new features:

    • New --distribute option that allows distributing single-node jobs over a set of nodes. It can also be combined with the -J option, for example to submit jobs to fill a reservation: --distribute=all -J reservation=cool. The current valid partition is always taken into account.
    • Extended syntax for valid_systems and valid_prog_environs that allows selecting systems and environments based on features and properties.
    • New CustomBuild build backend that delegates the building of test code entirely to users. If you use it, be aware of the side effects of your build scripts!
    • Explicitly mark variables and parameters as loggable.
    • New library tests merged in.
  • Future directions:

Action Items

  • Set up a separate meeting with EESSI community on defining common systems/environment properties and features (Victor will make a Doodle and post it in the #confcalls channel).

April 5, 2022

Participants

  • Vasileios Karakasis (CSCS)
  • Ake Sandgren (UMEA)
  • Rafael Sarmiento (CSCS)
  • Eirini Koutsaniti (CSCS)
  • Theofilos Manitaras (CSCS)
  • Carlos Rosales Fernandez (Amazon)

Agenda

Developments updates

  • OSU microbenchmarks as a library tests are merged
  • Almost done with the extended syntax of valid_systems and valid_prog_environs (https://github.com/eth-cscs/reframe/pull/2479)
    • We had to reimplement how valid systems/environments are selected in order to make it work with fixtures
    • The implementation fixes also the bug with --skip-{system|prgenv}-check options when using fixtures.
  • Still WIP: Distributing a set of tests over multiple nodes (https://github.com/eth-cscs/reframe/pull/2458)
  • v3.11.0 is planned for Wed. 13/4, since we need to have the two major features above merged.
  • April 19 call will be skipped.
  • Ake: When do you plan to split the repo and the site-specific tests?
    • We do plan to focus on it as soon as 3.11.0 is out.

March 22, 2022

Participants

  • Vasileios Karakasis (CSCS)
  • Victor Holanda (CSCS)
  • Theofilos Manitaras (CSCS)
  • Simon Bradford (Univ. Birmingham)

Agenda

February 22, 2022

Attendees

  • Vasileios Karakasis (CSCS)
  • Theofilos Manitaras (CSCS)
  • Eirini Koutsaniti (CSCS)
  • Jg Piccinali (CSCS)
  • Kenneth Hoste (HPC-UGent)
  • Åke Sandgren (Umeå Univ)
  • Rafael Sarmiento (CSCS)
  • Carlos Rosales (Amazon)
  • Richard Henwood (Arm)
  • Simon Branford (Univ. of Birmingham)

Agenda

  • We will skip 3.10.2 and target 3.11.0 for March 22; two dev releases in-between.
  • Community feedback
    • Extension of the valid_systems and valid_prog_environs syntax is still work in progress. What if we supported basic compiler abstractions as in Spack here?
      • Vasileios: There are no plans for compiler auto-detection and auto-generation of the environments configuration section.
      • Kenneth: this could quickly become a time-consuming task, since also compiler versions, etc. are relevant
      • Kenneth: this seems like an opportunity for a common Python library that could be leveraged by ReFrame, Spack, EasyBuild, ...
        • kind of similar to archspec (cfr. -mtune & co options that archspec knows about, but compiler flags for OpenMP is out-of-scope there...
      • Richard: Delegate the compilation task fully onto Spack and use the compiler info to generate the ReFrame config on-the-fly. Then ReFrame tests are monkey-patched to parametrise them over the various specs.
    • Use cases of running a test session continuously until a time limit is reached: https://github.com/eth-cscs/reframe/issues/619
      • could be used for burn-in testing, simulate user workload, ...
      • also related to exploring range of combinations for multi-node tests, since often not enough tests are generated to actually fill a system
  • Meeting frequency
  • AOB

February 8, 2022

Attendees

  • Vasileios Karakasis (CSCS)
  • Victor Holanda (CSCS)
  • Theofilos Manitaras (CSCS)
  • Jg Piccinali (CSCS)
  • Stefan Wolfsheimer (SURF0
  • Kenneth Hoste (HPC-UGent)
  • Åke Sandgren (Umeå Univ.)
  • Ben Fulton (Indiana Univ.)
  • Caspar van Leeuwen (SURF)
  • Rafael Sarmiento (CSCS)
  • Carlos Rosales (Amazon)

Agenda

  • Development updates
  • Community feedback on use cases
    • Do you use or plan to use ReFrame to test and deploy software stack, e.g., using Spack/EasyBuild?
      • Feedback: This is an interesting feature for both Spack and EasyBuild for exploring different build configurations, but it's not likely to be used for deploying the software stack.
    • Towards relaxing valid_systems and valid_prog_environs: https://github.com/eth-cscs/reframe/issues/1987
      • Key challenge here is to integrate also the resources that can be defined in the configuration, which are accessed now through extra_rerources inside the test.
      • There are three types of system-related attributes: features, key/value properties and scheduler resources.
    • Submit single node job automatically on every node of a reframe partition: https://github.com/eth-cscs/reframe/issues/2334
      • would be very useful to find "bad nodes" in a given reservation
      • automatically submit a separate copy of a test to each node
      • for now, nothing combinatorial (explodes quickly after 2 nodes...)
      • combinatorial combos could be pick N out of M possibilities at random, or strided throughout set of 100 nodes (1-10, 11-20, etc.)
        • selection mechanism is really needed when running 16-node tests out of 100 available nodes
      • Caspar: could tests somehow indicate that they want to use flexible allocation?
        • example: gpuburn to check thermal throtlling of GPUs ("hardware test")
        • tests that aim to validate working software are probably less interesting to run with flexible allocation
        • idea: --flex-alloc-singlenode=idle:testXYZ,testABC => only run these 2 specific single node tests across all nodes
      • Theo: Should the tests in such scenario share a single-stage directory so as to avoid redundant builds?
      • Åke: This case should be addressed by fixtures, where the build part of the test is a fixture and you only dynamically parametrise the run test.
  • Maintenance of scheduler backends
  • AOB

January 11, 2022

Agenda

  • Welcome and introductions
    • Briefly introduce yourself and where are you using (or planning to use) ReFrame?
  • Development status
    • Team & contributions
      • Core team (@ekouts, @rsarm, @teojgo, @vkarak, @victorusu)
      • Contributions are more than welcome!
    • Development model
      • Release train model: A new release every two weeks; releases are not delayed; whatever is ready and merged gets released
      • Semantic versioning: <major>.<minor>.<patch>
        • Patch-level bumps (every two weeks): bug fixes and new features (no deprecations)
        • Minor version bumps (every 6–8 weeks): introduction of major features (deprecations are allowed, but backward compatibility is ensured)
        • Major version bumps: backward compatibility may be broken.
    • Upcoming major features scheduled for 3.10.
  • Outlook for HPC Test library
  • Discuss issues that need resolution (feature requests, bugs)
  • Discuss interesting use cases
Clone this wiki locally