Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for more ranks than reservoir models during fv3 runtime #2329

Open
wants to merge 35 commits into
base: master
Choose a base branch
from

Conversation

frodre
Copy link
Contributor

@frodre frodre commented Sep 11, 2023

Running the SST reservoir with a single model per tile drastically slows down operation for runs that need to be 6+ months to see results. This PR adds the ability to run with fewer reservoir models than ranks used for fv3gfs by implementing gather/scatter operations for the inputs and outputs from those steppers.

Refactored public API:

  • get_reservoir_steppers: now takes in a CubedSphereCommunicator object to determine if more ranks than models are available and initializes the steppers depending on that.

Significant internal changes:

  • For non-root ranks of each tile, we now use _GatherScatterStepper for the increment and prediction steppers, the tile roots use the original steppers with gather/scatter embedded

Coverage reports (updated automatically):

Adding the latest fv3fit models that include the updates to pure
reservoirs for SST + atm modeling

commit ecf59657ee0fb7497c6a2b7dc7c6c065cc776945
Merge: 3a6b319 e016bdb
Author: Andre Perkins <frodre@gmail.com>
Date:   Sun Nov 26 23:28:16 2023 +0000

    Merge branch 'non-hybrid-sst-reservoir' into fv3runtime-latest-pure-reservoir

commit e016bdb
Merge: a0340ed 92c808d
Author: Andre Perkins <frodre@gmail.com>
Date:   Sun Nov 26 23:26:56 2023 +0000

    Merge branch 'master' into test-merge-master

commit a0340ed
Author: Andre Perkins <frodre@gmail.com>
Date:   Sun Nov 26 21:51:59 2023 +0000

    Pure reservoir sweep and initial training

commit c84d6ce
Author: Andre Perkins <frodre@gmail.com>
Date:   Tue Nov 21 06:23:10 2023 +0000

    Working training and validation

commit 634e178
Author: Andre Perkins <frodre@gmail.com>
Date:   Tue Nov 21 01:20:04 2023 +0000

    Add mask

commit 94fd068
Author: Andre Perkins <frodre@gmail.com>
Date:   Tue Nov 21 00:02:30 2023 +0000

    Add spetial scaling transformer

commit a5127ad
Author: Andre Perkins <frodre@gmail.com>
Date:   Mon Nov 20 23:51:54 2023 +0000

    Remove decode/encode_columns use

commit b75d1ba
Author: Andre Perkins <frodre@gmail.com>
Date:   Fri Nov 17 22:47:10 2023 +0000

    Training non-masked readout model

commit 16e91ab
Author: Andre Perkins <frodre@gmail.com>
Date:   Thu Nov 16 23:09:05 2023 +0000

    Add persistence fix and output metrics for notebook usage

commit 92b205f
Author: Andre Perkins <frodre@gmail.com>
Date:   Wed Nov 15 01:25:56 2023 +0000

    Add flag for application of readout mask when mask field is specified

commit 901f02c
Author: Andre Perkins <frodre@gmail.com>
Date:   Mon Nov 13 21:11:24 2023 +0000

    1x1 subdomain train

commit efe20c1
Author: Andre Perkins <frodre@gmail.com>
Date:   Mon Nov 13 21:06:04 2023 +0000

    2x2 subdomain with increased state

commit 6e0a20d
Author: Andre Perkins <frodre@gmail.com>
Date:   Mon Nov 13 20:32:10 2023 +0000

    Train 4x4 amip matched ERA5

commit 091c038
Author: Andre Perkins <frodre@gmail.com>
Date:   Sun Nov 5 15:51:59 2023 +0000

    Updated SST coastal fuzziness training

commit d1a9110
Author: Andre Perkins <frodre@gmail.com>
Date:   Thu Nov 2 21:22:21 2023 +0000

    Local trainin script

commit 9eef0ef
Author: Andre Perkins <frodre@gmail.com>
Date:   Thu Nov 2 18:16:55 2023 +0000

    Add ability for remote dump of reservoir model

commit 1e420aa
Author: Andre Perkins <frodre@gmail.com>
Date:   Fri Oct 27 23:32:44 2023 +0000

    Fix slow encoding

commit dd7e6f6
Author: Andre Perkins <frodre@gmail.com>
Date:   Mon Oct 23 20:53:13 2023 +0000

    Fix subtile slice arguments

commit 4e86888
Author: Andre Perkins <frodre@gmail.com>
Date:   Sun Oct 22 17:12:28 2023 +0000

    Sparsity sweep w/ halo

commit 4a634f0
Author: Andre Perkins <frodre@gmail.com>
Date:   Sun Oct 22 04:27:38 2023 +0000

    Update local test and fv3net image

commit b901b0b
Author: Andre Perkins <frodre@gmail.com>
Date:   Thu Oct 19 22:34:35 2023 +0000

    Fix validation for overlap data

commit 164c850
Author: Andre Perkins <frodre@gmail.com>
Date:   Thu Oct 19 00:24:43 2023 +0000

    Halo size 4 sweep

commit 9bb5983
Author: Andre Perkins <frodre@gmail.com>
Date:   Thu Oct 19 00:21:29 2023 +0000

    Fixed metric mask and rerun mask readout sweep

commit 8f70805
Author: Andre Perkins <frodre@gmail.com>
Date:   Thu Oct 19 00:13:41 2023 +0000

    Update for overlapped input validation

commit e00d106
Author: Andre Perkins <frodre@gmail.com>
Date:   Wed Oct 18 22:18:56 2023 +0000

    Masked readout sweep

commit 8988586
Author: Andre Perkins <frodre@gmail.com>
Date:   Wed Oct 18 04:46:16 2023 +0000

    Fix mask for pure reservoir

commit 59bed3a
Author: Andre Perkins <frodre@gmail.com>
Date:   Wed Oct 18 04:41:46 2023 +0000

    Harmonize validation

commit c5c3b24
Author: Andre Perkins <frodre@gmail.com>
Date:   Tue Oct 17 18:25:30 2023 +0000

    Updated validation plots for skill, fix tags

commit f2719d6
Author: Andre Perkins <frodre@gmail.com>
Date:   Tue Oct 17 04:45:09 2023 +0000

    Validation fixes for incrementing and plotting

commit 789f6fb
Author: Andre Perkins <frodre@gmail.com>
Date:   Sat Oct 14 05:04:23 2023 +0000

    Sweep configuration and submission

commit 4cd88dc
Author: Andre Perkins <frodre@gmail.com>
Date:   Sat Oct 14 05:03:22 2023 +0000

    Relax subdomain config typing

commit 7f13153
Author: Andre Perkins <frodre@gmail.com>
Date:   Fri Oct 13 20:02:41 2023 +0000

    Fix tests

commit c1dc4d3
Author: Andre Perkins <frodre@gmail.com>
Date:   Fri Oct 13 13:38:23 2023 +0000

    Add working validation integration

commit 9879524
Author: Andre Perkins <frodre@gmail.com>
Date:   Wed Oct 11 20:37:04 2023 +0000

    Error calculation up front

commit b366138
Author: Andre Perkins <frodre@gmail.com>
Date:   Wed Oct 11 20:35:12 2023 +0000

    Set state used to eliminate double synchronization, typing

commit d1712a1
Author: Andre Perkins <frodre@gmail.com>
Date:   Wed Oct 11 19:12:02 2023 +0000

    Add state setting method

commit 6ec65f9
Author: Andre Perkins <frodre@gmail.com>
Date:   Tue Oct 10 21:42:27 2023 +0000

    Fix abstract method for base nc loader

commit 84ee10f
Author: Andre Perkins <frodre@gmail.com>
Date:   Tue Oct 10 16:21:45 2023 +0000

    Add metric function

commit acc0640
Author: Andre Perkins <frodre@gmail.com>
Date:   Tue Oct 10 00:40:27 2023 +0000

    Add single file loader to dataset handler

commit e866855
Author: Andre Perkins <frodre@gmail.com>
Date:   Fri Sep 29 23:49:52 2023 +0000

    First attempt validation tools
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant