Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider changes to new COMPS scheduling support #2261

Open
jps1 opened this issue May 14, 2024 · 2 comments
Open

consider changes to new COMPS scheduling support #2261

jps1 opened this issue May 14, 2024 · 2 comments
Assignees
Labels
COMPS Documentation Research Tickets that are mostly planning or researching items before implementation Support

Comments

@jps1
Copy link
Collaborator

jps1 commented May 14, 2024

I ran into several issues when attempting to get a researcher's model running w/ idmtools in COMPS, mostly around trying to use the newer COMPS scheduling functionality.

  • Calling the add_work_order() utility is insufficient to add a WorkOrder.json asset correctly; you also have to add a scheduling=True argument to experiment.run() (otherwise, it simply adds the asset as file_type "input", which makes it show up and seem to the user like it has done the right thing, but it doesn't actually get used for scheduling. If possible, we shouldn't require this, but if it must be required, at least there should be a note in the method documentation warning the user that they have to also add that argument. (It's also unclear to me what setattr(_simulation, 'scheduling', True) does and how it's related to the other scheduling parameter).
  • add_work_order() and add_schedule_config() should be combined, or at least, reuse code.
  • add_schedule_config() and default_add_schedule_config_sweep_callback() should node use 'idm_cd' as default node-group (it's not a required parameter to COMPS, and if a user wants to just let COMPS do its default, there's no way to do that. Plus, not all platforms have the same node-group naming conventions, so 'idm_cd' node-group might not even exist).
  • It's unclear what the _update argument is being used for, why it's True for regular simulations and False for TemplatedSimulations. The current behavior also prevents me from updating the command-line dynamically from the CommandTask. In my case, I had a WorkOrder.json with some of static scheduling information that I wanted to use, but needed some runtime information that changed the command-line, so can't hard-code it in a WorkOrder.json. I could create a WorkOrder string and write that to disk just so that I can read it back out again, but that seems silly, so instead, I had to change my workflow to put all the scheduling in code. Why not just say that in all cases, we use what's in WorkOrder, unless that's empty, in which case, overwrite with the CommandTask? That way the user has flexibility to use whichever method they want and the _update argument can be removed completely.
@issuelabeler issuelabeler bot added COMPS Documentation Research Tickets that are mostly planning or researching items before implementation Support labels May 14, 2024
@shchen-idmod shchen-idmod self-assigned this May 18, 2024
shchen-idmod added a commit to shchen-idmod/idmtools-1 that referenced this issue May 20, 2024
@jps1
Copy link
Collaborator Author

jps1 commented May 31, 2024

@shchen-idmod sorry I haven't gotten around to looking at your PR yet. Will do so soon.

In the meantime, I ran across another thing that might be worth considering putting in here. If the user is using new COMPS scheduling, there are now redundant ways to set the node-group and num-cores:
image
They don't have to match, and it's unclear which is going to be used (and I'm not entirely sure if both will actually get communicated to COMPS).

Perhaps worth looking into whether there's a better way to do this.

@shchen-idmod
Copy link
Collaborator

shchen-idmod commented Jun 3, 2024

@shchen-idmod sorry I haven't gotten around to looking at your PR yet. Will do so soon.

In the meantime, I ran across another thing that might be worth considering putting in here. If the user is using new COMPS scheduling, there are now redundant ways to set the node-group and num-cores: image They don't have to match, and it's unclear which is going to be used (and I'm not entirely sure if both will actually get communicated to COMPS).

Perhaps worth looking into whether there's a better way to do this.

Thanks, Jeff, for identifying the inconsistency in the naming of node_group_name versus node_group between the platform and scheduling. I've fixed the use of node_group throughout [here].(5488944) This change only affects the default_add_schedule_config_sweep_callback and add_schedule_config methods. Fortunately, these methods are only called in our own tests, so hopefully the impact is minimal.
In terms of which one is called question, the design is whenever use scheduling, platform parameters get ignored and use whatever in workorder.json:

comps_config.update(executable_path=None, node_group_name=None, min_cores=None, max_cores=None,
                               exclusive=None, simulation_input_args=None)

When idmtools send config to comps vs pycomps call, we change to node_group_name=self.platform.node_group. It should be better to use consistent name with pycomps, but this is existing api call for platform and used everywhere already

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
COMPS Documentation Research Tickets that are mostly planning or researching items before implementation Support
Projects
None yet
Development

No branches or pull requests

2 participants