New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP for launching batch runs on AWS directly from COVIDScenarioPipeline #223
Conversation
batch/job_launcher.py
Outdated
import yaml | ||
|
||
@click.command() | ||
@click.option("-p", "--job-prefix", type=str, required=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: intelligent defaults to reduce the number of required params would be nice. e.g., job-prefix
is name
in config file plus a uniqifer, num-jobs
is calculated based on # of scenarios/# of sims with some cap, etc.
batch/job_launcher.py
Outdated
# Update and save the config file with the number of sims to run | ||
print("Updating config file %s to run %d simulations..." % (config_file, sims_per_job)) | ||
config = open(config_file).read() | ||
config = re.sub("nsimulations: \d+", "nsimulations: %d" % sims_per_job, config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would use pyyaml or the utils.config
module to read-modify-write versus regex, this seems error-prone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I did at first, but the output was a total re-ordering of the entries in the config file, which seemed not great; I wonder if there's a way to keep the ordering of the entries consistent on the read-edit-write cycle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to this thread, you need to set sort_keys=False
on the dump. Trying it locally, this preserves key order in the file, but strips comments and whitespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we would be okay with that, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i think it's fine
batch/job_launcher.py
Outdated
{"name": "DVC_OUTPUTS", "value": " ".join(dvc_outputs)}, | ||
{"name": "S3_RESULTS_PATH", "value": results_path} | ||
] | ||
s3_cp_run_script = "aws s3 cp s3://%s/%s $PWD/run-covid-pipeline" % (s3_input_bucket, runner_script_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(aside, don't know if you know about fstrings in python 3.6+, they make subbing variables much easier/neater)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not really; will take a look!
batch/job_launcher.py
Outdated
@click.option("-i", "--s3-input-bucket", "s3_input_bucket", type=str, default="idd-input-data-sets") | ||
@click.option("-o", "--s3-output-bucket", "s3_output_bucket", type=str, default="idd-pipeline-results") | ||
@click.option("-d", "--job-definition", "batch_job_definition", type=str, default="Batch-CovidPipeline-Job") | ||
@click.option("-q", "--job-queue", "batch_job_queue", type=str, default="Batch-CovidPipeline") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for these params, i think you want show_default=True
set so when you do a --help
you'll see what the default value is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also missing helpstrings :)
nice! this looks really good -- thanks for doing it! i only have minor/trivial comments |
for output in "${DVC_OUTPUTS_ARRAY[@]}" | ||
do | ||
"Saving output $output" | ||
tar cv --use-compress-program=pbzip2 -f $output.tar.bz2 $output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is compression required? We took that route initially but found that just a sync was sufficient and faster overall.
s3_cp_run_script = f"aws s3 cp s3://{s3_input_bucket}/{runner_script_name} $PWD/run-covid-pipeline" | ||
command = ["sh", "-c", f"{s3_cp_run_script}; /bin/bash $PWD/run-covid-pipeline"] | ||
container_overrides = { | ||
'vcpus': 72, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should one allow an override on that? If the advanced user wants a different configuration?
can we change this to use |
Going to be iterating on this for a bit, but posting it for feedback from folks now.