-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use lando-util for VM job #200
Conversation
removes unused classes as well
this provides support for organize_output_project messaging
Previous version 1.29.0 was incompatible with openstacksdk 0.28.0. Caused the following error when running tests: ``` ImportError: cannot import name 'task_manager' ```
This is to support lando changes that adds a new message type sent to lando worker VMs. See Duke-GCB/lando#200.
This is to support lando changes that adds a new message type sent to lando worker VMs. See Duke-GCB/lando#200.
this caused lando-util to fail with MissingInitialSetupError
This is to fix issues when running on linux where PATH/LANG caused issues finding and running software.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for undertaking such a big refactoring/cleanup. Big improvement and will streamline quite a lot. I did have some feedback, mostly on things consolidated into lando.common.
self.assertEqual(names.workflow_input_files_metadata_path, '/job-data/workflow-input-files-metadata.json') | ||
self.assertEqual(names.usage_report_path, '/output-data/job-49-joe-resource-usage.json') | ||
self.assertEqual(names.activity_name, 'myjob - Bespin Job 49') | ||
self.assertEqual(names.activity_description, 'Bespin Job 49 - Workflow myworkflow v2') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that test_zipped_workflow()
tests some attributes that test_packed_workflow()
here doesn't test (e.g. workflow_download_dest, workflow_to_run or workflow_to_read).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated test_packed_workflow to test these fields well.
self.paths = paths | ||
|
||
def command_file_dict(self, input_files): | ||
items = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see some explanation here about these two initial items - why are they here and what do they do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added comments above them.
self.create_stage_data_config_item(StageDataTypes.URL, | ||
self.workflow.workflow_url, | ||
self.names.workflow_download_dest, | ||
self.names.unzip_workflow_url_to_path), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This contains an unzip_to
argument. Does this only work for zipped workflows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unzip_workflow_url_to_path
value is None when staging packed workflows.
:return: str: contents of file | ||
""" | ||
try: | ||
with codecs.open(file_path, 'r', encoding='utf-8', errors='xmlcharrefreplace') as infile: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious (mostly for other projects where I've implemented file reading) what the benefit of codecs.open
is here over the builtin open
? Sounds like read()
then returns a unicode str
rather than bytes
, is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pre existing code that was moved here from lando/cwlworkflow.py.
lando/lando/worker/cwlworkflow.py
Lines 68 to 80 in c644add
def read_file(file_path): | |
""" | |
Read the contents of a file using utf-8 encoding, or return an empty string | |
if it does not exist | |
:param file_path: str: path to the file to read | |
:return: str: contents of file | |
""" | |
try: | |
with codecs.open(file_path, 'r', encoding='utf-8', errors='xmlcharrefreplace') as infile: | |
return infile.read() | |
except OSError as e: | |
logging.exception('Error opening {}'.format(file_path)) | |
return '' |
See #124
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess that explains why I found it interesting.
lando/common/commands.py
Outdated
finally: | ||
if stdout_file: | ||
stdout_file.close() | ||
if stderr_file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow the logic for checking stdout_file
and stderr_file
before closing. I can't seem to come up with a case where either would be falsy. So if that's possible, it would be worth a comment to explain why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pre existing code that was moved here from lando/cwlworkflow.py.
See #124 again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pre existing code that was moved here from lando/cwlworkflow.py.
See #124 again
No, the logic introduced in cwlworkflow.py in #124 always calls .close()
. This branch changes that behavior to conditionally close()
the files and it's not obvious why.
outfile.write(json.dumps(data)) | ||
|
||
def run_command(self, command, env=None, stdout_path=None, stderr_path=None): | ||
stdout_path, cleanup_stdout_path = self._create_temp_filename_if_none(stdout_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've read through this method and _create_temp_filename_if_none()
several times, and while I think I understand their job and what they're replacing, it's worth a comment that job overall, and how it applies to different cases (e.g. when stdout/stderr paths should be None or when they should be paths)
Beyond the interface, the file creation and cleanup logic adds complexity these otherwise simple methods, and it's not clear why that complexity is needed. In reading the code as it exists, I'd offer that NamedTemporaryFile(delete=True)
could simplify these methods (removing the need to track the cleanup variable or do the cleanup yourself). I imagine there's more to the story, so the reasons are worth noting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes here re-used the logic that writes stdout and stderr to files when running a process. For running a workflow(eg. cwltool) we save stdout and stderr into particular locations that are included in the output project. For operations like staging data and uploading the results we do not currently keep the output.
We could have all commands specify filenames for stdout/stderr. Or just copy the NamedTemporaryFile to the output destination when the command is run workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that there's places where the code is deliberate for obvious reasons (e.g. running a workflow and redirecting the output to a named file) , and places where it's deliberate but not obvious. It makes sure the output is always redirected and does its own cleanup on temporary files there. My feedback is to provide the reasons for these decisions.
As a prompt, you might answer the questions about why, in the case where the stdout/stderr paths are provided as None, why
- the streams need to be redirected at all
- the files are manually deleted rather than letting NamedTemporaryFile do its own cleanup
I recognize the answer lies in _handle_failed_process()
but that's a couple layers removed from run_command()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworked the logic to make it clearer how these files are being used and added some comments.
command.append(command_filename) | ||
command.append(self.names.output_project_details_filename) | ||
command.append("--outfile-format") | ||
command.append("json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this confusing - I recall that the k8s implementation works differently, but this method being in lando.common
suggests otherwise.
Upon further digging, I see that lando.k8s doesn't call SaveOutputCommand.run()
. Perhaps this method should be in a subclass in lando.worker
since it's only used there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lando.k8s
only uses the command_file_dict
method for all commands:(StageDataCommand, RunWorkflowCommand, OrganizeOutputCommand, SaveOutputCommand).
lando.worker
only uses the run
method for all commands. It seemed more natural to write the two functions together. I could move the run
methods into the lando.worker
directory with subclasses if that is easier to follow.
lando/server/lando.py
Outdated
@@ -192,6 +192,7 @@ def launch_vm(self, vm_instance_name, vm_volume_name): | |||
job = self.job_api.get_job() | |||
worker_config_yml = self.config.make_worker_config_yml(vm_instance_name, job.vm_settings.cwl_commands) | |||
cloud_config_script = CloudConfigScript() | |||
print(worker_config_yml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover debugging?
lando/common/commands.py
Outdated
def _handle_failed_process(self, process): | ||
stderr_output = read_file(process.stderr_path) | ||
tail_error_output = self._tail_stderr_output(stderr_output) | ||
error_message = "CWL workflow failed with exit code: {}\n{}".format(process.return_code, tail_error_output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this message CWL workflow failed
message be more generic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Removes logic in
lando.worker
that is now in lando-util.Duplicate logic shared by
lando.worker
andlando.k8s
has been moved to a newlando.common
module.Changes the directory structure used by
lando.worker
when running a workflow to follow the structure used inlando.k8s
prefixing thelando.worker
working directory.Changes
lando.server
to send theorganize output project
message tolando.worker
before theupload output project message
.Upgrades
shade
to version 1.31.0 due to errors using 1.29.0 due toopenstacksdk
changes.Fixes #154 - lando worker should use lando-util for staging and organizing project
Fixes #197 - centralize handling workflow types
Fixes #136 - pyyaml vulnerabilities