Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use lando-util for VM job #200

Merged
merged 24 commits into from
May 21, 2019
Merged

use lando-util for VM job #200

merged 24 commits into from
May 21, 2019

Conversation

johnbradley
Copy link
Collaborator

@johnbradley johnbradley commented May 15, 2019

Removes logic in lando.worker that is now in lando-util.
Duplicate logic shared by lando.worker and lando.k8s has been moved to a new lando.common module.
Changes the directory structure used by lando.worker when running a workflow to follow the structure used in lando.k8s prefixing the lando.worker working directory.
Changes lando.server to send the organize output project message to lando.worker before the upload output project message.
Upgrades shade to version 1.31.0 due to errors using 1.29.0 due to openstacksdk changes.

Fixes #154 - lando worker should use lando-util for staging and organizing project
Fixes #197 - centralize handling workflow types
Fixes #136 - pyyaml vulnerabilities

@johnbradley johnbradley changed the title rough pass use lando-util for data staging and organizing output use lando-util for VM job May 16, 2019
removes unused classes as well
this provides support for organize_output_project messaging
Previous version 1.29.0 was incompatible with openstacksdk 0.28.0.
Caused the following error when running tests:
```
ImportError: cannot import name 'task_manager'
```
johnbradley added a commit to Duke-GCB/bespin-api that referenced this pull request May 17, 2019
This is to support lando changes that adds a new message type
sent to lando worker VMs.
See Duke-GCB/lando#200.
johnbradley added a commit to Duke-GCB/bespin-mailer that referenced this pull request May 17, 2019
This is to support lando changes that adds a new message type
sent to lando worker VMs.
See Duke-GCB/lando#200.
@johnbradley johnbradley requested a review from dleehr May 17, 2019 20:41
Copy link
Member

@dleehr dleehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for undertaking such a big refactoring/cleanup. Big improvement and will streamline quite a lot. I did have some feedback, mostly on things consolidated into lando.common.

self.assertEqual(names.workflow_input_files_metadata_path, '/job-data/workflow-input-files-metadata.json')
self.assertEqual(names.usage_report_path, '/output-data/job-49-joe-resource-usage.json')
self.assertEqual(names.activity_name, 'myjob - Bespin Job 49')
self.assertEqual(names.activity_description, 'Bespin Job 49 - Workflow myworkflow v2')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that test_zipped_workflow() tests some attributes that test_packed_workflow() here doesn't test (e.g. workflow_download_dest, workflow_to_run or workflow_to_read).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated test_packed_workflow to test these fields well.

self.paths = paths

def command_file_dict(self, input_files):
items = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see some explanation here about these two initial items - why are they here and what do they do?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added comments above them.

self.create_stage_data_config_item(StageDataTypes.URL,
self.workflow.workflow_url,
self.names.workflow_download_dest,
self.names.unzip_workflow_url_to_path),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This contains an unzip_to argument. Does this only work for zipped workflows?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unzip_workflow_url_to_path value is None when staging packed workflows.

:return: str: contents of file
"""
try:
with codecs.open(file_path, 'r', encoding='utf-8', errors='xmlcharrefreplace') as infile:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious (mostly for other projects where I've implemented file reading) what the benefit of codecs.open is here over the builtin open? Sounds like read() then returns a unicode str rather than bytes, is that right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pre existing code that was moved here from lando/cwlworkflow.py.

def read_file(file_path):
"""
Read the contents of a file using utf-8 encoding, or return an empty string
if it does not exist
:param file_path: str: path to the file to read
:return: str: contents of file
"""
try:
with codecs.open(file_path, 'r', encoding='utf-8', errors='xmlcharrefreplace') as infile:
return infile.read()
except OSError as e:
logging.exception('Error opening {}'.format(file_path))
return ''

See #124

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess that explains why I found it interesting.

finally:
if stdout_file:
stdout_file.close()
if stderr_file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow the logic for checking stdout_file and stderr_file before closing. I can't seem to come up with a case where either would be falsy. So if that's possible, it would be worth a comment to explain why.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pre existing code that was moved here from lando/cwlworkflow.py.
See #124 again

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pre existing code that was moved here from lando/cwlworkflow.py.
See #124 again

No, the logic introduced in cwlworkflow.py in #124 always calls .close(). This branch changes that behavior to conditionally close() the files and it's not obvious why.

outfile.write(json.dumps(data))

def run_command(self, command, env=None, stdout_path=None, stderr_path=None):
stdout_path, cleanup_stdout_path = self._create_temp_filename_if_none(stdout_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read through this method and _create_temp_filename_if_none() several times, and while I think I understand their job and what they're replacing, it's worth a comment that job overall, and how it applies to different cases (e.g. when stdout/stderr paths should be None or when they should be paths)

Beyond the interface, the file creation and cleanup logic adds complexity these otherwise simple methods, and it's not clear why that complexity is needed. In reading the code as it exists, I'd offer that NamedTemporaryFile(delete=True) could simplify these methods (removing the need to track the cleanup variable or do the cleanup yourself). I imagine there's more to the story, so the reasons are worth noting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes here re-used the logic that writes stdout and stderr to files when running a process. For running a workflow(eg. cwltool) we save stdout and stderr into particular locations that are included in the output project. For operations like staging data and uploading the results we do not currently keep the output.

We could have all commands specify filenames for stdout/stderr. Or just copy the NamedTemporaryFile to the output destination when the command is run workflow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that there's places where the code is deliberate for obvious reasons (e.g. running a workflow and redirecting the output to a named file) , and places where it's deliberate but not obvious. It makes sure the output is always redirected and does its own cleanup on temporary files there. My feedback is to provide the reasons for these decisions.

As a prompt, you might answer the questions about why, in the case where the stdout/stderr paths are provided as None, why

  1. the streams need to be redirected at all
  2. the files are manually deleted rather than letting NamedTemporaryFile do its own cleanup

I recognize the answer lies in _handle_failed_process() but that's a couple layers removed from run_command()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked the logic to make it clearer how these files are being used and added some comments.

command.append(command_filename)
command.append(self.names.output_project_details_filename)
command.append("--outfile-format")
command.append("json")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this confusing - I recall that the k8s implementation works differently, but this method being in lando.common suggests otherwise.

Upon further digging, I see that lando.k8s doesn't call SaveOutputCommand.run(). Perhaps this method should be in a subclass in lando.worker since it's only used there?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lando.k8s only uses the command_file_dict method for all commands:(StageDataCommand, RunWorkflowCommand, OrganizeOutputCommand, SaveOutputCommand).
lando.worker only uses the run method for all commands. It seemed more natural to write the two functions together. I could move the run methods into the lando.worker directory with subclasses if that is easier to follow.

@@ -192,6 +192,7 @@ def launch_vm(self, vm_instance_name, vm_volume_name):
job = self.job_api.get_job()
worker_config_yml = self.config.make_worker_config_yml(vm_instance_name, job.vm_settings.cwl_commands)
cloud_config_script = CloudConfigScript()
print(worker_config_yml)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover debugging?

def _handle_failed_process(self, process):
stderr_output = read_file(process.stderr_path)
tail_error_output = self._tail_stderr_output(stderr_output)
error_message = "CWL workflow failed with exit code: {}\n{}".format(process.return_code, tail_error_output)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this message CWL workflow failed message be more generic?

Copy link
Member

@dleehr dleehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@johnbradley johnbradley merged commit 6128273 into master May 21, 2019
@johnbradley johnbradley deleted the 154-vm-job-use-lando-util branch May 21, 2019 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants