use lando-util for VM job #200

johnbradley · 2019-05-15T20:48:57Z

Removes logic in lando.worker that is now in lando-util.
Duplicate logic shared by lando.worker and lando.k8s has been moved to a new lando.common module.
Changes the directory structure used by lando.worker when running a workflow to follow the structure used in lando.k8s prefixing the lando.worker working directory.
Changes lando.server to send the organize output project message to lando.worker before the upload output project message.
Upgrades shade to version 1.31.0 due to errors using 1.29.0 due to openstacksdk changes.

Fixes #154 - lando worker should use lando-util for staging and organizing project
Fixes #197 - centralize handling workflow types
Fixes #136 - pyyaml vulnerabilities

removes unused classes as well

this provides support for organize_output_project messaging

Previous version 1.29.0 was incompatible with openstacksdk 0.28.0. Caused the following error when running tests: ``` ImportError: cannot import name 'task_manager' ```

This is to support lando changes that adds a new message type sent to lando worker VMs. See Duke-GCB/lando#200.

this caused lando-util to fail with MissingInitialSetupError

This is to fix issues when running on linux where PATH/LANG caused issues finding and running software.

dleehr

Thanks for undertaking such a big refactoring/cleanup. Big improvement and will streamline quite a lot. I did have some feedback, mostly on things consolidated into lando.common.

dleehr · 2019-05-20T12:12:51Z

lando/common/tests/test_names.py

+        self.assertEqual(names.workflow_input_files_metadata_path, '/job-data/workflow-input-files-metadata.json')
+        self.assertEqual(names.usage_report_path, '/output-data/job-49-joe-resource-usage.json')
+        self.assertEqual(names.activity_name, 'myjob - Bespin Job 49')
+        self.assertEqual(names.activity_description, 'Bespin Job 49 - Workflow myworkflow v2')


I see that test_zipped_workflow() tests some attributes that test_packed_workflow() here doesn't test (e.g. workflow_download_dest, workflow_to_run or workflow_to_read).

I updated test_packed_workflow to test these fields well.

dleehr · 2019-05-20T13:39:49Z

lando/common/commands.py

+        self.paths = paths
+
+    def command_file_dict(self, input_files):
+        items = [


I'd like to see some explanation here about these two initial items - why are they here and what do they do?

I added comments above them.

dleehr · 2019-05-20T13:40:40Z

lando/common/commands.py

+            self.create_stage_data_config_item(StageDataTypes.URL,
+                                               self.workflow.workflow_url,
+                                               self.names.workflow_download_dest,
+                                               self.names.unzip_workflow_url_to_path),


This contains an unzip_to argument. Does this only work for zipped workflows?

unzip_workflow_url_to_path value is None when staging packed workflows.

dleehr · 2019-05-20T13:43:21Z

lando/common/commands.py

+    :return: str: contents of file
+    """
+    try:
+        with codecs.open(file_path, 'r', encoding='utf-8', errors='xmlcharrefreplace') as infile:


I'm curious (mostly for other projects where I've implemented file reading) what the benefit of codecs.open is here over the builtin open? Sounds like read() then returns a unicode str rather than bytes, is that right?

This is pre existing code that was moved here from lando/cwlworkflow.py.

lando/lando/worker/cwlworkflow.py

Lines 68 to 80 in c644add

def read_file(file_path):

"""

Read the contents of a file using utf-8 encoding, or return an empty string

if it does not exist

:param file_path: str: path to the file to read

:return: str: contents of file

"""

try:

with codecs.open(file_path, 'r', encoding='utf-8', errors='xmlcharrefreplace') as infile:

return infile.read()

except OSError as e:

logging.exception('Error opening {}'.format(file_path))

return ''

See #124

Guess that explains why I found it interesting.

dleehr · 2019-05-20T13:48:56Z

lando/common/commands.py

+        finally:
+            if stdout_file:
+                stdout_file.close()
+            if stderr_file:


I don't follow the logic for checking stdout_file and stderr_file before closing. I can't seem to come up with a case where either would be falsy. So if that's possible, it would be worth a comment to explain why.

This is pre existing code that was moved here from lando/cwlworkflow.py.
See #124 again

This is pre existing code that was moved here from lando/cwlworkflow.py.
See #124 again

No, the logic introduced in cwlworkflow.py in #124 always calls .close(). This branch changes that behavior to conditionally close() the files and it's not obvious why.

dleehr · 2019-05-20T14:50:09Z

lando/common/commands.py

+            outfile.write(json.dumps(data))
+
+    def run_command(self, command, env=None, stdout_path=None, stderr_path=None):
+        stdout_path, cleanup_stdout_path = self._create_temp_filename_if_none(stdout_path)


I've read through this method and _create_temp_filename_if_none() several times, and while I think I understand their job and what they're replacing, it's worth a comment that job overall, and how it applies to different cases (e.g. when stdout/stderr paths should be None or when they should be paths)

Beyond the interface, the file creation and cleanup logic adds complexity these otherwise simple methods, and it's not clear why that complexity is needed. In reading the code as it exists, I'd offer that NamedTemporaryFile(delete=True) could simplify these methods (removing the need to track the cleanup variable or do the cleanup yourself). I imagine there's more to the story, so the reasons are worth noting.

The changes here re-used the logic that writes stdout and stderr to files when running a process. For running a workflow(eg. cwltool) we save stdout and stderr into particular locations that are included in the output project. For operations like staging data and uploading the results we do not currently keep the output.

We could have all commands specify filenames for stdout/stderr. Or just copy the NamedTemporaryFile to the output destination when the command is run workflow.

My point is that there's places where the code is deliberate for obvious reasons (e.g. running a workflow and redirecting the output to a named file) , and places where it's deliberate but not obvious. It makes sure the output is always redirected and does its own cleanup on temporary files there. My feedback is to provide the reasons for these decisions.

As a prompt, you might answer the questions about why, in the case where the stdout/stderr paths are provided as None, why

the streams need to be redirected at all

the files are manually deleted rather than letting NamedTemporaryFile do its own cleanup

I recognize the answer lies in _handle_failed_process() but that's a couple layers removed from run_command()

I reworked the logic to make it clearer how these files are being used and added some comments.

dleehr · 2019-05-20T15:06:23Z

lando/common/commands.py

+        command.append(command_filename)
+        command.append(self.names.output_project_details_filename)
+        command.append("--outfile-format")
+        command.append("json")


I found this confusing - I recall that the k8s implementation works differently, but this method being in lando.common suggests otherwise.

Upon further digging, I see that lando.k8s doesn't call SaveOutputCommand.run(). Perhaps this method should be in a subclass in lando.worker since it's only used there?

lando.k8s only uses the command_file_dict method for all commands:(StageDataCommand, RunWorkflowCommand, OrganizeOutputCommand, SaveOutputCommand).
lando.worker only uses the run method for all commands. It seemed more natural to write the two functions together. I could move the run methods into the lando.worker directory with subclasses if that is easier to follow.

dleehr · 2019-05-20T15:09:44Z

lando/server/lando.py

@@ -192,6 +192,7 @@ def launch_vm(self, vm_instance_name, vm_volume_name):
        job = self.job_api.get_job()
        worker_config_yml = self.config.make_worker_config_yml(vm_instance_name, job.vm_settings.cwl_commands)
        cloud_config_script = CloudConfigScript()
+        print(worker_config_yml)


Leftover debugging?

dleehr · 2019-05-20T17:56:29Z

lando/common/commands.py

+    def _handle_failed_process(self, process):
+        stderr_output = read_file(process.stderr_path)
+        tail_error_output = self._tail_stderr_output(stderr_output)
+        error_message = "CWL workflow failed with exit code: {}\n{}".format(process.return_code, tail_error_output)


Should this message CWL workflow failed message be more generic?

dleehr

LGTM

johnbradley added 2 commits May 15, 2019 16:48

rough pass use lando-util for data staging and organizing output

7a4f187

remove commented out code

902a934

johnbradley changed the title ~~rough pass use lando-util for data staging and organizing output~~ use lando-util for VM job May 16, 2019

johnbradley added 8 commits May 16, 2019 09:40

combine duplicate Names/Paths fields

fc2512d

move commands to common package

57d7241

removes unused classes as well

remove duplicate activity name and description

e6ca165

move workflow running into commands

0529232

fix bad import

3d58d1d

upgrade lando-messaging to 2.0.0

6538859

this provides support for organize_output_project messaging

upgrade shade to 1.31.0

0ed12d1

Previous version 1.29.0 was incompatible with openstacksdk 0.28.0. Caused the following error when running tests: ``` ImportError: cannot import name 'task_manager' ```

fix config yaml.load call

4472279

johnbradley added a commit to Duke-GCB/bespin-api that referenced this pull request May 17, 2019

updates lando-messaging to 2.0.0

8a4afc0

This is to support lando changes that adds a new message type sent to lando worker VMs. See Duke-GCB/lando#200.

johnbradley added a commit to Duke-GCB/bespin-mailer that referenced this pull request May 17, 2019

updates lando-messaging to 2.0.0

f8c69ab

This is to support lando changes that adds a new message type sent to lando worker VMs. See Duke-GCB/lando#200.

johnbradley added 6 commits May 17, 2019 11:15

use temp files for stdout/stderr when running commands

e6b68b0

add tests for cleaning up tmp stdout/stderr files

0d739d9

assure trailing slash for VM Paths base_directory

c9fdef9

pass environment to subprocess

870427f

this caused lando-util to fail with MissingInitialSetupError

include os environment with dds environment

7040b74

This is to fix issues when running on linux where PATH/LANG caused issues finding and running software.

simplify StepProcess to require stdout/stderr files

8103d7e

johnbradley requested a review from dleehr May 17, 2019 20:41

dleehr reviewed May 20, 2019

View reviewed changes

johnbradley added 4 commits May 20, 2019 12:14

remove debug print line

5161ae8

improve test_packed_workflow

525f4f3

remove un-necesary check before closing

517dcc5

improve comments about hard coded stage data items

2577566

dleehr reviewed May 20, 2019

View reviewed changes

johnbradley added 4 commits May 20, 2019 15:34

improve failed process error message

34d624e

fix tests for improved failed process error

cb92281

improve run_command structure and comments

cf752ea

add comment explaining commands.py at a high level

0fe683a

johnbradley requested a review from dleehr May 21, 2019 12:36

This was referenced May 21, 2019

updates lando-messaging to 2.0.0 Duke-GCB/bespin-api#216

Merged

updates lando-messaging to 2.0.0 Duke-GCB/bespin-mailer#12

Merged

dleehr approved these changes May 21, 2019

View reviewed changes

johnbradley merged commit 6128273 into master May 21, 2019

johnbradley deleted the 154-vm-job-use-lando-util branch May 21, 2019 13:13

johnbradley mentioned this pull request May 21, 2019

fix exit code None error #201

Closed

johnbradley mentioned this pull request Aug 5, 2019

Upgrade DukeDSClient for faster downloading #120

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use lando-util for VM job #200

use lando-util for VM job #200

johnbradley commented May 15, 2019 •

edited

dleehr left a comment

dleehr May 20, 2019

johnbradley May 20, 2019

dleehr May 20, 2019

johnbradley May 20, 2019

dleehr May 20, 2019

johnbradley May 20, 2019

dleehr May 20, 2019

johnbradley May 20, 2019

dleehr May 20, 2019

dleehr May 20, 2019

johnbradley May 20, 2019

dleehr May 20, 2019

dleehr May 20, 2019

johnbradley May 20, 2019

dleehr May 20, 2019

johnbradley May 20, 2019

dleehr May 20, 2019

johnbradley May 20, 2019

dleehr May 20, 2019

dleehr May 20, 2019

dleehr left a comment

	def read_file(file_path):
	"""
	Read the contents of a file using utf-8 encoding, or return an empty string
	if it does not exist
	:param file_path: str: path to the file to read
	:return: str: contents of file
	"""
	try:
	with codecs.open(file_path, 'r', encoding='utf-8', errors='xmlcharrefreplace') as infile:
	return infile.read()
	except OSError as e:
	logging.exception('Error opening {}'.format(file_path))
	return ''

use lando-util for VM job #200

use lando-util for VM job #200

Conversation

johnbradley commented May 15, 2019 • edited

dleehr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dleehr left a comment

Choose a reason for hiding this comment

johnbradley commented May 15, 2019 •

edited