Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: python notebooks template repo #1195

Merged
merged 30 commits into from Oct 19, 2021
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
b2e0505
adding python_notebooks template to synthtool
Aug 24, 2021
f22fe84
adding python notebook temlate repo
Aug 25, 2021
d7c0713
adding python notebooks template
Aug 25, 2021
f2e4ea0
Merge branch 'master' into python-notebooks-templates
loferris Aug 25, 2021
790c298
updating .cloud-build to recent versions
Aug 26, 2021
db73c44
Merge branch 'python-notebooks-templates' of github.com:loferris/synt…
Aug 26, 2021
7aa53a7
addressing kokoro fail
Aug 26, 2021
373043e
streamlining purpose of PR to cover only our shared testing pipeline
Aug 27, 2021
7e9b33e
updating check for Python version and adding a licence to cloud-build…
Aug 27, 2021
567d727
Merge branch 'master' into python-notebooks-templates
loferris Aug 27, 2021
a2d7581
removing PR templates
Sep 1, 2021
bf74f9d
Merge branch 'master' into python-notebooks-templates
loferris Sep 1, 2021
6b7deab
updating nbQA version
Sep 9, 2021
e62ec81
Merge branch 'master' into python-notebooks-templates
loferris Sep 9, 2021
672ee73
upgrading nbqa and black, running black directly
Sep 15, 2021
58614b2
Merge branch 'master' into python-notebooks-templates
loferris Sep 15, 2021
1812656
revising cloud-build files to fit with kokoro
Sep 15, 2021
b28ee9e
Merge branch 'master' into python-notebooks-templates
loferris Sep 15, 2021
e187dea
Merge branch 'master' into python-notebooks-templates
loferris Sep 16, 2021
ff9c271
updating versions of cloud-build suite
Sep 17, 2021
bb613bd
Merge branch 'master' into python-notebooks-templates
loferris Sep 22, 2021
9906270
addressed comments on cleanup setup
Sep 22, 2021
f03fb89
make requirements more compatible with renovatebot
Oct 1, 2021
00cdb68
pinning versions
Oct 1, 2021
c3ee69a
used default artifacts pattern instead of manual copying of artifacts
Oct 1, 2021
721511c
Merge branch 'master' into python-notebooks-templates
loferris Oct 15, 2021
f86f6ac
reformatting commands
Oct 15, 2021
3c2b225
adding defaults
Oct 15, 2021
e8068cb
Merge branch 'master' into python-notebooks-templates
loferris Oct 18, 2021
cbc4220
reverting to previous cloud-build.yaml file
Oct 19, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 6 additions & 0 deletions synthtool/gcp/common.py
Expand Up @@ -205,6 +205,12 @@ def py_samples_override(
overridden_samples_kwargs["subdir"] = override_path
return self._generic_library("python_samples", **overridden_samples_kwargs)

def python_notebooks(self, **kwargs) -> Path:
# kwargs["metadata"] is required to load values from .repo-metadata.json
if "metadata" not in kwargs:
kwargs["metadata"] = {}
return self._generic_library("python_notebooks", **kwargs)

def py_library(self, **kwargs) -> Path:
# kwargs["metadata"] is required to load values from .repo-metadata.json
if "metadata" not in kwargs:
Expand Down
@@ -0,0 +1,28 @@
#!/usr/bin/env python
# # Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys

MINIMUM_MAJOR_VERSION = 3
MINIMUM_MINOR_VERSION = 3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be 5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is current with AI platform (updated 12 days ago), but will do!


if (
sys.version_info.major < MINIMUM_MAJOR_VERSION
and sys.version_info.minor < MINIMUM_MINOR_VERSION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might need to be or? Wouldn't 2.7 result in the else statement?

Alternatively, you could flip the if/else and check that major and minor are >= expected value. (Probably worth checking for == 3 for the major since who knows what Python 4 will be if that ever happens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in latest commit!

):
print("Error: Python version less than 3.5")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe dynamically generate the string to reflect the constants?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I understand that this might be old code that was inherited from several people.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fixing it in next commit!

exit(1)
else:
print(f"Python version acceptable: {sys.version}")
exit(0)
@@ -0,0 +1,8 @@
steps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not 100% necessary, but YAML supports comments, so we really should have a license header here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!

# Install Python dependencies and run cleanup script
- name: ${_PYTHON_IMAGE}
entrypoint: /bin/sh
args:
- -c
- 'python3 -m pip install -U -r .cloud-build/cleanup/cleanup-requirements.txt && python3 .cloud-build/cleanup/cleanup.py'
timeout: 86400s
@@ -0,0 +1 @@
google-cloud-aiplatform
@@ -0,0 +1,56 @@
#!/usr/bin/env python
# # Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import List
from resource_cleanup_manager import (
ResourceCleanupManager,
DatasetResourceCleanupManager,
EndpointResourceCleanupManager,
ModelResourceCleanupManager,
)


def run_cleanup_managers(managers: List[ResourceCleanupManager], is_dry_run: bool):
for manager in managers:
type_name = manager.type_name

print(f"Fetching {type_name}'s...")
resources = manager.list()
print(f"Found {len(resources)} {type_name}'s")
for resource in resources:
if not manager.is_deletable(resource):
continue

if is_dry_run:
resource_name = manager.resource_name(resource)
print(f"Will delete '{type_name}': {resource_name}")
else:
manager.delete(resource)

print("")


is_dry_run = False

if is_dry_run:
print("Starting cleanup in dry run mode...")

# List of all cleanup managers
managers = [
DatasetResourceCleanupManager(),
EndpointResourceCleanupManager(),
ModelResourceCleanupManager(),
]

run_cleanup_managers(managers=managers, is_dry_run=is_dry_run)
@@ -0,0 +1,101 @@
#!/usr/bin/env python
# # Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import abc
from google.cloud import aiplatform
from typing import Any
from proto.datetime_helpers import DatetimeWithNanoseconds
from google.cloud.aiplatform import base

# If a resource was updated within this number of seconds, do not delete.
RESOURCE_UPDATE_BUFFER_IN_SECONDS = 60 * 60 * 8


class ResourceCleanupManager(abc.ABC):
@property
@abc.abstractmethod
def type_name(str) -> str:
pass

@abc.abstractmethod
def list(self) -> Any:
pass

@abc.abstractmethod
def resource_name(self, resource: Any) -> str:
pass

@abc.abstractmethod
def delete(self, resource: Any):
pass

@abc.abstractmethod
def get_seconds_since_modification(self, resource: Any) -> float:
pass

def is_deletable(self, resource: Any) -> bool:
time_difference = self.get_seconds_since_modification(resource)

if self.resource_name(resource).startswith("perm"):
print(f"Skipping '{resource}' due to name starting with 'perm'.")
return False

# Check that it wasn't created too recently, to prevent race conditions
if time_difference <= RESOURCE_UPDATE_BUFFER_IN_SECONDS:
print(
f"Skipping '{resource}' due update_time being '{time_difference}', which is less than '{RESOURCE_UPDATE_BUFFER_IN_SECONDS}'."
)
return False

return True


class VertexAIResourceCleanupManager(ResourceCleanupManager):
@property
@abc.abstractmethod
def vertex_ai_resource(self) -> base.VertexAiResourceNounWithFutureManager:
pass

@property
def type_name(self) -> str:
return self.vertex_ai_resource._resource_noun

def list(self) -> Any:
return self.vertex_ai_resource.list()

def resource_name(self, resource: Any) -> str:
return resource.display_name

def delete(self, resource):
resource.delete()

def get_seconds_since_modification(self, resource: Any) -> bool:
update_time = resource.update_time
current_time = DatetimeWithNanoseconds.now(tz=update_time.tzinfo)
return (current_time - update_time).total_seconds()


class DatasetResourceCleanupManager(VertexAIResourceCleanupManager):
vertex_ai_resource = aiplatform.datasets._Dataset


class EndpointResourceCleanupManager(VertexAIResourceCleanupManager):
vertex_ai_resource = aiplatform.Endpoint

def delete(self, resource):
resource.delete(force=True)


class ModelResourceCleanupManager(VertexAIResourceCleanupManager):
vertex_ai_resource = aiplatform.Model
@@ -0,0 +1,160 @@
#!/usr/bin/env python
# # Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import pathlib
import subprocess
from pathlib import Path
from typing import Dict, List, Optional

import execute_notebook


def run_changed_notebooks(
test_paths_file: str,
output_folder: str,
variable_project_id: str,
variable_region: str,
base_branch: Optional[str],
):
"""
Run the notebooks that exist under the folders defined in the test_paths_file.
It only runs notebooks that have differences from the Git base_branch.
The executed notebooks are saved in the output_folder.
Variables are also injected into the notebooks such as the variable_project_id and variable_region.
Args:
test_paths_file (str):
Required. The new-line delimited file to folders and files that need checking.
Folders are checked recursively.
base_branch (str):
Optional. If provided, only the files that have changed from the base_branch will be checked.
If not provided, all files will be checked.
output_folder (str):
Required. The folder to write executed notebooks to.
variable_project_id (str):
Required. The value for PROJECT_ID to inject into notebooks.
variable_region (str):
Required. The value for REGION to inject into notebooks.
"""

test_paths = []
with open(test_paths_file) as file:
lines = [line.strip() for line in file.readlines()]
lines = [line for line in lines if len(line) > 0]
test_paths = [line for line in lines]

if len(test_paths) == 0:
raise RuntimeError("No test folders found.")

print(f"Checking folders: {test_paths}")

# Find notebooks
notebooks = []
if base_branch:
print(f"Looking for notebooks that changed from branch: {base_branch}")
notebooks = subprocess.check_output(
["git", "diff", "--name-only", f"origin/{base_branch}...", "--"]
+ test_paths
)
else:
print(f"Looking for all notebooks.")
notebooks = subprocess.check_output(["git", "ls-files"] + test_paths)

notebooks = notebooks.decode("utf-8").split("\n")
notebooks = [notebook for notebook in notebooks if notebook.endswith(".ipynb")]
notebooks = [notebook for notebook in notebooks if len(notebook) > 0]
notebooks = [notebook for notebook in notebooks if Path(notebook).exists()]

# Create paths
artifacts_path = Path(output_folder)
artifacts_path.mkdir(parents=True, exist_ok=True)
artifacts_path.joinpath("success").mkdir(parents=True, exist_ok=True)
artifacts_path.joinpath("failure").mkdir(parents=True, exist_ok=True)

passed_notebooks: List[str] = []
failed_notebooks: List[str] = []

if len(notebooks) > 0:
print(f"Found {len(notebooks)} modified notebooks: {notebooks}")

for notebook in notebooks:
print(f"Running notebook: {notebook}")

# TODO: Handle cases where multiple notebooks have the same name
try:
execute_notebook.execute_notebook(
notebook_file_path=notebook,
output_file_folder=artifacts_path,
replacement_map={
"PROJECT_ID": variable_project_id,
"REGION": variable_region,
},
)
print(f"Notebook finished successfully.")
passed_notebooks.append(notebook)
except Exception as error:
print(f"Notebook finished with failure: {error}")
failed_notebooks.append(notebook)
else:
print("No notebooks modified in this pull request.")

if len(failed_notebooks) > 0:
print(f"{len(failed_notebooks)} notebooks failed:")
print(failed_notebooks)
print(f"{len(passed_notebooks)} notebooks passed:")
print(passed_notebooks)
elif len(passed_notebooks) > 0:
print("All notebooks executed successfully:")
print(passed_notebooks)


parser = argparse.ArgumentParser(description="Run changed notebooks.")
parser.add_argument(
"--test_paths_file",
type=pathlib.Path,
help="The path to the file that has newline-limited folders of notebooks that should be tested.",
required=True,
)
parser.add_argument(
"--base_branch",
help="The base git branch to diff against to find changed files.",
required=False,
)
parser.add_argument(
"--output_folder",
type=pathlib.Path,
help="The path to the folder to store executed notebooks.",
required=True,
)
parser.add_argument(
"--variable_project_id",
type=str,
help="The GCP project id. This is used to inject a variable value into the notebook before running.",
required=True,
)
parser.add_argument(
"--variable_region",
type=str,
help="The GCP region. This is used to inject a variable value into the notebook before running.",
required=True,
)

args = parser.parse_args()
run_changed_notebooks(
test_paths_file=args.test_paths_file,
base_branch=args.base_branch,
output_folder=args.output_folder,
variable_project_id=args.variable_project_id,
variable_region=args.variable_region,
)