New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: python notebooks template repo #1195
Changes from 19 commits
b2e0505
f22fe84
d7c0713
f2e4ea0
790c298
db73c44
7aa53a7
373043e
7e9b33e
567d727
a2d7581
bf74f9d
6b7deab
e62ec81
672ee73
58614b2
1812656
b28ee9e
e187dea
ff9c271
bb613bd
9906270
f03fb89
00cdb68
c3ee69a
721511c
f86f6ac
3c2b225
e8068cb
cbc4220
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
#!/usr/bin/env python | ||
# # Copyright 2021 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
import sys | ||
|
||
MINIMUM_MAJOR_VERSION = 3 | ||
MINIMUM_MINOR_VERSION = 5 | ||
|
||
if ( | ||
sys.version_info.major >= MINIMUM_MAJOR_VERSION | ||
or sys.version_info.minor >= MINIMUM_MINOR_VERSION | ||
): | ||
print(f"Python version acceptable: {sys.version}") | ||
exit(0) | ||
else: | ||
print( | ||
f"Error: Python version less than {MINIMUM_MAJOR_VERSION}.{MINIMUM_MINOR_VERSION}" | ||
) | ||
exit(1) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
steps: | ||
# Install Python dependencies and run cleanup script | ||
- name: ${_PYTHON_IMAGE} | ||
entrypoint: /bin/sh | ||
args: | ||
- -c | ||
- 'python3 -m pip install -U -r .cloud-build/cleanup/cleanup-requirements.txt && python3 .cloud-build/cleanup/cleanup.py' | ||
timeout: 86400s | ||
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
google-cloud-aiplatform | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For most other requirements files, we prefer to pin to specific versions and have them updated by renovate bot. (You can add this non-standard filename to the renovate.json). Is there a reason we're not doing that here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ooh, I think we just weren't aware of the other standards for synthtool. I'll look into changing this in the PR. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
#!/usr/bin/env python | ||
# # Copyright 2021 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
from typing import List | ||
from resource_cleanup_manager import ( | ||
ResourceCleanupManager, | ||
DatasetResourceCleanupManager, | ||
EndpointResourceCleanupManager, | ||
ModelResourceCleanupManager, | ||
) | ||
|
||
|
||
def run_cleanup_managers(managers: List[ResourceCleanupManager], is_dry_run: bool): | ||
for manager in managers: | ||
type_name = manager.type_name | ||
|
||
print(f"Fetching {type_name}'s...") | ||
resources = manager.list() | ||
print(f"Found {len(resources)} {type_name}'s") | ||
for resource in resources: | ||
if not manager.is_deletable(resource): | ||
continue | ||
|
||
if is_dry_run: | ||
resource_name = manager.resource_name(resource) | ||
print(f"Will delete '{type_name}': {resource_name}") | ||
else: | ||
manager.delete(resource) | ||
|
||
print("") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this print statement need to be here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It was probably to add a newline for formatting purposes. Feel free to remove. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. removing, but depending we can add back formatting if it makes things confusing. |
||
|
||
|
||
is_dry_run = False | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will this setting of the dry run override any time it's set to true? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is hard-coded, but could be changed to accept an arg. There was no need at the time for an arg, so it was left hard-coded. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ivanmkc I've rewritten the cleanup file and updated the source files for the repo in the last two commits if you want to take a look! |
||
|
||
if is_dry_run: | ||
print("Starting cleanup in dry run mode...") | ||
|
||
# List of all cleanup managers | ||
managers = [ | ||
DatasetResourceCleanupManager(), | ||
EndpointResourceCleanupManager(), | ||
ModelResourceCleanupManager(), | ||
] | ||
|
||
run_cleanup_managers(managers=managers, is_dry_run=is_dry_run) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
#!/usr/bin/env python | ||
# # Copyright 2021 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
import abc | ||
from google.cloud import aiplatform | ||
from typing import Any | ||
from proto.datetime_helpers import DatetimeWithNanoseconds | ||
from google.cloud.aiplatform import base | ||
|
||
# If a resource was updated within this number of seconds, do not delete. | ||
RESOURCE_UPDATE_BUFFER_IN_SECONDS = 60 * 60 * 8 | ||
|
||
|
||
class ResourceCleanupManager(abc.ABC): | ||
@property | ||
@abc.abstractmethod | ||
def type_name(str) -> str: | ||
pass | ||
|
||
@abc.abstractmethod | ||
def list(self) -> Any: | ||
pass | ||
|
||
@abc.abstractmethod | ||
def resource_name(self, resource: Any) -> str: | ||
pass | ||
|
||
@abc.abstractmethod | ||
def delete(self, resource: Any): | ||
pass | ||
|
||
@abc.abstractmethod | ||
def get_seconds_since_modification(self, resource: Any) -> float: | ||
pass | ||
|
||
def is_deletable(self, resource: Any) -> bool: | ||
time_difference = self.get_seconds_since_modification(resource) | ||
|
||
if self.resource_name(resource).startswith("perm"): | ||
print(f"Skipping '{resource}' due to name starting with 'perm'.") | ||
return False | ||
|
||
# Check that it wasn't created too recently, to prevent race conditions | ||
if time_difference <= RESOURCE_UPDATE_BUFFER_IN_SECONDS: | ||
print( | ||
f"Skipping '{resource}' due update_time being '{time_difference}', which is less than '{RESOURCE_UPDATE_BUFFER_IN_SECONDS}'." | ||
) | ||
return False | ||
|
||
return True | ||
|
||
|
||
class VertexAIResourceCleanupManager(ResourceCleanupManager): | ||
@property | ||
@abc.abstractmethod | ||
def vertex_ai_resource(self) -> base.VertexAiResourceNounWithFutureManager: | ||
pass | ||
|
||
@property | ||
def type_name(self) -> str: | ||
return self.vertex_ai_resource._resource_noun | ||
|
||
def list(self) -> Any: | ||
return self.vertex_ai_resource.list() | ||
|
||
def resource_name(self, resource: Any) -> str: | ||
return resource.display_name | ||
|
||
def delete(self, resource): | ||
resource.delete() | ||
|
||
def get_seconds_since_modification(self, resource: Any) -> bool: | ||
update_time = resource.update_time | ||
current_time = DatetimeWithNanoseconds.now(tz=update_time.tzinfo) | ||
return (current_time - update_time).total_seconds() | ||
|
||
|
||
class DatasetResourceCleanupManager(VertexAIResourceCleanupManager): | ||
vertex_ai_resource = aiplatform.datasets._Dataset | ||
|
||
|
||
class EndpointResourceCleanupManager(VertexAIResourceCleanupManager): | ||
vertex_ai_resource = aiplatform.Endpoint | ||
|
||
def delete(self, resource): | ||
resource.delete(force=True) | ||
|
||
|
||
class ModelResourceCleanupManager(VertexAIResourceCleanupManager): | ||
vertex_ai_resource = aiplatform.Model |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,160 @@ | ||
#!/usr/bin/env python | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This file also looks out-of-date, it should match https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/master/.cloud-build/ExecuteChangedNotebooks.py There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will do - I think it's just out of date because the PR has been in flight for a minute. |
||
# # Copyright 2021 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
import argparse | ||
import pathlib | ||
import subprocess | ||
from pathlib import Path | ||
from typing import List, Optional | ||
|
||
import execute_notebook | ||
|
||
|
||
def run_changed_notebooks( | ||
test_paths_file: str, | ||
output_folder: str, | ||
variable_project_id: str, | ||
variable_region: str, | ||
base_branch: Optional[str], | ||
): | ||
""" | ||
Run the notebooks that exist under the folders defined in the test_paths_file. | ||
It only runs notebooks that have differences from the Git base_branch. | ||
The executed notebooks are saved in the output_folder. | ||
Variables are also injected into the notebooks such as the variable_project_id and variable_region. | ||
Args: | ||
test_paths_file (str): | ||
Required. The new-line delimited file to folders and files that need checking. | ||
Folders are checked recursively. | ||
base_branch (str): | ||
Optional. If provided, only the files that have changed from the base_branch will be checked. | ||
If not provided, all files will be checked. | ||
output_folder (str): | ||
Required. The folder to write executed notebooks to. | ||
variable_project_id (str): | ||
Required. The value for PROJECT_ID to inject into notebooks. | ||
variable_region (str): | ||
Required. The value for REGION to inject into notebooks. | ||
""" | ||
|
||
test_paths = [] | ||
with open(test_paths_file) as file: | ||
lines = [line.strip() for line in file.readlines()] | ||
lines = [line for line in lines if len(line) > 0] | ||
test_paths = [line for line in lines] | ||
|
||
if len(test_paths) == 0: | ||
raise RuntimeError("No test folders found.") | ||
|
||
print(f"Checking folders: {test_paths}") | ||
|
||
# Find notebooks | ||
notebooks = [] | ||
if base_branch: | ||
print(f"Looking for notebooks that changed from branch: {base_branch}") | ||
notebooks = subprocess.check_output( | ||
["git", "diff", "--name-only", f"origin/{base_branch}...", "--"] | ||
+ test_paths | ||
) | ||
else: | ||
print("Looking for all notebooks.") | ||
notebooks = subprocess.check_output(["git", "ls-files"] + test_paths) | ||
|
||
notebooks = notebooks.decode("utf-8").split("\n") | ||
notebooks = [notebook for notebook in notebooks if notebook.endswith(".ipynb")] | ||
notebooks = [notebook for notebook in notebooks if len(notebook) > 0] | ||
notebooks = [notebook for notebook in notebooks if Path(notebook).exists()] | ||
|
||
# Create paths | ||
artifacts_path = Path(output_folder) | ||
artifacts_path.mkdir(parents=True, exist_ok=True) | ||
artifacts_path.joinpath("success").mkdir(parents=True, exist_ok=True) | ||
artifacts_path.joinpath("failure").mkdir(parents=True, exist_ok=True) | ||
|
||
passed_notebooks: List[str] = [] | ||
failed_notebooks: List[str] = [] | ||
|
||
if len(notebooks) > 0: | ||
print(f"Found {len(notebooks)} modified notebooks: {notebooks}") | ||
|
||
for notebook in notebooks: | ||
print(f"Running notebook: {notebook}") | ||
|
||
# TODO: Handle cases where multiple notebooks have the same name | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there an issue open tracking this todo somewhere? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not at the moment, could track in the synthtool issue tracker. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. agreed, removing and let's track via issues. |
||
try: | ||
execute_notebook.execute_notebook( | ||
notebook_file_path=notebook, | ||
output_file_folder=artifacts_path, | ||
replacement_map={ | ||
"PROJECT_ID": variable_project_id, | ||
"REGION": variable_region, | ||
}, | ||
) | ||
print("Notebook finished successfully.") | ||
passed_notebooks.append(notebook) | ||
except Exception as error: | ||
print(f"Notebook finished with failure: {error}") | ||
failed_notebooks.append(notebook) | ||
else: | ||
print("No notebooks modified in this pull request.") | ||
|
||
if len(failed_notebooks) > 0: | ||
print(f"{len(failed_notebooks)} notebooks failed:") | ||
print(failed_notebooks) | ||
print(f"{len(passed_notebooks)} notebooks passed:") | ||
print(passed_notebooks) | ||
elif len(passed_notebooks) > 0: | ||
print("All notebooks executed successfully:") | ||
print(passed_notebooks) | ||
|
||
|
||
parser = argparse.ArgumentParser(description="Run changed notebooks.") | ||
parser.add_argument( | ||
"--test_paths_file", | ||
type=pathlib.Path, | ||
help="The path to the file that has newline-limited folders of notebooks that should be tested.", | ||
required=True, | ||
) | ||
parser.add_argument( | ||
"--base_branch", | ||
help="The base git branch to diff against to find changed files.", | ||
required=False, | ||
) | ||
parser.add_argument( | ||
"--output_folder", | ||
type=pathlib.Path, | ||
help="The path to the folder to store executed notebooks.", | ||
required=True, | ||
) | ||
parser.add_argument( | ||
"--variable_project_id", | ||
type=str, | ||
help="The GCP project id. This is used to inject a variable value into the notebook before running.", | ||
required=True, | ||
) | ||
parser.add_argument( | ||
"--variable_region", | ||
type=str, | ||
help="The GCP region. This is used to inject a variable value into the notebook before running.", | ||
required=True, | ||
) | ||
|
||
args = parser.parse_args() | ||
run_changed_notebooks( | ||
test_paths_file=args.test_paths_file, | ||
base_branch=args.base_branch, | ||
output_folder=args.output_folder, | ||
variable_project_id=args.variable_project_id, | ||
variable_region=args.variable_region, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why we have such a high timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leaving as is!