Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sdk): harden internal thread management in SystemMetrics #4439

Merged
merged 12 commits into from Dec 5, 2022
8 changes: 6 additions & 2 deletions wandb/sdk/internal/system/assets/interfaces.py
Expand Up @@ -168,7 +168,11 @@ def start(self) -> None:
logger.info(f"Started {self._process.name}")

def finish(self) -> None:
if self._process is not None:
if self._process is None:
return None
try:
self._process.join()
logger.info(f"Joined {self._process.name}")
self._process = None
except Exception as e:
logger.warning(f"Failed to join {self._process.name}: {e}")
self._process = None
5 changes: 4 additions & 1 deletion wandb/sdk/internal/system/system_monitor.py
Expand Up @@ -161,7 +161,10 @@ def finish(self) -> None:
self._shutdown_event.set()
for asset in self.assets:
asset.finish()
self._process.join()
try:
self._process.join()
except Exception as e:
logger.error(f"Error joining system monitor process: {e}")
self._process = None

def probe(self, publish: bool = True) -> None:
Expand Down
7 changes: 6 additions & 1 deletion wandb/sdk/lib/git.py
Expand Up @@ -62,7 +62,12 @@ def enabled(self) -> bool:
def root(self) -> Optional[str]:
if not self.repo:
return None
return self.repo.git.rev_parse("--show-toplevel")
try:
return self.repo.git.rev_parse("--show-toplevel")
except exc.GitCommandError as e:
# todo: collect telemetry on this
logger.error(f"git root error: {e}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a blocker
we should probably capture some telemetry for this if it is easy..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... what would we do with that information? it usually happens when the home dir of the user running the script is owned by someone else.
it's easy to do, just kind of clumsy: I'd set a flag on the GitRepo object if that happened, then would check for that in Settings as it's one of the places where it's used, and then pick it up in init. Should I still do it?

return None

@property
def dirty(self) -> bool:
Expand Down
2 changes: 1 addition & 1 deletion wandb/sdk/wandb_settings.py
Expand Up @@ -1008,7 +1008,7 @@ def __init__(self, **kwargs: Any) -> None:
self.__unexpected_args: Set[str] = set()

# Set default settings values
# We start off with the class attributes and `default_props`' dicts
# We start off with the class attributes and `default_props` dicts
# and then create Property objects.
# Once initialized, attributes are to only be updated using the `update` method
default_props = self._default_props()
Expand Down