Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix indices in initial task-to-task registration #3410

Merged
merged 1 commit into from Feb 28, 2022

Conversation

EnricoMi
Copy link
Collaborator

@EnricoMi EnricoMi commented Feb 16, 2022

In Elastic Spark, the initial task-to-task registration has to use the actual next index, not the index + 1, as in elastic mode not all indices might exist on initialisation.

Fixes these errors:

Exception in thread Thread-18:
Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/dist-packages/horovod/spark/runner.py", line 179, in notify_and_register
    next_task_addresses = driver.all_task_addresses(next_task_index)
  File "/usr/local/lib/python3.7/dist-packages/horovod/runner/common/service/driver_service.py", line 112, in all_task_addresses
    return self._all_task_addresses[index].copy()
KeyError: 2

Signed-off-by: Enrico Minack <github@enrico.minack.dev>
@github-actions
Copy link

Unit Test Results

     773 files  ±0       773 suites  ±0   8h 49m 38s ⏱️ - 17m 21s
     722 tests ±0       661 ✔️  - 14       47 💤 ±0  14 +14 
16 693 runs  ±0  11 761 ✔️  - 15  4 917 💤 ±0  15 +15 

For more details on these failures, see this check.

Results for commit 5f72273. ± Comparison against base commit 046c071.

@github-actions
Copy link

Unit Test Results (with flaky tests)

     963 files  +     88       963 suites  +88   10h 33m 37s ⏱️ + 29m 52s
     722 tests ±       0       657 ✔️  -      16       47 💤 ±    0  18 +16 
21 036 runs  +1 972  14 660 ✔️ +1 336  6 325 💤 +588  51 +48 

For more details on these failures, see this check.

Results for commit 5f72273. ± Comparison against base commit 046c071.

@EnricoMi EnricoMi marked this pull request as ready for review February 17, 2022 08:23
@tgaddair tgaddair merged commit 7b5346e into master Feb 28, 2022
@tgaddair tgaddair deleted the branch-spark-elastic-fix-task2task-registration branch February 28, 2022 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants