Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested groups hang #56

Open
koenbok opened this issue Apr 24, 2024 · 1 comment
Open

Nested groups hang #56

koenbok opened this issue Apr 24, 2024 · 1 comment

Comments

@koenbok
Copy link
Sponsor

koenbok commented Apr 24, 2024

I ran into an issue where a nested group just started to hang, but it seems to really depend on the amount of workers and tasks. My expectation was that this should work, especially because the initially smaller amounts of tasks worked fine.

(If you lower the n=100 to n=10 it does work on my MacBook Pro M3)

Is this a bug or can I help document this better?

	pool := pond.New(16, 1_000_000)

	groupA := pool.Group()

	for i := 0; i < 100; i++ {
		groupA.Submit(func() {
			fmt.Println("A", i)
			groupB := pool.Group()
			for i := 0; i < 10; i++ {
				groupB.Submit(func() {
					fmt.Println("B", i)
				})
			}
			groupB.Wait()
		})
	}

	groupA.Wait()
@alitto
Copy link
Owner

alitto commented Apr 24, 2024

Hey @koenbok!

I'm afraid nested groups will not work with that setup, since it's reusing the same worker pool for both group levels (A and B). Each worker pool receives submitted tasks via a single channel and these are grabbed by worker goroutines in a FIFO manner.
To visualize it more clearly, this is one of the many possible scenarios of how tasks end up queued in the worker pool's channel. Notice that each task of the group A will be running in a different goroutine (outside the main one), so there's no guarantee about it running before the next iteration of the for loop that creates A tasks.

// 1. Initial state, empty task queue
[]

// 2. Group A task submitted
[ A1 ]

// 3. Group A task submitted
[ A1, A2 ]

// 4. Group A task submitted
[ A1, A2, A3 ]

// 5. Group B1 task submitted
[ A1, A2, A3, B11 ]

// 6. Group B1 task submitted
[ A1, A2, A3, B11, B12, .... ]

To simplify this exercise, lets assume the max number of workers in the pool is 2, although the same thing can happen with larger pool sizes.
Given that each task of group A waits until all its nested B tasks complete (groupB.Wait()), every time a worker goroutine grabs a task of group A, it will block until all its nested B tasks complete. Since we only have 2 workers and given that each worker grabs tasks from the queue in order (FIFO), each one will grab an A task (A1 and A2). Therefore, once these 2 workers submit all their nested B tasks, they will both block. When all workers are blocked (2 in this case), the next tasks in the queue will not be processed because there is no idle worker to grab them.

Possible workaround

One possible workaround would be to have a dedicated worker pool for each task level (A vs B). It would look something like this:

poolA := pond.New(16, 1_000_000)
poolB := pond.New(16, 1_000_000) // You can tweak these parameters as needed

groupA := poolA.Group()

for i := 0; i < 100; i++ {
	groupA.Submit(func() {
		fmt.Println("A", i)
		groupB := poolB.Group()
		for i := 0; i < 10; i++ {
			groupB.Submit(func() {
				fmt.Println("B", i)
			})
		}
		groupB.Wait()
	})
}

groupA.Wait()

Given that each pool will have its own FIFO queue, there shouldn't be any deadlock here 🙂

I hope you find this helpful and please feel free to submit pull requests with improvements to the docs if you want, that would be really appreciated 🙏 and it could help others that came across this use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants