internal/driver: fix rendering freeze in mobile #2473

changkun · 2021-09-15T15:55:52Z

Description:

This is a non-trivial fix. I have to describe my debugging process to formulate why this is a fix of issue 950.
More importantly, this is a similar fix as we have done in #2406.

In the mobile drivers, GL draw calls are executed in the drawloop, which is invoked by the iOS periodically:

fyne/internal/driver/mobile/app/darwin_ios.go

Lines 222 to 238 in eff859f

    
           //export drawloop 
        
           func drawloop() { 
        
           	runtime.LockOSThread() 
        
           	defer runtime.UnlockOSThread() 
        
           	for workAvailable := theApp.worker.WorkAvailable(); ; { 
        
           		select { 
        
           		case <-workAvailable: 
        
           			theApp.worker.DoWork() 
        
           		case <-theApp.publish: 
        
           			theApp.publishResult <- PublishResult{} 
        
           			return 
        
           		case <-time.After(100 * time.Millisecond): // incase the method blocked!! 
        
           			return 
        
           		} 
        
           	} 
        
           }

On Android, the draw loop is executed in the mainUI whenever workAvailable receives an event.

fyne/internal/driver/mobile/app/android.go

Lines 472 to 473 in eff859f

    
           case <-workAvailable: 
        
           	theApp.worker.DoWork()

The workAvailable receives events from a paint.Event, which calls many GLES functions. On mobile systems, GL calls are done in a DoWork call which batches cgo calls for performance reasons. However, if a GL call allocates a resource, then that function cannot be called asynchronously. Hence, DoWork will sync the pace with the paintWindow() in a paint.Event.

From debugging trace, the rendering pause is caused by the paintWindow() pauses forever and never returns. Therefore, a newer paint.Event will never be processed anymore.

In the minimized reproducer (see below), after days of tracing (sometimes very difficult to reproduce if there are more logs), logs indicates that the blocking seems to always happen in func (p *glPainter) drawRectangle, especially the GL calls happens inside drawTextureWithDetails (where calls func (p *glPainter) getTexture -> func (p *glPainter) newTexture -> ...). More importantly, a blocking GL call may be stuck on a channel at

fyne/internal/driver/mobile/gl/work.go

Lines 105 to 106 in eff859f

    
           func (ctx *context) enqueue(c call) uintptr { 
        
           	ctx.work <- c

or

fyne/internal/driver/mobile/gl/work.go

Line 114 in eff859f

return uintptr(<-ctx.retvalue)

forever and never continue.

The following log traces shows how this happened:

G1 workAvaliable success: glfnUseProgram, non-block       (a)
G1 workAvaliable success: glfnBlendFunc, non-block        (b)
G1 workAvaliable failed: glfnActiveTexture, non-block     (c)
G2 DoWork returned                                        (d)
G2 DoWork returned                                        (e)
G1 workAvaliable success: glfnBindTexture, non-block      (f)
G1 workAvaliable success: glfnDrawArrays, non-block       (g)
G2 drawloop called                                        (h)
G2 DoWork returned                                        (i)
G2 DoWork returned                                        (j)
G1 workAvaliable failed: glfnGenTexture, block            (k)
G2 drawloop timeout, return                               (l)
G2 drawloop timeout, return
G2 drawloop timeout, return
G2 drawloop timeout, return
G2 drawloop timeout, return
G2 drawloop timeout, return
...
DoWork is never called again.

G1 and G2 are two goroutines in the above log, and G1 sends GL calls to G2, and G2 processes it. We can have:

=> If workAvailable success twice, then DoWork must return twice. So (a,b) and (d,e) are associated; (f-g) and (i-j) are associated;
=> Hence, (h) and (k) are the relevant process to cause the rendering freeze, and (h) is before (l).
=> The timeout of (l) is because workAvaliable is not receiving. Thus (k) is the cause of the timeout.
=> (k)'s send to workAvaliable failed, and last DoWork return (j) leads to DoWork is never called again.

For a fix, we may quickly think about making the workAvailable's buffer a little bit bigger. But it won't fix the issue forever, this is because the workAvailable may still be ignored if the channel's buffer is full. For this reason, we should never skip a workAvailable event, which makes the channel's internal buffer to become an arbitrary size.

Thankfully, in #2406, we have provided the unbounded channel facility, and therefore the fix would be simple:

Use an unbounded channel for the workAvailable, so that no more workAvailable event can be missed.

Fixes #950

@andydotxyz @Jacalz

Checklist:

Tests included.
Lint and formatter run with no errors.
Tests all pass.

Reproducer

I simplified the button reproducer proposed in #950 as follows (automatic refreshing):

package main

import (
	"image/color"
	"log"
	"time"

	"fyne.io/fyne/v2"
	"fyne.io/fyne/v2/app"
	"fyne.io/fyne/v2/canvas"
	"fyne.io/fyne/v2/container"
	"fyne.io/fyne/v2/layout"
)

func main() {
	win := app.New().NewWindow("Some Window")
	w, h := 90, 25
	cells := make([]fyne.CanvasObject, 0, w*h)
	for i := 0; i < w*h; i++ {
		cells = append(cells, canvas.NewRectangle(color.RGBA{R: 255, G: 255, B: 255, A: 128}))
	}
	container := container.New(layout.NewGridLayout(h), cells...)
	colors := [3]color.RGBA{
		{R: 20, G: 30, B: 100, A: 128},
		{R: 100, G: 20, B: 30, A: 128},
		{R: 30, G: 100, B: 20, A: 128},
	}

	go func() {
		c := 0
		tk := time.NewTicker(100 * time.Millisecond)
		for range tk.C {
			for i := 0; i < w; i++ {
				for j := 0; j < h; j++ {
					obj := container.Objects[i*h+j].(*canvas.Rectangle)
					obj.FillColor = colors[c]
					obj.Refresh()
				}
			}
			log.Println("still refreshing")
			c = (c + 1) % 3
		}
	}()
	win.SetContent(container)
	win.ShowAndRun()
}

changkun · 2021-09-21T18:37:06Z

@andydotxyz ping

andydotxyz · 2021-09-22T10:49:04Z

The code looks good, and it does stop the hang.
However I do get new error messages, are they related and/or a problem?

default	11:47:26.659791+0100	main	Hang detected: 0.35s (always-on hang reporting)

After about 20 of these (which appear a few seconds apart), the demo exits hard.

changkun · 2021-09-22T10:54:37Z

After about 20 of these (which appear a few seconds apart), the demo exits hard.

I haven't encountered such a problem yet, could you describe it in more detail? such as what does it mean by "exits hard"? Is it run out of memory? or something else? "exists hard" triggers no panics?

andydotxyz · 2021-09-22T10:59:10Z

Is it run out of memory? or something else? "exists hard" triggers no panics?

Apologies, since Go 1.16 I don't get panic on iOS logs in the Console - I will downgrade and see what I can gather.

andydotxyz · 2021-09-22T11:18:12Z

It seems to be not crashing but being closed. due to:

Assertion did invalidate due to timeout: 48552-48552-46637 (target:[application<io.fyne.demo>:52123])

It looks like the app was just not responding due to high CPU.
Maybe I cranked up the demo hitting it's a ability to process the refreshes...

andydotxyz · 2021-09-22T11:24:39Z

Yeah, concluding that the issue is separate to the one this solves. If I dial it back to fewer refreshes the messages may appear but the app doesn't crash - I must have pushed the CPU too hard so the heartbeat fails to respond.

andydotxyz · 2021-09-22T11:25:12Z

Thanks, very excited indeed to have this fixed!

internal/driver: fix dead lock in render loop

3e3ee86

changkun changed the title ~~internal/driver: fix dead lock in mobile render loop~~ internal/driver: fix rendering freeze in mobile Sep 17, 2021

andydotxyz approved these changes Sep 22, 2021

View reviewed changes

andydotxyz merged commit d6848f2 into fyne-io:develop Sep 22, 2021

changkun deleted the issue950 branch September 22, 2021 11:34

guevara mentioned this pull request Apr 28, 2023

The Ultimate Channel Abstraction guevara/read-it-later#9279

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

internal/driver: fix rendering freeze in mobile #2473

internal/driver: fix rendering freeze in mobile #2473

changkun commented Sep 15, 2021 •

edited

changkun commented Sep 21, 2021

andydotxyz commented Sep 22, 2021

changkun commented Sep 22, 2021

andydotxyz commented Sep 22, 2021

andydotxyz commented Sep 22, 2021

andydotxyz commented Sep 22, 2021

andydotxyz commented Sep 22, 2021

	//export drawloop
	func drawloop() {
	runtime.LockOSThread()
	defer runtime.UnlockOSThread()

	for workAvailable := theApp.worker.WorkAvailable(); ; {
	select {
	case <-workAvailable:
	theApp.worker.DoWork()
	case <-theApp.publish:
	theApp.publishResult <- PublishResult{}
	return
	case <-time.After(100 * time.Millisecond): // incase the method blocked!!
	return
	}
	}
	}

internal/driver: fix rendering freeze in mobile #2473

internal/driver: fix rendering freeze in mobile #2473

Conversation

changkun commented Sep 15, 2021 • edited

Description:

Checklist:

Reproducer

changkun commented Sep 21, 2021

andydotxyz commented Sep 22, 2021

changkun commented Sep 22, 2021

andydotxyz commented Sep 22, 2021

andydotxyz commented Sep 22, 2021

andydotxyz commented Sep 22, 2021

andydotxyz commented Sep 22, 2021

changkun commented Sep 15, 2021 •

edited