Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: too many Refresh() calls may now cause visual artifacts in the List widget #2548

Closed
nullst opened this issue Oct 13, 2021 · 32 comments
Labels
duplicate This issue or pull request already exists

Comments

@nullst
Copy link
Contributor

nullst commented Oct 13, 2021

Describe the bug:

Consider a List widget with 1000 canvas.Texts, with updateFunc callback set up to update the text and call Refresh() on each text element. Scrolling this list quickly (for example, using a touchpad with large scroll acceleration) sometimes results in visual artifacts that are not resolved on their own any time soon. By that I mean, waiting several seconds does not fix them, and sometimes even moving the mouse over the list (which highlights items on hover) does not refresh the text to the correct state even though the background is being updated.

This is a regression: using the same example with fyne v2.0.4 (specifically, 2d4915c ) or with a recent commit c91d36c (UPD: happens on a recent commit as well if I enlarge the window vertically) does not lead to persistent visual artifacts. In those versions I can glimpse very similar distorted texts while scrolling very fast, but only momentarily, they are fixed the moment I stop scrolling.

Surprisingly, sometimes the visual glitch is fixed (as in: one moment you see a glitch, the other moment you see the correct text) without performing any invocations of updateFunc between those two moments. You can check this by uncommenting a logging line in the example code and watching the output in the terminal window. So perhaps this has something to do with visual cache.

I've attached the example code and a screenshot showing visual glitches on that example.

To Reproduce:

Steps to reproduce the behaviour:

  1. Compile the example code on develop 0b8f09e (same for any commit starting from, including, a113caf )
  2. Run it and scroll very fast.
  3. You may see the visual glitches when you stop scrolling. Scrolling a bit further usually fixes the problem, but just waiting or hovering the cursor does not always work.

Screenshots:

Screen Shot 2021-10-13 at 15 44 34

Example code:

package main

import (
	"fmt"
	"log"

	"fyne.io/fyne/v2"
	"fyne.io/fyne/v2/app"
	"fyne.io/fyne/v2/canvas"
	"fyne.io/fyne/v2/theme"
	"fyne.io/fyne/v2/widget"
)

func main() {
	app := app.New()
	w := app.NewWindow("List with many lines")

	recurringNames := []string{"aaaaaaaaa", "bb", "NNNNNNNNNNNNNNN"}

	list := widget.NewList(
		func() int {
			return 1000
		},
		func() fyne.CanvasObject {
			return canvas.NewText("", theme.ForegroundColor())
		},
		func(id widget.ListItemID, o fyne.CanvasObject) {
			o.(*canvas.Text).Text = fmt.Sprintf("%s %d", recurringNames[id%3], id)
			o.Refresh()
			// log.Printf("Called Refresh() on item ID %d\n", id)
		},
	)

	w.SetContent(list)
	w.Resize(fyne.NewSize(float32(500), float32(300)))
	w.ShowAndRun()
}

Device (please complete the following information):

  • OS: MacOS
  • Version: 10.12 Sierra
  • Go version: go1.15.3 darwin/amd64
  • Fyne version: 0b8f09e
@nullst nullst added the unverified A bug that has been reported but not verified label Oct 13, 2021
@nullst
Copy link
Contributor Author

nullst commented Oct 13, 2021

Further experiments show that the culprit is the commit a113caf . Commits before it have no visual artifacts, while starting from this commit the bug arises. Not sure what exactly is the problem. Paging @changkun who probably understands this better than me.

@changkun
Copy link
Member

I don't think a113caf was the cause. Instead, it is likely that the problem might exist longer than that commit since it only improves performance and misuse cases.

However, it may be a problem that relates to #2509 that widgets have several internal data race issues.
If I run your provided example, using -race tag, I received the following data race errors:

PS E:\dev\changkun.de\tests> ./x
2021/10/13 21:56:37 Fyne error:  Failed to set dark mode
2021/10/13 21:56:37   Cause: The operation completed successfully.
2021/10/13 21:56:37   At: C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/window_windows.go:26
==================
WARNING: DATA RACE
Write at 0x00c00049a090 by main goroutine:
  fyne.io/fyne/v2/widget.(*listLayout).offsetUpdated()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/widget/list.go:394 +0xaa
  fyne.io/fyne/v2/widget.(*listLayout).offsetUpdated-fm()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/widget/list.go:390 +0x2f
  fyne.io/fyne/v2/internal/widget.(*Scroll).updateOffset()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/widget/scroller.go:482 +0x2fe
  fyne.io/fyne/v2/internal/widget.(*Scroll).Scrolled()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/widget/scroller.go:465 +0x124
  fyne.io/fyne/v2/internal/driver/glfw.(*window).mouseScrolled()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/window.go:946 +0x3c8
  fyne.io/fyne/v2/internal/driver/glfw.(*window).mouseScrolled-fm()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/window.go:920 +0x67
  github.com/go-gl/glfw/v3.3/glfw.goScrollCB()
      C:/Users/Changkun Ou/go/pkg/mod/github.com/go-gl/glfw/v3.3/glfw@v0.0.0-20210410170116-ea3d685f79fb/input.go:353 +0x71
  _cgoexp_c09f62143e0e_goScrollCB()
      _cgo_gotypes.go:2585 +0x7e
  runtime.cgocallbackg1()
      C:/Program Files/Go/src/runtime/cgocall.go:306 +0x2ab
  github.com/go-gl/glfw/v3.3/glfw.PollEvents()
      C:/Users/Changkun Ou/go/pkg/mod/github.com/go-gl/glfw/v3.3/glfw@v0.0.0-20210410170116-ea3d685f79fb/window.go:949 +0x29
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).tryPollEvents()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/loop.go:243 +0x48
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).runGL()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/loop.go:109 +0x256
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).Run()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/driver.go:83 +0x39
  fyne.io/fyne/v2/internal/driver/glfw.(*window).ShowAndRun()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/window.go:461 +0x55
  main.main()
      E:/dev/changkun.de/tests/main.go:35 +0x30a

Previous read at 0x00c00049a090 by goroutine 9:
  fyne.io/fyne/v2/widget.(*listLayout).updateList()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/widget/list.go:428 +0x244
  fyne.io/fyne/v2/widget.(*listLayout).Layout()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/widget/list.go:369 +0x33
  fyne.io/fyne/v2/internal/driver/common.updateLayout()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/common/canvas.go:512 +0x12b
  fyne.io/fyne/v2/internal/driver/common.(*Canvas).EnsureMinSize.func1()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/common/canvas.go:97 +0x358
  fyne.io/fyne/v2/internal/driver/common.(*Canvas).walkTree.func2()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/common/canvas.go:433 +0x1c8
  fyne.io/fyne/v2/internal/driver.walkObjectTree()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/util.go:197 +0x3a1
  fyne.io/fyne/v2/internal/driver.walkObjectTree.func1()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/util.go:176 +0x454
  fyne.io/fyne/v2/internal/driver.walkObjectTree()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/util.go:190 +0x460
  fyne.io/fyne/v2/internal/driver.walkObjectTree.func1()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/util.go:176 +0x454
  fyne.io/fyne/v2/internal/driver.walkObjectTree()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/util.go:190 +0x460
  fyne.io/fyne/v2/internal/driver.walkObjectTree.func1()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/util.go:176 +0x454
  fyne.io/fyne/v2/internal/driver.walkObjectTree()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/util.go:190 +0x460
  fyne.io/fyne/v2/internal/driver.walkObjectTree.func1()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/util.go:176 +0x454
  fyne.io/fyne/v2/internal/driver.walkObjectTree()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/util.go:190 +0x460
  fyne.io/fyne/v2/internal/driver.WalkVisibleObjectTree()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/util.go:134 +0x7d
  fyne.io/fyne/v2/internal/driver/common.(*Canvas).walkTree()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/common/canvas.go:439 +0x211
  fyne.io/fyne/v2/internal/driver/common.(*Canvas).WalkTrees()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/common/canvas.go:351 +0x67
  fyne.io/fyne/v2/internal/driver/common.(*Canvas).EnsureMinSize()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/common/canvas.go:102 +0x152
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).repaintWindow.func1()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/loop.go:166 +0x54
  fyne.io/fyne/v2/internal/driver/glfw.(*window).RunWithContext()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/window.go:1329 +0x6e
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).repaintWindow()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/loop.go:165 +0x6c
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).startDrawThread.func1()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/loop.go:228 +0x3e4

Goroutine 9 (running) created at:
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).startDrawThread()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/loop.go:192 +0x158
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).initGLFW.func1()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/loop.go:83 +0x65
  sync.(*Once).doSlow()
      C:/Program Files/Go/src/sync/once.go:68 +0x127
  sync.(*Once).Do()
      C:/Program Files/Go/src/sync/once.go:59 +0x46
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).initGLFW()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/loop.go:75 +0x5d
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).createWindow.func1()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/window.go:1383 +0x6e
  fyne.io/fyne/v2/internal/driver/glfw.runOnMain()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/loop.go:55 +0x17a
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).createWindow()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/window.go:1382 +0x1ae
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).CreateWindow()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/window.go:1374 +0x45
  fyne.io/fyne/v2/app.(*fyneApp).NewWindow()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/app/app.go:57 +0x57
  main.main()
      E:/dev/changkun.de/tests/main.go:15 +0x46
==================
==================
WARNING: DATA RACE
Write at 0x00c00060e0a8 by goroutine 9:
  fyne.io/fyne/v2/widget.(*listLayout).updateList()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/widget/list.go:475 +0xb64
  fyne.io/fyne/v2/widget.(*listLayout).Layout()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/widget/list.go:369 +0x33
  fyne.io/fyne/v2/internal/driver/common.updateLayout()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/common/canvas.go:512 +0x12b
  fyne.io/fyne/v2/internal/driver/common.(*Canvas).EnsureMinSize.func1()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/common/canvas.go:97 +0x358
  fyne.io/fyne/v2/internal/driver/common.(*Canvas).walkTree.func2()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/common/canvas.go:433 +0x1c8
  fyne.io/fyne/v2/internal/driver.walkObjectTree()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/util.go:197 +0x3a1
  fyne.io/fyne/v2/internal/driver.walkObjectTree.func1()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/util.go:176 +0x454
  fyne.io/fyne/v2/internal/driver.walkObjectTree()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/util.go:190 +0x460
  fyne.io/fyne/v2/internal/driver.walkObjectTree.func1()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/util.go:176 +0x454
  fyne.io/fyne/v2/internal/driver.walkObjectTree()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/util.go:190 +0x460
  fyne.io/fyne/v2/internal/driver.walkObjectTree.func1()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/util.go:176 +0x454
  fyne.io/fyne/v2/internal/driver.walkObjectTree()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/util.go:190 +0x460
  fyne.io/fyne/v2/internal/driver.walkObjectTree.func1()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/util.go:176 +0x454
  fyne.io/fyne/v2/internal/driver.walkObjectTree()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/util.go:190 +0x460
  fyne.io/fyne/v2/internal/driver.WalkVisibleObjectTree()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/util.go:134 +0x7d
  fyne.io/fyne/v2/internal/driver/common.(*Canvas).walkTree()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/common/canvas.go:439 +0x211
  fyne.io/fyne/v2/internal/driver/common.(*Canvas).WalkTrees()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/common/canvas.go:351 +0x67
  fyne.io/fyne/v2/internal/driver/common.(*Canvas).EnsureMinSize()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/common/canvas.go:102 +0x152
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).repaintWindow.func1()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/loop.go:166 +0x54
  fyne.io/fyne/v2/internal/driver/glfw.(*window).RunWithContext()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/window.go:1329 +0x6e
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).repaintWindow()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/loop.go:165 +0x6c
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).startDrawThread.func1()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/loop.go:228 +0x3e4

Previous read at 0x00c00060e0a8 by main goroutine:
  fyne.io/fyne/v2.(*Container).MinSize()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/container.go:71 +0xa4
  fyne.io/fyne/v2/internal/widget.(*Scroll).updateOffset()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/widget/scroller.go:480 +0x237
  fyne.io/fyne/v2/internal/widget.(*Scroll).Scrolled()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/widget/scroller.go:465 +0x124
  fyne.io/fyne/v2/internal/driver/glfw.(*window).mouseScrolled()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/window.go:946 +0x3c8
  fyne.io/fyne/v2/internal/driver/glfw.(*window).mouseScrolled-fm()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/window.go:920 +0x67
  github.com/go-gl/glfw/v3.3/glfw.goScrollCB()
      C:/Users/Changkun Ou/go/pkg/mod/github.com/go-gl/glfw/v3.3/glfw@v0.0.0-20210410170116-ea3d685f79fb/input.go:353 +0x71
  _cgoexp_c09f62143e0e_goScrollCB()
      _cgo_gotypes.go:2585 +0x7e
  runtime.cgocallbackg1()
      C:/Program Files/Go/src/runtime/cgocall.go:306 +0x2ab
  github.com/go-gl/glfw/v3.3/glfw.PollEvents()
      C:/Users/Changkun Ou/go/pkg/mod/github.com/go-gl/glfw/v3.3/glfw@v0.0.0-20210410170116-ea3d685f79fb/window.go:949 +0x29
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).tryPollEvents()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/loop.go:243 +0x48
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).runGL()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/loop.go:109 +0x256
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).Run()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/driver.go:83 +0x39
  fyne.io/fyne/v2/internal/driver/glfw.(*window).ShowAndRun()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/window.go:461 +0x55
  main.main()
      E:/dev/changkun.de/tests/main.go:35 +0x30a

Goroutine 9 (running) created at:
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).startDrawThread()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/loop.go:192 +0x158
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).initGLFW.func1()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/loop.go:83 +0x65
  sync.(*Once).doSlow()
      C:/Program Files/Go/src/sync/once.go:68 +0x127
  sync.(*Once).Do()
      C:/Program Files/Go/src/sync/once.go:59 +0x46
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).initGLFW()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/loop.go:75 +0x5d
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).createWindow.func1()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/window.go:1383 +0x6e
  fyne.io/fyne/v2/internal/driver/glfw.runOnMain()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/loop.go:55 +0x17a
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).createWindow()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/window.go:1382 +0x1ae
  fyne.io/fyne/v2/internal/driver/glfw.(*gLDriver).CreateWindow()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/internal/driver/glfw/window.go:1374 +0x45
  fyne.io/fyne/v2/app.(*fyneApp).NewWindow()
      C:/Users/Changkun Ou/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/app/app.go:57 +0x57
  main.main()
      E:/dev/changkun.de/tests/main.go:15 +0x46
==================

More importantly, I also observed the artifact on 2.1.0 without the changes from the mentioned commit.

@andydotxyz
Copy link
Member

Can you please test the tip of release/v2.1.x (version e3f513e7)? This should already be resolved.

@changkun
Copy link
Member

I just tested on release/v2.1.x tip, the problem still exists

image

@nullst
Copy link
Contributor Author

nullst commented Oct 14, 2021

More importantly, I also observed the artifact on 2.1.0 without the changes from the mentioned commit.

Interesting! Originally, I wasn't able to replicate the issue with 2.1.0 on my machine, but your screenshot helped: if I enlarge the window vertically to fit more lines into the visible part of the list, then I can get the same issues on 2.1.0. Though on v2.0.4 I seem to have no issues regardless of the list size, so this is still a regression.

A random observation: it seems that replacing canvas.Text in the provided example with a widget.Label completely fixes the visual glitches on my machine, and this seems to work universally, on the tip of develop, tip of release/v2.1.x, or any earlier commits as well. Note that the label widget introduces more padding, so I need to enlarge the window to have a valid comparison, but it still works. Though if the issue arises from a data race, this may not mean anything, just the change of some timings that accidentally avoids the race. Here's an example code:

package main

import (
	"fmt"
	"log"

	"fyne.io/fyne/v2"
	"fyne.io/fyne/v2/app"
	"fyne.io/fyne/v2/canvas"
	"fyne.io/fyne/v2/theme"
	"fyne.io/fyne/v2/widget"
)

func main() {
	app := app.New()
	w := app.NewWindow("List with many lines")

	recurringNames := []string{"aaaaaaaaa", "bb", "NNNNNNNNNNNNNNN"}

	useLabel := true

	list := widget.NewList(
		func() int {
			return 1000
		},
		func() fyne.CanvasObject {
			if useLabel {
				return widget.NewLabel("")
			}
			return canvas.NewText("", theme.ForegroundColor())
		},
		func(id widget.ListItemID, o fyne.CanvasObject) {
			if useLabel {
				o.(*widget.Label).Text = fmt.Sprintf("%s %d", recurringNames[id%3], id)
			} else {
				o.(*canvas.Text).Text = fmt.Sprintf("%s %d", recurringNames[id%3], id)
			}
			o.Refresh()
			log.Printf("Called Refresh() on item ID %d\n", id)
		},
	)

	w.SetContent(list)
	w.Resize(fyne.NewSize(float32(500), float32(300)))
	w.ShowAndRun()
}

@nullst
Copy link
Contributor Author

nullst commented Oct 14, 2021

An observation, maybe of value, maybe obvious: the visual glitches disappear for me after disabling the texture cache in fyne. By that I mean, if I edit the function getTexture in internal/painter/gl/gl_common.go as following:

func (p *glPainter) getTexture(object fyne.CanvasObject, creator func(canvasObject fyne.CanvasObject) Texture) Texture {
	texture, ok := cache.GetTexture(object)

	if !ok || ok {    // MODIFIED CONDITION TO BE ALWAYS TRUE
		texture = cache.TextureType(creator(object))
		cache.SetTexture(object, texture, p.canvas)
	}
	return Texture(texture)
}

Then the glitches completely disappear, both the temporary ones during scrolling and the "persistent" ones that remain at the end of the scroll. I guess texture cache is a natural origin for problems as described, so maybe it was obvious, but perhaps this could be helpful in figuring out the source of the problem.

@andydotxyz
Copy link
Member

Yes, this does look like a texture re-use issue - they are not invalidating at the right time perhaps.

@andydotxyz
Copy link
Member

Looks like it could be a duplicate of #2529 - setting Text then calling Refresh().
The thing I thought we had fixed is just Label.SetText.

@andydotxyz andydotxyz added duplicate This issue or pull request already exists and removed unverified A bug that has been reported but not verified labels Oct 15, 2021
@nullst
Copy link
Contributor Author

nullst commented Oct 16, 2021

I've done a bunch of experiments, and it seems that I found the source of this particular issue:

  • Scrolling the list sends messages to the unbounded channel refreshQueue (in internal/driver/common/canvas.go) MUCH faster than they are consumed, so when user calls Text.Refresh() this message may easily wait for >30 seconds before it is processed.
  • The problem is due to a subtle mistake in FreeDirtyTextures() function, which is the only consumer of that unbounded channel.
  • The wrong logic causes FreeDirtyTextures to finish prematurely: it consumes some messages from the refreshQueue, but often it leaves a vast majority of messages unprocessed.

Diagnostic

This can be confirmed by adding some logging in FreeDirtyTextures, though you'll also need to modify internal/async/chan_canvasobject.go to allow access to the number of items in the internal buffer. Scrolling the list will produce scary messages like that:

...
FreeDirtyTextures freed 17 out of 9230
FreeDirtyTextures freed 34 out of 10080
FreeDirtyTextures freed 17 out of 10200
FreeDirtyTextures freed 17 out of 11070
FreeDirtyTextures freed 17 out of 11330
FreeDirtyTextures freed 59 out of 12600
FreeDirtyTextures freed 199 out of 12890
FreeDirtyTextures freed 158 out of 13020
FreeDirtyTextures freed 42 out of 13290
FreeDirtyTextures freed 23 out of 14010
...

(If you casually move the cursor over the list, hovering over different elements, the amount of unprocessed refresh events will eventually decrease, but very slowly.)

Logic mistake

Here's a sketch of FreeDirtyTextures:

func (c *Canvas) FreeDirtyTextures() bool {
	for {
		select {
		case object := <-c.refreshQueue.Out():
			...
		default:
			...
			return
		}
	}
}

It seems like the intent behind this code was to consume all messages from the refreshQueue before finishing in the default case. But this is not a valid implementation since unbounded channels are trickier.

Namely, the refreshQueue is not constantly trying to send messages to the Out channel, it only tries to send a message if it's not busy storing messages from the in-channel (probably to avoid heavy synchronization for in and out workflows). Thus the select loop in FreeDirtyTextures reaches the default case not when the refresh queue is empty, but at the moment when the select loop step is started at a moment when refreshQueue is "busy".

Since scrolling the list produces a storm of Refresh() events, this causes FreeDirtyTextures to finish prematurely all the time. Though note that even without a huge list in any nontrivial fyne application new messages arrive to the refreshQueue often enough to stop FreeDirtyTextures constantly.

Possible fix

During testing I introduced a new exported method, Estimate() int on *UnboundedCanvasObjectChan, that returns a number guaranteed to be <= number of items in the internal buffer. With this method already implemented, a natural replacement for the wrong logic in FreeDirtyTextures is to replace it by the following:

// FreeDirtyTextures frees dirty textures.
func (c *Canvas) FreeDirtyTextures() bool {
	estimate := c.refreshQueue.Estimate()
	// Since FreeDirtyTextures is the only consumer of refreshQueue,
	// consuming this many elements from the refreshQueue will not block.
	for i := 0; i < estimate; i++ {
		object := <-c.refreshQueue.Out()
		freeWalked := func(obj fyne.CanvasObject, _ fyne.Position, _ fyne.Position, _ fyne.Size) bool {
			c.painter.Free(obj)
			return false
		}
		driver.WalkCompleteObjectTree(object, freeWalked, nil)
	}

	cache.RangeExpiredTexturesFor(c.impl, func(obj fyne.CanvasObject) {
		c.painter.Free(obj)
	})
	return (estimate > 0)
}

On my machine this modified version of FreeDirtyTextures solves the issue.

Implementation of refreshQueue.Estimate()

Disclaimer: I know that chan_canvasobject.go is an autogenerated file, so for a proper fix this needs to be implemented in internal/async/gen.go. But I don't know how the code generation works in go, sorry. Just running go generate ./... in the root of the fyne repository gives me a bunch of errors, and I don't know anything more advanced.

With that notice in mind, here's a diff that introduces Estimate() method for the unbounded channel:

diff --git a/internal/async/chan_canvasobject.go b/internal/async/chan_canvasobject.go
index d695a0d4..3d3a8fe4 100755
--- a/internal/async/chan_canvasobject.go
+++ b/internal/async/chan_canvasobject.go
@@ -3,6 +3,7 @@
 package async

 import "fyne.io/fyne/v2"
+import "sync"

 // UnboundedCanvasObjectChan is a channel with an unbounded buffer for caching
 // CanvasObject objects.
@@ -10,8 +11,13 @@ type UnboundedCanvasObjectChan struct {
        in, out chan fyne.CanvasObject
        close   chan struct{}
        q       []fyne.CanvasObject
+
+       estimateLock sync.Mutex
+       estimate     int // a number guaranteed to be <= len(ch.q) at all times
 }

+const estimatePrecision = 10
+
 // NewUnboundedCanvasObjectChan returns a unbounded channel with unlimited capacity.
 func NewUnboundedCanvasObjectChan() *UnboundedCanvasObjectChan {
        ch := &UnboundedCanvasObjectChan{
@@ -61,6 +67,11 @@ func (ch *UnboundedCanvasObjectChan) processing() {
                        select {
                        case ch.out <- ch.q[0]:
                                ch.q[0] = nil // de-reference earlier to help GC
+                               if len(ch.q)%estimatePrecision == 1 {
+                                       ch.estimateLock.Lock()
+                                       ch.estimate = len(ch.q) - 1
+                                       ch.estimateLock.Unlock()
+                               }
                                ch.q = ch.q[1:]
                        case e, ok := <-ch.in:
                                if !ok {
@@ -70,6 +81,11 @@ func (ch *UnboundedCanvasObjectChan) processing() {
                                        panic("async: misuse of unbounded channel, In() was closed")
                                }
                                ch.q = append(ch.q, e)
+                               if len(ch.q)%estimatePrecision == 0 {
+                                       ch.estimateLock.Lock()
+                                       ch.estimate = len(ch.q)
+                                       ch.estimateLock.Unlock()
+                               }
                        case <-ch.close:
                                ch.closed()
                                return
@@ -83,6 +99,15 @@ func (ch *UnboundedCanvasObjectChan) processing() {
        }
 }

+// Estimate returns the number guaranteed to be less than or equal to
+// the number of items in the internal buffer ready to be consumed.
+func (ch *UnboundedCanvasObjectChan) Estimate() int {
+       ch.estimateLock.Lock()
+       result := ch.estimate
+       ch.estimateLock.Unlock()
+       return result
+}
+
 func (ch *UnboundedCanvasObjectChan) closed() {
        close(ch.in)
        for e := range ch.in {

@andydotxyz
Copy link
Member

This is excellent work thanks @nullst.
The approach sounds good - though "Estimate" is not a particularly descriptive name - perhaps "EstimatedLength" or "LengthMin" or similar?

@changkun
Copy link
Member

It is an interesting analysis, but I am not sure about all details here:

was to consume all messages from the refreshQueue before finishing in the default case

If we have one side of the refresh queue is producing objects to refresh, and the other side is consuming it. What does it mean by "all messages"? We don't know the future, thus "all messages" seem to be less accurate in this case.

it only tries to send a message if it's not busy storing messages from the in-channel

This is not entirely true because the in and out channels are selected in a select statement, which is selected under the fairness guarantee of the select statement:

select {
case ch.out <- ch.q[0]:
ch.q[0] = nil // de-reference earlier to help GC
ch.q = ch.q[1:]
case e, ok := <-ch.in:

this causes FreeDirtyTextures to finish prematurely all the time

Indeed the FreeDirtyTextures may return early, but if the queue is going to be consumed eventually, why calling FreeDirtyTextures in the next frame can still leads to artifacts?

--

Regarding the proposed fix, estimating the current length of the queue seems a little bit overkill using a default estimation number (10) and a Mutex lock. Using atomics is good engouh to get an estimated length of the queue buffer, see package chann as an example implementation.

Nevertheless, after rethinking the unbounded channel implementation, I suspect the actual reason may be current implementation slightly changes the channel's memory model behavior, because the in and out channels are buffered channel:

in: make(chan fyne.CanvasObject, 16),
out: make(chan fyne.CanvasObject, 16),

Hence a real fix (although I have don't know and not yet tested if it may leads to other issues), might as simple as turning them into unbuffered channel, or just turning the out channel to an unbounded channel.

@nullst
Copy link
Contributor Author

nullst commented Oct 17, 2021

This is not entirely true because the in and out channels are selected in a select statement, which is selected under the fairness guarantee of the select statement

I'll try to defend my interpretation. There are three participants: processing loop of an unbounded channel, the function Refresh() that sends messages, and the function FreeDirtyTextures that consumes messages. There is a fundamental difference between Refresh and FreeDirtyTextures that cannot be compensated by the fairness of the processing loop:

  • Refresh blocks on sending a message to the in-channel of refreshQueue. If the refreshQueue processing loop is performing other work at the moment (and the internal buffer of the in-channel is full, which during list scrolling is, I think, almost always true), Refresh will persist and wait. Furthermore, there could be multiple goroutines performing Refresh() and waiting on a result.
  • FreeDirtyTextures tries to consume a message from the out-channel of refreshQueue, but it has a default case as well. If the refreshQueue processing loop is performing other work at the moment (and the internal buffer of the out-channel is empty) FreeDirtyTextures will stop trying. Furthermore, this function is only called in the draw thread, so in a single goroutine.

So a random moment during the scrolling of the list looks kind of like a bunch of parallel Refresh()es all trying to send the messages to the in-channel of refreshQueue, each waiting for their turn since the in-buffer is full, and from time to time FreeDirtyTextures comes and tries to sneak directly to the refreshQueue, but give up if it's busy. What I'm saying is, at the start of the select loop you linked to there is very often a Refresh() instance waiting for its message to refreshQueue to be processed, but FreeDirtyTextures may only attempt to read messages for example 1ms later. That's not quite correct since the channels are buffered, but the picture is roughly like that.

Having a buffered out-channel actually helps: in the log fragment in "Diagnostic" session of the post above we see FreeDirtyTextures reporting 17 textures freed during the run four times out of ten. (UPDATE: I didn't make it clear, but the log entries mean "refreshQueue has 11330 tasks, but FreeDirtyTextures only freed 17 of them". It's not a total number of textures, it's a number of elements stuck in refreshQueue.) I didn't notice that at the time, but it seems to indicate FreeDirtyTextures is consuming buffered messages in the out-buffer, then maybe one more message put to the buffer during the execution (or maybe my logging code had an off-by-one mistake?), and then immediately stops executing. Though you are right that my variant of FreeDirtyTextures need fixing ("estimate" does not count elements buffered in the out-channel, so we should try to consume more than estimated to avoid messages stuck in the buffer).

Indeed the FreeDirtyTextures may return early, but if the queue is going to be consumed eventually, why calling FreeDirtyTextures in the next frame can still leads to artifacts?

Not only FreeDirtyTextures may return early sometimes, but in situations like "scrolling a list quickly" it will return early basically every frame for a long time (can easily happen for >30 seconds), as logged in the "Diagnostic" part of the comment above. And that often happens even if there is no list of widgets: FreeDirtyTextures is executed only on a redraw, which is only done when canvas is marked as dirty, which usually happens because a Refresh message is sent, but in practice as soon as a fyne application sends one refresh event, it also sends lots of other refresh messages immediately after it, and each of the follow-up messages has a chance to stop FreeDirtyTextures.

estimating the current length of the queue seems a little bit overkill using a default estimation number (10) and a Mutex lock.

Makes sense, atomics are definitely better here, I just used the most straightforward implementation with mutexes.

@smcio
Copy link
Contributor

smcio commented Oct 17, 2021

@nullst Nice work :). I can confirm that your patch does indeed fix the problem.

I was also experiencing this problem with widget.Table in my application. The problem would occur when scrolling up and down quickly, and similarly to in your report, data intended for cells would appear in the wrong cell, sometimes larger than intended. As you also noted, the problem only occurred when using the 'develop' branch, and not when using the 'master' branch. Applying your fix has gotten rid of this effect.

@changkun
Copy link
Member

Although for me 2.1 (master) is problematic for me, if develop is problematic but ok with master branch. Considering the latest change of unbounded channel mentioned by @nullst , does the following change offer a workaround as a fix?

-in:    make(chan fyne.CanvasObject, 16), 
-out:   make(chan fyne.CanvasObject, 16), 
+in:    make(chan fyne.CanvasObject, 128), 
+out:   make(chan fyne.CanvasObject, 128), 

@nullst
Copy link
Contributor Author

nullst commented Oct 17, 2021

Although for me 2.1 (master) is problematic for me, if develop is problematic but ok with master branch. Considering the latest change of unbounded channel mentioned by @nullst , does the following change offer a workaround as a fix?

-in:    make(chan fyne.CanvasObject, 16), 
-out:   make(chan fyne.CanvasObject, 16), 
+in:    make(chan fyne.CanvasObject, 128), 
+out:   make(chan fyne.CanvasObject, 128), 

For me this does not fix the problem: it takes slightly longer to reach it, and the visual glitches are fixed faster, but if I enlarge the list vertically and scroll back and forth quickly, it does not take a lot of time to reach messages like

2021/10/17 13:29:42 FreeDirtyTextures finished prematurely: only 179 out of at least 9180
2021/10/17 13:29:42 FreeDirtyTextures finished prematurely: only 129 out of at least 9520
2021/10/17 13:29:42 FreeDirtyTextures finished prematurely: only 311 out of at least 10160
2021/10/17 13:29:42 FreeDirtyTextures finished prematurely: only 916 out of at least 10900
2021/10/17 13:29:42 FreeDirtyTextures finished prematurely: only 129 out of at least 10450
2021/10/17 13:29:42 FreeDirtyTextures finished prematurely: only 176 out of at least 11370
2021/10/17 13:29:42 FreeDirtyTextures finished prematurely: only 301 out of at least 11740
2021/10/17 13:29:42 FreeDirtyTextures finished prematurely: only 128 out of at least 12940
2021/10/17 13:29:42 FreeDirtyTextures finished prematurely: only 130 out of at least 13240
2021/10/17 13:29:42 FreeDirtyTextures finished prematurely: only 153 out of at least 15200
2021/10/17 13:29:42 FreeDirtyTextures finished prematurely: only 389 out of at least 15440

Fyne seems to call Refresh() a lot.

@changkun
Copy link
Member

2021/10/17 13:29:42 FreeDirtyTextures finished prematurely: only 153 out of at least 15200
2021/10/17 13:29:42 FreeDirtyTextures finished prematurely: only 389 out of at least 15440

The data provided here is much higher than what was introduced before the unbounded channel, the original size is 4098, see

c.Initialize(c, 4096, c.overlayChanged)
. Why there was no problem before the unbounded channel was introduced?

As being suspected before, if the previous FreeDirtyTexture is relying on a behavior of the memory model, and the unbounded channel slightly changes it. Instead of changing two parties, is guarantee the unbounded channel matches how a buffered channel behaved a better fix in this case?

@nullst
Copy link
Contributor Author

nullst commented Oct 17, 2021

UPDATE: I think I know how to answer the question more concisely. For unbounded channels as implemented:

  1. A blocking receive behaves exactly like a receive from a standard buffered channel, modulo being a tiny bit slower.
  2. A nonblocking receive from the Out() channel, on the other hand, has very different semantics than nonblocking receives from a standard buffered channel! Namely, a nonblocking receive may fail even if the internal buffer is not empty.

When an unbounded channel is not congested, nonblocking receives behave confusingly similar to nonblocking receives from a buffered channel. But if there are, say, 500 goroutines desperately trying to send messages via the unbounded channel, a concurrent nonblocking receive may, and in practice will, fail. The select statement in the processing loop is fair, but if there is a long queue of goroutines trying to send, then at the moment a nonblocking receive is attempted, the loop will probably not be chilling in the select statement, it would be spending its time in lines 66-72.

As for modifying the unbounded channels to mimic the behavior of buffered channels: if that's easy to do, sure, that would be fine. But if this costs a lot of performance, maybe documenting the unusual behavior of nonblocking receives in the Out() function is good enough? A nonblocking receive is a rare operation, even FreeDirtyTextures does not really need it, it just wants to execute all tasks from the refreshQueue and then stop, so we can also solve this by providing FreeDirtyTextures with the number of tasks to do.


As being suspected before, if the previous FreeDirtyTexture is relying on a behavior of the memory model, and the unbounded channel slightly changes it. Instead of changing two parties, is guarantee the unbounded channel matches how a buffered channel behaved a better fix in this case?

Sorry, what do you mean by a memory model?

I think I can explain why there was no problem before the unbounded channel was introduced. My impression of the situation is as follows:

  1. FreeDirtyTextures has an infinite select loop. The select statement either (a) consumes a message from refreshQueue.Out(); or (b) stop processing if at the moment select is called, nothing can be consumed from .Out() channel.
  2. For standard buffered channels in golang: if nothing can be consumed from the channel, it means the internal buffer is empty (or was empty some microseconds ago if multiple goroutines work with that channel).
  3. For unbounded channels as implemented in
    select {
    case ch.out <- ch.q[0]:
    ch.q[0] = nil // de-reference earlier to help GC
    ch.q = ch.q[1:]
    case e, ok := <-ch.in:
    if !ok {
    : if nothing can be consumed from the channel, it may mean (A) that the internal buffer is empty, but could also mean (B) that processing() loop is currently processing a message to the in channel (lines 66-72 of that file).

The log fragments I posted seem to indicate that the infinite select loop in FreeDirtyTextures stops prematurely, while the internal buffer of researchQueue has thousands of items. The explanation I propose: very often nothing is being sent to Out() channel of researchQueue for the reason (B). Is this what you mean by "memory model"?

@changkun
Copy link
Member

changkun commented Oct 17, 2021

For memory models, see https://golang.org/ref/mem.

nothing is being sent to Out() channel of refreshQueue for the reason that processing() loop is currently processing a message to the in channel

That's an interesting observation. Let's do a thought experiment. If we have 10000 Refresh tasks, before the unbounded queue, we can only fill 4096 tasks, then the next Refresh will be blocked. On the receiver side, Out() consumes all existing tasks, which makes the channel becomes empty. Then terminate the loop in the default case.

Since the channel has been cleared previously, the blocked Refresh call can proceed, then SetDirty will guarantee a future frame to keep work on the remaining Refresh tasks.

Now, let's back to the unbounded channel, but the previously described case remains applicable to me.

FreeDirtyTextures is executed only on a redraw, which is only done when the canvas is marked as dirty, which usually happens because a Refresh message is sent, but in practice as soon as a fyne application sends one refresh event, it also sends lots of other refresh messages immediately after it [...]

According to the previous comments, if I understand the description correctly, that is describing: It can happen, if and only if, a task is being sent to the unbounded channel, but SetDirty failed to notify the next frame to further consume remaining tasks.

If true, then this sounds like a subtle bug in the SetDirty notification mechanism, where SetDirty is an atomic variable which cannot always collaborate with the refreshQueue itself, rather than a problem of the unbounded channel. Does the suggested change fixes this problem?

@nullst
Copy link
Contributor Author

nullst commented Oct 17, 2021

if I understand the description correctly, that is describing: It can happen, if and only if, a task is being sent to the unbounded channel, but SetDirty failed to notify the next frame to further consume remaining tasks.

No, that's not what I'm saying. First, let me argue that SetDirty is irrelevant:

  1. It's easy to modify the draw loop to always consider canvas as dirty. In fact, that's one of the things I've tried before I found a problem with FreeDirtyTextures(). This does not solve the issue. This just calls FreeDirtyTextures() marginally more often, but my point is that FreeDirtyTextures usually returns prematurely, without doing the work it is expected to do. Thus the visual glitches are fixed slightly faster, but not significantly.
  2. We have hard data: if you add logging to FreeDirtyTextures() that reports how many objects were freed during the function execution and how many objects are in refreshQueue, as I've done in the "Diagnostic" section of the comment above, you will see that FreeDirtyTextures() often reaches default case with thousands of items still stuck in the refreshQueue. How would you explain those premature returns? Note that neither FreeDirtyTextures nor researchQueue depend on the dirty flags.

Each frame is correctly marked as dirty, each time FreeDirtyTextures() is called to remove old objects. My explanation of the bug is not the first mechanism being wrong, but FreeDirtyTextures() not performing the work it is expected to do. Again, consider what are possible reasons for FreeDirtyTextures() to return while there are thousands of items in the queue? I believe this is the key part.

UPDATE:
Regarding the thought experiment, here's the part that I claim does not describe reality:

On the receiver side, Out() consumes all existing tasks

What happens with the current implementation of FreeDirtyTextures() is that on the receiver side Out() consumes 17 out of 11070 existing tasks and happily return. And it is called at most 60 times per second by the draw thread.

@nullst
Copy link
Contributor Author

nullst commented Oct 17, 2021

If the main point you want to know is why FreeDirtyTextures() was working correctly for standard buffered channels, see the update in the beginning of the #2548 (comment) . I shouldn't have put the key information in the update at the first place, sorry.

@changkun
Copy link
Member

changkun commented Oct 17, 2021

Sorry for keeping query for more details. I just managed to find some time and manage to do more tests regarding the issue, and I found two possible workarounds:

  1. Increase the in and out buffer from 16 to 4096
- in:  make(chan {{.Type}}, 128),
- out: make(chan {{.Type}}, 128),
+ in:  make(chan {{.Type}}, 4096),
+ out: make(chan {{.Type}}, 4096),
  1. Or, remove IsDirty check from draw thread
    if closing || !canvas.IsDirty() || !visible {
- if closing || !canvas.IsDirty() || !visible {
+ if closing || !visible {

The above two changes confirm this inspection: a task is being sent to the unbounded channel, but SetDirty failed to notify the next frame to further consume the remaining tasks.

glitches are fixed slightly faster, but not significantly.

This actually confirms the issue relates to the SetDirty. Otherwise, the glitches won't disappear eventually.

With all the discussion in above, thanks for the great analysis and explaination. I'd like to conclude the issues are:

  1. SetDirty may failed to notify the next frame to keep consume all desired refresh task, hence the rendering failed to converge to a fixed point.
  2. The unbounded channel slightly changes the guarantee of a buffered channel where the consumer may not recieve all expected number of message within one frame (although they are expected to be received eventually, but since 1 exists, it may causes problematic rendering).

In general, providing a estimation on the unbounded channel size may be helpful, but the package is for general channel send and recieving purpose, which makes the proposed method not really fit the purpose of the package. Rather than touching that package, with the above concluded issue, a simple and sound fix can be done as following:

diff --git a/internal/driver/common/canvas.go b/internal/driver/common/canvas.go
index 5ff0dd47..0bf8527a 100644
--- a/internal/driver/common/canvas.go
+++ b/internal/driver/common/canvas.go
@@ -46,6 +46,7 @@ type Canvas struct {
        // the refreshQueue is an unbounded channel which is bale to cache
        // arbitrary number of fyne.CanvasObject for the rendering.
        refreshQueue *async.UnboundedCanvasObjectChan
+       refreshCount uint64
        dirty        uint32 // atomic
 
        mWindowHeadTree, contentTree, menuTree *renderCacheTree
@@ -198,6 +199,7 @@ func (c *Canvas) FreeDirtyTextures() bool {
        for {
                select {
                case object := <-c.refreshQueue.Out():
+                       atomic.AddUint64(&c.refreshCount, ^uint64(0))
                        freed = true
                        freeWalked := func(obj fyne.CanvasObject, _ fyne.Position, _ fyne.Pos
ition, _ fyne.Size) bool {
                                c.painter.Free(obj)
@@ -205,6 +207,10 @@ func (c *Canvas) FreeDirtyTextures() bool {
                        }
                        driver.WalkCompleteObjectTree(object, freeWalked, nil)
                default:
+                       if atomic.LoadUint64(&c.refreshCount) > 0 {
+                               continue
+                       }
+
                        cache.RangeExpiredTexturesFor(c.impl, func(obj fyne.CanvasObject) {
                                c.painter.Free(obj)
                        })
@@ -260,6 +266,7 @@ func (c *Canvas) Painter() gl.Painter {
 
 // Refresh refreshes a canvas object.
 func (c *Canvas) Refresh(obj fyne.CanvasObject) {
+       atomic.AddUint64(&c.refreshCount, 1)
        c.refreshQueue.In() <- obj // never block
        c.SetDirty(true)
 }

Do we share a consensus of this conclusion?

@nullst
Copy link
Contributor Author

nullst commented Oct 17, 2021

Do we share a consensus of this conclusion?

I don't think so. I think we are speaking past each other, or something like that. The last particular thing you suggest would solve the issue. But the first two suggestions you give in this comment are not relevant (and (2) would be very harmful for background CPU load and performance in general). To be blunt, I think the fact that you suggested them shows that you misdiagnose the issue.

My claims are:

  1. FreeDirtyTextures(), when called, may finish prematurely, after performing only a tiny amount of tasks waiting for it in the refreshQueue. I think you agree with that.
  2. When lots of Refresh events are generated, like during fast scrolling of a list, this not only may happen, but would happen literally every time. So every time FreeDirtyTextures() is called (at most 60 times per second) only a small portion of the refreshQueue would be handled. I again refer you to the "Diagnostic" section for the logging that confirms that. This is not a thought experiment, this is supported by the actual logs. Do you agree? I stress that this is the absolute key point of all my comments starting from Regression: too many Refresh() calls may now cause visual artifacts in the List widget #2548 (comment) . This happens really, really often, almost every time! If you do not agree with this, we should discuss this specific thing, nothing else matters.
  3. I claim that this is a good old logic bug in the implementation of FreeDirtyTextures, and this has nothing to do with memory models, data races, or the way isDirty flag is handled. Absolutely nothing to do with them.
  4. One stupid way to handle the problem in the point (2) is just to call FreeDirtyTextures more than 60 times per seconds. Just replace the call to FreeDirtyTextures in repaintWindow by a loop that calls FreeDirtyTextures 1000 times in a row every time. That would definitely help! Sure, just do 17 tasks of the queue each time you are called, we'll call you 1000 times then. Of course, this does not address the key problem, but that would help. Now, I claim that the changes you suggest above ("Increase the in and out buffer from 16 to 4096" and "remove IsDirty check from draw thread") are basically the same solution: do not fix the logic issue in FreeDirtyTextures(), just force it to work more and more. If you wish, I can explain this in details. But you can also confirm this yourself: just add the logging to FreeDirtyTextures that compares the number of tasks done and the number of tasks waiting in the refresh queue. If you scroll the list up and down quickly, you will see that none of those two modifications solves the core issue, they just slightly increase the numbers of tasks done. (I mean, if you make the buffer of size 4096, you'll need to scroll back and forth quickly and for a long time, but that's completely feasible to do by hand!) If you increase the numbers enough, the visual glitches would disappear quickly, but you may as well just call FreeDirtyTextures() 1000 times in a row each frame, that's the same approach.
  5. I claim that the phrase "The above two fixes confirm this inspection: a task is being sent to the unbounded channel, but SetDirty failed to notify the next frame to further consume the remaining tasks." absolutely does not follow from the data above. No, SetDirty is completely irrelevant. When a canvas was marked dirty, FreeDirtyTextures is called. It should have performed the work requested in the refreshQueue, but it does 17 out of 11000 tasks and returns as if everything is fine. Whether the next frame should be marked dirty is irrelevant, this is not the problem. If the next frame is marked dirty, then next time FreeDirtyTextures() could also do 17 tasks from 10983 remaining. If you do this 60 times per second, then sure, after 11 seconds the refreshQueue will finally be consumed, if no new refresh messages are being sent. Is this what you really want? I'm pretty sure it would be better to modify FreeDirtyTextures() to just do the scheduled work rather than calling it more often!

I'd like to conclude the issues are:

  1. SetDirty may failed to notify the next frame to keep consume all desired refresh task, hence the rendering failed to converge to a fixed point.
  2. The unbounded channel slightly changes the guarantee of a buffered channel where the consumer may not recieve all expected number of message within one frame (although they are expected to be received eventually, but since 1 exists, it may causes problematic rendering).

My response is as follows:

  1. Point 1 is not an issue. Or at least it's not the issue Regression: too many Refresh() calls may now cause visual artifacts in the List widget #2548 that we are discussing. This issue is about the fact that I could move the mouse over different list items, the hovering switching nicely, but the text was still distorted despite modified background. Logging from the "Diagnostic" section of my comment shows that if I just continue to move the mouse over the list items, the rendering will converge, but maybe after 30 seconds or so. I wouldn't open the issue if the problem was just with SetDirty flag: if moving the mouse to invalidate the canvas would be enough, I'll be happy, it's the fact that this takes >10 seconds that I'm not happy with.
  2. For point 2, "the consumer may not receive all expected number of message within one frame" is a misleading way to describe the problem. The technical issue is that a nonblocking receive from an unbounded channel will very often fail if many messages are being put into the channel concurrently. Not just "occasionally fail", not "fail randomly", it will fail almost every time as confirmed by the logs in the "Diagnostic" section of my comment above. Sorry for being so annoying about this, it just seems to me that you do not realize how often this problem happens. Really, all SetDirty issues are absolutely minor when compared to this. FreeDirtyTextures returns prematurely all the time, and that's the core reason for this issue. Otherwise marking all frames as dirty would be a solution, but it's definitely not, I checked this explicitly.

Rather than touching that package, with the above concluded issue, a simple and sound fix can be done as following:

Yes, this would help! Exactly! This is the core problem: FreeDirtyTextures() returns before it finished processing all the tasks waiting for it in the refreshQueue, and that modification would change that, thus fixing the issue. Calling continue in the default case of the select loop would turn FreeDirtyTextures() into a busy loop that constantly retries reading messages from the output channel. I'll state this explicitly, again: reaching the default case is not an anomaly, that's what typically happens in FreeDirtyTextures while the list is being scrolled! See the Diagnostic section of my comment above. So this is indeed a busy loop, but that's easy to fix, of course.

I'm not sure why you think the proposed method EstimateLength is not a reasonable thing for an unbounded channel to have, but if you insist, the diff you gave is also a solution for this issue.

@andydotxyz
Copy link
Member

Can I just add, if it helps, we should make sure that all refreshing is called as soon as possible (ideally within one frame) otherwise we get glitching frames or delayed animations/changes.
I think this is in line with what you’re describing @nullst.

@changkun
Copy link
Member

first two suggestions you give in this comment are not relevant

As being said, they are workarounds to track the issue, not actual suggested fix.

I'm not sure why you think the proposed method EstimateLength is not a reasonable thing for an unbounded channel to have,

Many reasons

  1. Adding stuff in unbounded channel may cause performance issue, the channel is a fundamental mechanism used by multiple parties. Especially, using a lock (even one more atomic operation) can cause delay of send-recieving. See a relevant benchmark in https://github.com/golang-design/chann/blob/cd193f60e671f3958ed551bc85b8667f28f99aa3/chann_test.go#L468-L509

  2. EstimateLength is an estimation on the unbounded channel side. But the suggested changes in Regression: too many Refresh() calls may now cause visual artifacts in the List widget #2548 (comment) offers more precise estimation, becase the refreshCount does not bother the in channel.

  3. Is there a reason to make the larger change (as your original proposed changes) instead of the simpler change to use a counter?

@nullst
Copy link
Contributor Author

nullst commented Oct 17, 2021

As being said, they are workarounds to track the issue, not actual suggested fix.

OK, sorry, I misunderstood you. I thought you offer them in addition to the atomic counter fix at the end of your comment.

Is there a reason to make the larger change (as your original proposed changes) instead of the simpler change to use a counter?

I guess there is no real reason now. That's essentially a manual implementation of EstimateLength which is easy to do here since there is exactly one producer and exactly one consumer. That may be better.

@changkun
Copy link
Member

If my previous comments made the discussion a little heated, then I must applogize. Actually, I am not trying to say that your proposed changes is not valid or not solving the problem. Instead, they are great for finding the actual issue.

The comments are mostly about understanding what is the real issue and how those changes might influence other parties. If the unbounded channel is only used in the canvas, it would be entirely acceptable for the fix. However, the unbounded channel has grow to be depended on many other components, such as mobile. Comparing the two possible fixes here, I'd probably say #2548 (comment) is more preferred from my view, (and of course, the actual credit is yours because they won't exist without your analysis)

@nullst
Copy link
Contributor Author

nullst commented Oct 17, 2021

Sorry for being obtuse. I agree that the solution #2548 (comment) looks more straightforward than what I did in #2555 , and works just as well. If you think EstimateLength() method is not going to be useful anywhere else, then we should stick with the simpler solution.

@andydotxyz
Copy link
Member

andydotxyz commented Oct 17, 2021

I'm OK with this, though I am nervous about the proposed removal of || !canvas.IsDirty() - but was that a proposal for an alternative fix?
The diff of a refreshCount on it's own looks like it would be a pretty safe change for a bug fix release :)

@changkun
Copy link
Member

but was that a proposal for an alternative fix?

Just clarification: Sorry for the confusion, that was not a suggested fix, it was for confirming the cause of the issue

@andydotxyz
Copy link
Member

andydotxyz commented Oct 17, 2021

Tested the patch (thanks @changkun ) and it does also fix the issue.

However I can't shake the feeling that the #2555 approach seems to give a more responsive result.
(I tested with fyneterm - but on re-testing I don't see any difference, so some random artefact maybe?)

@andydotxyz
Copy link
Member

I guess either fix works well. What do you think @changkun? Probably simpler is better - can you open PR tonight, or should I just land the patch directly?

@andydotxyz
Copy link
Member

On develop now, will merge to bugfix branch soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

4 participants