Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrolling list bound to data programmatically causes nil pointer dereference #2549

Closed
thenick775 opened this issue Oct 14, 2021 · 13 comments
Closed
Labels
bug Something isn't working

Comments

@thenick775
Copy link

thenick775 commented Oct 14, 2021

Describe the bug:

When scrolling a list bound to data using the keyboard programmatically, this results in a nil pointer dereference. This is after the list has been bound to an existing slice of string.

Sometimes this works, but more often than not I run into this issue, since keyboard scrolling using the arrow keys did not seem to be supported.

I have seen this behavior in the middle of list navigation in this manner, as well as at the end and near the beginning. I also see this when using the mouse wheel to scroll.

To Reproduce:

Steps to reproduce the behaviour:

declare a list and populate it with data (50-100 elements)

use a function like the one included below to map the keydowns to a scroll, where the keyup function will just disable the bool used to keep the scroll going, this function is called from a goroutine.

Example code:

package main

import (
	"fyne.io/fyne/v2"
	"fyne.io/fyne/v2/app"
	"fyne.io/fyne/v2/container"
	"fyne.io/fyne/v2/data/binding"
	"fyne.io/fyne/v2/driver/desktop"
	"fyne.io/fyne/v2/widget"
	"time"
)

var list *widget.List
var ScrollStop bool
var slice []string
var count int

func Scroll(key fyne.KeyEvent, loc int) {
	time.Sleep(200 * time.Millisecond)
	for ScrollStop {
		time.Sleep(50 * time.Millisecond)
		switch key.Name {
		case fyne.KeyDown:
			loc += 1
			if loc > len(slice)-1 {
				ScrollStop = false
				break
			}
			list.Select(loc)
		case fyne.KeyUp:
			loc -= 1
			if loc < 0 {
				ScrollStop = false
				break
			}
			list.Select(loc)
		}
	}
}

func up(key *fyne.KeyEvent) {
	switch key.Name {
	case fyne.KeyDown:
		fallthrough
	case fyne.KeyUp:
		ScrollStop = false
	}
}

func down(key *fyne.KeyEvent) {
	switch key.Name {
	case fyne.KeyDown:
		list.Select(count + 1)
		ScrollStop = true
		go Scroll(*key, count+1)
	case fyne.KeyUp:
		list.Select(count - 1)
		ScrollStop = true
		go Scroll(*key, count-1)
	}
}

func selected(index int) {
	if index < 0 {
		index = 0
	} else if index > list.Length()-1 {
		index = list.Length() - 1
	}
	count = index
}

func main() {
	ScrollStop, count = false, 0

	slice = []string{}
	var data binding.ExternalStringList

	for i := 0; i < 100; i++ {
		slice = append(slice, "word")
	}

	a := app.NewWithID("com.vancise.listtest")
	w := a.NewWindow("List Test")

	data = binding.BindStringList(
		&slice,
	)

	list = widget.NewListWithData(data,
		func() fyne.CanvasObject {
			return widget.NewLabel("template")
		},
		func(i binding.DataItem, o fyne.CanvasObject) {
			o.(*widget.Label).Wrapping = fyne.TextTruncate
			o.(*widget.Label).Bind(i.(binding.String))
		})
	list.OnSelected = selected

	view := container.NewBorder(
		container.NewVBox(
			widget.NewLabel("List Test"),
			widget.NewSeparator()),
		nil,
		nil,
		nil,
		list)

	list.Select(0)

	if deskCanvas, ok := w.Canvas().(desktop.Canvas); ok {
		deskCanvas.SetOnKeyDown(down) //for monitoring navigation of the list in inquire mode
		deskCanvas.SetOnKeyUp(up)
	} else {
		panic("not desktop canvas")
	}

	w.Resize(fyne.NewSize(940, 660))
	w.SetContent(view)
	w.ShowAndRun()
}

Error Report

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x415fce2]

goroutine 15 [running]:
fyne.io/fyne/v2/data/binding.(*boundExternalStringListItem).Get(0x0)
	<autogenerated>:1 +0x22
fyne.io/fyne/v2/widget.(*Label).createListener.func1()
	/Users/nickvancise/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/widget/label.go:148 +0x37
fyne.io/fyne/v2/data/binding.(*listener).DataChanged(0x1)
	/Users/nickvancise/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/data/binding/binding.go:55 +0x1a
fyne.io/fyne/v2/data/binding.queueItem.func1.1()
	/Users/nickvancise/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/data/binding/queue.go:19 +0x42
created by fyne.io/fyne/v2/data/binding.queueItem.func1
	/Users/nickvancise/go/pkg/mod/fyne.io/fyne/v2@v2.1.0/data/binding/queue.go:17 +0x45
Saving session...
...copying shared history...
...saving history...truncating history files...
...completed.

[Process completed]

Device (please complete the following information):

  • OS: MacOS BigSur 11.6
  • go version go1.17 darwin/amd64
  • required fyne.io/fyne/v2 v2.1.0 in go mod file

Any help on debugging this problem is appreciated!

@thenick775 thenick775 added the unverified A bug that has been reported but not verified label Oct 14, 2021
@Jacalz
Copy link
Member

Jacalz commented Oct 14, 2021

Please provide a fully working code example instead of a short code snippet. We prefer to have minimal code examples that provide nothing more than a base of reproducing the issues but can be built and tested without having to guess how pieces should be put together.

@thenick775
Copy link
Author

Updated code section with full example

@Jacalz
Copy link
Member

Jacalz commented Oct 15, 2021

Thanks for updating the code example. It would however be good if you could remove all the unnecessary details and just trim it down the most minor code that reproduces the issue (i.e. just fyne code and the parts that make the issue happen, no extra features or external dependencies).

I know that it can be a bit tedious but it saves us immense amount of time for every issue because there is less unnecessary code to try and debug.

@thenick775
Copy link
Author

thenick775 commented Oct 15, 2021

What is unnecessary? I built this as the minimal example that reproduces the problem correctly

I have removed the word generator if that was causing you problems.

@Jacalz
Copy link
Member

Jacalz commented Oct 15, 2021

Thanks. I’m just saying that there is a lot of mutexes, goroutines and other stuff that makes it hard to debug (and a potential race condition could very well be the source of the issue in the first place).

@thenick775
Copy link
Author

Ahh I see, I had this setup w/o the mutexes but thought it better to include them since they dealt with the subset of shared variables.

This happens with/without the mutexes. There should only be 1 goroutine used for the scroll (how I found the error above) active at any time, otherwise there can't be programmatic keyboard scrolling as it wouldn't stop.

This error also happens when scrolling slowly through the list up/down, but originally I saw it using the keyboard to scroll!

@andydotxyz
Copy link
Member

What happens if you move the wrapping code line into the template function?
As it never changes it would be better to set that once.

@thenick775
Copy link
Author

@andydotxyz I was able to replicate the issue with the wrapping line moved to the label creation as well:

list = widget.NewListWithData(data,
		func() fyne.CanvasObject {
			tmp :=  widget.NewLabel("template")
			tmp.Wrapping = fyne.TextTruncate
			return tmp
		},
		func(i binding.DataItem, o fyne.CanvasObject) {
			//o.(*widget.Label).Wrapping = fyne.TextTruncate
			o.(*widget.Label).Bind(i.(binding.String))
		})

@nullst
Copy link
Contributor

nullst commented Oct 19, 2021

I think I have an idea about the bug leading to this issue. It's a concurrency bug. There is kind of a data race between Label.Bind() and Label.Unbind(). Since Bind() calls Unbind() to clear the previous binding, this is also a race between two concurrent Bind() calls.

Some crash investigation

The crash happens in the data listener of a widget.Label. It executes src := label.textSource and it turns out that src is a non-nil binding.String interface with underlying nil pointer (as in here), and the listener only checks if the interface is nil before using it.

This is not necessarily a bug since label.textSource is not an exported field, and all assignments to it in widget.Label code are carefully controlled: the value of textSource is either nil (see Label.Unbind()) or a user-provided data parameter in Label.Bind(). If a developer wants to pass an invalid data source to Bind() method, Label cannot reasonably defend against this.

However, in this situation the user is blameless. We can check this by modifying the "update list item" function. Though you'll also need to edit fyne source code to export the struct boundExternalStringListItem from a binding package. Using the code

list = widget.NewListWithData(data,
		func() fyne.CanvasObject {
			tmp :=  widget.NewLabel("template")
			tmp.Wrapping = fyne.TextTruncate
			return tmp
		},
		func(i binding.DataItem, o fyne.CanvasObject) {
			stringBinding := i.(binding.String)
			// CHECK THE VALIDITY OF THE BIND ARGUMENT
			if s, ok := stringBinding.(*binding.BoundExternalStringListItem); ok {
				if s == nil {
					log.Printf("Nil pointer dressed as a non-nil pointer!\n")
				}
			}
			o.(*widget.Label).Bind(i.(binding.String))
		})

With this modification I can still replicate the issue, though it takes much longer for crash to happen (I increased the list to 1000 elements and scrolled with up/down arrows for maybe 15-20 seconds). I think the difference just arises from the updateFunction becoming slower due to type assertions. Notably, the situation we test for never happens, we never pass an invalid pointer.

So if we trust the code above, we know that (1) we never provide an invalid value to Label.Bind(); (2) when the application crashes, the value of Label.textSource is invalid; (3) Label only sets textSource to nil or the user-provided value. How is this possible?

One possible explanation

This can theoretically happen due to the non-atomicity of interface assignments. The field textSource of interface type binding.String is modified in only two ways:

  1. textSource = nil in Unbind.
  2. textSource = data when user calls Bind(data).

Under the hood an interface value is just a pair of pointers. There are no atomicity guarantees about assignments of interfaces. (You can also disassemble the compiled code to see that the compiler implements textSource = data as a sequence of two assignments without any safeguards.)

Assume we are concurrently performing textSource = nil (in Unbind) and src = textSource (in the data listener in Label.createListener). Each of those assignment is a pair of operations. So they may occur in the following order:

  1. src = textSource copies the type information.
  2. textSource = nil sets type information to nil
  3. textSource = nil sets the data pointer to nil.
  4. src = textSource copies the data pointer, which is now nil.

The result is: src has type information that textSource had before evaluating textSource = nil, but the data pointer is now nil. This is exactly what we see.

I wasn't able to verify directly that this is what happens during the crash because I don't know how. Some intrusive attempts to log the internal structure of interface values cause go to generate more complicated code with various memory barriers and stuff, and then I cannot replicate the issue.

Possible fix

Making the access to textSource atomic could help if that's the only problem. I've tried to guard the access to textSource by a mutex, and this makes it impossible for me to replicate the bug. But mutex makes Bind() sufficiently slow that in the use case given by the minimal example I can feel the difference in responsiveness while scrolling a long list by holding a down arrow key.

Note: there is very clearly a data race happenning with several concurrent Bind()s, so if we want to get rid of data races this should be fixed even if it's not related to the bug described by the original reporter. Any good fix for data races would also protect against the non-atomicity, I think.

Counterarguments

The possible bug I described does not explain why the issue does not happen if I just scroll the list with my touchpad, even though I can do that much quicker than what happens when I hold the down arrow key.

Notably, my description does not depend on List.Select being called at all. I don't know, maybe it's not the selection that is the problem, but somehow scrollTo function invoked by Select causes updateFunc to be called very often, thus performing more Bind()s on labels than the manual scrolling...

Or maybe the only difference is that scrolling manually happens "on the main thread", while in the example code here scrolling happens on a separate goroutine. I don't know.

@thenick775
Copy link
Author

thenick775 commented Oct 19, 2021

I am able to reproduce with scrolling @nullst, I set up everything the same except without the keydown/up code. Scrolling the list slowly up and down is what throws that error for me.

Thank you for all the great input!

@andydotxyz
Copy link
Member

@thenick775 can you please test with the PR #2566 that @nullst opened

@thenick775
Copy link
Author

thenick775 commented Oct 27, 2021

@andydotxyz I was also unable to reproduce testing against develop after that pull request was merged, looks good here thank you!

@andydotxyz
Copy link
Member

Great, thanks for confirming

@andydotxyz andydotxyz added bug Something isn't working and removed unverified A bug that has been reported but not verified labels Oct 27, 2021
@andydotxyz andydotxyz added this to the Fixes (v2.1.x) milestone Oct 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants