New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression: too many Refresh() calls may now cause visual artifacts in the List widget #2548
Comments
I don't think a113caf was the cause. Instead, it is likely that the problem might exist longer than that commit since it only improves performance and misuse cases. However, it may be a problem that relates to #2509 that widgets have several internal data race issues.
More importantly, I also observed the artifact on 2.1.0 without the changes from the mentioned commit. |
Can you please test the tip of |
Interesting! Originally, I wasn't able to replicate the issue with 2.1.0 on my machine, but your screenshot helped: if I enlarge the window vertically to fit more lines into the visible part of the list, then I can get the same issues on 2.1.0. Though on v2.0.4 I seem to have no issues regardless of the list size, so this is still a regression. A random observation: it seems that replacing package main
import (
"fmt"
"log"
"fyne.io/fyne/v2"
"fyne.io/fyne/v2/app"
"fyne.io/fyne/v2/canvas"
"fyne.io/fyne/v2/theme"
"fyne.io/fyne/v2/widget"
)
func main() {
app := app.New()
w := app.NewWindow("List with many lines")
recurringNames := []string{"aaaaaaaaa", "bb", "NNNNNNNNNNNNNNN"}
useLabel := true
list := widget.NewList(
func() int {
return 1000
},
func() fyne.CanvasObject {
if useLabel {
return widget.NewLabel("")
}
return canvas.NewText("", theme.ForegroundColor())
},
func(id widget.ListItemID, o fyne.CanvasObject) {
if useLabel {
o.(*widget.Label).Text = fmt.Sprintf("%s %d", recurringNames[id%3], id)
} else {
o.(*canvas.Text).Text = fmt.Sprintf("%s %d", recurringNames[id%3], id)
}
o.Refresh()
log.Printf("Called Refresh() on item ID %d\n", id)
},
)
w.SetContent(list)
w.Resize(fyne.NewSize(float32(500), float32(300)))
w.ShowAndRun()
} |
An observation, maybe of value, maybe obvious: the visual glitches disappear for me after disabling the texture cache in fyne. By that I mean, if I edit the function func (p *glPainter) getTexture(object fyne.CanvasObject, creator func(canvasObject fyne.CanvasObject) Texture) Texture {
texture, ok := cache.GetTexture(object)
if !ok || ok { // MODIFIED CONDITION TO BE ALWAYS TRUE
texture = cache.TextureType(creator(object))
cache.SetTexture(object, texture, p.canvas)
}
return Texture(texture)
} Then the glitches completely disappear, both the temporary ones during scrolling and the "persistent" ones that remain at the end of the scroll. I guess texture cache is a natural origin for problems as described, so maybe it was obvious, but perhaps this could be helpful in figuring out the source of the problem. |
Yes, this does look like a texture re-use issue - they are not invalidating at the right time perhaps. |
Looks like it could be a duplicate of #2529 - setting |
I've done a bunch of experiments, and it seems that I found the source of this particular issue:
DiagnosticThis can be confirmed by adding some logging in
(If you casually move the cursor over the list, hovering over different elements, the amount of unprocessed refresh events will eventually decrease, but very slowly.) Logic mistakeHere's a sketch of func (c *Canvas) FreeDirtyTextures() bool {
for {
select {
case object := <-c.refreshQueue.Out():
...
default:
...
return
}
}
} It seems like the intent behind this code was to consume all messages from the Namely, the Since scrolling the list produces a storm of Refresh() events, this causes Possible fixDuring testing I introduced a new exported method, // FreeDirtyTextures frees dirty textures.
func (c *Canvas) FreeDirtyTextures() bool {
estimate := c.refreshQueue.Estimate()
// Since FreeDirtyTextures is the only consumer of refreshQueue,
// consuming this many elements from the refreshQueue will not block.
for i := 0; i < estimate; i++ {
object := <-c.refreshQueue.Out()
freeWalked := func(obj fyne.CanvasObject, _ fyne.Position, _ fyne.Position, _ fyne.Size) bool {
c.painter.Free(obj)
return false
}
driver.WalkCompleteObjectTree(object, freeWalked, nil)
}
cache.RangeExpiredTexturesFor(c.impl, func(obj fyne.CanvasObject) {
c.painter.Free(obj)
})
return (estimate > 0)
} On my machine this modified version of Implementation of refreshQueue.Estimate()Disclaimer: I know that With that notice in mind, here's a diff that introduces
|
This is excellent work thanks @nullst. |
It is an interesting analysis, but I am not sure about all details here:
If we have one side of the refresh queue is producing objects to refresh, and the other side is consuming it. What does it mean by "all messages"? We don't know the future, thus "all messages" seem to be less accurate in this case.
This is not entirely true because the in and out channels are selected in a select statement, which is selected under the fairness guarantee of the select statement: fyne/internal/async/chan_canvasobject.go Lines 61 to 65 in 087d79f
Indeed the FreeDirtyTextures may return early, but if the queue is going to be consumed eventually, why calling FreeDirtyTextures in the next frame can still leads to artifacts? -- Regarding the proposed fix, estimating the current length of the queue seems a little bit overkill using a default estimation number (10) and a Mutex lock. Using atomics is good engouh to get an estimated length of the queue buffer, see package chann as an example implementation. Nevertheless, after rethinking the unbounded channel implementation, I suspect the actual reason may be current implementation slightly changes the channel's memory model behavior, because the in and out channels are buffered channel: fyne/internal/async/chan_canvasobject.go Lines 20 to 21 in 087d79f
Hence a real fix (although I have don't know and not yet tested if it may leads to other issues), might as simple as turning them into unbuffered channel, or just turning the out channel to an unbounded channel. |
I'll try to defend my interpretation. There are three participants: processing loop of an unbounded channel, the function
So a random moment during the scrolling of the list looks kind of like a bunch of parallel Refresh()es all trying to send the messages to the in-channel of Having a buffered out-channel actually helps: in the log fragment in "Diagnostic" session of the post above we see
Not only FreeDirtyTextures may return early sometimes, but in situations like "scrolling a list quickly" it will return early basically every frame for a long time (can easily happen for >30 seconds), as logged in the "Diagnostic" part of the comment above. And that often happens even if there is no list of widgets: FreeDirtyTextures is executed only on a redraw, which is only done when canvas is marked as dirty, which usually happens because a Refresh message is sent, but in practice as soon as a fyne application sends one refresh event, it also sends lots of other refresh messages immediately after it, and each of the follow-up messages has a chance to stop FreeDirtyTextures.
Makes sense, atomics are definitely better here, I just used the most straightforward implementation with mutexes. |
@nullst Nice work :). I can confirm that your patch does indeed fix the problem. I was also experiencing this problem with widget.Table in my application. The problem would occur when scrolling up and down quickly, and similarly to in your report, data intended for cells would appear in the wrong cell, sometimes larger than intended. As you also noted, the problem only occurred when using the 'develop' branch, and not when using the 'master' branch. Applying your fix has gotten rid of this effect. |
Although for me 2.1 (master) is problematic for me, if develop is problematic but ok with master branch. Considering the latest change of unbounded channel mentioned by @nullst , does the following change offer a workaround as a fix? -in: make(chan fyne.CanvasObject, 16),
-out: make(chan fyne.CanvasObject, 16),
+in: make(chan fyne.CanvasObject, 128),
+out: make(chan fyne.CanvasObject, 128), |
For me this does not fix the problem: it takes slightly longer to reach it, and the visual glitches are fixed faster, but if I enlarge the list vertically and scroll back and forth quickly, it does not take a lot of time to reach messages like
Fyne seems to call Refresh() a lot. |
The data provided here is much higher than what was introduced before the unbounded channel, the original size is 4098, see fyne/internal/driver/glfw/canvas.go Line 328 in c885793
As being suspected before, if the previous FreeDirtyTexture is relying on a behavior of the memory model, and the unbounded channel slightly changes it. Instead of changing two parties, is guarantee the unbounded channel matches how a buffered channel behaved a better fix in this case? |
UPDATE: I think I know how to answer the question more concisely. For unbounded channels as implemented:
When an unbounded channel is not congested, nonblocking receives behave confusingly similar to nonblocking receives from a buffered channel. But if there are, say, 500 goroutines desperately trying to send messages via the unbounded channel, a concurrent nonblocking receive may, and in practice will, fail. The select statement in the As for modifying the unbounded channels to mimic the behavior of buffered channels: if that's easy to do, sure, that would be fine. But if this costs a lot of performance, maybe documenting the unusual behavior of nonblocking receives in the Out() function is good enough? A nonblocking receive is a rare operation, even FreeDirtyTextures does not really need it, it just wants to execute all tasks from the
Sorry, what do you mean by a memory model? I think I can explain why there was no problem before the unbounded channel was introduced. My impression of the situation is as follows:
The log fragments I posted seem to indicate that the infinite select loop in FreeDirtyTextures stops prematurely, while the internal buffer of |
For memory models, see https://golang.org/ref/mem.
That's an interesting observation. Let's do a thought experiment. If we have 10000 Refresh tasks, before the unbounded queue, we can only fill 4096 tasks, then the next Refresh will be blocked. On the receiver side, Out() consumes all existing tasks, which makes the channel becomes empty. Then terminate the loop in the default case. Since the channel has been cleared previously, the blocked Refresh call can proceed, then SetDirty will guarantee a future frame to keep work on the remaining Refresh tasks. Now, let's back to the unbounded channel, but the previously described case remains applicable to me.
According to the previous comments, if I understand the description correctly, that is describing: It can happen, if and only if, a task is being sent to the unbounded channel, but SetDirty failed to notify the next frame to further consume remaining tasks. If true, then this sounds like a subtle bug in the SetDirty notification mechanism, where SetDirty is an atomic variable which cannot always collaborate with the refreshQueue itself, rather than a problem of the unbounded channel. Does the suggested change fixes this problem? |
No, that's not what I'm saying. First, let me argue that SetDirty is irrelevant:
Each frame is correctly marked as dirty, each time FreeDirtyTextures() is called to remove old objects. My explanation of the bug is not the first mechanism being wrong, but FreeDirtyTextures() not performing the work it is expected to do. Again, consider what are possible reasons for FreeDirtyTextures() to return while there are thousands of items in the queue? I believe this is the key part. UPDATE:
What happens with the current implementation of FreeDirtyTextures() is that on the receiver side Out() consumes 17 out of 11070 existing tasks and happily return. And it is called at most 60 times per second by the draw thread. |
If the main point you want to know is why FreeDirtyTextures() was working correctly for standard buffered channels, see the update in the beginning of the #2548 (comment) . I shouldn't have put the key information in the update at the first place, sorry. |
Sorry for keeping query for more details. I just managed to find some time and manage to do more tests regarding the issue, and I found two possible workarounds:
- in: make(chan {{.Type}}, 128),
- out: make(chan {{.Type}}, 128),
+ in: make(chan {{.Type}}, 4096),
+ out: make(chan {{.Type}}, 4096),
- if closing || !canvas.IsDirty() || !visible {
+ if closing || !visible { The above two changes confirm this inspection: a task is being sent to the unbounded channel, but SetDirty failed to notify the next frame to further consume the remaining tasks.
This actually confirms the issue relates to the SetDirty. Otherwise, the glitches won't disappear eventually. With all the discussion in above, thanks for the great analysis and explaination. I'd like to conclude the issues are:
In general, providing a estimation on the unbounded channel size may be helpful, but the package is for general channel send and recieving purpose, which makes the proposed method not really fit the purpose of the package. Rather than touching that package, with the above concluded issue, a simple and sound fix can be done as following: diff --git a/internal/driver/common/canvas.go b/internal/driver/common/canvas.go
index 5ff0dd47..0bf8527a 100644
--- a/internal/driver/common/canvas.go
+++ b/internal/driver/common/canvas.go
@@ -46,6 +46,7 @@ type Canvas struct {
// the refreshQueue is an unbounded channel which is bale to cache
// arbitrary number of fyne.CanvasObject for the rendering.
refreshQueue *async.UnboundedCanvasObjectChan
+ refreshCount uint64
dirty uint32 // atomic
mWindowHeadTree, contentTree, menuTree *renderCacheTree
@@ -198,6 +199,7 @@ func (c *Canvas) FreeDirtyTextures() bool {
for {
select {
case object := <-c.refreshQueue.Out():
+ atomic.AddUint64(&c.refreshCount, ^uint64(0))
freed = true
freeWalked := func(obj fyne.CanvasObject, _ fyne.Position, _ fyne.Pos
ition, _ fyne.Size) bool {
c.painter.Free(obj)
@@ -205,6 +207,10 @@ func (c *Canvas) FreeDirtyTextures() bool {
}
driver.WalkCompleteObjectTree(object, freeWalked, nil)
default:
+ if atomic.LoadUint64(&c.refreshCount) > 0 {
+ continue
+ }
+
cache.RangeExpiredTexturesFor(c.impl, func(obj fyne.CanvasObject) {
c.painter.Free(obj)
})
@@ -260,6 +266,7 @@ func (c *Canvas) Painter() gl.Painter {
// Refresh refreshes a canvas object.
func (c *Canvas) Refresh(obj fyne.CanvasObject) {
+ atomic.AddUint64(&c.refreshCount, 1)
c.refreshQueue.In() <- obj // never block
c.SetDirty(true)
} Do we share a consensus of this conclusion? |
I don't think so. I think we are speaking past each other, or something like that. The last particular thing you suggest would solve the issue. But the first two suggestions you give in this comment are not relevant (and (2) would be very harmful for background CPU load and performance in general). To be blunt, I think the fact that you suggested them shows that you misdiagnose the issue. My claims are:
My response is as follows:
Yes, this would help! Exactly! This is the core problem: FreeDirtyTextures() returns before it finished processing all the tasks waiting for it in the refreshQueue, and that modification would change that, thus fixing the issue. Calling I'm not sure why you think the proposed method EstimateLength is not a reasonable thing for an unbounded channel to have, but if you insist, the diff you gave is also a solution for this issue. |
Can I just add, if it helps, we should make sure that all refreshing is called as soon as possible (ideally within one frame) otherwise we get glitching frames or delayed animations/changes. |
As being said, they are workarounds to track the issue, not actual suggested fix.
Many reasons
|
OK, sorry, I misunderstood you. I thought you offer them in addition to the atomic counter fix at the end of your comment.
I guess there is no real reason now. That's essentially a manual implementation of EstimateLength which is easy to do here since there is exactly one producer and exactly one consumer. That may be better. |
If my previous comments made the discussion a little heated, then I must applogize. Actually, I am not trying to say that your proposed changes is not valid or not solving the problem. Instead, they are great for finding the actual issue. The comments are mostly about understanding what is the real issue and how those changes might influence other parties. If the unbounded channel is only used in the canvas, it would be entirely acceptable for the fix. However, the unbounded channel has grow to be depended on many other components, such as mobile. Comparing the two possible fixes here, I'd probably say #2548 (comment) is more preferred from my view, (and of course, the actual credit is yours because they won't exist without your analysis) |
Sorry for being obtuse. I agree that the solution #2548 (comment) looks more straightforward than what I did in #2555 , and works just as well. If you think EstimateLength() method is not going to be useful anywhere else, then we should stick with the simpler solution. |
I'm OK with this, though I am nervous about the proposed removal of |
Just clarification: Sorry for the confusion, that was not a suggested fix, it was for confirming the cause of the issue |
I guess either fix works well. What do you think @changkun? Probably simpler is better - can you open PR tonight, or should I just land the patch directly? |
On |
Describe the bug:
Consider a List widget with 1000
canvas.Text
s, withupdateFunc
callback set up to update the text and call Refresh() on each text element. Scrolling this list quickly (for example, using a touchpad with large scroll acceleration) sometimes results in visual artifacts that are not resolved on their own any time soon. By that I mean, waiting several seconds does not fix them, and sometimes even moving the mouse over the list (which highlights items on hover) does not refresh the text to the correct state even though the background is being updated.This is a regression: using the same example with fyne v2.0.4 (specifically, 2d4915c ) or
with a recent commit c91d36c(UPD: happens on a recent commit as well if I enlarge the window vertically) does not lead to persistent visual artifacts. In those versions I can glimpse very similar distorted texts while scrolling very fast, but only momentarily, they are fixed the moment I stop scrolling.Surprisingly, sometimes the visual glitch is fixed (as in: one moment you see a glitch, the other moment you see the correct text) without performing any invocations of
updateFunc
between those two moments. You can check this by uncommenting a logging line in the example code and watching the output in the terminal window. So perhaps this has something to do with visual cache.I've attached the example code and a screenshot showing visual glitches on that example.
To Reproduce:
Steps to reproduce the behaviour:
Screenshots:
Example code:
Device (please complete the following information):
The text was updated successfully, but these errors were encountered: