Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long, sometimes endless loading in message list #4033

Open
maltokyo opened this issue Apr 10, 2020 · 25 comments
Open

Long, sometimes endless loading in message list #4033

maltokyo opened this issue Apr 10, 2020 · 25 comments

Comments

@maltokyo
Copy link

When I login, from time to time (I would say one in five times I open the app) I get this "loading" screen. Sometimes it lasts for 5 seconds, sometimes for 30 seconds.

Restarting the app (force quit and reload) fixes it immediately 100% of the time, so I am sure there is nothing wrong on server side.

Video is attached, I could not upload the mp4 file directly, so it is zipped here.

Screenrecorder-2020-03-26-22-04-19-301.mp4.zip

@chrisbobbe
Copy link
Contributor

chrisbobbe commented Apr 10, 2020

Hi @maltokyo, thanks for the report!

Here's a screenshot from the video, to make it easier for people looking at this issue to see what you're seeing:

image

I don't have a clear diagnosis for this yet, but we know of one issue that could cause long wait times in hard-to-reproduce ways.

@gnprice or @ray-kraesig, I think #3841 is getting close to mergeable and is awaiting review. I'm not saying this is definitely the solution to this problem, but it's kind of hard to rule out except empirically.

@maltokyo
Copy link
Author

Great. Thank you. The other thing I can say is that recently when it happens on mobile app I've always had web version nearby. And it was always super responsive without issue while the mobile app was was behaving like this. Both on the same WiFi network. So again, I did everything I could to isolate it to be sure it's an issue with the app.

Any log I can attach?

@chrisbobbe
Copy link
Contributor

#3841 was merged today, so it should be out in the next beta, probably next week; you can watch for that to be released here in the Zulip developer community, but I'll post again here when that happens, if I remember. And, in any case, we'll keep this issue open until we're sure it's been fixed.

Or, if you'd like to test it sooner, you can build and run the app from the latest code on the master branch, following these instuctions.

@maltokyo
Copy link
Author

Thanks a lot @chrisbobbe ! So nice to hear responses like this and motivates people to submit issues more when they find. I'll try the beta next week or when available.

@chrisbobbe
Copy link
Contributor

chrisbobbe commented Apr 14, 2020

So, there was a beta release yesterday, the 13th, but it only had one cherry-picked fix, deemed critical because the problem was a total crash of the app on startup, and that release it didn't include the attempted fix for this issue. @gnprice, we'll plan to do a regular beta release with everything on master this week, I think, right?

@maltokyo
Copy link
Author

Hi @chrisbobbe & all
Here are some more videos. Strange things still happenning, it seems even more frequently now with the latest update. I updated two days ago.

  1. Here is more of the issue above:

https://www.youtube.com/watch?v=1RqD1X-9j1Y&feature=youtu.be

  1. I have additional issues also with latest couple of versions, including latest version. Sometimes I get spinning ball next to the post. When I go out of it and back in, it is posted, and spinning ball stays, and posts again! Here it is in action:

https://www.youtube.com/watch?v=rZMjHYoRFhE&feature=youtu.be

  1. And another issue, this I noticed first time only with the latest version, some jittering posts:

https://www.youtube.com/watch?v=Qvpjr9kSCmY&feature=youtu.be

These videos are "unlisted", and I will delete soon, please acknowledge that you have seen the problem(s) enough for me to bring them down.

@chrisbobbe
Copy link
Contributor

chrisbobbe commented Apr 15, 2020

Thanks for the additional reports, @maltokyo!

  1. One thing that would be very helpful to know for this issue, and I'm seeing that I didn't ask for this specifically, above, is if you're pretty confident that the loading screen just never goes away, or if it eventually goes away after 30 seconds or so. I know the effect on users is the same, if it causes enough frustration that you have to quit and relaunch the app to fix it, but I ask because it would a helpful clue for debugging, if it isn't fixed in the (hopefully!) next beta release. Could you leave it on that screen for 5 or 10 minutes and see if it's still there?
  2. This looks like it's under the umbrella of Sending outbox messages is fraught with issues #3881, specifically, Permanently unsendable messages are shown with spinners #3731. An "outbox message" is a message that you've just pressed "send" for, but the server hasn't acknowledged it yet. We show the message anyway, so the experience of sending messages feels quicker — this is a common pattern in apps that communicate with a server, called "optimistic UI" — but apparently we don't correctly handle failures to send.
  3. This looks like PM message keeps refreshing infinitely  #3869.

@maltokyo
Copy link
Author

Hi @chrisbobbe

Regarding #1 I've frequently left it for some minutes like that (not 5 mins for sure, but at least 1:30 - 2:00). For me it doesn't resolve itself and I just restart the app which fixes it 100% of the time. Next time it happens I'll leave it for 10min, see how it goes, and report back.

Does the app make any logs at all, that would be useful for you guys? I'm happy to send if you let me know how to access.

@chrisbobbe
Copy link
Contributor

chrisbobbe commented Apr 18, 2020

There's been another report of something that looks quite similar, also on Android, here. That report said it was happening to multiple other people, on multiple realms.

@maltokyo, thanks for the offer to include logs! The app does indeed produce logs, but we should be able to see any relevant ones ourselves, in Sentry. We'll let you know if not, though.

@gnprice, @ray-kraesig, do you know a way to filter our Sentry logs to see errors whose first occurrence is recent, and that only happen on Android?

@chrisbobbe
Copy link
Contributor

chrisbobbe commented Apr 18, 2020

Hmm, @gnprice, @ray-kraesig, take a look at this filtering/sorting in Sentry.

That's a summary of events that have occurred at least once in the last 14 days, where the first-seen date for that class of events (I know we have some open issues about (dis)aggregation, like #3864) is within the last three weeks, sorted by frequency.

The first four are for the error message Unable to parse color from string: #NaN, which we addressed as a P0 fix for a white screen crash this week in #3991. I guess it's plausible that this load-screen issue is also a symptom, and we weren't aware; if so, it would be fixed by upgrading to 26.27.150 (GitHub, CZO (chat.zulip.org)); I believe that's in production on iOS and Android.

However, there's an interesting result that's popping up, and is showing what may be an upward trend in occurrences; it certainly seems that way more than other event classes in these search results:

image
(The pink dot on the time axis, at left, is first seen; green, on the right, is last seen)

The error message (which, again, we haven't seen in Sentry before) is "RuntimeException: Probable deadlock detected due to WebView API being called on incorrect thread while the UI thread is blocked."

Looks like it was first seen on 26.25.148 (GitHub, CZO).

Just opened this as #4051.

@gnprice gnprice changed the title Android App has long (sometimes endless) load screen (video attached) Long, sometimes endless loading in message list Apr 18, 2020
@gnprice
Copy link
Member

gnprice commented Apr 30, 2020

@maltokyo , are you continuing to see this symptom?

I have a hypothesis that it might be tied to the Chrome version used for the WebView, in part because the timing (especially in the graph Chris posted just above) lines up with when a new Chrome release came out. If you are still seeing the issue, you might be able to help us pin that down.

Two quick questions first:

  • What Android version are you using?
  • What version of Chrome is installed on your device? (You can determine this by finding Chrome in the Play Store app.)

Then, if you're up for an experiment: I would be very interested in the results if you try opting into the "dev" channel of Chrome (currently Chrome 84) for your WebViews. Instructions here:
https://chromium.googlesource.com/chromium/src/+/HEAD/android_webview/docs/channels.md
and a description of what the channels mean is here:
https://support.google.com/chrome/a/answer/9027636?hl=en
It might be a bit fiddly to set up, depending how much you've poked at the guts of Android before; I'd be quite happy to help you through it over on chat.zulip.org, either in #mobile or in PMs. It's straightforward to change back whenever you want to stop the experiment.

@maltokyo
Copy link
Author

maltokyo commented May 1, 2020

Hi @gnprice
Amazingly (I'm not sure if there was a recent update?) I don't seem to experience this at the moment. Let me test a few things and see if I can see why. But for the past one or two weeks I do not remember it happening. I'm on the version updates 13th April.

@pohutukawa
Copy link

This issue also affects (all of?) our staff in the company. Both on Android as well as iOS. We do have some streams that load, but most are just (at least seemingly) loading forever. The ones that do load only have very few messages in them. The desktop and web clients are working perfectly.

Previous messages in this ticket are now weeks old already. Has there been any movement or the possibility to get/test a beta release?

@gnprice
Copy link
Member

gnprice commented Jun 12, 2020

@pohutukawa Thanks for the report! Please file it as its own issue so we can debug it properly -- it sounds like it's a different issue from this one, and I have some followup questions I'll be interested in asking to learn more about what you're seeing.

@maltokyo
Copy link
Author

maltokyo commented Jun 13, 2020 via email

@pohutukawa
Copy link

@gnprice Done in #4156.

@gnprice
Copy link
Member

gnprice commented Jun 16, 2020

@maltokyo Very interesting. Thanks for the update, and sorry the issue seems to be back.

I asked some questions at #4033 (comment) that may help us debug the issue. Now that you're seeing the issue again, I'd be very curious for the answers.

@maltokyo
Copy link
Author

Hi @gnprice Please find below, sorry for the delay.

Two quick questions first:

  • What Android version are you using?

MIUI 11, based on Android 10 (Xiaomi, latest updates)

  • What version of Chrome is installed on your device? (You can determine this by finding Chrome in the Play Store app.)

Version 83.0.4103.106 (updated on 15th June)

Then, if you're up for an experiment: I would be very interested in the results if you try opting into the "dev" channel of Chrome (currently Chrome 84) for your WebViews. Instructions here:
https://chromium.googlesource.com/chromium/src/+/HEAD/android_webview/docs/channels.md
and a description of what the channels mean is here:
https://support.google.com/chrome/a/answer/9027636?hl=en
It might be a bit fiddly to set up, depending how much you've poked at the guts of Android before;

I tried this, however, due to the fact that I have a "work profile" installed, it will not let me select the second line in this screenshot, any way around it?

webview

@gnprice
Copy link
Member

gnprice commented Jul 1, 2020

MIUI 11, based on Android 10 (Xiaomi, latest updates)

Cool -- so that's consistent with our guess that this symptom may correspond to the error #4051 we're seeing reported as an exception, because that error is only possible on Android 10+. (Though something like half our Android users are on Android 10 -- 38% when I last looked a couple of months ago -- so one data point isn't super strong confirmation by itself.)

Version 83.0.4103.106 (updated on 15th June)

That's helpful, thanks.

I tried this, however, due to the fact that I have a "work profile" installed, it will not let me select the second line in this screenshot, any way around it?

Hmm. I don't know very much about how a "work profile" works, but I would guess that this means that this is a device managed by your employer's IT department, and it's saying that changing the Chrome version to be used for webviews is something off-limits for you to do yourself -- that it can only be done by someone with those IT privileges. (Which is pretty reasonable; keeping everyone's configurations uniform helps keep down the complexity of debugging the IT staff may have to do.)

If you have a good relationship with someone in that department who might be up for helping you debug this issue you're seeing in Zulip, you might ask them to make this change for you. I would certainly be grateful to see the results.

If it helps, the version I'm most interested in -- Chrome 84 -- is now in the Beta channel, so you can use that instead of Dev. Slim chance that'll give any different result on this screen; but if you do ask someone in your IT department for help, they might feel more OK about a beta version than a dev version.

@gnprice
Copy link
Member

gnprice commented Jul 9, 2020

@maltokyo One other question for you (or pair of questions), which I should have asked sooner but I see I neglected to!

  • Are you seeing this on Zulip Cloud (zulip.com / zulipchat.com), or on a server hosted elsewhere (like within your company/organization)?
  • If the latter, what version of the Zulip server is it running? You can find out the exact version with a command like this one (just replace chat.zulip.org with the domain from your server's URL):
$ curl -s https://chat.zulip.org/api/v1/server_settings | jq -r .zulip_version
3.0-rc1-131-g5d91cfbbb7

@maltokyo
Copy link
Author

maltokyo commented Jul 9, 2020

Hi @gnprice

I only have tried with my own (debian 10, 32GB RAM, VPS) server using docker version of Zulip, but I can give it a go with the main zulip chat site as well.

Below is the outcome of running that command on my server (domain replaced).

# curl -s https://z.MYDOMAIN.com/api/v1/server_settings | jq -r .zulip_version
2.1.4

@gnprice
Copy link
Member

gnprice commented Jul 9, 2020

That's helpful information, thanks.

2.1.4

(Unrelated to this issue, I'd recommend you upgrade the server -- there was a security release last month.)

@gnprice
Copy link
Member

gnprice commented Apr 14, 2021

@maltokyo Have you been continuing to see this symptom?

We've been seeing in Sentry a much lower rate of the exception #4051 which looked like it might have been related, ever since around the time of the last few comments on this thread. So I'm hoping that you've also stopped seeing this, or at least that it's become rare.

@gnprice
Copy link
Member

gnprice commented Apr 14, 2021

We'd marked this as P1 last year because it's a bad symptom, and it looked like it might be related to #4051 which was an exception report we were getting from a lot of users so it seemed likely that a lot of users were seeing this.

Since then #4051 has become infrequent, and we haven't had any reports of this from additional users, so that no longer seems likely.

@maltokyo
Copy link
Author

maltokyo commented Apr 14, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants