New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qt 5.15: Incomplete support for sessions #5359
Comments
Reproducing in import qutebrowser.misc.sessions, PyQt5.QtCore
items = [qutebrowser.misc.sessions.TabHistoryItem(active=True, original_url=PyQt5.QtCore.QUrl('file:///home/florian/proj/qutebrowser/git/tests/end2end/data/numbers/1.txt'), title='1.txt', url=PyQt5.QtCore.QUrl('file:///home/florian/proj/qutebrowser/git/tests/end2end/data/numbers/1.txt'), user_data={'zoom': 1.0, 'scroll-pos': PyQt5.QtCore.QPoint()})]
tab = objreg.get('tab', tab=0, window=0, scope='tab')
tab.history.private_api.load_items(items) results in
|
Data we get from Qt 5.14 with
with Qt 5.15:
|
These changes to diff --git qutebrowser/browser/webengine/tabhistory.py qutebrowser/browser/webengine/tabhistory.py
index f630e8873..d77013f8a 100644
--- qutebrowser/browser/webengine/tabhistory.py
+++ qutebrowser/browser/webengine/tabhistory.py
@@ -33,7 +33,8 @@ from qutebrowser.utils import qtutils
# Qt 5.14 added version 4 which also serializes favicons:
# https://codereview.qt-project.org/c/qt/qtwebengine/+/279407
# However, we don't care about those, so let's keep it at 3.
-HISTORY_STREAM_VERSION = 3
+# FIXME
+HISTORY_STREAM_VERSION = 4
def _serialize_item(item, stream):
@@ -62,16 +63,21 @@ def _serialize_item(item, stream):
## toQt(entry->GetTitle());
# \x00\x00\x00\n\x001\x00.\x00t\x00x\x00t
- stream.writeQString(item.title)
+ # FIXME
+ stream.writeQString("")
## QByteArray(encodedPageState.data(), encodedPageState.size());
# \xff\xff\xff\xff
- qtutils.serialize_stream(stream, QByteArray())
+ # qtutils.serialize_stream(stream, QByteArray())
+ state = "00 00 01 E4 E0 01 00 00 1C 00 00 00 D8 01 00 00 18 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 68 00 00 00 02 00 00 00 60 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 01 00 00 00 00 00 00 00 00 00 00 02 00 00 00 00 01 00 00 00 00 00 00 5A 1D 57 E1 55 A3 05 00 5B 1D 57 E1 55 A3 05 00 40 01 00 00 00 00 00 00 58 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 9C 00 00 00 4A 00 00 00 66 00 69 00 6C 00 65 00 3A 00 2F 00 2F 00 2F 00 68 00 6F 00 6D 00 65 00 2F 00 66 00 6C 00 6F 00 72 00 69 00 61 00 6E 00 2F 00 70 00 72 00 6F 00 6A 00 2F 00 71 00 75 00 74 00 65 00 62 00 72 00 6F 00 77 00 73 00 65 00 72 00 2F 00 67 00 69 00 74 00 2F 00 74 00 65 00 73 00 74 00 73 00 2F 00 65 00 6E 00 64 00 32 00 65 00 6E 00 64 00 2F 00 64 00 61 00 74 00 61 00 2F 00 6E 00 75 00 6D 00 62 00 65 00 72 00 73 00 2F 00 31 00 2E 00 74 00 78 00 74 00 00 00 00 00 10 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 38 00 00 00 01 00 00 00 30 00 00 00 00 00 00 00 38 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00"
+ state_data = bytes.fromhex(state)
+ stream.writeRawData(state_data)
## static_cast<qint32>(entry->GetTransitionType());
# chromium/ui/base/page_transition_types.h
- # \x00\x00\x00\x00
- stream.writeInt32(0) # PAGE_TRANSITION_LINK
+ # \x02\x00\x00\x01
+ # PAGE_TRANSITION_LINK | PAGE_TRANSITION_FROM_ADDRESS_BAR
+ stream.writeInt32(0x02000000 | 1)
## entry->GetHasPostData();
# \x00
@@ -82,9 +88,9 @@ def _serialize_item(item, stream):
qtutils.serialize_stream(stream, QUrl())
## static_cast<qint32>(entry->GetReferrer().policy);
- # chromium/third_party/WebKit/public/platform/WebReferrerPolicy.h
- # \x00\x00\x00\x00
- stream.writeInt32(0) # WebReferrerPolicyAlways
+ # chromium/services/network/public/mojom/referrer_policy.mojom
+ # \x00\x00\x00\x02
+ stream.writeInt32(2) # kNoReferrerWhenDowngrade
## toQt(entry->GetOriginalRequestURL());
# \x00\x00\x00Jfile:///home/florian/proj/qutebrowser/git/tests/end2end/data/numbers/1.txt
@@ -92,7 +98,7 @@ def _serialize_item(item, stream):
## entry->GetIsOverridingUserAgent();
# \x00
- stream.writeBool(False)
+ stream.writeBool(True)
## static_cast<qint64>(entry->GetTimestamp().ToInternalValue());
# \x00\x00\x00\x00^\x97$\xe7
@@ -100,7 +106,15 @@ def _serialize_item(item, stream):
## entry->GetHttpStatusCode();
# \x00\x00\x00\xc8
- stream.writeInt(200)
+ # FIXME
+ stream.writeInt(0)
+
+ ## favicon
+ # \xff\xff\xff\xff
+ qtutils.serialize_stream(stream, QUrl())
+
+ with open('new-fake.bin', 'wb') as f:
+ f.write(bytes(stream.device().data()))
def serialize(items): The main missing bit is the page state, which apparently can't just be an invalid QByteArray anymore... Argh. Not really keen on reverse-engineering that... |
Analyzing the page state we got:
|
Note that the source says: // NOTE: If the version is -1, then the pickle contains only a URL string.
// See ReadPageState. which can be seen here: void ReadPageState(SerializeObject* obj, ExplodedPageState* state) {
obj->version = ReadInteger(obj);
if (obj->version == -1) {
GURL url = ReadGURL(obj);
// NOTE: GURL::possibly_invalid_spec() always returns valid UTF-8.
state->top.url_string = base::UTF8ToUTF16(url.possibly_invalid_spec());
return;
}
// ... So I hoped to be able to replicate this by doing: import struct
url = 'file:///home/florian/proj/qutebrowser/git/tests/end2end/data/numbers/1.txt'
encoded = url.encode('utf-16')[2:]
parts = []
parts.append(encoded)
parts.append(struct.pack('<I', len(url)))
parts.append(struct.pack('<i', -1))
parts.append(struct.pack('<I', len(b''.join(parts))))
parts.append(struct.pack('>I', len(b''.join(parts))))
state_data = b''.join(reversed(parts))
with open('new-fake-minus.bin', 'wb') as f:
f.write(state_data)
stream.writeRawData(state_data) Which results in:
Which doesn't look too bad, and kind of works - however, qutebrowser now gets stuck while loading the page instead... |
I did some digging, and I have a bad feeling about this. In static void deserializeNavigationHistory(QDataStream &input, int *currentIndex, std::vector<std::unique_ptr<content::NavigationEntry>> *entries, content::BrowserContext *browserContext)
{
[...]
for (int i = 0; i < count; ++i) {
[...]
input >> virtualUrl;
input >> title;
input >> pageState;
input >> transitionType;
input >> hasPostData;
input >> referrerUrl;
input >> referrerPolicy;
input >> originalRequestUrl;
input >> isOverridingUserAgent;
input >> timestamp;
input >> httpStatusCode;
[...]
entry->SetPageState(content::PageState::CreateFromEncodedData(std::string(pageState.data(), pageState.size())));
[...]
} The serialization looks mostly like what qutebrowser is doing (with the patch from above), the interesting line is the // static
PageState PageState::CreateFromEncodedData(const std::string& data) {
return PageState(data);
}
[...]
PageState::PageState(const std::string& data)
: data_(data) {
// TODO(darin): Enable this DCHECK once tests have been fixed up to not pass
// bogus encoded data to CreateFromEncodedData.
//DCHECK(IsValid());
} My guess is that the encoded page state that we pass to Qt never actually makes it through |
Indeed - but with Qt < 5.15 we got away with just passing no data at all for the page state. That doesn't work anymore. In the comment above, I tried to craft a page state which is as minimal as possible based on the code (using a So either:
Right now I can see three paths forward:
|
Looks like I might have not dug enough, deep down there are some calls to I guess option 2 would be a nice compromise between the effort to imitate Chromium's serialization and the capability to store sessions. Option 3 also means no transferring tab history between backends then (if anyone ever did that, so probably not a huge loss). |
Here's a quick dumper: import sys
from PyQt5.QtWebEngineWidgets import QWebEngineView
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QByteArray, QDataStream, QIODevice, QUrl
def on_load_finished():
data = QByteArray()
stream = QDataStream(data, QIODevice.ReadWrite)
assert stream.status() == QDataStream.Ok
stream << view.history()
assert stream.status() == QDataStream.Ok
stream.device().seek(0)
print(f'raw data: {bytes(data).hex()}\n\n')
version = stream.readInt()
print(f"version: {version}")
count = stream.readInt()
print(f"count: {count}")
current = stream.readInt()
print(f"current index: {current}")
for i in range(count):
print(f"\n---- entry {i} ----")
url = QUrl()
stream >> url
print(f"GetVirtualURL: {url}")
title = stream.readString()
print(f"title: {title}")
pagestate = QByteArray()
stream >> pagestate
print(f"pagestate: {bytes(pagestate).hex()}")
transition = stream.readInt32()
print(f"transition: {hex(transition)}")
has_post_data = stream.readBool()
print(f"has post data: {has_post_data}")
referrer = QUrl()
stream >> referrer
print(f"referrer: {referrer}")
referrer_policy = stream.readInt32()
print(f"referrer policy: {referrer_policy}")
original_request_url = QUrl()
stream >> original_request_url
print(f"original request url: {original_request_url}")
is_overriding_user_agent = stream.readBool()
print(f"is overriding user agent: {is_overriding_user_agent}")
time = stream.readInt64()
print(f"time: {time}")
http_status = stream.readInt()
print(f"http status: {http_status}")
if version >= 4:
favicon_url = QUrl()
stream >> favicon_url
print(f"favicon url: {favicon_url}")
assert stream.atEnd()
app.quit()
app = QApplication([])
view = QWebEngineView()
# view.show()
view.loadFinished.connect(on_load_finished)
view.load(QUrl.fromUserInput(sys.argv[1]))
app.exec_() With PyQt 5.7, we get: So that's version 0x17 == 23 of the data. It is a bit simpler, but not as much as I hoped (260 bytes instead of 484 bytes for the |
Not necessarily. We can still store the history as-is, the problem is that we won't be able to load it with Qt 5.15 and QtWebEngine (though I haven't checked what actually happens to the loaded pages with Qt 5.15 - maybe the history is intact, just the newest page is broken?) |
I've tried your version 23 dump, and I got a "URL cannot be found" message (which makes sense, given that it's hardcoded to your path). I tried modifying it to my own path here (modifying the dump and adjusting the length fields), and even wrote a version 23 serializer for the data, but the original issue persists - upon loading the session, the page does not get properly re-loaded. Maybe another option is to trigger a re-load of the page when the session is loaded, lazily once per tab)? For reference, slightly annotated version 23 dumper: import struct
url = item.url.toDisplayString()
encoded = url.encode('utf-16')[2:]
parts = []
# Version
parts.append(struct.pack('<i', 23))
# Referenced file array
parts.append(struct.pack('<I', 0))
# URL string
parts.append(struct.pack('<I', len(encoded)))
parts.append(encoded)
# Target
parts.append(struct.pack('<i', -1))
# Scroll offsets
parts.append(struct.pack('<I', 0))
parts.append(struct.pack('<I', 0))
# Referrer
parts.append(struct.pack('<i', -1))
# Document state
parts.append(struct.pack('<I', 0))
# Page scale factor (saved as length of double + double data)
parts.append(struct.pack('<I', 8))
parts.append(struct.pack('<d', 0.0))
# Item and document sequence number
parts.append(struct.pack('<Q', 0))
parts.append(struct.pack('<Q', 0))
# Referrer policy
parts.append(struct.pack('<I', 1))
# Visual viewport scroll offset (same as scale factor)
parts.append(struct.pack('<I', 8))
parts.append(struct.pack('<d', 0.0))
parts.append(struct.pack('<I', 8))
parts.append(struct.pack('<d', 0.0))
# Scroll restoration type
parts.append(struct.pack('<I', 0))
# Has state object
parts.append(struct.pack('<I', 0))
# False for HTTP body
parts.append(struct.pack('<I', 0))
# HTTP content type
parts.append(struct.pack('<i', -1))
# Subitems
parts.append(struct.pack('<I', 0))
parts.insert(0, struct.pack('<I', len(b''.join(parts))))
state_data = b''.join(parts)
with open('new-fake-minus.bin', 'wb') as f:
f.write(state_data)
qtutils.serialize_stream(stream, QByteArray(state_data)) |
Meh... But if you dump your own version 28 dump (or rewrite mine above) and load that, that works? If so, there's probably still hope we can get some minimal valid data for version 28 (or find out what part of the data causes it to load infinitely vs. not do so) |
If I dump my own version 28 dump it loads correctly (to the URL that I dumped it with), but I didn't manage yet to make a modified one with a custom URL. After version 26 the serialization switched to some Mojo-based serialization, which shuffles some things around and I can't understand the format yet. |
Argh, that mojo stuff sounds like a pain. Probably need to find a way out which doesn't rely on reverse-engineering the format then... |
I've been thinking about this some more, and I think it's unrealistic to reverse-engineer the format. Even if we managed to do so, there are various problems:
So there aren't many options left, with all of them being a dead end, as far as we know:
Instead, I'd propose we use this as an opportunity to introduce a new session format, which saves/restores the page state 1:1. While we're at it, we can also switch from YAML to JSON for session files, to (partially) move away from it - that's something I wanted to do anyways due to performance/maintenance issues, see #2777. What that would mean:
Benefits:
Downsides:
Open questions:
|
I would prefer it but if we could load a session on 5.14 and save it in the new format then under 5.15 it would all be loaded from the pagestate that would be good. |
Another curiosity: I've played around with version 23 a bit more and noticed that it does work, but only for In general though I agree that we shouldn't chase Chromium's internal serialization. They've said themselves that they don't want to stabilize the binary format, and there seems to be barely any documentation. |
That would mean keeping (at least part of) the current hacky serialization-faking code around, at least for a while. I'll see how I feel about that when actually implementing the rest, but I guess that's feasible. I'd probably ignore the less critical information (scroll position and zoom) at least. |
Something I forgot when saying this above:
is that a pagestate exists for every history entry, not only once. So I can see three ways to store it:
I'm probably going to go with 1., though it'd be really nice to know how big the data can get for more complex pages/navigations (perhaps with form inputs, file uploads, etc.). @toofar @Kingdread have you looked at the size of the page state data with real website requests/history so far? |
Nope, I haven't looked into this at all apart from following this issue. Is there a way we can dump them form the debug console? Regarding storing them, we could use the zipfile module too? As I understand it we would only want to read/write all the history items for a tab at once so putting them in separate files sounds less than ideal. I suppose it comes back to the size of them whether you go with base64 json or something binary or zipped because if they are large the base64 one will be a bit more IO. |
I don't have the exact numbers available at the moment, but they were also a few hundred bytes in size (like the one you dumped). I didn't use very "complex" sites though, so not a lot of inputs, frames, ... and I didn't have a long history - not sure how all of that affects the total size in the end. If binary serialization would be acceptable (i.e. no human-readable JSON/YAML), what about other formats for serializing binary data? Something like BSON, MessagePack, Protobuf, ... It might add less overhead than base64-in-json, but the files would be even less readable than JSON-with-binary-blobs. I guess it'd be worthwile checking how much data would actually need to be saved with a realistic history, and how much overhead the "simple" solution would produce. |
Something like this maybe: from qutebrowser.utils import qtutils
tab = objreg.get('tab', scope='tab', tab=0, window=0)
hist = tab.history._history
data = qtutils.serialize(hist)
print(len(data), len(hist)) That's the entire data though, not just the page state. Here it's some 7kb in total for a page with two history entries - so when only looking at the pagestate, that's probably a couple 100 bytes, which I guess is fine to have in base64. I'm mainly wondering if it can grow into kilo-/megabytes for a single entry. Unfortunately, I think it can: Here on GitHub, I now get 115 kB total, increasing as I'm typing this message...
I like that idea. That'd mean a
Might as well just use the sqlite would have a couple of advantages: No new dependencies (same for QDataStream), and still easily read-/writeable with a tool most people likely already have installed (the Right now I'm leaning towards the zipfile idea. Still means two files for every session, but that seems okay to me. |
Right, I forgot about the built-in Qt serialization. To me, the Edit: Theoretically, we could even pack the JSON into the .zip, that way we have a single, self-contained file with all of the benefits. |
Good point. I'm mainly worried about people who use session files as a "hack" to get a list of all currently open tabs/windows - IIRC, some people use something like that to have a "window switcher" which includes qutebrowser tabs and emacs buffers and such. With this change, they'll need to adjust their code anyways - and I guess some additional I'm pretty convinced that's the way to go forward, so unless there are any objections, I'm going to try implementing that as soon as I find the time. |
Also don't show it for new users - this doesn't really help much if someone just started using qutebrowser. See #5359
This has been dragging along way longer than I'd like, but I'm afraid I'll need to delay this once again. I'd like to get v2.0.0 into the next Debian Stable, but for that to happen it needs to be released ASAP. I've played with the minimal required format in #5359 (comment) and #5359 (comment) again in the hope that something changed with Qt 5.15.2 (and the enclosed Chromium upgrade), but unfortunately that wasn't the case. It looks like there's really no way around having a new session format... 😞 Unfortunately the new session format is a major undertaking and also impacts some old PRs which I'd like to look at first, and I just haven't found the time to do this so far. It's still quite high on my roadmap though, and I hope to finally get back soon after v2.0.0 is released. |
This means sessions need to be initialized after websettings, because initializing websettings also initializes QtWebEngine and thus qutescheme. This needs to happen before sessions.init() calls version.webengine_versions(). I don't think this should be a problem, as they are independent to each other. Fixes #5738 See #5359 Also switches sessions.init() to pathlib, see #176.
I feel bad about having to push this back yet again, but it's time for a v2.1 since various fixes were needed for newer QtWebEngine/PyQt versions. I still think it'd be bad to hold back new releases because this hasn't been fixed yet. I hope with QtWebEngine 5.15.3 being the last 5.15 release with a Chromium update (rather than just backporting security fixes), Qt/PyQt should stop throwing me curve-balls needing time for a while (until Qt 6.2 is ready of course). |
With Qt 5.15 the underlying chromium version switched to using a new page serialization format. Additionally the deserialization is much more fragile and we don't have enough of an API from webengine to pull out enough of the necessary page attributes to construct something that we can deserialize into a new page that works. So now we attempt to dump the whole page state along with the session. This should be backwards compatible so if you save a session with this version of qutebrowser on 5.14 and then load it on 5.15 you should still get your session history. I have no idea how fragile the parsing is. TODO: cleanup abstractisize (make work for webkit) test on other versions and document weirdnesses add version number to session file and save backups on loading older ones? ref qutebrowser#5359
I've tested the commit 8891ce9 and it works fine for web engine. I haven't tested it for webkit. |
So, I've been using this commit (8891ce9) since a few days and it looks promissing. The only downside seems to be that it makes the session file quite large. I have a session with 25 tabs opened and it is already almost 400K. I don't know if this is going to become an issue. It seems that with every history item the session file grows. |
Given that some people already see quite severe performance issues with PyYAML (#2777) and that saving binary data as base64 in it has a bit of overhead too, I think that will turn into a problem indeed... That's why above I proposed to turn sessions into (potentially uncompressed) zip files, so that we can store the binary data as binary, and in the long run also do stuff like storing multiple historical versions of a session. Unfortunately, I never got around to actually implementing it so far... |
As a data point (I agree doing it like this is not a good idea for widespread use, not everyone has my luxury in hardware) my session file with the page state in base64 is 5.3M (68k lines) and it usually takes 0.5 - 0.7 of a second to save to an SSD (and with the C yaml extensions compiled of course!). I've moved the yaml dumping to a thread, I'm not sure if that helps but I don't experience any pauses from it. |
Is there a commit for this? |
@cosminadrianpopescu patch? (click to expand)
|
Wow, I couldn't imagine this issue was such a big thing. Is there some workaround rn to have your tabs history restored after loading the session? As I understand, the binary files for each session (in format |
@Mrestof nope, no workaround sorry. There has been surprisingly little pushback against the breakage so it kinda dropped down the priority list. Go give the top post a 👍 while you are here. |
Would it be possible to introduce an incomplete version of Even if admittedly very incomplete, that sounds very easy to implement, would not need saving anything new, or doing much really besides interpreting the already existing session file a bit differently (switching up the links to the facade page) and making the facade page, and would be enough for some tab hoarders like me. Of course it would need to be under a separate setting now that some people have |
@Architector4 Probably! The current way lazy restore works always has been quite a hack. I suppose it would be possible for it to just open the given URL in JS instead of going back in the history, yeah. I don't think it needs to be a separate setting. Restoring the full session data (including history) is broken by the workaround currently anyways... |
Save/restore is bad enough here (just with quit, not even kill) locally that I've been forced to go back to my old browser alas. I have lost too many tabs / too many sessions and can no longer rely on it for my work. Shame as I really enjoy the qutebrowser UI. |
Maybe there is a way to hack around this. Like tabs are stored in a file, and then that file is referenced when a window is re-opened. |
Seems like it could be done in the config, if you create a custom keybind. |
With Qt 5.15, when sessions are loaded, only
about:blank
is displayed. This is due to how the reverse-engineered binary format somehow changed inside Chromium...(split off from #5237)
Update from October 2020:
qute://warning/sessions
) and only opens the first page of a tab's history (rather than trying to load the full history and displayingabout:blank
).sessions.lazy_restore
is disabled with Qt 5.15 as well, so the page gets loaded rather than only loading the "suspended page" page, with no way to go back to the real page.The text was updated successfully, but these errors were encountered: