-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor selections with non-overlapping data structure #2757
Refactor selections with non-overlapping data structure #2757
Conversation
code quality is good, I haven't look at the actual functionality tho, it would be good to actually explain what you did, and how the algorithm changed, you also cannot get rid of priority. |
also if you're already addessing select/deselect behavior it would be good if you included what #2115 tries to accomplish, not exactly its code, but it's functionality, except that also didn't include priorities once these 2 comments are addressed I'll review, test and potentially merge |
Explanation of my changes:To make the _selections array non-overlapping, I abstraced it away into a separate class. When we remove an interval, we loop over the existing items and check if there's any overlap with the removed interval. There can be 4 types of overlapping:
Insertion is very simple compared to this: we call the On the topic of priorities:I understand why we assign priorities to pieces when they are selected, but I don't see why we would need them upon deselection. From recently reading around I got the feeling that it's for making sure that we don't deselect intervals that were selected by file streams -- therefore in the past we would only deselect the given pieces if the start and end points matched exactly and also the priority. Is this right? If so, then wouldn't it make more sense to distinguish stream-selections from non-stream selections with a boolean flag instead of their priority? If this is the case, then I'll adjust my code to make sure that only files can deselect file streams and not end users accidentally. Let me know if I'm wrong on this. |
your assumption is roughly correct, but it doesn't apply to only streams, imagine webtorrent's HTTP server being used as a media server, and having say 10 people watching 1 video as it's being downloaded/streamed from BitTorrent if multiple people are at the same point in time in the video, and one of them stops watching it unloads the video, which sends a cancel request to the range that was requested, thus deselecting the pieces for ALL 9 other people that are watching it, permanently stalling the playback for 9 people until someone reloads it this ofc applies to not only streams, but manually running selection ranges, your code works for single selections, but when you have multiple unrelated processes making selections, it falls apart hope that clears it up |
(This comment is mostly for future reference to make sure I still understand the issue when I come back to it) I think I finally understand. So the file stores one iterator for each stream created. If there are multiple requests as the same time, there might be 2 iterators for the same (or overlapping) piece intervals. When the file iterator's I think I might be able to solve this by keeping a separate array of selections specifically for these cases. This would ensure that existing features don't break. I'll try to write the code now and push it as soon as I can. I'll try to setup an example project as well, to test that my solution actually works. |
yeup, u got it right. best way to solve it, is to make iterators [streams] create ranges which cannot be cancelled, except by themselves, and make other selections work how u described [I think] |
@ThaUnknown Thanks for the guidance so far. I have implemented 2 things:
I copied the tests from Alex's PR and modified them somewhat + just to be on the safe side, I created a test server and connected to it with 2 clients, then disconnected one of them. The other client's selection remained in the I read somewhere that you should never modify the git history, so I basically sent everything in a separate commit, let me know if I should squash my commits though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is high quality, so I'm going to be VERY strict on it :)
hope that doesn't demotivate you
I hope so too 😄 |
Webtorrent's selections also have 1 more issue, I'm not gonna ask you to fix it, but it would be insanely welcome as it's the last thing that sucks in this implementation: webtorrent selects full pieces, rather than specific bytes. This becomes an issue for example when there's a torrent with 2 files, and file 1 ends at 1/2 piece and file 2 starts at where that 1/2 piece ended, selecting and deselecting the files, impacts the other file's selections. The way I could see this solved, is that internally we'd select byte ranges, but users would still request pieces. This is hard to explain but TLDR: internally selections should be byte ranges, however they should be exposed to users and the selection algorithms as piece numbers. If you're interested in this, and didn't understand my shit explanations, ask, and I'll put more effort into properly explaining this. |
Personally, I like the idea of having byte-range selections. I haven't given this too much thought yet, but I imagine it could be used to specifically not download any files, other than the selected one(s). For me this was an issue previously because when a torrent has 2 large files and there's a piece that contains data from both files, it creates the second file too (even when I haven't selected it). Now this shouldn't be an issue, since I didn't actually download the second file's contents, but to the file system it is indicated that the second file is also a full file (let's say 10GB). Now I know it's not actually taking up that space, but it's still annoying. That said, I think that would be a pretty big undertaking and considering that this PR has also grown to be around ~450 LOC in changes, I would prefer to keep those changes in a separate PR. I can't guarantee to work on it after this PR, because I would first like to address some other issues - namely fixing the very out of date TLDR: I like the idea and support it, but I think it should be in a separate PR. I won't start working on it right away, but if I have time later, I might take a stab at it. |
I'm going to check out the failing tests, but I adressed your comments. Edit: I have some issues with running the browser tests. I'll come back to it after work Thank you for the detailed review! I don't think there are too many details left to debate, so we might ship this thing quite quickly 🚀 Who said open-source is slow? 😄 |
very fair
it is, I just sacrifice a lot of free time for FOSS, as you see the previous PR is still open |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, this is gonna be a breaking change, but I think it's needed and I don't know if its the best idea, @SilentBot1 what do u think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue stands with these changes in total, notify
can be swallowed or overwritten, that's kinda bad.... when you merge or overwrite selections, you should also merge their notify functions? imagine multiple functions ask for notify on different priority levels, all but one will be lost....
yeah @detarkende you'll need to "merge" callback/notify functions when selections are merged/overwritten, this will call the notify function when it's not really necessary, but its better to call it more than less, I think it would be roughly like this: const existingStart = { ...existing, to: item.from - 1, notify: ()=>{item.notify();existing.notify()} } probably? you'd need to double check this, and ofc it would be in other places too |
The code looks overall high quality, thanks for taking the time to PR this @detarkende. Through my testing, I only found one real issue, and one thing that may cause issues:
client.add('magnet:?xt=urn:btih:08ada5a7a6183aae1e09d831df6748d566095a10&dn=Sintel&tr=udp%3A%2F%2Fexplodie.org%3A6969&xs=https%3A%2F%2Fwebtorrent.io%2Ftorrents%2Fsintel.torrent&so=0', {deselect: false})
// No selection made - incorrect
client.add('magnet:?xt=urn:btih:08ada5a7a6183aae1e09d831df6748d566095a10&dn=Sintel&tr=udp%3A%2F%2Fexplodie.org%3A6969&xs=https%3A%2F%2Fwebtorrent.io%2Ftorrents%2Fsintel.torrent&so=1')
// No selection made - incorrect
client.add('magnet:?xt=urn:btih:08ada5a7a6183aae1e09d831df6748d566095a10&dn=Sintel&tr=udp%3A%2F%2Fexplodie.org%3A6969&xs=https%3A%2F%2Fwebtorrent.io%2Ftorrents%2Fsintel.torrent&so=5', {deselect: true})
// No selection made - I would say this is correct, explicit deselect should take priority over &so=
client.torrents[0].select(0, 100)
client.torrents[0].select(0, 50) which leads to an unfulfillable selection: Selections {
_items: [
{
from: 0,
to: -1,
offset: 0,
priority: 0,
notify: [Function: noop],
isStreamSelection: false
},
...
]
} Other than that, it looks like a nice addition! |
@ThaUnknown Thanks for the idea. I'm currently trying to make it work with an independent notifications array inside Selections. It's kinda the same idea what #2115 does. I have some issues with it atm, but I'll try to fix them. If I can't I'll probably use your idea. |
@ThaUnknown pushed some changes that address the notifications issue. I believe it solves the problem, but maybe I didn't think of some edge case. lmk what you think. |
@SilentBot1 Thank you for the review. You've got a good eye for bugs, I would've never caught those two by myself. |
682eebd
to
e06b5ef
Compare
@SilentBot1 - Thanks for the review. I tried to reproduce your situation, but I have to say, I don't know what I'm missing, but it only inserts 1 selection for me. This is my script: import WebTorrent from './index.js'
import MemoryChunkStore from 'memory-chunk-store'
const client = new WebTorrent()
client.add('magnet:?xt=urn:btih:08ada5a7a6183aae1e09d831df6748d566095a10&dn=Sintel&tr=udp%3A%2F%2Fexplodie.org%3A6969&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.empire-js.us%3A1337&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=wss%3A%2F%2Ftracker.btorrent.xyz&tr=wss%3A%2F%2Ftracker.fastcast.nz&tr=wss%3A%2F%2Ftracker.openwebtorrent.com&ws=https%3A%2F%2Fwebtorrent.io%2Ftorrents%2F&xs=https%3A%2F%2Fwebtorrent.io%2Ftorrents%2Fsintel.torrent', { deselect: true, store: MemoryChunkStore })
const server = client.createServer()
server.listen(8443)
/** @type {import('./lib/torrent.js').default} */
const torrent = client.torrents[0]
setInterval(() => {
console.log(torrent._selections)
}, 1000) And this is the output for me, when I run your Can you please try it again? I have rebased the repo now, but that shouldn't have an effect on this (theoretically). I tested this with node 16, 18 and 20. Which version are you using? If you have the time, we could check in a video call or something, but I'm just out of ideas on how to reproduce this 😄 |
Hey @detarkende, I'm now also unable to replicate this (on Node v18.16.0) after the rebase and a |
Thank you @ThaUnknown and @SilentBot1! Let's ship this thing 🚀 @ThaUnknown, to me it shows that you have 2 unresolved threads, but I cannot find them. Could you please check and maybe reapprove? |
Thanks everyone! It says that only those with write access to webtorrent can merge this PR, so from my end if either of you have this permission, we can merge this 😄 🚀 |
it will be merged when I have the time. |
yeah add that 1 test, and I'll merge this, this is defo an A tier pull request |
# [2.2.0](v2.1.37...v2.2.0) (2024-03-26) ### Features * Refactor selections with non-overlapping data structure ([#2757](#2757)) ([467f30c](467f30c))
🎉 This PR is included in version 2.2.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
good PR, well documented, clean simple code, includes typings for static analysis |
@detarkende if you're interested in working on other things, in order of priority:
I'm willing to review all these, don't have time to work on them myself tho. My personal interest, is insanely high for #2082 because it would allow webtorrent to "function", during re-validating of pieces, so if you have a 500GB torrent, you could use it while it's still validating pieces, it's probably one of the biggest issue for WebTorrent which prevents it from being used as a full blown torrent client such as qbit. I wouldn't work on types, webtorrent needs to be re-written to use async/await instead of callbacks [this has been on the roadman for years], which will break all the types anyways |
Thank you, @ThaUnknown for the careful and precise review. I don't plan on working on types (as in jsdoc) in this repo, but I am considering fixing the Thanks for the suggestions, I will look into these issues, but no promises yet. Earlier you mentioned the issue about pieces containing data from multiple files and that resulting in files being created, even when they weren't selected. I know that other torrent clients do this as well, but from a user's perspective it's quite unexpected, so I'll think about finding a solution for that. |
you can fix that, by re-writing this to use byte ranges internally, not pieces :) |
What is the purpose of this pull request? (put an "X" next to item)
What changes did you make? (Give an overview)
WIP - please give me feedback
The goal of this PR is to introduce a new data structure that handles piece selections. The new
Selection
class allows you to add intervals to the selection and remove them without creating overlapping selection items.Upon insertion, the class checks if the code overlaps with any existing selections. If it does, then it merges the two selections (the new, inserted selection remains intact and the existing one(s) are truncated or cut into pieces).
This also means that (for example) if the user runs
then all the pieces are actually deselected.
The long-term goal of this change is to also fix the issue with
file.deselect()
but that's a story for a later day.Which issue (if any) does this pull request address?
It relates to #2755.
Is there anything you'd like reviewers to focus on?
As Webtorrent is a pretty large project, I don't understand all of it. I would like reviewers to help me poke holes into this implementation. If there are any aspects of the app that this breaks that I didn't think of, please let me know so I can hopefully extend my solution.