Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor selections with non-overlapping data structure #2757

Merged

Conversation

detarkende
Copy link
Contributor

What is the purpose of this pull request? (put an "X" next to item)

  • Documentation update
  • Bug fix
  • New feature
  • Other, please explain:

What changes did you make? (Give an overview)

WIP - please give me feedback

The goal of this PR is to introduce a new data structure that handles piece selections. The new Selection class allows you to add intervals to the selection and remove them without creating overlapping selection items.

Upon insertion, the class checks if the code overlaps with any existing selections. If it does, then it merges the two selections (the new, inserted selection remains intact and the existing one(s) are truncated or cut into pieces).

This also means that (for example) if the user runs

torrent.deselect(0, torrent.pieces.length - 1);

then all the pieces are actually deselected.

The long-term goal of this change is to also fix the issue with file.deselect() but that's a story for a later day.

Which issue (if any) does this pull request address?

It relates to #2755.

Is there anything you'd like reviewers to focus on?

As Webtorrent is a pretty large project, I don't understand all of it. I would like reviewers to help me poke holes into this implementation. If there are any aspects of the app that this breaks that I didn't think of, please let me know so I can hopefully extend my solution.

Copy link

welcome bot commented Feb 28, 2024

🙌 Thanks for opening this pull request! You're awesome.

@ThaUnknown
Copy link
Member

code quality is good, I haven't look at the actual functionality tho, it would be good to actually explain what you did, and how the algorithm changed, you also cannot get rid of priority.

@ThaUnknown
Copy link
Member

ThaUnknown commented Feb 29, 2024

also if you're already addessing select/deselect behavior it would be good if you included what #2115 tries to accomplish, not exactly its code, but it's functionality, except that also didn't include priorities

once these 2 comments are addressed I'll review, test and potentially merge

@detarkende
Copy link
Contributor Author

Explanation of my changes:

To make the _selections array non-overlapping, I abstraced it away into a separate class.
This class internally has the same array, but its insert and remove methods include some extra logic to keep items separate:

When we remove an interval, we loop over the existing items and check if there's any overlap with the removed interval. There can be 4 types of overlapping:

  • isLowerIntersecting: Only the lower end of the removed interval overlaps

    • Example: we are removing 8-12, but 1-10 is already selected, so we cut the end of the existing selection off.
  • isUpperIntersecting: Only the upper end of the removed interval overlaps

    • Example: we are removing 10-22, but 20-25 is already selected, so we cut the beginning of the existing selection off.
  • isInsideExisting: The removed interval is specifically inside an existing larger selection.

    • Example: we are removing 10-15, but 1-30 is already selected, so we carve out the removed section from the middle of the existing one -> this leaves us with 2 selections with a hole between them.
  • isCoveringExisting: This is the simplest case. If the removed interval covers an entire existing selection, or if they're an exact match size-wise, then we simply remove the existing selection.

Insertion is very simple compared to this: we call the remove method with the newly inserted interval to make sure that there's no overlapping items, then we push the selection to the array.


On the topic of priorities:

I understand why we assign priorities to pieces when they are selected, but I don't see why we would need them upon deselection.

From recently reading around I got the feeling that it's for making sure that we don't deselect intervals that were selected by file streams -- therefore in the past we would only deselect the given pieces if the start and end points matched exactly and also the priority.

Is this right? If so, then wouldn't it make more sense to distinguish stream-selections from non-stream selections with a boolean flag instead of their priority? If this is the case, then I'll adjust my code to make sure that only files can deselect file streams and not end users accidentally. Let me know if I'm wrong on this.

@ThaUnknown
Copy link
Member

I understand why we assign priorities to pieces when they are selected, but I don't see why we would need them upon deselection.

your assumption is roughly correct, but it doesn't apply to only streams, imagine webtorrent's HTTP server being used as a media server, and having say 10 people watching 1 video as it's being downloaded/streamed from BitTorrent

if multiple people are at the same point in time in the video, and one of them stops watching it unloads the video, which sends a cancel request to the range that was requested, thus deselecting the pieces for ALL 9 other people that are watching it, permanently stalling the playback for 9 people until someone reloads it

this ofc applies to not only streams, but manually running selection ranges, your code works for single selections, but when you have multiple unrelated processes making selections, it falls apart

hope that clears it up

@detarkende
Copy link
Contributor Author

detarkende commented Mar 1, 2024

(This comment is mostly for future reference to make sure I still understand the issue when I come back to it)

I think I finally understand. So the file stores one iterator for each stream created. If there are multiple requests as the same time, there might be 2 iterators for the same (or overlapping) piece intervals. When the file iterator's destroy method is called, it deselects the selected pieces. Which means that if selections cannot overlap, then one iterator might clear its own selection, but with that also clear the selection for all other iterators.

I think I might be able to solve this by keeping a separate array of selections specifically for these cases. This would ensure that existing features don't break. I'll try to write the code now and push it as soon as I can. I'll try to setup an example project as well, to test that my solution actually works.

@ThaUnknown
Copy link
Member

ThaUnknown commented Mar 1, 2024

yeup, u got it right.

best way to solve it, is to make iterators [streams] create ranges which cannot be cancelled, except by themselves, and make other selections work how u described [I think]

@detarkende detarkende marked this pull request as ready for review March 3, 2024 18:43
@detarkende
Copy link
Contributor Author

@ThaUnknown Thanks for the guidance so far. I have implemented 2 things:

  1. Streams create specific stream-selections, which are marked with a true isStreamSelection flag. These can only be deselected by streams, and only one at a time. The actual implementation of this is mostly the same as in feat: change deselect and select behaviour #2115, since that has already received a round of code review - so I assume it's not a terrible way to do it 😄
  2. In feat: change deselect and select behaviour #2115, @feross suggested a new option in the torrent options: the deselect option. I added that as well.

I copied the tests from Alex's PR and modified them somewhat + just to be on the safe side, I created a test server and connected to it with 2 clients, then disconnected one of them. The other client's selection remained in the _selections and its video playback continued without any issues.

I read somewhere that you should never modify the git history, so I basically sent everything in a separate commit, let me know if I should squash my commits though.

Copy link
Member

@ThaUnknown ThaUnknown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is high quality, so I'm going to be VERY strict on it :)

hope that doesn't demotivate you

lib/torrent.js Outdated Show resolved Hide resolved
lib/torrent.js Outdated Show resolved Hide resolved
lib/torrent.js Outdated Show resolved Hide resolved
lib/torrent.js Outdated Show resolved Hide resolved
lib/selection.js Outdated Show resolved Hide resolved
lib/selection.js Outdated Show resolved Hide resolved
lib/selection.js Outdated Show resolved Hide resolved
lib/selection.js Outdated Show resolved Hide resolved
lib/selection.js Outdated Show resolved Hide resolved
lib/torrent.js Show resolved Hide resolved
@detarkende
Copy link
Contributor Author

detarkende commented Mar 3, 2024

I hope so too 😄
I'll look at your requested changes tomorrow.

@ThaUnknown
Copy link
Member

Webtorrent's selections also have 1 more issue, I'm not gonna ask you to fix it, but it would be insanely welcome as it's the last thing that sucks in this implementation: webtorrent selects full pieces, rather than specific bytes.

This becomes an issue for example when there's a torrent with 2 files, and file 1 ends at 1/2 piece and file 2 starts at where that 1/2 piece ended, selecting and deselecting the files, impacts the other file's selections.
this is especially an issue when creating file streams/iterators, as in theory removing an iterator from the start of the 2nd file, can remove the selection for the end of the 1st file, which ignores metadata download, and effectively stalls all playback etc

The way I could see this solved, is that internally we'd select byte ranges, but users would still request pieces.
This means that file selections aka torrent.file.select would select ranges, and deselecting it would also deselect ranges, then the indexes specific selections target would be exposed as parts of those ranges.

This is hard to explain but TLDR: internally selections should be byte ranges, however they should be exposed to users and the selection algorithms as piece numbers.

If you're interested in this, and didn't understand my shit explanations, ask, and I'll put more effort into properly explaining this.

lib/selection.js Outdated Show resolved Hide resolved
@detarkende
Copy link
Contributor Author

Webtorrent's selections also have 1 more issue, I'm not gonna ask you to fix it, but it would be insanely welcome as it's the last thing that sucks in this implementation: webtorrent selects full pieces, rather than specific bytes.

Personally, I like the idea of having byte-range selections. I haven't given this too much thought yet, but I imagine it could be used to specifically not download any files, other than the selected one(s). For me this was an issue previously because when a torrent has 2 large files and there's a piece that contains data from both files, it creates the second file too (even when I haven't selected it). Now this shouldn't be an issue, since I didn't actually download the second file's contents, but to the file system it is indicated that the second file is also a full file (let's say 10GB). Now I know it's not actually taking up that space, but it's still annoying.

That said, I think that would be a pretty big undertaking and considering that this PR has also grown to be around ~450 LOC in changes, I would prefer to keep those changes in a separate PR.

I can't guarantee to work on it after this PR, because I would first like to address some other issues - namely fixing the very out of date @types/webtorrent declaration files, or maybe adding more jsdoc annotations to the code, but I might give it a go later.

TLDR: I like the idea and support it, but I think it should be in a separate PR. I won't start working on it right away, but if I have time later, I might take a stab at it.

@detarkende
Copy link
Contributor Author

detarkende commented Mar 4, 2024

I'm going to check out the failing tests, but I adressed your comments. Edit: I have some issues with running the browser tests. I'll come back to it after work

Thank you for the detailed review! I don't think there are too many details left to debate, so we might ship this thing quite quickly 🚀 Who said open-source is slow? 😄

@ThaUnknown
Copy link
Member

but I think it should be in a separate PR

very fair

Who said open-source is slow?

it is, I just sacrifice a lot of free time for FOSS, as you see the previous PR is still open

Copy link
Member

@ThaUnknown ThaUnknown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, this is gonna be a breaking change, but I think it's needed and I don't know if its the best idea, @SilentBot1 what do u think

lib/torrent.js Outdated Show resolved Hide resolved
lib/file.js Show resolved Hide resolved
lib/torrent.js Show resolved Hide resolved
test/client-deselect.js Outdated Show resolved Hide resolved
lib/selections.js Outdated Show resolved Hide resolved
Copy link
Member

@ThaUnknown ThaUnknown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue stands with these changes in total, notify can be swallowed or overwritten, that's kinda bad.... when you merge or overwrite selections, you should also merge their notify functions? imagine multiple functions ask for notify on different priority levels, all but one will be lost....

lib/torrent.js Outdated Show resolved Hide resolved
@ThaUnknown
Copy link
Member

yeah @detarkende you'll need to "merge" callback/notify functions when selections are merged/overwritten, this will call the notify function when it's not really necessary, but its better to call it more than less, I think it would be roughly like this:

const existingStart = { ...existing, to: item.from - 1, notify: ()=>{item.notify();existing.notify()} }

probably? you'd need to double check this, and ofc it would be in other places too

@SilentBot1
Copy link
Member

The code looks overall high quality, thanks for taking the time to PR this @detarkende.

Through my testing, I only found one real issue, and one thing that may cause issues:

  1. The first being that the &so= magnet query parameter is preventing any selections from being made:
client.add('magnet:?xt=urn:btih:08ada5a7a6183aae1e09d831df6748d566095a10&dn=Sintel&tr=udp%3A%2F%2Fexplodie.org%3A6969&xs=https%3A%2F%2Fwebtorrent.io%2Ftorrents%2Fsintel.torrent&so=0', {deselect: false})
// No selection made - incorrect

client.add('magnet:?xt=urn:btih:08ada5a7a6183aae1e09d831df6748d566095a10&dn=Sintel&tr=udp%3A%2F%2Fexplodie.org%3A6969&xs=https%3A%2F%2Fwebtorrent.io%2Ftorrents%2Fsintel.torrent&so=1')
// No selection made - incorrect

client.add('magnet:?xt=urn:btih:08ada5a7a6183aae1e09d831df6748d566095a10&dn=Sintel&tr=udp%3A%2F%2Fexplodie.org%3A6969&xs=https%3A%2F%2Fwebtorrent.io%2Ftorrents%2Fsintel.torrent&so=5', {deselect: true})
// No selection made - I would say this is correct, explicit deselect should take priority over &so=
  1. The second being that two overlapping sections at the start of the piece range leaves a negative selection:
client.torrents[0].select(0, 100)
client.torrents[0].select(0, 50)

which leads to an unfulfillable selection:

Selections {
  _items: [
    {
      from: 0,
      to: -1,
      offset: 0,
      priority: 0,
      notify: [Function: noop],
      isStreamSelection: false
    },
    ...
  ]
}

Other than that, it looks like a nice addition!

@detarkende
Copy link
Contributor Author

yeah @detarkende you'll need to "merge" callback/notify functions when selections are merged/overwritten, this will call the notify function when it's not really necessary, but its better to call it more than less, I think it would be roughly like this:

const existingStart = { ...existing, to: item.from - 1, notify: ()=>{item.notify();existing.notify()} }

probably? you'd need to double check this, and ofc it would be in other places too

@ThaUnknown Thanks for the idea. I'm currently trying to make it work with an independent notifications array inside Selections. It's kinda the same idea what #2115 does. I have some issues with it atm, but I'll try to fix them. If I can't I'll probably use your idea.

@detarkende
Copy link
Contributor Author

@ThaUnknown pushed some changes that address the notifications issue. I believe it solves the problem, but maybe I didn't think of some edge case. lmk what you think.

@detarkende
Copy link
Contributor Author

@SilentBot1 Thank you for the review. You've got a good eye for bugs, I would've never caught those two by myself.
I addressed both of them in my last 2 commits. Let me know what you think

@detarkende detarkende force-pushed the 2755-non-overlapping-selections branch from 682eebd to e06b5ef Compare March 20, 2024 19:46
@detarkende
Copy link
Contributor Author

@SilentBot1 - Thanks for the review. I tried to reproduce your situation, but I have to say, I don't know what I'm missing, but it only inserts 1 selection for me.

This is my script:

import WebTorrent from './index.js'
import MemoryChunkStore from 'memory-chunk-store'

const client = new WebTorrent()
client.add('magnet:?xt=urn:btih:08ada5a7a6183aae1e09d831df6748d566095a10&dn=Sintel&tr=udp%3A%2F%2Fexplodie.org%3A6969&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.empire-js.us%3A1337&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=wss%3A%2F%2Ftracker.btorrent.xyz&tr=wss%3A%2F%2Ftracker.fastcast.nz&tr=wss%3A%2F%2Ftracker.openwebtorrent.com&ws=https%3A%2F%2Fwebtorrent.io%2Ftorrents%2F&xs=https%3A%2F%2Fwebtorrent.io%2Ftorrents%2Fsintel.torrent', { deselect: true, store: MemoryChunkStore })
const server = client.createServer()
server.listen(8443)

/** @type {import('./lib/torrent.js').default} */
const torrent = client.torrents[0]

setInterval(() => {
  console.log(torrent._selections)
}, 1000)

And this is the output for me, when I run your curl command in a separate terminal instance:

image

Can you please try it again? I have rebased the repo now, but that shouldn't have an effect on this (theoretically).

I tested this with node 16, 18 and 20. Which version are you using? If you have the time, we could check in a video call or something, but I'm just out of ideas on how to reproduce this 😄

@SilentBot1
Copy link
Member

Hey @detarkende, I'm now also unable to replicate this (on Node v18.16.0) after the rebase and a git reset --hard detarkende/2755-non-overlapping-selections, so I'm assuming this was a skill issue on my half, but this now looks all good to go.

@detarkende
Copy link
Contributor Author

Thank you @ThaUnknown and @SilentBot1! Let's ship this thing 🚀

@ThaUnknown, to me it shows that you have 2 unresolved threads, but I cannot find them. Could you please check and maybe reapprove?

@detarkende
Copy link
Contributor Author

Thanks everyone! It says that only those with write access to webtorrent can merge this PR, so from my end if either of you have this permission, we can merge this 😄 🚀

@ThaUnknown
Copy link
Member

it will be merged when I have the time.

@ThaUnknown
Copy link
Member

yeah add that 1 test, and I'll merge this, this is defo an A tier pull request

@ThaUnknown ThaUnknown enabled auto-merge (squash) March 26, 2024 14:34
@ThaUnknown ThaUnknown merged commit 467f30c into webtorrent:master Mar 26, 2024
6 checks passed
Copy link

welcome bot commented Mar 26, 2024

🎉 Congrats on getting your first pull request landed!

webtorrent-bot pushed a commit that referenced this pull request Mar 26, 2024
# [2.2.0](v2.1.37...v2.2.0) (2024-03-26)

### Features

* Refactor selections with non-overlapping data structure ([#2757](#2757)) ([467f30c](467f30c))
@webtorrent-bot
Copy link
Collaborator

🎉 This PR is included in version 2.2.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@ThaUnknown
Copy link
Member

good PR, well documented, clean simple code, includes typings for static analysis

@ThaUnknown
Copy link
Member

ThaUnknown commented Mar 26, 2024

@detarkende if you're interested in working on other things, in order of priority:

I'm willing to review all these, don't have time to work on them myself tho.

My personal interest, is insanely high for #2082 because it would allow webtorrent to "function", during re-validating of pieces, so if you have a 500GB torrent, you could use it while it's still validating pieces, it's probably one of the biggest issue for WebTorrent which prevents it from being used as a full blown torrent client such as qbit.

I wouldn't work on types, webtorrent needs to be re-written to use async/await instead of callbacks [this has been on the roadman for years], which will break all the types anyways

@detarkende
Copy link
Contributor Author

Thank you, @ThaUnknown for the careful and precise review.

I don't plan on working on types (as in jsdoc) in this repo, but I am considering fixing the @types/webtorrent declaration, since it is very outdated and at this point it's more of a burden than help. I work on a project that uses TS and it's a pain to work with that old declaration.

Thanks for the suggestions, I will look into these issues, but no promises yet. Earlier you mentioned the issue about pieces containing data from multiple files and that resulting in files being created, even when they weren't selected. I know that other torrent clients do this as well, but from a user's perspective it's quite unexpected, so I'll think about finding a solution for that.

@ThaUnknown
Copy link
Member

you can fix that, by re-writing this to use byte ranges internally, not pieces :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants