Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProgressBarIter and finding stream length via seeking breaks rate estimation #480

Closed
rlee287 opened this issue Sep 8, 2022 · 3 comments · Fixed by #483
Closed

ProgressBarIter and finding stream length via seeking breaks rate estimation #480

rlee287 opened this issue Sep 8, 2022 · 3 comments · Fixed by #483

Comments

@rlee287
Copy link
Contributor

rlee287 commented Sep 8, 2022

MCVE:

use std::io::{Read, Seek, SeekFrom};
use std::io::Cursor;

use std::thread::sleep;
use std::time::Duration;

use indicatif::{ProgressBar, ProgressStyle};

pub fn seek_len(seekable: &mut dyn Seek) -> u64 {
    let old_pos = seekable.stream_position().unwrap();
    let len = seekable.seek(SeekFrom::End(0)).unwrap();
    if old_pos != len {
        seekable.seek(SeekFrom::Start(old_pos)).unwrap();
    }
    // return
    len
}

fn main() {
    let mut dummy_buf: [u8; 4] = [0x00; 4];
    let reader = Cursor::new([0xff; 4*100]);

    let pb_style = ProgressStyle::default_bar()
            .template("{wide_bar} | {pos}/{len} | {per_sec}")
            .unwrap();

    let pb = ProgressBar::new(100*4);
    pb.set_style(pb_style);
    let mut pb_reader = pb.wrap_read(reader);
    seek_len(&mut pb_reader);
    for _i in 0..100 {
        pb_reader.read(&mut dummy_buf).unwrap();
        sleep(Duration::from_millis(40));
    }
    pb.finish();
}

Expected behavior: rate hovers slightly below 100/s

Actual behavior: rate is stuck at an absurdly high constant while progress bar counts upwards and drops to slightly below 100/s when the progress bar completes

This behavior could occur when a ProgressBarIter is passed to a generic function taking a Read + Seek object.

This is a regression from v0.16.

@djc
Copy link
Collaborator

djc commented Sep 12, 2022

I think this is a duplicate of the discussion in #394. It would be great if someone can investigate this in more detail and propose a fix (potentially to revert some Estimator changes to the 0.16 state).

@rlee287
Copy link
Contributor Author

rlee287 commented Sep 22, 2022

After some investigation, I've managed to track down the root cause of my issue. While #394 is a discussion of computing ETA, I do not believe it is closely related as that issue does not discuss problems in rate estimation that arise with seeking.

I've linked to the relevant lines of code in the Estimator struct below for reference:

indicatif/src/state.rs

Lines 385 to 405 in 791068c

fn record(&mut self, new: u64, now: Instant) {
let delta = new.saturating_sub(self.prev.0);
if delta == 0 || now < self.prev.1 {
return;
}
let elapsed = now - self.prev.1;
let divisor = delta as f64;
let mut batch = 0.0;
if divisor != 0.0 {
batch = duration_to_secs(elapsed) / divisor;
};
self.steps[self.pos as usize] = batch;
self.pos = (self.pos + 1) % 16;
if !self.full && self.pos == 0 {
self.full = true;
}
self.prev = (new, now);
}

A typical approach of finding the length of a Seek object is through a function like the one below (copied from my MCVE):

pub fn seek_len(seekable: &mut dyn Seek) -> u64 {
    let old_pos = seekable.stream_position().unwrap();
    let len = seekable.seek(SeekFrom::End(0)).unwrap();
    if old_pos != len {
        seekable.seek(SeekFrom::Start(old_pos)).unwrap();
    }
    // return
    len
}

This involves 2-3 seek operations:

  1. A seek to find the current position (which may not be necessary depending on how stream_position() is actually implemented)
  2. A seek to the end to find the length
  3. A seek back to the initial position to reset the seek position

Unfortunately this seek pattern has unfortunate interactions with the rate estimator:

  • The seek to the end creates an extremely large delta, and this large seek-to-the-end step is recorded in the estimator, generating an absurdly large rate. prev now contains (seek_len, time_of_seek_to_end).
  • The seek backwards creates a delta of 0, as the negative value gets clipped to 0 by the saturating_sub operation.
  • All future operations lead to a computed delta of 0 and an actual delta <= 0, because the stored position in prev is at the very end of the object whose progress is being tracked.

I can think of two possible solutions:

  1. Do not record a step in the Estimator if the value received is equal to the length of the underlying object. This does not resolve a similar bug that would arise if users did a massive seek forwards and then back that did not line up with the end of the underlying object. This is also not possible for iterators that do not have a predetermined length. Having different behavior depending on whether the underlying object is Seek or not might also require some form of specialization.
  2. Reset the Estimator whenever a backwards seek/step is detected.

I am leaning towards the second solution, but I would like to hear other people's opinions before submitting a PR for this.

@djc
Copy link
Collaborator

djc commented Sep 22, 2022

Yeah, (2) definitely seems cleaner than (1). Of course it's still kind of a crappy heuristic, but I guess it's unlikely to do much harm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants