Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimization: Reduce overhead of SimpleTransformIterator #44

Open
jeswr opened this issue Mar 13, 2022 · 42 comments
Open

optimization: Reduce overhead of SimpleTransformIterator #44

jeswr opened this issue Mar 13, 2022 · 42 comments

Comments

@jeswr
Copy link
Collaborator

jeswr commented Mar 13, 2022

For basic methods like .map and .filter, on AsyncIterator, using the SimpleTransformIterator introduces uneccessary overhead via the readAndTransformSimple operator. To improve performance in applications like Comunica it may be worth creating custom iterators with simpler read methods; for instance a synchronous MapIterator could skip buffering entirely and just be implemented as something like

export class MapIterator<T, S> extends AsyncIterator<S> {
    ...
    read() {
       if ((item = source.read()) !== null)
          return this.map(item);
    }
}

@jacoscaz I suspect this is the cause of the potential slowdown you were encountering when doing a large amount of chaining.

@jacoscaz
Copy link
Collaborator

jacoscaz commented Mar 13, 2022

Indeed, chained buffering steps might explain the slowdowns I’ve seen. Normally I would expect performance to slow down linearly with the number of chained transforms but often I see slowdowns that seem somewhat exponential. I had been under the impression that methods such as map() and filter() behaved as you described but, given this issue, I was either plain wrong or possibly remembering a previous version. This would be a great addition to asynciterator itself but we could do it separately if better for backward compat.

However, I do not have hard numbers at hand. @jeswr have you done some profiling on this?

@rubensworks
Copy link
Collaborator

@jeswr Do you think think the slowdown is caused by the complexity of the code in readAndTransformSimple, or the fact that it fills an internal buffer? If the latter, just disabling the buffer on intermediary iterators might also improve overall performance.

@jeswr
Copy link
Collaborator Author

jeswr commented Mar 13, 2022

@rubensworks I'd suspect you're correct in that the intermediary buffers are the main overhead.

I was thinking that one way around this is to have a 'chained transform' which applies multiple map/filter/transform operations in one go, so long as each part of that chain is synchronous - is this equivalent to what you mean by disabling internal buffers?

@jeswr
Copy link
Collaborator Author

jeswr commented Mar 13, 2022

@jacoscaz - no I have not done profiling, it was just an observation I made when digging through the code when working on a PR I made earlier

@jacoscaz
Copy link
Collaborator

I was thinking that one way around this is to have a 'chained transform' which applies multiple map/filter/transform operations in one go

Would this not require knowing what those transformations look like ahead of time? Unless, perhaps, one were to have all intermediate actors return a transform function and have the final actor concatenate these.

However, if all intermediate steps were to be implemented as iterators without internal buffering as you suggested, provided an initial buffered iterator for I/O queries (i.e. reading from a file), the resulting performance should not be too far from that of a pure concatenation of functions while allowing to keep the current internal structure.

We should come up with a couple of examples that clearly demonstrate whether there’s a real issue here or not. One possible way of doing this would be to manually implement a no-buffering map iterator and profile it against the current map(). I’d be up for that if nobody has time to do so, there’s potential for a great return of investment.

@jeswr
Copy link
Collaborator Author

jeswr commented Mar 13, 2022

Would this not require knowing what those transformations look like ahead of time?

I was thinking that methods like map, filter and transform would add to the internal set of operations that need to be done and then return the same iterator. The major caveat of this being that this would affect upstream reads of the iterator if there are any

I’d be up for that if nobody has time to do so

That sounds great but I'd wait for #43 to be merged as there may introduce some small performance changes

@jacoscaz
Copy link
Collaborator

I was thinking that methods like map, filter and transform would add to the internal set of operations that need to be done and then return the same iterator.

I suspect that the concatenation mechanism might be a non-trivial one, and it would have to be instantiated even for single-op iterators unless we were to make it a lazy one, which would make it more complicated. Nonetheless, I think there’s definitely something to be said for combining operations. For example, could Comunica benefit from a mapFilter() iterator?

In any case, we should probably look at these as orthogonal issues:

  1. on one side we have buffering, and the possibility that disabling buffering in intermediate iterators might result in gains when dealing with long-ish chains;

  2. on a different side we have combining N transformations into M < N iterators, and the possibility that doing so might reduce avg chain length.

Both definitely sound worth investigating IMHO.

I'd wait for #43 to be merged as there may introduce some small performance changes

Seems reasonable! I may start setting up a simple benchmark and then pause and wait for that PR to be merged.

@RubenVerborgh
Copy link
Owner

Is the SimpleTransformIterator being used in performance-sensitive code?

My goal for it was to be used for simple cases—and definitely not for chaining.
I guess I should have been more explicit about that.

For chaining, you want to define your own sync or async transform iterator.

@jeswr
Copy link
Collaborator Author

jeswr commented Mar 13, 2022

Because of this the use of .map, .filter, .transform etc. all create a new SimpleTransformIterator and hence there are many places in Comunica that create new SimpleTransformIterators for instance

Mapping:

(and thats only going up to page 3/10 in searching for .map in https://github.com/comunica/comunica/search?p=3&q=.map)

Filtering:

Given how commonly this is used I think it is worth making the change upstream here in this package

@RubenVerborgh
Copy link
Owner

Absolutely, thanks for finding this. We probably want:

  • Dedicated MapIterator, FilterIterator, …
  • A check within the SimpleTransformIterator that, if only map or filter are requested, one of the dedicated iterators is instantiated instead.

@jacoscaz
Copy link
Collaborator

@jeswr could you have a look at this gist of mine? https://gist.github.com/jacoscaz/e20812991714092dbb20cfd432d1bf00

Either I got something wrong or there's virtually no performance difference between .map() and a dedicated mapping iterator.

@jeswr
Copy link
Collaborator Author

jeswr commented Mar 23, 2022

To summarize the results I have in the following comments

  1. ArrayIterator causes about a 5x speed degradation compared to range iterator on 20000 elements (because as I raised here the V8 implementation of Array.shift is still very suboptimal).
  2. Using the range iterator on 20000 elements and using custom MapIterator / FilterIterators we see approx. an 8x improvement.
  3. Using my PR which avoids scheduling a few unnecessary tasks (ie just calling the end event straight away where possible creates a further 10% improvement.)
  4. Using a CompositeTransform that reduces over all of the maps and filters causes a further 10-15% performance improvement when compared to simply creating 50 new Map and Filter iterators
  5. Pre-compile the transformation pipeline on the CompositeTransform and observe a 50%-100% performance gain

Combining all of these factors can result in a speed-up of 100x or greater when doing 50 map/filters on 20000 elements; and similar results when we do smaller numbers of map/filters (e.g. 5).

This gist includes some experiments and results that are discussed in the following comments.


@jacoscaz - The vast majority of your time is spent on the array iterator because of #38; I believe node does not implement similar optimizations to firefox. (@RubenVerborgh for reference; my machine took about 5x times longer to run the implementation @jacoscaz had using the array iterator).

I get the following results for the code I have attached below

ArrayIterator#map(): 145
MappingIterator: 19

if you do 50 mappings the results become

ArrayIterator#map(): 1005
MappingIterator: 143

now here is where it gets more interesting - if we use the array iterator for 50 mappings we get

ArrayIterator#map(): 5321
MappingIterator: 4698

So this means that with a combination of optimizing the ArrayIterator and using the MappingIterator there is a 35-40x performance increase to be had for 50 chaining operations (and for 5 chaining operations the problems of the ArrayIterator become even more pronounced; observing a 200x difference in results).

import {AsyncIterator, range} from 'asynciterator';

class MappingIterator<I, O> extends AsyncIterator<O> {
  constructor(source: AsyncIterator<I>, map: (item: I) => O) {
    super();
    source.on('end', () => {
      this.close();
    });
    source.on('readable', () => {
      this.emit('readable');
    });
    let item: I | null;
    this.read = (): O | null => {
      return ((item = source.read()) !== null)
        ? map(item)
        : null;
    };
  }
}

const time = (createStream: (max: number) => AsyncIterator<number>): Promise<number> => {
  const max = 200000
  const str = createStream(max);
  let count = 0;
  return new Promise((resolve, reject) => {
    const now = Date.now();
    str.on('data', () => {
      count += 1;
    })
      .on('end', () => {
        const then = Date.now();
        if (count != max + 1) {
          reject(new Error('Bad count'));
          return;
        }
        resolve(then - now);
      });
  })
}

const main = async () => {
  const mapMethodTime = await time((max) => {
    let iterator: AsyncIterator<number> = range(0, max);
    for (let i = 0; i < 5; i++) {
      iterator = iterator.map(item => item);
    }
    return iterator;
  });
  const mappingIteratorTime = await time((max) => {
    let iterator: AsyncIterator<number> = range(0, max);
    for (let i = 0; i < 5; i++) {
      iterator = new MappingIterator(iterator, item => item);
    }
    return iterator;
  });
  console.log(`ArrayIterator#map(): ${mapMethodTime}`);
  console.log(`MappingIterator: ${mappingIteratorTime}`);
};

main().catch((err) => {
  console.error(err);
  process.exit(1);
});

Environment:

npm v8.4.1
node v17.4.0

Run on Dell XPS15 with 32G RAM

@jeswr
Copy link
Collaborator Author

jeswr commented Mar 23, 2022

With the PR (#45) I have made I get; for the ArrayIterator method

ArrayIterator#map(): 5532
MappingIterator: 4770

and for the range method I get

ArrayIterator#map(): 944
MappingIterator: 126

As a summary, using the range iterator and doing 50 transformations we get

Current Version New PR
ArrayIterator#map() 1005 944
MappingIterator 143 126

To do this I used the following MapIterator

class MappingIterator<I, O> extends AsyncIterator<O> {
  constructor(source: AsyncIterator<I>, map: (item: I) => O) {
    super();
    source.on('end', () => {
      this.close();
    });
    source.on('readable', () => {
      this.emit('readable');
    });
    let item: I | null;
    this.read = (): O | null => {
      const item = source.read();
      if (item === null) {
        // @ts-ignore
        if (this._state === CLOSED)
          this._end();
        return null
      }
      return map(item);
    };
  }
}

@jeswr
Copy link
Collaborator Author

jeswr commented Mar 23, 2022

I've also implemented a FilterIterator against my PR (below), for 50 iterations on range the results are

ArrayIterator#filter(): 72
FilterIterator: 13
class FilterIterator<I> extends AsyncIterator<I> {
  constructor(source: AsyncIterator<I>, filter: (item: I) => boolean) {
    super();
    source.on('readable', () => {
      this.emit('readable');
    });
    let item: I | null;
    this.read = (): I | null => {
      let item;
      while ((item = source.read()) !== null) {
        if (filter(item))
          return item;
      }
      // @ts-ignore
      if (source._state === ENDED || source._state === CLOSED)
        this._end();
      return null;
    };
  }
}

@jeswr
Copy link
Collaborator Author

jeswr commented Mar 24, 2022

Sorry to keep spamming this thread - I've also implemented a CompositeIterator and it also shows non-trivial performance gains relative to just repeatedly applying the MapIterator and FliterIterator (~10-15% improvement)

ArrayIterator#map(): 894
MappingIterator: 105
CompositeIterator: 90
import { CLOSED, ENDED } from 'asynciterator';
import { AsyncIterator, ArrayIterator, range } from './asynciterator';

type Transform = {
  type: 'filter';
  function: (elem: any) => boolean
} | {
  type: 'map';
  function: (elem: any) => any
}

class CompositeMapFilter<T> extends AsyncIterator<T> {
  private tranforms: Transform[] = [];
  constructor(private source: AsyncIterator<T>) {
    super();
    source.on('readable', () => {
      this.emit('readable');
    });
  }
  read(): T | null {
    let item = this.source.read();

    for (let i = 0; i < this.tranforms.length; i += 1) {
      const transform = this.tranforms[i];
      switch (transform.type) {
        case 'map':
          item = transform.function(item);
        case 'filter': {
          if (!transform.function(item)) {
            item = this.source.read();
            if (item === null)
              break;
            else
              i = 0
          }
        }
      }
    }
  
    // @ts-ignore
    if (item === null && this.source._state === ENDED) {
      this._end();
    }

    return item;
  }
  // @ts-ignore
  filter(filter: (item: T) => boolean, self?: any): CompositeMapFilter<T> {
    this.tranforms.push({ type: 'filter', function: filter });
    return this;
  }
  // @ts-ignore
  map(map: (item: T) => T): CompositeMapFilter<T> {
    this.tranforms.push({ type: 'map', function: map });
    return this;
  }
}

class MappingIterator<I, O> extends AsyncIterator<O> {
  constructor(source: AsyncIterator<I>, map: (item: I) => O) {
  super();
  source.on('readable', () => {
    this.emit('readable');
  });
  let item: I | null;
  this.read = (): O | null => {
    const item = source.read();
    if (item === null) {
      // @ts-ignore
      if (source._state === ENDED || source._state === CLOSED)
        this._end();
      return null
    }
    return map(item);
  };
}
}

class FilterIterator<I> extends AsyncIterator<I> {
  constructor(source: AsyncIterator<I>, filter: (item: I) => boolean) {
    super();
    source.on('readable', () => {
      this.emit('readable');
    });
    let item: I | null;
    this.read = (): I | null => {
      let item;
      while ((item = source.read()) !== null) {
        if (filter(item))
          return item;
      }
      // @ts-ignore
      if (source._state === ENDED || source._state === CLOSED)
        this._end();
      return null;
    };
  }
}

const generateArr = (): number[] => {
  let i = 0;
  return new Array(200000)
    .fill(true)
    .map(() => i++);
};

const time = (createStream: (arr: number[]) => AsyncIterator<number> | CompositeMapFilter<number>): Promise<number> => {
  const arr = generateArr();
  const str = createStream(arr);
  let count = 0;
  return new Promise((resolve, reject) => {
    const now = Date.now();
    str.on('data', () => {
      count += 1;
    })
      .on('end', () => {
        const then = Date.now();
        // console.log(count, arr.length / 2, arr)
        if (count != arr.length / 2) {
          // reject(new Error('Bad count'));
          // return;
        }
        resolve(then - now);
      });
  })
}

const main = async () => {
  const mapMethodTime = await time((arr) => {
    let iterator: AsyncIterator<number> = range(0, arr.length - 1);
    for (let i = 0; i < 50; i++) {
      iterator = iterator.filter(item => item % 2 === 0);
      iterator = iterator.map(item => item);
    }
    return iterator;
  });
  const mappingIteratorTime = await time((arr) => {
    let iterator: AsyncIterator<number> = range(0, arr.length - 1);
    for (let i = 0; i < 50; i++) {
      iterator = new FilterIterator(iterator, item => item % 2 === 0);
      iterator = new MappingIterator(iterator, item => item);
    }
    return iterator;
  });
  const compIteratorTime = await time((arr) => {
    let iterator: CompositeMapFilter<number> = new CompositeMapFilter(range(0, arr.length - 1));
    for (let i = 0; i < 50; i++) {
      iterator = iterator.filter(item => item % 2 === 0);
      iterator = iterator.map(item => item);
    }
    return iterator;
  });
  console.log(`ArrayIterator#map(): ${mapMethodTime}`);
  console.log(`MappingIterator: ${mappingIteratorTime}`);
  console.log(`CompositeIterator: ${compIteratorTime}`);
};

main().catch((err) => {
  console.error(err);
  process.exit(1);
});

@jeswr
Copy link
Collaborator Author

jeswr commented Mar 24, 2022

You can see a further 50-100% performance improvement by 'pre-compiling' the CompositeTransform - see https://gist.github.com/jeswr/f49a6975f1bfc088a9c32565e69761bf

@jacoscaz
Copy link
Collaborator

Sorry to keep spamming this thread

@jeswr if all spam were like this we'd all be much happier about spam. Seriously, awesome work.

To summarize the results I have in the following comments

  1. ArrayIterator causes about a 5x speed degradation compared to range iterator on 20000 elements (because as I raised here the V8 implementation of Array.shift is still very suboptimal).

Although I was not expecting shift() to be optimized to the extent implied by @RubenVerborgh , I was also not expecting it to be slow to the point of eclipsing other differences in my tests. Unexpected and very good to know.

The latest revision of my gist at https://gist.github.com/jacoscaz/e20812991714092dbb20cfd432d1bf00/e6cbf3caab930e88edb62f34562619603572c42b has a RangeIterator that appears to run 500x faster than ArrayIterator, which a difference so enormous that I keep looking for mistakes on my end.

  1. Using the range iterator on 20000 elements and using custom MapIterator / FilterIterators we see approx. an 8x improvement.

Can confirm. When going from calling .map() on the RangeIterator to passing the RangeIterator as the source of MappingIterator I, too, see an improvement of 7x - 9x.

  1. Using my PR which avoids scheduling a few unnecessary tasks (ie just calling the end event straight away where possible creates a further 10% improvement.)

Haven't tested this yet but I have no reason to object.

  1. Using a CompositeTransform that reduces over all of the maps and filters causes a further 10-15% performance improvement when compared to simply creating 50 new Map and Filter iterators
  2. Pre-compile the transformation pipeline on the CompositeTransform and observe a 50%-100% performance gain

I can see the benefits of doing this but I'm wary of the fact that it breaks iterator immutability, which can be a source of bugs. I would still try to reap as much as possible by offering a MapFilteringIterator<I, O> whose transform function is allowed to output null, indicating that the I item should be discarded.

Combining all of these factors can result in a speed-up of 100x or greater when doing 50 map/filters on 20000 elements; and similar results when we do smaller numbers of map/filters (e.g. 5).

Indeed, performance-wise this should turn out to be an extremely productive and effective investment of time.

@jeswr
Copy link
Collaborator Author

jeswr commented Mar 24, 2022

RangeIterator that appears to run 500x faster than ArrayIterator

Impressive - and I can't see any problems with it; though I wonder why it also seems to be a much greater improvement than using range (i.e IntegerIterator).

I'm wary of the fact that it breaks iterator immutability, which can be a source of bugs

Makes sense - I think there should be ways to give the feel of Immutability to the API (or at the very least throw meaningful errors for devs when they misuse a Mutable AsyncIterator) while using mutations like this under the hood for performance.

This is probably more appropriate in a separate package; but one option I was thinking of is as follows

PlaceholderIterator extends AsyncIterator<never> {
   error() {
       throw new Error('You are trying to use methods on an iterator - which has been destroyed by another transform operation')
   }
   map = this.error;
   read = this.error;
   ...
}
const placeHolderIterator = new PlaceholderIterator();

class ComposeMapFilterIterator extends AsyncIterator<T> {
    map(fn) {
       this.transformers.push(fn);
       const that = this;
       this = placeHolderIterator
       return that;
    }
}

So this means if any potential usage that would cause bugs such as using .map or .read on upstream iterators; the problem immediately gets identified.

Furthermore the build method that I had in the original implementation of this iterator could just be called on the first read.

In fact I'm inclined to argue that such an implementation may lead to fewer bugs in some cases where devs may be trying to read data from an iterator that they shouldn't be.

@jacoscaz
Copy link
Collaborator

jacoscaz commented Mar 24, 2022

@jeswr as I was investigating this, I came up with the same idea as #38 (comment) and had a go at it in https://github.com/jacoscaz/AsyncIterator/blob/42311b68988a971fc072c896100d7082b4162470/asynciterator.ts#L744-L783 . I'll make a PR later but I just realized we've put quite a lot of things on the table and we should probably make a plan of action, including what should go into asynciterator and what should go in Comunica. That I can think of:

Tagging @rubensworks and @RubenVerborgh to keep everyone in the loop.

@jeswr
Copy link
Collaborator Author

jeswr commented Mar 24, 2022

ArrayIterator can be made incredibly faster

Indeed - but its also probably worth tracking https://bugs.chromium.org/p/v8/issues/detail?id=12730 in case the V8 team solves this upstream for us.
You would also want to delete the source buffer on the end event in your implementation.

I'm seeing ~10% increase in performance by switching BufferedIterator

Nice!

50x - 100x increase in performance by compositing transformations

Standalone it is only approx a ~2x improvement - IIRC the 100x figure came from compounding most of the factors above

but can see the point in having it upstream too. Where should this go?

I'd say this should be a new library of its own e.g. 'asynciterator-compose.js'

@jacoscaz
Copy link
Collaborator

I'd say this should be a new library of its own e.g. 'asynciterator-compose.js'

Makes sense!

comunica/comunica#965

In theory, this should be mitigated by #46 although my testing has shown a significantly smaller performance gain than I had expected. @jeswr do you have time to do some testing of that branch on your use cases?

Still, doing away with buffering entirely would be a good idea, given the gains seen in #48 . I think we could port the source wrapping part of TransformIterator into a dedicated WrappingIterator to keep this within asynciterator. How does that sound @RubenVerborgh @jeswr ?

@jeswr
Copy link
Collaborator Author

jeswr commented Mar 25, 2022

@jeswr do you have time to do some testing of that branch on your use cases?

I've not done any benchmarking for wrap - I imagine a degenerate case would be wrapping something like the range iterator with a large number of iterations. Such a case, however, is a prime example of where doing away with buffering entirely would be better.

I think we could port the source wrapping part of TransformIterator into a dedicated WrappingIterator to keep this within asynciterator

Sounds good to me

@RubenVerborgh
Copy link
Owner

I'd say this should be a new library of its own e.g. 'asynciterator-compose.js'

Happy to have it here, really. Could be a separate file though (and maybe we could also split the code into more separate files in general).

@jeswr
Copy link
Collaborator Author

jeswr commented Apr 3, 2022

As discussed with @jacoscaz the proposed timeline for resolving this issue is

(1) Get full coverage on current state of #57 . Perhaps minus the SyncUnionIterator for now, and then get the contents of that PR merged into main (likely broken across a few PRs).

(2) Merge #45, because this resolves a lot of issues I had for the SyncUnionIterator as it allows the iterator to enter a done state in a timely manner (i.e. during the first read that produces a null value rather than asynchronously afterwards). The reason I would do this after getting in the majority of performance improvements is that it does include breaking changes.

(3) Address SyncUnionIterator

(4) Address Cloning

Pinging @RubenVerborgh

@jacoscaz
Copy link
Collaborator

jacoscaz commented Apr 3, 2022

In addition to the above plan shared by @jeswr , if nobody else needs these changes to live within a single branch for testing purposes I would suggest breaking the work done across the following PRs, to be refined and merged in order:

  1. Faster synchronous transforms, which would include SynchronousTransformIterator and its subclasses MappingIterator, FilteringIterator, SkippingIterator, LimitingIterator, MultiMapFilterTransformIterator , MultiMapFilterIterator and MaybeMappingIterator from Faster filtering, mapping, limiting, skipping and wrapping #57 . IMHO, these should be evaluated together as they share a common ancestor, itself a new addition to the codebase, and they point to one another via the various map(), filter() and other commodity methods. I would caution against breaking this further into separate PRs.

  2. Faster wrapping, which would include work done on wrap() and the new WrappingIterator from Faster filtering, mapping, limiting, skipping and wrapping #57 .

  3. End on read, already working in breaking: end on read (part 1) #45

  4. Faster (synchronous) union, already working in chore: SyncUnionIterator #54

  5. Faster cloning, already started in https://github.com/jacoscaz/AsyncIterator/pull/1 (though we might change approach)

When taken all together, these changes will make for dramatic performance gains. I have one use case, which I unfortunately cannot share ATM, where things become 1200x faster simply by switching to 1 + 2 + 4.

@RubenVerborgh
Copy link
Owner

Thanks for all the work and please proceed, with the following caveats:

  • breaking: end on read (part 1) #45 is a semver.major so anything before that will be the last 3.x, and anything after that will become part of 4.x
  • I'm not sold yet on SynchronousTransformIterator; I'd need to see a clean version of the code, but things like maybeMap sound like over-optimizations
  • Given the impact of some changes, I'll need measurable evidence / executable benchmark code for each non-trivial performance PR

Thanks again, @jeswr and @jacoscaz!

@jacoscaz
Copy link
Collaborator

jacoscaz commented Apr 3, 2022

I'm not sold yet on SynchronousTransformIterator; I'd need to see a clean version of the code, but things like maybeMap sound like over-optimizations

@RubenVerborgh is it the overall approach that does not convince you or is it the fact that we might have pushed it too far with things like maybeMap? Asking just so I can better direct my efforts. The SynchronousTransformIterator class per-se is just a way to centralize a bunch of code shared by SkippingIterator, MappingIterator and so on. Totally understand if you need a cleaner version of the code to answer this question.

@RubenVerborgh
Copy link
Owner

RubenVerborgh commented Apr 3, 2022

is it the overall approach that does not convince you or is it the fact that we might have pushed it too far with things like maybeMap?

At the moment, I can't see the forest for the trees; the code needs a bit of love still. And some bits might take it too far indeed.

@jacoscaz
Copy link
Collaborator

jacoscaz commented Apr 20, 2022

A few comments to keep everyone in the loop and for posterity...

While working on 5. Faster cloning, I've realized that my assumptions about the relative performance of Array#push() and LinkedList#push() were wrong. Even when dealing with millions of elements, Array#push() consistently outperforms LinkedList#push(). Furthermore, ClonedIterator#read() already bypasses BufferedIterator's internal buffering, making cloning already far more optimized than I had thought.

@jeswr has also been working on a non-breaking alternative to #54 in #65 for faster unions which also optimizes prepend() and append() (#67).

All this said, I think in order to close this issue and reap the benefits in Comunica we're looking at the following:

  1. Faster synchronous transforms, PR open at Add MappingIterator for synchronous transformations #59 and currently being reviewed by @RubenVerborgh
  2. Faster wrapping, PR open at feat: faster wrapping #63 by @jacoscaz
  3. Faster union, PR in WIP at WIP: V3 Sync Unions #65 by @jeswr
  4. Release new 3.x version
  5. End on read, PR open at breaking: end on read (part 1) #45 by @jeswr (required given Chore: remove unecessary transform calls comunica/comunica#977 (comment))
  6. Release new 4.x version

Also tagging @rubensworks .

@RubenVerborgh
Copy link
Owner

. Even when dealing with millions of elements, Array#push() consistently outperforms LinkedList#push()

That is expected; but what we need to know is the behavior for repeated pushes and pulls, cfr. #60 (comment)

@jacoscaz
Copy link
Collaborator

jacoscaz commented Apr 20, 2022

That is expected; but what we need to know is the behavior for repeated pushes and pulls, cfr. #60 (comment)

As far as my understanding goes and limited to cloning, nothing is ever pulled from the backing buffer as the entire history starting from when the first clone is created (EDIT) is maintained for future cloned iterators to go through; we only care about pushes done at

this._history[pos] = item;

Nonetheless, I've done some work on a LinkedList-backed HistoryReader here but saw no performance gains whatsoever, which eventually led me to question my assumptions on #push().

@RubenVerborgh
Copy link
Owner

LinkedList-backed HistoryReader here but saw no performance gains whatsoeve

But that's because no data is ever pulled from it.

@jacoscaz
Copy link
Collaborator

But that's because no data is ever pulled from it.

Yes, precisely! What I meant in #44 (comment) is that, given that HistoryReader only pushes to its internal buffer while never pulling from it (by design) and given the fact that Array#push() outperforms LinkedList#push() even with millions of items, there's no more reason for me to work on a LinkedList-backed refactoring of HistoryReader, hence my dropping it from the updated list of PRs. Apologies for the lack of clarity.

@jacoscaz
Copy link
Collaborator

jacoscaz commented Jun 27, 2022

Update for posterity and to keep track of the general progress on this. Tagging @rubensworks @jeswr @RubenVerborgh .

Open PRs that we're working on, to be reviewed / merged / rebased in order:

  1. Faster synchronous transforms
  2. Faster wrapping
  3. Faster union
  4. End on read (required given Chore: remove unecessary transform calls comunica/comunica#977 (comment))
  5. Abstract SynchronousTransformIterator class

@jacoscaz
Copy link
Collaborator

jacoscaz commented Jul 31, 2022

We're halfway through! Faster unions are next. @RubenVerborgh @jeswr @rubensworks do you think we can get these merged by the end of august?

@jeswr
Copy link
Collaborator Author

jeswr commented Jul 31, 2022

@jacoscaz 4 might be a significant amount of work to rebase and check for correctness; so I wouldn't bet on that by end of August. The other two should be much less work but will depend on everyones schedules.

@RubenVerborgh
Copy link
Owner

My plan is to wrap up AsyncIterator v3.x soonish; the moment we start tackling #45, we're in the v4.x territory.
But I guess a v4.x alpha could be possible.

@jacoscaz
Copy link
Collaborator

jacoscaz commented Aug 1, 2022

@RubenVerborgh @jeswr perhaps it would make sense to do faster unions and the synchronous transform iterator first, wrapping up 3.x, and leave end on read last. Would that work for you guys?

@jeswr
Copy link
Collaborator Author

jeswr commented Aug 1, 2022

I agree with the strategy - will try and wrap up work on the faster unions today

@jacoscaz
Copy link
Collaborator

jacoscaz commented Aug 1, 2022

Update for posterity and to keep track of the general progress. Tagging @rubensworks @jeswr @RubenVerborgh .

Open PRs that we're working on, to be reviewed / merged / rebased in order:

non-breaking, to be released in minor 3.x versions:

  1. Faster synchronous transforms
  2. Faster wrapping
  3. Abstract SynchronousTransformIterator class

breaking, to be released in 4.0.0:

  1. End on read (required given Chore: remove unecessary transform calls comunica/comunica#977 (comment))
  2. Faster union

EDITED 2022-08-26: faster unions are now breaking changes, moved to 4.x

@jacoscaz
Copy link
Collaborator

I kinda lost track of where things are. Is the v4 PR still being worked on?

@jeswr
Copy link
Collaborator Author

jeswr commented Apr 12, 2023

I kinda lost track of where things are. Is the v4 PR still being worked on?

Somewhat fell off my radar for a bit; I do have most of the code for a new version locally but there is still work do be done for the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants