Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support for suggestion batches in NN ensemble #687

Open
osma opened this issue Apr 14, 2023 · 0 comments
Open

Better support for suggestion batches in NN ensemble #687

osma opened this issue Apr 14, 2023 · 0 comments

Comments

@osma
Copy link
Member

osma commented Apr 14, 2023

As noted in PR #681 ("Potential future work"), the way NN ensemble handles batches could be improved:

I'm not quite happy with how the NN ensemble handles suggestion results from other projects, both during training and suggest operations. For example, the training samples are stored in LMDB one document at a time, but now it would be easier to store them as whole batches instead, which could be more efficient. But I decided that this PR is already much too big and it would make sense to try to improve batching in the NN ensemble in a separate follow-up PR. There is already an attempt to do part of this in PR #676; that could be a possible starting point.

In particular:

  • training documents could be processed by using batch operations on source projects; there was an attempt to do this in PR Batch processing in training of NN ensemble - base project suggest calls #676
  • training data is currently stored in LMDB one document at a time; it would make sense to store them as batches instead (and perhaps use another data storage mechanism, e.g. TF Data / Dataset)
  • _merge_source_batches could perform calculations using sparse arrays and only convert to NumPy arrays at the end (and transpose if necessary)

Of course the changes need to be properly benchmarked.

@osma osma added this to the 1.0 milestone Apr 14, 2023
@juhoinkinen juhoinkinen modified the milestones: 1.0, 1.1 Aug 16, 2023
@juhoinkinen juhoinkinen modified the milestones: 1.1, Short term Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants