Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Set Management in Annif #635

Open
mo-fu opened this issue Oct 26, 2022 · 3 comments
Open

Data Set Management in Annif #635

mo-fu opened this issue Oct 26, 2022 · 3 comments

Comments

@mo-fu
Copy link
Contributor

mo-fu commented Oct 26, 2022

When extending Annif with more hyperparamter optimization functionality or training via API it may be useful to have data set management.

Possible functionalities:

  • adding a data set annif data add $DATA_SET_NAME $PATH
  • splitting a data set into folds annif data split $DATE_SET_NAME 0.7:train 0.2:test 0.1:validate, could be adressed using annif train ${DATA_SET_NAME}#train
  • removing data sets annif data remove $DATA_SET_NAME
@osma
Copy link
Member

osma commented Oct 26, 2022

Can you provide an example how this could look from the user perspective? For example CLI commands or REST API calls?

@mo-fu
Copy link
Contributor Author

mo-fu commented Oct 26, 2022

I added some examples on CLI usage.

@osma
Copy link
Member

osma commented Oct 28, 2022

Ah, now I understand what you mean by this, thanks!

How would this be implemented? Where would the managed data sets be stored? Somewhere under the data directory? Would these be copies of the originals or something else?

This would expand the scope of Annif quite a lot. I'm not sure it would be worth the additional complexity. But it's an interesting idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants