Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train a Project using API #634

Open
mo-fu opened this issue Oct 26, 2022 · 3 comments
Open

Train a Project using API #634

mo-fu opened this issue Oct 26, 2022 · 3 comments

Comments

@mo-fu
Copy link
Contributor

mo-fu commented Oct 26, 2022

It would be great to have a REST method were you could upload a compressed archive and then a model is trained. You probably would have to add the project configuration to the call. An alternative could be creating new projects via API and then train them afterwards.
To stop disk space from filling up, the intermediate files should be removed after training or there has to be an API call for cleaning intermediate data. The later variant would also allow parameter optimization using the uploaded data.

You would probably need some kind of locking for the projects file to keep it in sync with the projects of the running instance

@osma
Copy link
Member

osma commented Jan 30, 2023

This sounds like a good feature and something I've also thought about in the past. However, it's quite a challenge to do all of this and also ensure data consistency. This may be worthwhile to split into several features implemented separately:

  • create a project (with configuration) via the REST API
  • alter the configuration of an existing project via the REST API
  • train the project via the REST API
  • add support for these operations in the Web UI

There is already a tiny bit of functionality in this direction - the learn method in the REST API. Perhaps some of that code could be used for inspiration.

For the API design, I think it might be worth looking at the Maui Server HTTP API which has similar goals (they have taggers, Annif has projects).

@juhoinkinen
Copy link
Member

There could be also CLI command to create a new project configuration, named e.g. create-project or new-project.

Most backend-specific parameters could be set to default values coming from projects.cfg.dist and written to the configuration, if they cannot be omitted altogether (i.e. having default values hardcoded in the backend). If a backend has some necessary parameters that cannot be defaulted, they could be given with the existing --backend-param option (after some tweaking of it). Usage could be like

annif create-project yso-fi --backend nn_ensemble --language fi --vocab yso \
     --backend-param sources=yso-mllm-fi,yso-fasttext-fi

This could speed up creating new projects, for which currently copy-pasting an example project configuration from Wiki or projects.cfg.dist is often used.

@juhoinkinen
Copy link
Member

There could be also CLI command to create a new project configuration, named e.g. create-project or new-project.

Most backend-specific parameters could be set to default values coming from projects.cfg.dist and written to the configuration, if they cannot be omitted altogether (i.e. having default values hardcoded in the backend). If a backend has some necessary parameters that cannot be defaulted, they could be given with the existing --backend-param option (after some tweaking of it). Usage could be like

annif create-project yso-fi --backend nn_ensemble --language fi --vocab yso \
     --backend-param sources=yso-mllm-fi,yso-fasttext-fi

This could speed up creating new projects, for which currently copy-pasting an example project configuration from Wiki or projects.cfg.dist is often used.

For creating the project configuration via CLI, instead of giving backend parameters with options, a better approach is to inquire them with prompts. There could be even autocompletion showing selectable values, e.g. for backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants