-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V2: Add CLI and tests for hyperparameter optimization #734
Conversation
I'm just leaving a comment here to document that the current release of Ray has an istallation issue on Windows with python 3.11 at the moment as described here. From the thread it looks like they already merged in a fix so hopefully this will be released/resolved soon, but it's broken for the time being. |
chemprop/cli/hyperopt.py
Outdated
|
||
logger = logging.getLogger(__name__) | ||
|
||
AVAILABLE_SPACES = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a really good default. However, It would be good to give the user the option to pass in their own search space as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! What type of input format do you think we should use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just leaving a comment as per our discussion today that we are leaning towards supporting a JSON input that contains the dictionary which we translate to the hyperparameter search space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we plan to make this available via the CLI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JacksonBurns We plan to make this available via the CLI, but we haven't figured out an appropriate input format. In previous dev meeting, people think JSON is a good format to use. I tried, but functions are not JSON serializable. I tried many ways to translate the search space dictionary to be JSON serializable, but they all feel hacky. I will implement a version of CLI available through the pickle format. Feel free to chime in on this.
Edit: I figure a way to handle the JSON input format. Trying it right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So input this as a JSON format would look something like this.
{
"activation":
{
"type": "choice",
"args": ["relu", "gelu", "tanh"],
},
"aggregation":
{
"type": "choice",
"args": ["mean", "sum", "max",],
},
"aggregation_norm":
{
"type": "quniform",
"kwargs": {"low": 1,
"high": 200,
"q": 1,}
},
"batch_size":
{
"type": "quniform",
"kwargs": {"low": 5,
"high": 200,
"q": 5,}
},
"depth":
{
"type": "randint",
"kwargs": {"low": 2,
"high": 6,}
},
"dropout":
{
"type": "choice",
"args": [
{"type": "choice", "args": [0.0]},
{"type": "quniform", "kwargs": {"low": 0.05, "high": 0.4, "q": 0.05}},
],
},
"ffn_hidden_size":
{
"type": "quniform",
"kwargs": {"low": 300, "high": 2400, "q": 100,}
},
"ffn_num_layers":
{
"type": "randint",
"kwargs": {"low": 2, "high": 6,}
},
"final_lr_ratio":
{
"type": "loguniform",
"kwargs": {"low": 1e-4, "high": 1,}
},
"hidden_size":
{
"type": "quniform",
"kwargs": {"low": 300,
"high": 2400,
"q": 100,}
},
"init_lr_ratio":
{
"type": "loguniform",
"kwargs": {"low": 1e-4,
"high": 1,}
},
"linked_hidden_size":
{
"type": "quniform",
"kwargs": {"low": 300,
"high": 2400,
"q": 100,}
},
"max_lr":
{
"type": "loguniform",
"kwargs": {"low": 1e-6,
"high": 1e-2,}
},
}
This JSON corresponds to a python dictionary for search space
{
"activation": tune.choice(["relu", "gelu", "tanh"]),
"aggregation": tune.choice(["mean", "sum", "max",]),
"aggregation_norm": tune.quniform(lower=1, upper=200, q=1),
"batch_size": tune.quniform(lower=5, upper=200, q=5),
"depth": tune.randint(lower=2, upper=6),
"dropout": tune.choice([tune.choice([0.0]), tune.quniform(lower=0.05, upper=0.4, q=0.05)]),
"ffn_hidden_size": tune.quniform(lower=300, upper=2400, q=100),
"ffn_num_layers": tune.randint(lower=2, upper=6),
"final_lr_ratio": tune.loguniform(lower=1e-4, upper=1),
"hidden_size": tune.quniform(lower=300, upper=2400, q=100),
"init_lr_ratio": tune.loguniform(lower=1e-4, upper=1),
"linked_hidden_size": tune.quniform(lower=300, upper=2400, q=100),
"max_lr": tune.loguniform(lower=1e-6, upper=1e-2),
}
I prefer the PICKLE format because
- it is more flexible to handle the user input type
- The JSON format requires the same amount of user knowledge on ray tune function as the PICKLE format
- We don't have to maintain our custom JSON format to PICKLE format translator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two questions here:
- "How can we accept plaintext input to parameterize the search space for a given hyperparameter?"
- "More broadly, can we programmatically define a search space using plaintext input?"
For (1), we'd have to design some sort of factory pattern around the ray.tune
search space API. This actually isn't that hard and requires (i) a mapping from a string key to the corresponding function and (ii) a way to pass parameters to that function. Luckily, Ray has already provided (i) for us:
DISTRIBUTIONS = {
"uniform": tune.uniform,
"quniform": tune.quniform,
"loguniform": tune.loguniform,
"qloguniform": tune.qloguniform,
"randn": tune.randn,
"qrandn": tune.qrandn,
"randint": tune.randint,
"qrandint": tune.qrandint,
"lograndint": tune.lograndint,
"qlograndint": tune.qlograndint,
"choice": tune.choice,
"func": tune.sample_from, # NOTE: not a good idea
"grid": tune.grid_search,
}
You could also just do getattr(tune, key)
, but there's less control there in which search spaces we allow. From there, you just pass through the received parameters like so: func(**params)
. So you would need some sort of plaintext format that enables providing a mapping like this:
{
"distribution": "uniform",
"params": {
"lower": 0.0,
"upper": 1.0
}
}
Note
I'm using JSON here, but you can use YAML, TOML, INI, etc.
Then, inside hyperopt.py
, you can just create the search space like so:
d = json.load(...)
search_space = DISTRIBUTIONS[d['distribution']](**d['params'])
# or equivalently
search_space = getattr(tune, d['distribution'])(**d['params'])
There would need to be some error checking here to validate the input parameters. This can be somewhat cumbersome because we will need to establish, for each hyperparameter, the allowable search space parameterizations. Ex: "aggregation"
can only be "choice"
with params["choices"]
{"max", "norm", "mean", "sum"}
and "batch_size"
can be any integer distribution with a lower bound tune.choice
where params["choices"]
For (2), we would need to establish a mapping from a given hyperparameter key to its actual hyperparameter. We already have this in the form of our train.py
args and the update_args_with_config()
method defined below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidegraff Thanks for the detailed write-up! Note that using this would not allow a nested search space such as the one below, which is used in our default, unless we make the parsing function parse recursively.
"dropout": tune.choice([tune.choice([0.0]), tune.quniform(lower=0.05, upper=0.4, q=0.05)]),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's one of the drawbacks to allowing "fully user-defined search spaces" unless we also define our own text parser. I think that's outside the scope of the chemprop CLI and can thus be limited to (1) our hardcoded search space or (2) a user-implemented hyperoptimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decide to keep this part of changes as a separate PR and focus on getting the base case hyperparameter optimization in for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making this PR. I've left some comments below. As we switch from Hyperopt to Ray Tune, some v1 arguments might not work. Also, we probably need to support ensemble modeling and cross validation during hyperparameter tuning when model is trained on a tiny dataset. In such cases, the model performance would highly depend on the splits.
As discussed in the previous meeting, we are considering providing some flexibility to allow users to define their own search space. However, the method for handling parameter expressions has not been fully determined. It largely depends on the level of flexibility we wish to offer. Below are some of my thoughts: (1) Provide a range list for each search parameter
(2) Establish a stochastic expression in advance and only let user provide the range
(3) Allow users to define both the stochastic expression and the search range simultaneously
The above cases are based on the JSON format, but I personally prefer YAML more because we can add comments and it is more succinct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small notes, especially about some possibly blocking PRs
chemprop/cli/hyperopt.py
Outdated
|
||
logger = logging.getLogger(__name__) | ||
|
||
AVAILABLE_SPACES = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we plan to make this available via the CLI?
…e use for dropout. Commenting out relevant code for now.
I was testing hyperparameter optimization function. raytune + hyperopt works fine. raytune + optuna, however, it seems that it couldn't handle the nested distribution search space such as below.
The best config is returned as
while something like this is expected
I'm inclined to comment out the relevant code for now for the v2.0 release deadline and only support random search and hyperopt search algorithm. We can keep track of this in the future. |
@KnathanM @JacksonBurns @am2145 This is ready for another review and merge! Would like to get this in today for the v2.0 release. All additional feature requests should be opened as a v2.1 issues for feature tracking! |
@hwpang CI is failing please take a look: https://github.com/chemprop/chemprop/actions/runs/8662788070/job/23755482209?pr=734#step:5:10 |
.github/workflows/ci.yml
Outdated
python -m pip install ".[dev,docs,test,hpopt]" | ||
python -m pip install optuna |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python -m pip install ".[dev,docs,test,hpopt]" | |
python -m pip install optuna | |
python -m pip install ".[dev,docs,test]" | |
python -m pip install ".[hpopt]" optuna |
This could fix the CI issue
…i.yml (Line: 85, Col: 13): Unexpected symbol: '"3'. Located at position 26 within expression: matrix.python-version == "3.11""
chemprop/cli/hpopt.py
Outdated
from ray import tune | ||
from ray.train import CheckpointConfig, RunConfig, ScalingConfig | ||
from ray.train.lightning import ( | ||
RayDDPStrategy, | ||
RayLightningEnvironment, | ||
RayTrainReportCallback, | ||
prepare_trainer, | ||
) | ||
from ray.train.torch import TorchTrainer | ||
from ray.tune.schedulers import ASHAScheduler | ||
|
||
NO_HYPEROPT = False | ||
try: | ||
from ray.tune.search.hyperopt import HyperOptSearch | ||
except ImportError: | ||
NO_HYPEROPT = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from ray import tune | |
from ray.train import CheckpointConfig, RunConfig, ScalingConfig | |
from ray.train.lightning import ( | |
RayDDPStrategy, | |
RayLightningEnvironment, | |
RayTrainReportCallback, | |
prepare_trainer, | |
) | |
from ray.train.torch import TorchTrainer | |
from ray.tune.schedulers import ASHAScheduler | |
NO_HYPEROPT = False | |
try: | |
from ray.tune.search.hyperopt import HyperOptSearch | |
except ImportError: | |
NO_HYPEROPT = True | |
NO_HYPEROPT = False | |
try: | |
from ray.tune.search.hyperopt import HyperOptSearch | |
from ray import tune | |
from ray.train import CheckpointConfig, RunConfig, ScalingConfig | |
from ray.train.lightning import ( | |
RayDDPStrategy, | |
RayLightningEnvironment, | |
RayTrainReportCallback, | |
prepare_trainer, | |
) | |
from ray.train.torch import TorchTrainer | |
from ray.tune.schedulers import ASHAScheduler | |
except ImportError: | |
NO_HYPEROPT = True |
resolves latest test failure here: https://github.com/chemprop/chemprop/actions/runs/8664591034/job/23761373381?pr=734#step:6:136
chemprop/cli/hpopt.py
Outdated
|
||
logger = logging.getLogger(__name__) | ||
|
||
DEFAULT_SEARCH_SPACE = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have this dict as a global variable it will be created at import time of this module. This dict requires that the hpopt packages are installed. You can fix this by:
- moving this defintion into the try/except block where you do the imports
- inside the CLI main class, move the import of the hopt subcommand into the body of the cli function to avoid importing until needed
The former is easier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM to me assuming tests pass. My only ask is that you open an issue to track adding optuna back in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only have a few minor comments here, but I will approve it for now as most parts look good to me. Perhaps you could open an issue related to enabling fully customizable distribution types and search ranges for tracking purposes.
Co-authored-by: Shih-Cheng Li <scli@mit.edu>
Description
I will work on implementing hyperparameter optimization CLI in this PR for v2. This is currently a work in progress, but comments / reviews are welcome to help shape this PR.
Example / Current workflow
There is not much CLI support for hyperparameter search in v2 currently.
Bugfix / Desired workflow
Based on discussion from last week's meeting, we choose to Ray Tune as hyperparameter optimization backend.
I have added arguments related to hyperparameter search in v1 to the Google sheet. We should go over it and discuss ones to keep/discard.
Questions
hyperopt.py
overlaps with the function intrain.py
. I could try to write a generic version used by both to reduce redundant codes. I can also see the flexibility to keep them separate.config.json
file is saved. Another thing we should test is that theupdate_args_with_config
function actually updatesargs
correctly.Relevant issues
Updated hyperparameter search space: #571
Checklist