Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: '<' not supported between instances of 'NoneType' and 'int' #5226

Open
yancychy opened this issue Feb 5, 2024 · 10 comments
Open
Labels
bug Issue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself.

Comments

@yancychy
Copy link

yancychy commented Feb 5, 2024

Don't use GitHub Issues to ask support questions.

Hi, when I try to run optuna on parallel mode, I have 4 CPUs on the node. I tried to set n_jobs=-1, but it failed with following errors. It only work when I set n_jobs=1.
Python: 3.10
Optuna: 3.5.0

[W 2024-02-05 13:12:09,752] Trial 68 failed with parameters: {} because of the following error: TypeError("'<' not supported between instances of 'NoneType' and 'int'").
Traceback (most recent call last):
  File "/home/cheny48/.local/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/gpfs/gsfs12/users/cheny48/TFproject/fly/foo_tmp.py", line 11, in objective
    x = trial.suggest_int("tx", -10, 10)
  File "/home/cheny48/.local/lib/python3.10/site-packages/optuna/_convert_positional_args.py", line 83, in converter_wrapper
    return func(**kwargs)
  File "/home/cheny48/.local/lib/python3.10/site-packages/optuna/trial/_trial.py", line 326, in suggest_int
    suggested_value = int(self._suggest(name, distribution))
  File "/home/cheny48/.local/lib/python3.10/site-packages/optuna/trial/_trial.py", line 635, in _suggest
    param_value = self.study.sampler.sample_independent(
  File "/home/cheny48/.local/lib/python3.10/site-packages/optuna/samplers/_tpe/sampler.py", line 427, in sample_independent
    trials = study._get_trials(deepcopy=False, states=states, use_cache=True)
  File "/home/cheny48/.local/lib/python3.10/site-packages/optuna/study/study.py", line 274, in _get_trials
    self._thread_local.cached_all_trials = self._storage.get_all_trials(
  File "/home/cheny48/.local/lib/python3.10/site-packages/optuna/storages/_cached_storage.py", line 234, in get_all_trials
    trials = list(sorted(trials.values(), key=lambda t: t.number))
TypeError: '<' not supported between instances of 'NoneType' and 'int'

My example code:

import optuna
import sys
from optuna.trial import TrialState

jobID = sys.argv[1]

urlName= 'mysql://root@localhost:55555/example0'

def objective(trial):
    x = trial.suggest_int("tx", -10, 10)
    print(x)
    return x**2

if __name__ == "__main__":
    study = optuna.create_study(direction="maximize", study_name= 'studyID',
                                storage=urlName, load_if_exists=True)#,

    study.optimize(objective, n_trials=5, n_jobs=-1)
@yancychy
Copy link
Author

yancychy commented Feb 5, 2024

I just followed the code in this link

@contramundum53 contramundum53 added the bug Issue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself. label Feb 6, 2024
@contramundum53
Copy link
Member

contramundum53 commented Feb 6, 2024

@yancychy Thank you for your bug report!
Hmm...I couldn't reproduce this error. (The above code worked normally, after setting url to my database.)

@yancychy
Copy link
Author

yancychy commented Feb 6, 2024

Thanks. It's 3.5.0. It works when n_jobs=1 which needs very long time to finish. I hope to do paralle to reduce the running time.

@yancychy
Copy link
Author

yancychy commented Feb 6, 2024

When I set n_trials=1, n_jobs=2, it only show 1 n_trial results.

[I 2024-02-06 13:33:34,868] Using an existing study with name 'studyID' instead of creating a new one.
10
[I 2024-02-06 13:33:34,924] Trial 8 finished with value: 100.0 and parameters: {'tx': 10}. Best is trial 8 with value: 100.0.
Study statistics: 
  Number of finished trials:  9
  Number of pruned trials:  0
  Number of complete trials:  2
Best trial:
  Value:  100.0
  Params: 
    tx: 10

@nzw0301
Copy link
Member

nzw0301 commented Feb 7, 2024

When I set n_trials=1, n_jobs=2, it only show 1 n_trial results.

This is expected behaviour.

@contramundum53
Copy link
Member

contramundum53 commented Feb 8, 2024

@yancychy Could you tell us what version of mysql you are using, and what DB driver you are using? (We test our code with PyMySQL.) Also could you tell us about your OS?
One possible cause of this behavior is that the isolation level of your DB implementation allows dirty reads. Could you change your config to set the isolation level to REPEATABLE READ or above?

@jgoodson
Copy link

jgoodson commented Feb 8, 2024

I have not troubleshot this in great detail, but I reproduced the issue with a fresh MySQL database. Some basic poking has indicated that what is happening is that rows end up missing from the all of the trial_* tables. For instance, if trial 5573 fails and gives this error, you might see:

mysql> select * from trials where trial_id=5573;
+----------+--------+----------+-------+---------------------+---------------------+
| trial_id | number | study_id | state | datetime_start      | datetime_complete   |
+----------+--------+----------+-------+---------------------+---------------------+
|     5573 |    161 |        8 | FAIL  | 2024-02-08 10:02:38 | 2024-02-08 10:02:38 |
+----------+--------+----------+-------+---------------------+---------------------+
1 row in set (0.00 sec)

mysql> select * from trial_params where trial_id=5573;
Empty set (0.00 sec)

whereas a functional trial looks like:

mysql> select * from trials where trial_id=5574;
+----------+--------+----------+----------+---------------------+---------------------+
| trial_id | number | study_id | state    | datetime_start      | datetime_complete   |
+----------+--------+----------+----------+---------------------+---------------------+
|     5574 |    162 |        8 | COMPLETE | 2024-02-08 10:02:38 | 2024-02-08 10:02:38 |
+----------+--------+----------+----------+---------------------+---------------------+
1 row in set (0.00 sec)

mysql> select * from trial_params where trial_id=5574;
+----------+----------+------------+-------------+----------------------------------------------------------------------------------------------+
| param_id | trial_id | param_name | param_value | distribution_json                                                                            |
+----------+----------+------------+-------------+----------------------------------------------------------------------------------------------+
|     5529 |     5574 | tx         |          -9 | {"name": "IntDistribution", "attributes": {"log": false, "step": 1, "low": -10, "high": 10}} |
+----------+----------+------------+-------------+----------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

I have observed this both when setting njobs>1 and when running multiple processes in parallel against the same database/experiment. This may just be a side effect of the trials failing in some other way, however.

What is the case is that the dictionary of study trials in _StudyInfo.trials is ending up with a None key, containing a FrozenTrial where trials[None].number == None, causing the resulting sort to fail as it cannot compare Nonetype with numbers, for instance:

In [8]: [t for t in trials.values() if t.number is None]
Out[8]: [FrozenTrial(number=None, state=0, values=None, datetime_start=datetime.datetime(2024, 2, 8, 10, 2, 38), datetime_complete=None, params={}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={}, trial_id=5574, value=None)]

I have not found the root cause of this, but I added a check in _add_trials_to_cache to check for this case, and it does catch the addition of a trial with trial.number == None with the following traceback:

[W 2024-02-08 10:13:11,592] Trial 206 failed with parameters: {} because of the following error: Exception('Nonetype added to trials').
Traceback (most recent call last):
  File "/vf/users/goodsonjr/mambaforge/envs/optuna/lib/python3.12/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
                      ^^^^^^^^^^^
  File "/home/goodsonjr/single-hyp.py", line 14, in objective
    x = trial.suggest_int("tx", -10, 10)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vf/users/goodsonjr/mambaforge/envs/optuna/lib/python3.12/site-packages/optuna/_convert_positional_args.py", line 83, in converter_wrapper
    return func(**kwargs)
           ^^^^^^^^^^^^^^
  File "/vf/users/goodsonjr/mambaforge/envs/optuna/lib/python3.12/site-packages/optuna/trial/_trial.py", line 326, in suggest_int
    suggested_value = int(self._suggest(name, distribution))
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vf/users/goodsonjr/mambaforge/envs/optuna/lib/python3.12/site-packages/optuna/trial/_trial.py", line 635, in _suggest
    param_value = self.study.sampler.sample_independent(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vf/users/goodsonjr/mambaforge/envs/optuna/lib/python3.12/site-packages/optuna/samplers/_tpe/sampler.py", line 427, in sample_independent
    trials = study._get_trials(deepcopy=False, states=states, use_cache=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vf/users/goodsonjr/mambaforge/envs/optuna/lib/python3.12/site-packages/optuna/study/study.py", line 274, in _get_trials
    self._thread_local.cached_all_trials = self._storage.get_all_trials(
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vf/users/goodsonjr/mambaforge/envs/optuna/lib/python3.12/site-packages/optuna/storages/_cached_storage.py", line 221, in get_all_trials
    self._read_trials_from_remote_storage(study_id)
  File "/vf/users/goodsonjr/mambaforge/envs/optuna/lib/python3.12/site-packages/optuna/storages/_cached_storage.py", line 252, in _read_trials_from_remote_storage
    self._add_trials_to_cache(study_id, trials)
  File "/vf/users/goodsonjr/mambaforge/envs/optuna/lib/python3.12/site-packages/optuna/storages/_cached_storage.py", line 268, in _add_trials_to_cache
    raise Exception("Nonetype added to trials")
Exception: Nonetype added to trials

@yancychy
Copy link
Author

yancychy commented Feb 8, 2024

The version is mysql 5.7.36 on Linux. I am using mysqlclient==2.1.0
default_storage_engine = MyISAM
default_tmp_storage_engine = MyISAM
query_cache_size = 256M

@contramundum53
Copy link
Member

MyISAM doesn't seem to support transactions. Could you try using InnoDB?

@yancychy
Copy link
Author

yancychy commented Feb 9, 2024

Thanks. I modified the my.conf and used InnoDB. The errors still are same to previous errors.
default_storage_engine = InnoDB
default_tmp_storage_engine = InnoDB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself.
Projects
None yet
Development

No branches or pull requests

4 participants