-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training performance drop after augmentation refactoring #3092
Comments
I did run another test with the code directly before the refactoring (#188a6f2c1ee53dc79acf8abceaf729b5f9a05e7a). This time one epoch takes 4min on average and the whole training took 1:45h.
I now used batch size of 24 and did update the params again to better match the params above:
|
@DanBmh Augmentations |
My observations so far:
|
I had some time to run some more tests today (with master about two days ago). This time an epoch did take about 4:30min on average. I also tried different dropout values:
|
@tilmankamp Any updates on the accuracy problem? |
@DanBmh -- did you ever reach a conclusion on this? have you been running augmentation with newer releases? |
Were there important changes to the augmentations in between? I didn't check for it. I didn't run further tests, just the ones above. For my own trainings I still use the old version. |
Might have found a reason for the accuracy problem. First I did misunderstand the augmentation flag description and the pitch and tempo flags are not converted correctly. Second, the new Will try to run a test in the next time, but I don't believe this will also solve the slower training. For the second problem maybe a new flag like |
Yeah, that could be useful, usually for hyperparameter schedules there's a separate start/ramp-up/ramp-down/stop range compared to the number of steps/epochs for the whole training run. |
I have the feeling that the training performance, duration and accuracy, got worse after the augmentation refactoring commits.
Before the training of my dataset took about 2:10h and today it took 3:20h, with about the same number of epochs. At the beginning one epoch takes about 5min, but after epoch 18 they suddenly need 8min on average. I didn't see this behaviour in the trainings two days ago.
Also the accuracy got a bit worse:
This were the options I set before:
And those I used for my todays run:
I also had to reduce the batch size from 30 to 24 because I got the error #3088. About two month ago I could use 36 without any problems.
I know there is a bit randomness in the accuracy and I did change some of the augmentation params slightly but the change in results is bigger than expected.
@tilmankamp do you have an idea about this?
The text was updated successfully, but these errors were encountered: