Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to run 3-state (2 spot states) HMM data - getting CUDA memory error and "Iteration started with a new seed" warnings #420

Open
zhoudan-brandeis opened this issue Feb 14, 2023 · 4 comments

Comments

@zhoudan-brandeis
Copy link

zhoudan-brandeis commented Feb 14, 2023

image

other details: running v1.1.17, have previously successfully run the same data with a 2-state (1 spot state) HMM model.

Reduced spot/frame batches from 10->5 and 512-> 256 and still get many iterations (hundreds! example image only shows last few) of the warning before ultimately running out of CUDA memory

@ordabayevy
Copy link
Collaborator

The program restarts the run when there are NaN values detected in the parameters. It is usually ok if it happens small number of times during the entire run.

If it happens repeatedly, like in your case, then there is something pathological. It is hard to tell if it is related to the data or the model without inspecting it deeply. Can we setup a Zoom meeting to have a closer look at this together?

@ordabayevy
Copy link
Collaborator

I also see that it has run 50800 iterations. How close it is to being converged when you look at Tensorboard?

@zhoudan-brandeis
Copy link
Author

zhoudan-brandeis commented Feb 15, 2023 via email

@ordabayevy
Copy link
Collaborator

ordabayevy commented Feb 15, 2023

Oh I guess that is the reason. The name of the model file is the same for 2 and 3 states hmm models. Since you already have run 2 state model you have that one saved in the .tapqir folder. Now when you try to run 3 state hmm it loads the model file for a 2 state hmm and tries to continue from there. That's why it says iteration 50800. So running it in a different analysis folder should fix the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants