New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_disable_stats
doesn't work. wandb.init(settings=wandb.Settings(_disable_stats=True))
It still sends stats to WANDB, which in turn leads to BSOD due to incompatibility with the old PYNVML dependency in the vendor folder.
#3597
Comments
Leslie commented: |
Leslie commented: We wanted to follow up with you regarding your support request as we have not heard back from you. Please let us know if we can be of further assistance or if your issue has been resolved. |
Hi @anatolii-kotov, thanks for reporting this! This issue was solved in #3510 and shipped with the latest release 2 days ago (https://github.com/wandb/client/releases/tag/v0.12.16). Could you please give it a try and let us know whether it worked for you? |
Hi @dmitryduev , tried the last version but still having BSOD. In this case it appeared after everything was sent via wandb API. System: Win11 22000.613 & Nvidia 512.59 |
WandB Internal User commented: |
yes, I tried it with the latest version, still getting BSOD |
thanks for the update @anatolii-kotov, we'll look into this. |
@dmitryduev has there been any progress on this issue? If it helps any, it seems to be an issue related to pynvml and NVIDIA drivers later than version 472.12. That is, any package that calls I haven't tested drivers prior to 472.12, however - so it might require an exact match, or it might just be a regression and that's the last good version. |
@dmitryduev on further investigation, I think the issue is that the |
Thank you for the extra information @benjamincburns! @dmitryduev is going to do more investigation on this early next week |
WandB Internal User commented: |
Anything new here? Can't use wandb, because I'm getting BSOD every single time. I got crazy in the beginning, because I didn't know wandb was the issue... |
I'm so sorry for the wait! I talked to the engineer in charge of this and they mentioned that they would work on it this week |
Hey all, many thanks for bringing this to our attention and please accept my apologies for it taking us so long to properly look into. We have updated the vendored version of nvidia-ml-py here and that PR has been merged into master. |
Thank you so much! I just tried it remotely, but it looks like it crashed again. I will try it again tomorrow, when I'm at home. I got an RTX 2080 Ti and Intel CPU. My wandb version says: 0.13.2.dev1 |
I participate in a reinforcement learning community via discord, and the common (horribly ugly) workaround in that group is to edit the wandb code in Also I see from #4109 that you weren't able to repro the original problem @dmitryduev? Do you happen to have access to a computer with a 2080 Ti? It seems to repro reliably with that card on either Windows 10 or Windows 11. Also for the purpose of reproducing the problem, I would avoid driver versions 472.12, and the current 516.94 driver (I've heard from one person that the crash went away on that driver version). |
Many thanks for the updates, @benjamincburns and @PeterKeffer! @PeterKeffer: would you mind trying to update the driver to 516.94 and see if it still crashes? @benjamincburns: I tried repro'ing on a bunch of different Tesla cards on Win 10, 11, and Server 2019, with a number of driver versions within (and outside!) the range you mentioned. Also tried a plain 2080 and it also works. Closing in on a machine with a 2080 Ti, might have an update soon. In the mean time, to turn off sys metrics logging completely (instead of commenting out wandb.init(settings=wandb.Settings(_disable_stats=True, _disable_meta=True)) |
Ah interesting. I'm really curious to know why it doesn't repro for you on all of those boxes. I know Tesla GPUs are using a different driver series, but I wouldn't expect much of any difference between the 2080 and the 2080 Ti. Thanks for going on such a scavenger hunt!
Unfortunately unless there has been a change, per the title of this issue, running with Edit: oh, I see - we need the extra |
@benjamincburns, yea, |
WandB Internal User commented: |
WandB Internal User commented: Also I see from #4109 that you weren't able to repro the original problem @dmitryduev? Do you happen to have access to a computer with a 2080 Ti? It seems to repro reliably with that card on either Windows 10 or Windows 11. Also for the purpose of reproducing the problem, I would avoid driver versions 472.12, and the current 516.94 driver. |
I have great news: @dmitryduev Thank you so much for your efforts! @benjamincburns Also thank you for your valuable inputs! :) |
@lesliewandb @dmitryduev why was this issue closed? Running the If the issue is going to be closed as completed you should at least capture notes about the workaround on the troubleshooting FAQ page. Given that I don't see that here, I strongly suspect that many users will continue encountering this problem for quite some time. https://docs.wandb.ai/guides/technical-faq/troubleshooting |
Sorry for the confusion about closing the issue. Since I saw that there was no a BSOD with @PeterKeffer that this was solved. I'll make an internal ticket to get this fixed in the docs |
@lesliewandb there are still users, including myself who are reporting BSODs. I don't think this issue should be closed. |
The arg |
I understand, however our engineers have done what could be done on our end. Past this is a nvidia drivers + windows issue that's why this issue is closed now |
I'd like to claim that this problem still exists now... |
@lesliewandb there is a way for wandb to prevent the BSOD from occurring on windows - make sure that no Otherwise my only way of working around this issue today is by hand editing the wandb client in |
_disable_stats
doesn't work.wandb.init(settings=wandb.Settings(_disable_stats=True))
It still sends stats to WANDB, which in turn leads to BSOD due to incompatibility with the old PYNVML dependency in the vendor folder.Originally posted by @CosmicHazel in #473 (comment)
Can confirm that this is causing BSOD on Windows platform with Nvidia GPU with latest drivers. And since there's no way to disable it there's practically now way to use wandb on Windows
The text was updated successfully, but these errors were encountered: