Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nc_close should not require the GIL #1180

Closed
sternj opened this issue Aug 2, 2022 · 12 comments
Closed

nc_close should not require the GIL #1180

sternj opened this issue Aug 2, 2022 · 12 comments

Comments

@sternj
Copy link

sternj commented Aug 2, 2022

When running NetCDF4 using Scalene I ran into an issue where one thread held HDF5's lock without the GIL and was trying to acquire the GIL while another thread was holding the GIL while trying to acquire HDF5's lock. I tracked down what I suspect to be the root of the problem to the Python declaration of netcdf4.h, where the nc_close function is declared without a nogil.

I'm trying out a patched version with the nc_close function being declared with nogil and the offending call to the nc_close function being wrapped in a nogil block to see if this resolves the issue that I've been seeing, and I can of course PR this, but I wanted to know if there's any specific reason why it was written without the nogil in the first place.

@jswhit
Copy link
Collaborator

jswhit commented Aug 3, 2022

Don't know why nc_close is not wrapped 'with nogil' in _netCDF4.pyx. I don't see why adding that would be a problem - but my understanding of how the GIL works is admittedly limited.

@sternj
Copy link
Author

sternj commented Aug 3, 2022

I've been digging into this further and found several similar functions which take an ncid (and thus HDF5's lock). I can't quite remember where I read this but some of Python's documentation indicates that it is best practice to release the GIL when waiting on a blocking operation.

I actually am seeing a few other functions like this, there are several that are namespaced in that don't have nogil, and I'm not sure why any of the functions would need the GIL-- I don't want to put the work into a PR if there's a compelling reason why these do have the GIL. Does anyone know why it was designed this way?

@dopplershift
Copy link
Member

So long as these functions aren't doing any Python C-API work, releasing the GIL should be fine. Also, releasing the GIL is probably the most (only?) reliable way to deal with potential deadlocks due to competing locks.

It looks like all of the functions in that header are just direct wraps of the netCDF C API, so the GIL should be completely safe to release.

@sternj
Copy link
Author

sternj commented Aug 3, 2022

I'll have a PR up in the next few days, it seems like it's not quite possible to declare cdef variables inside of a nogil block (well, the compiler complains very loudly at me), is there a convention that works better than forward-declaring the variable?

@dopplershift
Copy link
Member

I don't think you can wrap a variable. "nogil" means "release the GIL when calling this function", so it only makes sense to apply to functions/methods.

@sternj
Copy link
Author

sternj commented Aug 3, 2022

I mean that something like

with nogil:
   cdef int foo = some_function()

whereas

cdef int foo = -1
with nogil:
   foo = some_function()

does work

@dopplershift
Copy link
Member

Oh, weird. Are there places where that's necessary? I would have thought just adding nogil to as many function definitions as possible would be enough.

@sternj
Copy link
Author

sternj commented Aug 3, 2022

This is not quite the case, at least according to this, you need to manually release the GIL I think, unless I'm entirely misreading what nogil in a declaration means

@sternj
Copy link
Author

sternj commented Aug 3, 2022

(as an answer to my question, though, someone else seems to have established this practice for the repo, which I will follow

@jswhit
Copy link
Collaborator

jswhit commented Aug 9, 2022

This turns out to be a lot more work than I thought - I've been inconsistent in releasing the GIL when calling the C lib. PR #1180.

@jswhit
Copy link
Collaborator

jswhit commented Aug 10, 2022

The MPI tests are failing for PR #1180

Update: The issue with MPI tests is now fixed.

@jswhit
Copy link
Collaborator

jswhit commented Aug 12, 2022

closed by PR #1180

@jswhit jswhit closed this as completed Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants