New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG-REPORT] Refcount leak to underlying array when deleting dataframe #2323
Comments
Just realized the problem from my inital post is much simpler to explain:
prints:
So dataframe deletion is not cleaning up its reference to the array. |
ok, think this is not a bug in vaex, but related to delayed garbage collection in python. Although I did not understand why garbage collection is delayed after going through vaex |
After digging deeper into python garbage collection internals I think I closed this one too early.. Using tricks from https://rushter.com/blog/python-garbage-collector/ I can see that the
Outputs:
The fact that it gets removed when calling Effectively this behaves as a memory leak until the python interpreter decides to run garbage collection or user code triggers it explicitly via
I tried to debug and locate the cyclic reference using Maybe someone more skilled or knowledge of vaex internals could help here? |
It's a difficult topic for sure! |
im running into a similar error as described in a previous issue #2062. @schwingkopf im curious if downgrading numpy lets your code run successfully? |
numpy 1.23 had lots of changes so if you're using 1.23+ there might be something in there that could be related https://github.com/numpy/numpy/releases/tag/v1.23.0 |
@anthonycorletti thanks for your hint. Just tried the example from my first post:
Interesting.. any ideas what that means? For the problem to appear it still requires interaction with a vaex |
Happy to hear this at least got something working for you. I'm not exactly sure what this means unfortunately. I know that 1.22.4 has problems with mmap which might be due to this change in numpy numpy/numpy#21446 |
I'm trying to use vaex with numpy arrays that reference shared memory and experience problems when trying to unlink the shared memory. Here a minimal reproducing example:
Execution throws the following exception:
It works fine when not creating the dataframe object.
It seems like vaex is still keeping a reference to the array/shm block after deleting the dataframe object. Is that a bug or is there a recommended way to delete all references?
Software information
import vaex; vaex.__version__)
: {'vaex': '4.16.0', 'vaex-core': '4.16.1', 'vaex-viz': '0.5.4', 'vaex-hdf5': '0.14.1', 'vaex-server': '0.8.1', 'vaex-astro': '0.9.3', 'vaex-jupyter': '0.8.1', 'vaex-ml': '0.18.1'}The text was updated successfully, but these errors were encountered: