Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mongodump and mongorestore library - Blob (not pure dataframe) #967

Open
fengster123 opened this issue Sep 15, 2022 · 0 comments
Open

mongodump and mongorestore library - Blob (not pure dataframe) #967

fengster123 opened this issue Sep 15, 2022 · 0 comments

Comments

@fengster123
Copy link

fengster123 commented Sep 15, 2022

Arctic Version

# 1.80.0

Arctic Store

# VersionStore

Platform and version

Spyder (Python 3.8)

Description of problem and/or code sample that reproduces the issue

Hi, I use mongodump and mongorestore to move libraries in between PCs (let me know if there are easier ways). So for each library (in this case, my library is called "attribution_europe_data"), it has 5 collections (from MongoDB's point of view), which are attribution_europe_data / ....ARCTIC / ....snapshots /...version_nums/...versions, and during the mongodump process, it dumps 2 files for each collection, so a total of 10 files for each library.

I successfully manage to mongorestore those 10 files into a seperate PC. ie I can do things like below (ie I can do things like print(Arctic('localhost')['attribution_data_europe'].list_symbols())

image

Now, each symbol in my library represents a pandas dataframe (actually they are saved as Blob, since they contain Objects), its around 5000rows x 2000 columns. The issue is if i read it in the new PC, eg "Arctic('localhost')['attribution_europe_data'].read('20220913').data" in Spyder, it will freeze, and eventually "restarting kernel...."

image

It shouldn't be a memory issue reading that dataframe, as I generated a similar size dataframe randomly in the same PC and it is ok.

As a test, I use the same mongodump and mongorestore method on a smaller / simpler library, of which the library consists of a very simple symbol of a dictionary of {'hi':1}. And the new PC (where I restore it) is is able to read this library and this symbol without any issue. Similarly I use the same method on pure dataframe, as opposed to Blob, it works as well!

So do you think during the mongodump and mongorestore process, it corrupts Blob object?

Also what you guys normally use to transfer arctic libraries from one PC to another? surely there is a simplier way than mongodump and mongorestore?

==============
Just to update on more investigations:

  1. if the symbol is a dataframe (that is NOT saved as a blob), it works
  2. if the symbol is a dict say {'hi':1}, it works
  3. if the symbol is a blob, it DOES NOT work (ie it will have trouble reading that symbol from the restored library in the new PC)
  4. if the symbol is a dict wrapped around a pure dataframe, eg {'hi' : pd.DataFrame(np.random.rand(2,2))}, then it works
  5. if the symbol is a dict wrapper around a blob, DOES NOT WORK, eg {'hi': some_blob}.

I have included what it looks like in the old PC, and what error it throws up in the new PC if the symbol is a dict wraps around a blob

(old PC)
image

(new PC)
image

@fengster123 fengster123 changed the title mongodump and mongorestore library mongodump and mongorestore library - Blob (not pure dataframe) Sep 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant