Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I store the data of my use case? #882

Open
echatzikyriakidis opened this issue Jan 13, 2021 · 4 comments
Open

How can I store the data of my use case? #882

echatzikyriakidis opened this issue Jan 13, 2021 · 4 comments

Comments

@echatzikyriakidis
Copy link

Hi all!

I am facing a problem and I need help from someone having experience in arctic.

I tried first antarctic to store Pandas dataframes but it stores a dataframe in a single document. However, errors are generated because of 16MB document limitation problem.

I think that arctic will solve my problem but I don't know what Store to use for my use case.

So, here is my use case:

Every 3 days I run a program that creates 3 Pandas dataframes for multiple projects.

So, each project has 3 dataframes.

In a second program a user selects one project and I want to have access to its 3 dataframes.

What Store to use? Is this a correct usage? :

db = Arctic('localhost')

db.initialize_library('projects')

projects_library = db['projects']

projects_library.write('project-1-dataframe-1', df1, metadata={'run_date': date1})
projects_library.write('project-1-dataframe-2', df2, metadata={'run_date': date1})
projects_library.write('project-1-dataframe-3', df3, metadata={'run_date': date1})

projects_library.write('project-2-dataframe-1', df4, metadata={'run_date': date2})
projects_library.write('project-2-dataframe-2', df5, metadata={'run_date': date2})
projects_library.write('project-2-dataframe-3', df6, metadata={'run_date': date2})

project1_df1 = finance_library.read('project-1-dataframe-1').data

Also, I don't think that I will need access to data of previous runs. Only I need latest data of a project.

How can I do it optimally?

Thank you!

@bmoscon
Copy link
Collaborator

bmoscon commented Jan 13, 2021

arctic is for time series data, this doesnt seem to be time series data

@echatzikyriakidis
Copy link
Author

@bmoscon I see.

How can I store large Pandas dataframes in a MongoDB?

@bmoscon
Copy link
Collaborator

bmoscon commented Jan 13, 2021

i dont know, export the columns to dictionaries and store them that way?

@echatzikyriakidis
Copy link
Author

@bmoscon I can convert a dataframe to json documents and store them in a collection but I don't know if this performs fast when reading thousands of documents. I might need to check it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants