Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas.read_hdf issue with latest anaconda version #10

Open
fcfcfcfcf opened this issue Sep 11, 2020 · 2 comments
Open

pandas.read_hdf issue with latest anaconda version #10

fcfcfcfcf opened this issue Sep 11, 2020 · 2 comments

Comments

@fcfcfcfcf
Copy link

Hello!

In module 1.1.3, the pandas read_hdf function is used to read AAPL stock price data from a h5 file. Unfortunately, pandas v1.0.0 or higher does not support reading older h5 files such as the one included (according to https://github.com/pandas-dev/pandas/issues/33186). Anaconda seems to be the python distribution recommended in module 1.1.1, but the latest version of anaconda includes a version of pandas incompatible with the given h5 file. The easiest way to get around this for me was to simply install an older version of anaconda (2019.3), which comes with pandas 0.24.2, but it might be worth mentioning as this could be tricky for users to figure out.

@c-cunningham
Copy link
Contributor

I had this issue too, as well as other compatibility issues with different versions of Python. For instance, the "alt.renderers.enable('notebook')" line worked in Python 2 but caused the graphs to fail in Python 3.

@JoBe10
Copy link

JoBe10 commented Oct 2, 2023

You can solve the read_hdf issue by using the h5py Python library (install that library using pip or conda as usual). For the Apple data the code is the following:

import h5py

with h5py.File("data/AAPL.h5", 'r') as f:
    aapl_group = f['AAPL']

    # Initialize an empty dictionary to collect data
    data_dict = {}

    # Iterate over all items in the group
    for name, item in aapl_group.items():
        if isinstance(item, h5py.Dataset):
            # For each dataset, add its data to the dictionary with the dataset's name as the key
            data_dict[name] = item[:]

# Extract the columns and their data
cols_data = {
    **dict(zip(data_dict['block0_items'].astype(str), data_dict['block0_values'].T)),
    **dict(zip(data_dict['block1_items'].astype(str), data_dict['block1_values'].T))
}

# Create the DataFrame
aapl = pd.DataFrame(cols_data, index=pd.to_datetime(data_dict['axis1']))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants