Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add examples of different ways of creating string datasets #2424

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

takluyver
Copy link
Member

Building on #2423, trying to illustrate the different possible ways to create string datasets.

string_data = ["varying", "sizes", "of", "strings"]

# Variable length strings (implicit)
f['vlen_strings1'] = string_data

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This f is supposed to be the database? What is supposed to happen when you are doing it implicitly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f for file - this is a convention across the h5py docs, e.g. look further down this page, or at the dataset page. We don't really use the word database at all; HDF5 also talks about files rather than databases.

'Implicit' here just means you're not telling h5py a dtype, so it's guessing based on the object you give it.

f['vlen_strings1'] = string_data

# Variable length strings (explicit)
ds = f.create_dataset('vlen_strings2', shape=4, dtype=h5py.string_dtype())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice if there were some examples with data= keyword argument - those are the ones that got me confused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing data= should be equivalent to the implicit case (f[name] =) if you don't pass dtype=, or to the explicit case if you do. I'm showing the explicit dtype case as two lines to illustrate that it lets you create the dataset before you have all the data.

I'd rather not make this example longer by showing more possible ways to do the same thing. The explanation at creating datasets could probably be improved as well.

Comment on lines 29 to +31
You can use :func:`.string_dtype` to explicitly specify any HDF5 string datatype.

::

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can use :func:`.string_dtype` to explicitly specify any HDF5 string datatype.
::
You can use :func:`.string_dtype` to explicitly specify any HDF5 string datatype,
as shown in the examples below::

Copy link

codecov bot commented May 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.58%. Comparing base (25c55e5) to head (bce8793).

❗ Current head bce8793 differs from pull request most recent head 7081e4f. Consider uploading reports for the commit 7081e4f to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2424   +/-   ##
=======================================
  Coverage   89.58%   89.58%           
=======================================
  Files          17       17           
  Lines        2391     2391           
=======================================
  Hits         2142     2142           
  Misses        249      249           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants