Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arraycore #634

Open
wants to merge 76 commits into
base: pt4
Choose a base branch
from
Open

Arraycore #634

wants to merge 76 commits into from

Conversation

albertosm27
Copy link
Contributor

First implementation of the new array using h5py backend. The objective of this branch is to pass the basic tests in tables/tests/test_array.py.

@FrancescAlted
Copy link
Member

Looks good to me so far. I specially like the first attempt at getting rid of the node manager as h5py should be the place to handle this.

@tomkooij
Copy link
Contributor

Nice work!

if atom is None or shape is None:
raise TypeError('if the obj parameter is not specified '
'(or None) then both the atom and shape '
'parametes should be provided.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameters

if not out.flags['C_CONTIGUOUS']:
raise ValueError('output array not C contiguous')

np.copyto(out, arr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid the arr temporary by using read_direct(array, source_sel=None, dest_sel=None).

else:
flavor = flavor_of(obj)
# use a temporary object because converting obj at this stage
# breaks some test. This is soultion performs a double,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This is soultion performs " -> "This fix performs".

strides=(0,) * len(shape))
else:
flavor = flavor_of(obj)
# use a temporary object because converting obj at this stage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[STYLE] Start comments in its own line with an uppercase letter.

if self.shape == ():
return 1
else:
return len(self)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For compactness you can make use of the ternary operator there:

return 1 if self.shape == () else len(self)

@tacaswell
Copy link

Where did the work from the Perth workshop end up?

@tacaswell
Copy link

🐑 Never mind, I sorted it out (that code is on the pt4 branch which this is a PR into).

np_byteorders = {
'big': '>',
'little': '<',
'irrelevant': '|'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'irrelevant' should be 'native'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'irrelevant' is meant for ASCII strings and the like, where the order is, well, irrelevant.


# Finally, check whether the desired node is an instance
# of the expected class.
if classname:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth checking classname is not None so other 'falsy' values (which should probably raise) do not safely pass.

return where.create_table(name, desc, *args, **kwargs)

def get_node(self, where):
return self.root[where]
def get_node(self, where, name=None, classname=None):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to match the API on tables.file.File.get_node ?

 - this will conflict with the develop branch
 - this will become irrelevant on this branch eventually (as
   all of the cython should be dropped)
 - createparents is positional-only now, adjust test
 - match positional order of remaining args
@tacaswell
Copy link

albertosm27#1 <- getting copy to work.

@@ -2209,7 +2209,7 @@ def test00b_zeros(self):
print("First row-->", ca[0])
print("Defaults-->", ca.atom.dflt)
self.assertTrue(allequal(ca[0], numpy.zeros(N, 'S3')))
self.assertTrue(allequal(ca.atom.dflt, numpy.zeros(N, 'S3')))
self.assertTrue(allequal(ca.atom.dflt, b""))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change the tests like this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multidimensional dtypes are giving me a hard time.
I thought h5py as numpy was putting the dtype.shape inside the array.shape so I did that for the Atom.dflt (equivalent to fillvalue in h5py) and thats why I needed to change the test like that.

I still do not know if multidimensional dtypes are fully supported in h5py, for example in tables/tests/array_mdatom.h5 you can get '/arr' and it will be a multidimensional array but trying it to copy (with create_dataset) will raise an error like 'Can't broadcast shape (5, 5, 5) to (5, 5, 5, 3)' the 3 comes from the dtype shape.

Using the h5py copy method from Group will copy the array correctly with its multidensional dtype (still reading the array returns it as numpy with the dtype shape inside the array shape)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by 'multidimensional dtypes'?

h5py should support all dtypes that hdf5 supports (and if any are missing it should be fixed in h5py).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean dtypes with shape like this numpy.dtype('(3,)i4').

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think this is a numpy bug

In [58]: np.zeros(5, dtype=np.dtype('(3,)i4')).shape
Out[58]: (5, 3)

In [59]: np.zeros(5, dtype=np.dtype('(3,)i4,f4')).shape
Out[59]: (5,)

In [60]: np.zeros(5, dtype=np.dtype('(3,)i4')).dtype
Out[60]: dtype('int32')

In [61]: np.zeros(5, dtype=np.dtype('(3,)i4,f4')).dtype
Out[61]: dtype([('f0', '<i4', (3,)), ('f1', '<f4')])

Copy link

@tacaswell tacaswell Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

In [63]: np.zeros(5, dtype=np.dtype('i4,i4,i4')).shape
Out[63]: (5,)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so '(3,)i4' and 3*'i4' are definitely different datatypes, the question is if we expect the promotion (demotion?) to a int32 array with shape + (3,) or not.

https://docs.scipy.org/doc/numpy/user/basics.rec.html#structured-arrays

Sorry for the stream of conscious comments as I sort this out 🐑

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tacaswell I remember discussing with Travis the 'demotion' of multidimensional dtypes in NumPy for atomic dtypes (i.e. not compound or structured) quite long time ago and IIRC while he agreed that the demotion was not consistent, he argued easiness of implementation to keep the current behaviour. Perhaps the NumPy guys can change that, but IMO, this ship sailed long time ago so it is not a good idea to ask for that change now.

@@ -136,8 +140,10 @@ def copy(self, newparent=None, newname=None,
newparent = self.root._get_or_create_path(newparent, createparents)
if self.__class__.__name__ == 'Array':
create_function = newparent.create_array
else:
elif self.__class__.__name__ == 'CArray':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it would be saner to use isinstance to test the condition, but you need to reverse the order of tests because CArraiy is an Array and EArray is a CArray.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried, but with isinstance I need to import CArray in array.py and CArray also imports Array which makes it crash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albertosm27 yes, you are right. Now I remember the issue :/

@FrancescAlted FrancescAlted mentioned this pull request Aug 27, 2017
self.assertRaises(ValueError, vlarray.__setitem__, 1, "shrt")
self.assertRaises(ValueError, vlarray.__setitem__, 1, "toolong")
#self.assertRaises(ValueError, vlarray.__setitem__, 1, "shrt")
#self.assertRaises(ValueError, vlarray.__setitem__, 1, "toolong")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure that you add new tests for the assignation of values that are of different length than the old values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do not forget to add an entry abut this new feature in the CHANGELOG for PyTables 4.0.

avalentino pushed a commit to avalentino/PyTables that referenced this pull request Oct 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants