Equality comparison is incorrect for `CategoricalMatrix` #254

david-cortes · 2023-05-15T15:28:21Z

Comparing one CategoricalMatrix against another always yields False even if their contents are identical:

import tabmat
mat1 = tabmat.CategoricalMatrix([1,2,3])
mat2 = tabmat.CategoricalMatrix([1,2,3])
mat1 == mat2

False

The text was updated successfully, but these errors were encountered:

MarcAntoineSchmidtQC · 2023-05-16T01:31:57Z

I would not say that it is 'incorrect'. The equality magic method has not been implemented in tabmat. Therefore, CategoricalMatrix, which does not inherit from a class, uses the default, which verifies the identity of the object. In your above example, mat1 == mat1 is true because it has the same identity.

I am not sure what is the use case for checking equality. Could you please tell us why you want to check this?

If you need a fast solution, I would suggest you test for equality using the underlying attribute. In the case of CategoricalMatrix, the data is stored in the .cat attribute. Thus, this will work:

import tabmat
mat1 = tabmat.CategoricalMatrix([1,2,3])
mat2 = tabmat.CategoricalMatrix([1,2,3])
mat1.cat == mat2.cat

Similar to DenseMatrix and SparseMatrix, it will return a boolean array that has the same length as the input array of the CategoricalMatrix.

david-cortes · 2023-05-16T14:55:02Z

Thanks for the answer. I think it'd be nice to have the equality comparison return a 1-d boolean array (as opposed to a single value or to a densified 2-d array), as if one had accessed .cat.

This is useful for example when writing unit tests where this package is used, or when creating new features from these matrix objects.

MartinStancsicsQC · 2023-06-20T08:17:53Z

I'm not totally convinced that equality comparison should return a 1-d array, even if it gets implemented. For example, the shape of your example matrices is (3, 3), and the 1-d array is just an internal representation of a numeric matrix containing 0s and 1s. It would be more intuitive to me if element-wise operations returned arrays with the same shape as the inputs. Another way to look at it is that IMO mat1 == mat2 should have the same behavior as mat_1.A == mat_2.A.

For downstream purposes, would subclassing CategoricalMatrix and overriding its __eq__ method to return self.cat == other.cat be a solution to your use case?

david-cortes · 2023-06-20T18:55:46Z

I'm not totally convinced that equality comparison should return a 1-d array, even if it gets implemented. For example, the shape of your example matrices is (3, 3), and the 1-d array is just an internal representation of a numeric matrix containing 0s and 1s. It would be more intuitive to me if element-wise operations returned arrays with the same shape as the inputs. Another way to look at it is that IMO mat1 == mat2 should have the same behavior as mat_1.A == mat_2.A.

For downstream purposes, would subclassing CategoricalMatrix and overriding its __eq__ method to return self.cat == other.cat be a solution to your use case?

One could also argue that the raw shape is (3,1) whereas an OHE representation would be (3,3).

And yes, thanks - for my purposes even just calling .cat would be enough, but would be nicer to have such functionality implemented at the class level, the moreso since other classes like DenseMatrix do have equality comparisons implemented (guess they might be inherited from NumPy but they work as intended).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Equality comparison is incorrect for `CategoricalMatrix` #254

Equality comparison is incorrect for `CategoricalMatrix` #254

david-cortes commented May 15, 2023

MarcAntoineSchmidtQC commented May 16, 2023

david-cortes commented May 16, 2023

MartinStancsicsQC commented Jun 20, 2023 •

edited

david-cortes commented Jun 20, 2023

Equality comparison is incorrect for CategoricalMatrix #254

Equality comparison is incorrect for CategoricalMatrix #254

Comments

david-cortes commented May 15, 2023

MarcAntoineSchmidtQC commented May 16, 2023

david-cortes commented May 16, 2023

MartinStancsicsQC commented Jun 20, 2023 • edited

david-cortes commented Jun 20, 2023

Equality comparison is incorrect for `CategoricalMatrix` #254

Equality comparison is incorrect for `CategoricalMatrix` #254

MartinStancsicsQC commented Jun 20, 2023 •

edited