Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equality comparison is incorrect for CategoricalMatrix #254

Open
david-cortes opened this issue May 15, 2023 · 4 comments
Open

Equality comparison is incorrect for CategoricalMatrix #254

david-cortes opened this issue May 15, 2023 · 4 comments

Comments

@david-cortes
Copy link

Comparing one CategoricalMatrix against another always yields False even if their contents are identical:

import tabmat
mat1 = tabmat.CategoricalMatrix([1,2,3])
mat2 = tabmat.CategoricalMatrix([1,2,3])
mat1 == mat2
False
@MarcAntoineSchmidtQC
Copy link
Member

I would not say that it is 'incorrect'. The equality magic method has not been implemented in tabmat. Therefore, CategoricalMatrix, which does not inherit from a class, uses the default, which verifies the identity of the object. In your above example, mat1 == mat1 is true because it has the same identity.

I am not sure what is the use case for checking equality. Could you please tell us why you want to check this?

If you need a fast solution, I would suggest you test for equality using the underlying attribute. In the case of CategoricalMatrix, the data is stored in the .cat attribute. Thus, this will work:

import tabmat
mat1 = tabmat.CategoricalMatrix([1,2,3])
mat2 = tabmat.CategoricalMatrix([1,2,3])
mat1.cat == mat2.cat

Similar to DenseMatrix and SparseMatrix, it will return a boolean array that has the same length as the input array of the CategoricalMatrix.

@david-cortes
Copy link
Author

Thanks for the answer. I think it'd be nice to have the equality comparison return a 1-d boolean array (as opposed to a single value or to a densified 2-d array), as if one had accessed .cat.

This is useful for example when writing unit tests where this package is used, or when creating new features from these matrix objects.

@MartinStancsicsQC
Copy link
Contributor

MartinStancsicsQC commented Jun 20, 2023

I'm not totally convinced that equality comparison should return a 1-d array, even if it gets implemented. For example, the shape of your example matrices is (3, 3), and the 1-d array is just an internal representation of a numeric matrix containing 0s and 1s. It would be more intuitive to me if element-wise operations returned arrays with the same shape as the inputs. Another way to look at it is that IMO mat1 == mat2 should have the same behavior as mat_1.A == mat_2.A.

For downstream purposes, would subclassing CategoricalMatrix and overriding its __eq__ method to return self.cat == other.cat be a solution to your use case?

@david-cortes
Copy link
Author

I'm not totally convinced that equality comparison should return a 1-d array, even if it gets implemented. For example, the shape of your example matrices is (3, 3), and the 1-d array is just an internal representation of a numeric matrix containing 0s and 1s. It would be more intuitive to me if element-wise operations returned arrays with the same shape as the inputs. Another way to look at it is that IMO mat1 == mat2 should have the same behavior as mat_1.A == mat_2.A.

For downstream purposes, would subclassing CategoricalMatrix and overriding its __eq__ method to return self.cat == other.cat be a solution to your use case?

One could also argue that the raw shape is (3,1) whereas an OHE representation would be (3,3).

And yes, thanks - for my purposes even just calling .cat would be enough, but would be nicer to have such functionality implemented at the class level, the moreso since other classes like DenseMatrix do have equality comparisons implemented (guess they might be inherited from NumPy but they work as intended).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants