Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TableSchema.diff method to understand the difference between two TableSchema objects #1670

Open
tamargrey opened this issue Apr 10, 2023 · 1 comment
Labels
new feature suggestions for new functionality

Comments

@tamargrey
Copy link
Contributor

  • As a user, I wish I had an easy way to tell the difference between two Woodwork TableSchemas.

When passing around Woodwork dataframes, it is easy to lose track of some of the woodwork types, like feature origins or metadata, and because the table schema repr only shows column names, logical types, and semantic tags, it is hard to tell if other woodwork typing info has changed without going through all the relevant fields and directly comparing. It would be great if there was a Woodwork method to make this easier.

Code Example

schema_1.diff(schema_2)

We would need to come up with a design for what the output could be, but we could go as simple as just displaying all fields that are not equal and outputting the entire value, leaving it up to the user to determine what exactly is different. A more involved option would be to isolate the difference and display that specifically.

For consistency's sake, we should use this function to implement the TableSchema.__eq__ method, which will make sure that these two always stay in sync.

@gsheni
Copy link
Contributor

gsheni commented Apr 10, 2023

@tamargrey What is the urgency of this issue? What is the benefit for EvalML?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature suggestions for new functionality
Projects
None yet
Development

No branches or pull requests

2 participants