Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparing two data frame #658

Open
brindasanth opened this issue Apr 15, 2024 · 2 comments
Open

Comparing two data frame #658

brindasanth opened this issue Apr 15, 2024 · 2 comments
Labels
enhancement New feature or request research
Milestone

Comments

@brindasanth
Copy link

I have two data frames having same same schema, Is there way to compare the two data frames ? so that it provide the added , deleted and modified rows. It may take some single/group of Key columns and Ignore columns.

@Jolanrensen
Copy link
Collaborator

Hi!
We don't have such functionality at the moment, but it might be a handy addition.

Tracking additions, deletions, and modifications, similar to how git would do it, requires a special algorithm. I suppose Myer's Differencing Algorithm could help.

I just tried this algorithm via https://github.com/andrewbailey/Difference on two dataFrames (as List<DataRow<*>>) which correctly provides the remove/move/add operations that likely occurred between the two dataframes.

We could wrap a library like that in the future to introduce this behavior to DataFrame natively, but in the meantime, you could try that library as well :)

@Jolanrensen Jolanrensen added this to the Backlog milestone Apr 15, 2024
@Jolanrensen Jolanrensen added enhancement New feature or request research labels Apr 15, 2024
@brindasanth
Copy link
Author

Thanks for your comments and adding in Backlog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request research
Projects
None yet
Development

No branches or pull requests

2 participants