Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Table Transformer #18920

Closed
wants to merge 10 commits into from

Conversation

NielsRogge
Copy link
Contributor

@NielsRogge NielsRogge commented Sep 7, 2022

What does this PR do?

This PR adds Table Transformer by Microsoft, which are DETR-compatible models for table detection and table structure recognition tasks in unstructured documents.

Note: I'm making some updates to the original DETR implementation, however these are justified by the fact that the original DETR implementation by Facebook AI also includes these things, which I didn't add when first porting DETR. Hence, our DETR implementation is now more aligned with the original one.

To do:

  • transfer checkpoints to the Microsoft organization
  • add link to notebook

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 7, 2022

The documentation is not available anymore as the PR was closed or merged.

@sgugger
Copy link
Collaborator

sgugger commented Sep 7, 2022

I'm very much not in favor of adding a new config parameter that controls where the layernorm is applied. I'm not surprised the original code has it, as Facebook AI usually codes models in a modular way, but not Transformers. We had the same thing with BART and friends, and they are coded as distinct models in the library.

Copy link
Contributor

@alaradirik alaradirik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! My only concern is that this model might be difficult to publicize as the paper's main contribution is the PubTables-1M dataset and demonstrating how DETR can be used to solve table extraction and related tasks.

Could we add the notebook you are working on to the notebooks repo?


The Table Transformer model was proposed in [PubTables-1M: Towards comprehensive table extraction from unstructured documents](https://arxiv.org/abs/2110.00061) by
Brandon Smock, Rohith Pesala, Robin Abraham. The authors introduce a new dataset, PubTables-1M, to benchmark progress in table extraction from unstructured documents,
as well as table structure recognition and functional analysis. The authors train 2 [DETR](detr) models, one for table detection and one for table structure recognition, dubbed Table Transformers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
as well as table structure recognition and functional analysis. The authors train 2 [DETR](detr) models, one for table detection and one for table structure recognition, dubbed Table Transformers.
as well as table structure recognition and functional analysis tasks. The authors train two [DETR](detr) models, one for table detection and one for table structure recognition, dubbed Table Transformers.


Tips:

- The authors released 2 models, one for table detection in documents, one for table structure recognition (the task of recognizing the individual rows, columns etc. in a table).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The authors released 2 models, one for table detection in documents, one for table structure recognition (the task of recognizing the individual rows, columns etc. in a table).
- The authors released two models, one for table detection in documents, one for table structure recognition (the task of recognizing the individual rows, columns etc. in a table).

"""
Copy/paste/tweak model's weights to our DETR structure.
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@github-actions
Copy link

github-actions bot commented Oct 8, 2022

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@NielsRogge
Copy link
Contributor Author

Closing this PR in favor of #19614

@NielsRogge NielsRogge closed this Oct 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants