Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LayoutLMv3 has to depend on Detectron2? #715

Closed
7fantasysz opened this issue May 14, 2022 · 5 comments
Closed

LayoutLMv3 has to depend on Detectron2? #715

7fantasysz opened this issue May 14, 2022 · 5 comments

Comments

@7fantasysz
Copy link

I am using LayoutLMv3.

If I am not interested in the layout/object detection task, but only form recogniser and document classification tasks, could I be spared for the Detectron2 installation? It's hard to install on a VM without direct public internet connection and this installation will add unnecessary burden to our pipeline that's running everyday.

@HYPJUDY
Copy link
Contributor

HYPJUDY commented May 14, 2022

Hi, thanks for your question!

The current version of the unilm/layoutlmv3 implementation uses Detectron2 in the following two aspects:

  1. To load images in datasets (e.g., FUNSD, CORD).
    You can avoid installing Detectron2 (reference) by modifying the following codes
    from detectron2.data.detection_utils import read_image
    from detectron2.data.transforms import ResizeTransform, TransformList

    def load_image(image_path):
    image = read_image(image_path, format="BGR")
    h = image.shape[0]
    w = image.shape[1]
    img_trans = TransformList([ResizeTransform(h=h, w=w, new_h=224, new_w=224)])
    image = torch.tensor(img_trans.apply_image(image).copy()).permute(2, 0, 1) # copy to make it writeable
    return image, (w, h)

    to
    from PIL import Image
    def load_image(image_path):
        image = Image.open(image_path).convert("RGB")
        w, h = image.size
        return image, (w, h)
  1. To support detection tasks.
    The current version of the unilm/layoutlmv3 implementation has set detection=False, which does not use detection components. Removing all codes related to detection in modeling_layoutlmv3.py will also work. For example, @NielsRogge has removed is_detection logic in this PR.

@7fantasysz
Copy link
Author

@HYPJUDY

Your answer is really helpful! That's what I was looking for. Two follow-up questions:

  1. After removal of this dependency, would it affect the accuracy of different tasks (form/receipt understanding, image classification, DocVQA)? If it is, do you have a metrics about how much accuracy will differ from the one on paper?

  2. By reading the paper, my understanding is that adding Detectron2 is only to finetune and compare with other models on PubLayNet dataset, not a fundamental part of this layoutlmv3 for other tasks, right?

Thanks for your help!

@HYPJUDY
Copy link
Contributor

HYPJUDY commented May 16, 2022

I'm glad it helped.

  1. The two snippet of codes should be equivalent, so switching from one to the other will not affect accuracy. I haven't verified this conclusion experimentally, but @NielsRogge's experimental results (e.g., FUNSD) support this conclusion.
  2. You are right.

@7fantasysz
Copy link
Author

That's great to know. Also appreciate your insightful research work, which is the key enabler of our project. Thank you!

@HYPJUDY
Copy link
Contributor

HYPJUDY commented May 17, 2022

My pleasure : ) Good luck with your project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants