LayoutLMv3 has to depend on Detectron2? #715

7fantasysz · 2022-05-14T01:10:49Z

I am using LayoutLMv3.

If I am not interested in the layout/object detection task, but only form recogniser and document classification tasks, could I be spared for the Detectron2 installation? It's hard to install on a VM without direct public internet connection and this installation will add unnecessary burden to our pipeline that's running everyday.

HYPJUDY · 2022-05-14T04:03:14Z

Hi, thanks for your question!

The current version of the unilm/layoutlmv3 implementation uses Detectron2 in the following two aspects:

To load images in datasets (e.g., FUNSD, CORD).
You can avoid installing Detectron2 (reference) by modifying the following codes

unilm/layoutlmv3/layoutlmft/data/image_utils.py

Lines 9 to 10 in ca82fd4

    
           from detectron2.data.detection_utils import read_image 
        
           from detectron2.data.transforms import ResizeTransform, TransformList

unilm/layoutlmv3/layoutlmft/data/image_utils.py

Lines 21 to 27 in ca82fd4

    
           def load_image(image_path): 
        
               image = read_image(image_path, format="BGR") 
        
               h = image.shape[0] 
        
               w = image.shape[1] 
        
               img_trans = TransformList([ResizeTransform(h=h, w=w, new_h=224, new_w=224)]) 
        
               image = torch.tensor(img_trans.apply_image(image).copy()).permute(2, 0, 1)  # copy to make it writeable 
        
               return image, (w, h)

to

from PIL import Image
def load_image(image_path):
    image = Image.open(image_path).convert("RGB")
    w, h = image.size
    return image, (w, h)

To support detection tasks.
The current version of the unilm/layoutlmv3 implementation has set detection=False, which does not use detection components. Removing all codes related to detection in modeling_layoutlmv3.py will also work. For example, @NielsRogge has removed is_detection logic in this PR.

7fantasysz · 2022-05-16T01:42:06Z

@HYPJUDY

Your answer is really helpful! That's what I was looking for. Two follow-up questions:

After removal of this dependency, would it affect the accuracy of different tasks (form/receipt understanding, image classification, DocVQA)? If it is, do you have a metrics about how much accuracy will differ from the one on paper?
By reading the paper, my understanding is that adding Detectron2 is only to finetune and compare with other models on PubLayNet dataset, not a fundamental part of this layoutlmv3 for other tasks, right?

Thanks for your help!

HYPJUDY · 2022-05-16T03:28:08Z

I'm glad it helped.

The two snippet of codes should be equivalent, so switching from one to the other will not affect accuracy. I haven't verified this conclusion experimentally, but @NielsRogge's experimental results (e.g., FUNSD) support this conclusion.
You are right.

7fantasysz · 2022-05-16T23:18:29Z

That's great to know. Also appreciate your insightful research work, which is the key enabler of our project. Thank you!

HYPJUDY · 2022-05-17T06:09:25Z

My pleasure : ) Good luck with your project!

HYPJUDY closed this as completed May 17, 2022

HYPJUDY mentioned this issue Jul 31, 2022

LayoutLMv3 | Object Detection & Huggingface Transformers #800

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LayoutLMv3 has to depend on Detectron2? #715

LayoutLMv3 has to depend on Detectron2? #715

7fantasysz commented May 14, 2022

HYPJUDY commented May 14, 2022

7fantasysz commented May 16, 2022

HYPJUDY commented May 16, 2022

7fantasysz commented May 16, 2022

HYPJUDY commented May 17, 2022

LayoutLMv3 has to depend on Detectron2? #715

LayoutLMv3 has to depend on Detectron2? #715

Comments

7fantasysz commented May 14, 2022

HYPJUDY commented May 14, 2022

7fantasysz commented May 16, 2022

HYPJUDY commented May 16, 2022

7fantasysz commented May 16, 2022

HYPJUDY commented May 17, 2022