Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Deformable DETR #17281

Merged
merged 82 commits into from Sep 14, 2022
Merged

Conversation

NielsRogge
Copy link
Contributor

@NielsRogge NielsRogge commented May 16, 2022

What does this PR do?

This PR implements Deformable DETR, which improves the original DETR using a new "deformable attention" module.

This model requires a custom CUDA kernel (hence it can only be run on GPU). Other than that, the API is entirely the same as DETR.

Models are on the hub.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the PR wasn't necessarily in a state ready for review. Please make sure all docstrings are finished and code is generally cleaned up before asking reviewers to look.

docs/source/en/index.mdx Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
src/transformers/__init__.py Outdated Show resolved Hide resolved
src/transformers/models/auto/configuration_auto.py Outdated Show resolved Hide resolved
src/transformers/models/deformable_detr/test.py Outdated Show resolved Hide resolved
@NielsRogge
Copy link
Contributor Author

NielsRogge commented May 18, 2022

Addressed most comments. I would like to have:

  • @Narsil reviewing the initialization of the model using the custom CUDA kernel
  • @LysandreJik (and possibly @Narsil) help me out regarding making the CI green for a model that only runs on GPU. Should we define a custom CI job for this particular model?
  • @NouamaneTazi will take care of the remaining comments regarding clearer variable names/docstrings, as he has a detailed understanding of this model.

@LysandreJik
Copy link
Member

@LysandreJik (and possibly @Narsil) help me out regarding making the CI green for a model that only runs on GPU. Should we define a custom CI job for this particular model?

We have a require_torch_gpu decorator. Would it help in that case? We could add it to the model tester as a whole, if the model needs GPU to run.

@NielsRogge
Copy link
Contributor Author

NielsRogge commented Jun 2, 2022

@Narsil there's an issue with the pipeline tests, I added DeformableDetrForObjectDetection to the object detection mapping, but this model requires the custom CUDA kernel to be run.

Also, CircleCI reports the following:

Traceback (most recent call last):
  File "utils/check_repo.py", line 764, in <module>
    check_repo_quality()
  File "utils/check_repo.py", line 753, in check_repo_quality
    check_models_are_in_init()
  File "utils/check_repo.py", line 305, in check_models_are_in_init
    for module in get_model_modules():
  File "utils/check_repo.py", line 267, in get_model_modules
    modeling_module = getattr(model_module, submodule)
  File "/home/circleci/.local/lib/python3.7/site-packages/transformers/utils/import_utils.py", line 866, in __getattr__
    value = self._get_module(name)
  File "/home/circleci/.local/lib/python3.7/site-packages/transformers/utils/import_utils.py", line 883, in _get_module
    ) from e
RuntimeError: Failed to import transformers.models.deformable_detr.modeling_deformable_detr because of the following error (look up to see its traceback):
[Errno 2] No such file or directory: '/home/circleci/.local/lib/python3.7/site-packages/transformers/models/deformable_detr/custom_kernel/vision.cpp'

I might need some help with this.

@Narsil
Copy link
Contributor

Narsil commented Jun 3, 2022

@Narsil there's an issue with the pipeline tests, I added DeformableDetrForObjectDetection to the object detection mapping, but this model requires the custom CUDA kernel to be run.

The generic tests will always run the model on CPU, so the best way is to discard this model from the test.

Doing if isinstance(pipeline.models, Deformable...): self.skipTest("This model requires a custom CUDA kernel and is NOT implemented for CPU") should be enough IMO (we know how to update later when needed).

I would also add a slow GPU test that tries to use the pipeline directly if that's OK for the CI.

@require_gpu
@require_torch
@slow
def test_slow(self):
    pipe = pipeline(model="hf-internal-testing/....", device=0)
    out = pipe(....)
    self.assertEqual(out, {....})

Does that make sense ? If it's hard to have a GPU test (not sure we ever call those anyway for pipelines, no @LysandreJik then we can do without but even if it's not auto tested there's value in creating the test IMO (it will run on local machines that try to run the test)

@Narsil
Copy link
Contributor

Narsil commented Jun 3, 2022

As for the missing file, It's probably because the setup.py doesn't properly include the file when installing transformers.

I don't really have good pointers for that since you seem to have added the correct line. The main advice would be to do
python -m build and looking at the output to check that the proper .cpp, .h .cuh are properly included in the build folder. (Installing from source with pip install -e . won't work as it always copy all the files I think so you won't see how the built version fails, maybe it does I am unsure)

@stas00
Copy link
Contributor

stas00 commented Jun 16, 2022

OK, so looking at why the custom kernel fails to build:

_ ERROR collecting tests/models/deformable_detr/test_modeling_deformable_detr.py _
src/transformers/utils/import_utils.py:893: in _get_module
    return importlib.import_module("." + module_name, self.__name__)
/usr/local/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1006: in _gcd_import
    ???
<frozen importlib._bootstrap>:983: in _find_and_load
    ???
<frozen importlib._bootstrap>:967: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:677: in _load_unlocked
    ???
<frozen importlib._bootstrap_external>:728: in exec_module
    ???
<frozen importlib._bootstrap>:219: in _call_with_frames_removed
    ???
src/transformers/models/deformable_detr/modeling_deformable_detr.py:49: in <module>
    MSDA = load_cuda_kernels()
src/transformers/models/deformable_detr/load_custom.py:45: in load_cuda_kernels
    "-D__CUDA_NO_HALF2_OPERATORS__",
../.local/lib/python3.7/site-packages/torch/utils/cpp_extension.py:1156: in load
    keep_intermediates=keep_intermediates)
../.local/lib/python3.7/site-packages/torch/utils/cpp_extension.py:1367: in _jit_compile
    is_standalone=is_standalone)
../.local/lib/python3.7/site-packages/torch/utils/cpp_extension.py:1438: in _write_ninja_file_and_build_library
    verify_ninja_availability()
../.local/lib/python3.7/site-packages/torch/utils/cpp_extension.py:1494: in verify_ninja_availability
    raise RuntimeError("Ninja is required to load C++ extensions")
E   RuntimeError: Ninja is required to load C++ extensions

This occurs quite often. The build is missing ninja.

Try adding pip install ninja to the CircleCI job workflow and see if it solves the problem. Please ping me if it doesn't.

@stas00
Copy link
Contributor

stas00 commented Jun 16, 2022

Additionally, if we start having custom cuda kernels that are enabled by default we must include ninja in our main python dependencies in setup.py.

@stas00
Copy link
Contributor

stas00 commented Jun 18, 2022

so installing ninja did the trick of overcoming the initial hurdle. as commented above - if we make it work it should go into setup.py's dependencies and not the job file - but for now this is good enough while we figure out how to make it work.

Now it's failing:

E   OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

because CircleCI is cpu-only and doesn't have cuda installed by default.

Basically your custom cuda kernel requires cuda installed to build. You don't have to have a gpu to build it, but it needs to be installed.

@ydshieh, do you by chance know if we are planning to get cuda installed on CircleCI? it's easy to do via apt directly from nvidia with .deb packages. Except it's not fast if it's reinstalled on every job run.

@NielsRogge, does this model work on CPU at all? i.e. is there a fallback to non-custom kernel in the absense of GPUs? If it is then the code should be modified to verify if there is a CUDA environment available and if it's not available not to load the custom kernel and everything will just work.

@NielsRogge
Copy link
Contributor Author

The model only runs on GPU and requires the custom kernel. The authors do provide a CPU version here, but it's for "debugging and testing purposes only".

@ydshieh
Copy link
Collaborator

ydshieh commented Jun 18, 2022

The current CircleCI jobs use the docker image circleci/python:3.7. If we decide to install cuda, I think we can build a custom docker image based on it.

@ydshieh
Copy link
Collaborator

ydshieh commented Jun 18, 2022

If it is not too much work to make running on both CPU/GPU work (considering the authors provide some implementation), I would advocate doing it - also mainly for "debugging and testing purposes only".

@NielsRogge
Copy link
Contributor Author

If it is not too much work to make running on both CPU/GPU work (considering the authors provide some implementation), I would advocate doing it - also mainly for "debugging and testing purposes only".

Hmm I looked into the code, the problem is that their CPU version doesn't accept 2 arguments (level_start_index and im2col_step) which the CUDA version has, and are required for correct computation. Hence, I don't think it's possible to have a CPU version of it in the library. The authors also explicitly indicate that the layer isn't implemented on CPU.

@stas00
Copy link
Contributor

stas00 commented Jun 18, 2022

  1. OK, so if the CPU version is not the same then we won't be testing the actual modeling code - not a good idea. let's stick to testing the actual GPU modeling code.

  2. You're setting a new precedent with this model, @NielsRogge - so we need to decide how to deal with such models, so let's bring @LysandreJik and @sgugger to this discussion - I wonder if we should perhaps discuss this in a separate RFC Issue since it will probably impact other similar models in the future.

But we need:

a. the modeling files not fail on import in an environment that lacks cuda installed- so probably either using the earlier suggestion of moving the model loading into __init__ (less ideal) or using try/except and recovering gracefully if cuda env is not availble.

b. the tests for such model should all be decorated with @require_torch_gpu - so it might be tricky with common tests - I wonder if perhaps decorating the test class with @require_torch_gpu would do the trick.

c. the testing will have to happen on our CI that has GPUs. which means no "real-time" testing.

@NielsRogge
Copy link
Contributor Author

b. the tests for such model should all be decorated with @require_torch_gpu - so it might be tricky with common tests - I wonder if perhaps decorating the test class with @require_torch_gpu would do the trick.

I've done this as seen here: NielsRogge@ec61d72.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not 100% sure investing time in a model that is only accessible on GPU is the best, as it restricts a lot the number of users that can play with it, and there won't be any regular tests or inference widget.

However this one is done, so let's finish this (just saying the above for the selection of future models we implement). The main problem for the tests is just the line

MSDA = load_cuda_kernels()

flagged above. It should be inside an if is_torch_cuda_available() and the else branch should set the same object to None. Then all models should error at init if there is no GPU and the whole tests of those models should be decorated by the right require decorator.

.circleci/config.yml Outdated Show resolved Hide resolved
@NielsRogge
Copy link
Contributor Author

Pinging @Narsil regarding excluding this model from the pipeline tests.

@Narsil
Copy link
Contributor

Narsil commented Jun 27, 2022

Hi @NielsRogge ,

The best location to do this is in tests/pipelines/test_pipelines_xxxx.py and simply add some logic in get_test_pipeline function.

But the tests currently seem to be passing, so is this really necessary ?

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jul 1, 2022

The documentation is not available anymore as the PR was closed or merged.

@NielsRogge
Copy link
Contributor Author

PR is ready for review, by adding the model to the mappings this happens:

ERROR tests/pipelines/test_pipelines_feature_extraction.py - RecursionError: ...
ERROR tests/pipelines/test_pipelines_object_detection.py - RecursionError: ma...
!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should resolve the errors in the pipeline tests.

tests/pipelines/test_pipelines_feature_extraction.py Outdated Show resolved Hide resolved
tests/pipelines/test_pipelines_object_detection.py Outdated Show resolved Hide resolved
@NielsRogge
Copy link
Contributor Author

@sgugger that didn't seem to fix the recursion error.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for polishing this PR!

@NielsRogge NielsRogge merged commit 59407bb into huggingface:main Sep 14, 2022
oneraghavan pushed a commit to oneraghavan/transformers that referenced this pull request Sep 26, 2022
* First draft

* More improvements

* Improve model, add custom CUDA code

* Import torch before

* Add script that imports custom layer

* Add everything in new ops directory

* Import custom layer in modeling file

* Fix ARCHIVE_MAP typo

* Creating the custom kernel on the fly.

* Import custom layer in modeling file

* More improvements

* Fix CUDA loading

* More improvements

* Improve conversion script

* Improve conversion script

* Make it work until encoder_outputs

* Make forward pass work

* More improvements

* Make logits match original implementation

* Make implementation also support single_scale model

* Add support for single_scale and dilation checkpoint

* Add support for with_box_refine model

* Support also two stage model

* Improve tests

* Fix more tests

* Make more tests pass

* Upload all models to the hub

* Clean up some code

* Improve decoder outputs

* Rename intermediate hidden states and reference points

* Improve model outputs

* Move tests to dedicated folder

* Improve model outputs

* Fix retain_grad test

* Improve docs

* Clean up and make test_initialization pass

* Improve variable names

* Add copied from statements

* Improve docs

* Fix style

* Improve docs

* Improve docs, move tests to model folder

* Fix rebase

* Remove DetrForSegmentation from auto mapping

* Apply suggestions from code review

* Improve variable names and docstrings

* Apply some more suggestions from code review

* Apply suggestion from code review

* better docs and variables names

* hint to num_queries and two_stage confusion

* remove asserts and code refactor

* add exception if two_stage is True and with_box_refine is False

* use f-strings

* Improve docs and variable names

* Fix code quality

* Fix rebase

* Add require_torch_gpu decorator

* Add pip install ninja to CI jobs

* Apply suggestion of @sgugger

* Remove DeformableDetrForObjectDetection from auto mapping

* Remove DeformableDetrModel from auto mapping

* Add model to toctree

* Add model back to mappings, skip model in pipeline tests

* Apply @sgugger's suggestion

* Fix imports in the init

* Fix copies

* Add CPU implementation

* Comment out GPU function

* Undo previous change

* Apply more suggestions

* Remove require_torch_gpu annotator

* Fix quality

* Add logger.info

* Fix logger

* Fix variable names

* Fix initializaztion

* Add missing initialization

* Update checkpoint name

* Add model to doc tests

* Add CPU/GPU equivalence test

* Add Deformable DETR to pipeline tests

* Skip model for object detection pipeline

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
@robinnarsinghranabhat
Copy link

Hi @NielsRogge . I am following the finetuning notebook for DETR object detection.

You have mentioned that DeformableDETR follows mostly same API. But I noticed that model based on DeformableDetrForObjectDetection doesn't automatically add +1 to number classes.

Also, for the Feature-Extractor, I am confused whether we should opt for as per documentation to use AutoImageProcessor or DeformableDetrFeatureExtractor instead.

To add further, I was wondering if we could add in the augmentation that the original paper follows from the official Repo. I managed to add augmentation based on functions available in official repo for Deformable-DETR. But not sure of the correctness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants