Skip to content

GSoC_2024

Gary Bradski edited this page Apr 17, 2024 · 42 revisions

OpenCV Google Summer of Code 2024

Example use of computer vision: Vision Transformer (https://learnopencv.com/)


OpenCV Accepted Projects:

Mentor only list

  • 🚧 TBD Spreadheet of projects link
Contributor Title Mentors Passed

Important dates:

Date (2024) Description Comment
Feb 6 Organization Applications Closed 👍
Feb 21 Org Acceptance 👍
Mar 18 Proposals Open 👍
Apr 24 Proposal Ranking 🏃‍♂️
April Slots Allocated 🏃

      Timeline

        UTC time   UTC time converter

Resources:

OpenCV Project Ideas List:

Project Discussion list

Index to Ideas Below

  1. Neural 3D Capture and Rendering
  2. 3D Object Capture with SplaTAM
  3. 3D Radiance Toolbox
  4. Multi-camera calibration part 3
  5. Multi-camera calibration test
  6. Multi-camera calibration toolbox
  7. Quantized models for OpenCV Model Zoo
  8. RISC-V Optimizations
  9. Dynamic CUDA support in DNN
  10. OpenGL support with GTK 3 and GTK 4
  11. Animated images and other functionality for imgcodecs
  12. Synchronized multi camera video recorder
  13. libcamera back-end for VideoCapture

Idea Template

All work is in C++ unless otherwise noted.


Ideas:

  1. IDEA: Neural 3D Capture and Rendering

    • Description: Refine SplaTAM or Droid-SLAM 3D capture plus Gaussian Splatting GS code (we cannot use Inria GS code due to licensing issues, NerfSlam better) for users to easily capture neural 3D models of scenes. Possibly allow accuracy boosts using Charuco boards in the scene. Be able localize arbitrary camera poses in the above scenes.
    • Expected Outcomes:
      • Droid SLAM + Gaussian Splatting code and application, well documented for ease of user's capturing 3D models
        • Captures an indoor (e.g. a room) or outdoor scene (e.g. side of yard)
        • Working code pull request on Github
      • Documentation and YouTube videos showing how to use it.
    • Resources: OpenCV, Example NeRF Code, Droid-SLAM paper, Droid-SLAM Code, Gaussian Splatting Description, Gaussian Splatting Code or alternatively SplaTAM. Another possibility is to use Gassian Splatting SLAM
    • Skills Required: Good software development skills, good knowledge of deep methods of 3D visual capture and use of Gaussian Splatting models.
    • Possible Mentors: Doug Lee, Gary Bradski
    • Difficulty: Medium to Hard
    • Duration: 350 hours
  2. IDEA: 3D Object Capture with SplaTAM

    • Description: Use the above or SplaTAM (possibly allowing accuracy boosts using Charuco boards) to capture and log 3D objects to collect a database. We might also use GaussianObject if their code is published and has a compatible license. We then want to use the resulting Neural Render to find/segment the object in a scene.
    • Expected Outcomes:
      • Easy capture of 3D objects
        • Working code pull request on Github
      • An app that allows camera capture from phone, Data processing on server or cloud to process/render and display the object on comptuer including an open license to use captured objects
      • Use object splats to segment objects in a scene.
      • Documentation and YouTube videos showing how to use it.
    • Resources: OpenCV, NerfStudio, Gaussian Splatting Description](https://towardsdatascience.com/a-comprehensive-overview-of-gaussian-splatting-e7d570081362), SplaTAM and code. A new paper that doesn't yet have code, and we'll need to check license is [GaussianObject}(https://gaussianobject.github.io/).
    • Skills Required: Good software development skills, good knowledge of deep methods of 3D visual capture, and use of Gaussian Splatting models.
    • Possible Mentors: Gary Bradski
    • Difficulty: Medium to Hard
    • Duration: 350 hours
  3. IDEA: 3D Radiance Toolbox

    • Description: Create a toolbox for easy capture of radiance scenes. You use NerfStudio or Droid SLAM (Simultaneous Localization and Mapping) plus Gaussian Splatting as a default technique, the main thing is ease of capture from iPhone or Android (fixing focal lengths) to capture the scene and send it to a cloud or server. To initialize and calibrate, images of a Charuco board) can be used to begin/end a capture. Data goes to the cloud or as server where subsequent views in the scene can be accurately localized in 6DOF (degrees of freedom: X,Y,Z,orientation).
    • Expected Outcomes:
      • Easy capture of 3D scenes
        • Working code pull request on Github
      • An app that allows camera capture from phone, Data processing on server or cloud to process/render and display the object on comptuer including an open license to use captured objects. New views from the same or security-type/web cameras can be localized in the scene.
        • The app may use an initial Charuco board to set a coordinate system and to calibrate the capture camera.
        • This app is intended to be modular and later be able to wrap the above two projects.
      • Documentation and YouTube videos showing how to use it.
    • Resources: OpenCV, Example NeRF Code, Droid-SLAM paper, Droid-SLAM Code, Gaussian Splatting Description, Gaussian Splatting Code
    • Skills Required: Good software development skills, computer vision/deep net experience. iPhone and/or Android experience
    • Possible Mentors: Gary Bradski
    • Difficulty: Medium to Hard
    • Duration: 175 hours
  4. IDEA: Multi-camera calibration part 3

    • Description: During GSoC 2023 a new cool multi-camera calibration algorithm was improved: https://github.com/opencv/opencv/pull/24052. This year we would like to finish this work with more test cases, tune the accuracy and build higher-level user-friendly tool (based on the script from the tutorial) to perform multi-camera calibration. If this is completed before the internship is up, then we'll move on to leveraging the IMU or marker-free calibration.
    • Expected Outcomes:
      • A series of patches with more unit tests and bug fixes for the multi-camera calibration algorithm
      • New/improved documentation on how to calibrate cameras
      • A short YouTube video showing off how to use the calibration routines
    • Skills Required: Mastery of C++ and Python, mathematical knowledge of camera calibration, ability to code up mathematical models
    • Difficulty: Medium-Difficult
    • Possible Mentors: Maksym Ivashechkin, Alexander Smorkalov
    • Duration: 175 hours
  5. IDEA: Multi-camera calibration test

    • Description: We are looking for a student to curate best of class calibration data, collect calibration data with various OpenCV Fiducials, and graphically produce calibration board and camera models data (script). Simultaneously, begin to write comprehensive test scripts of all the existing calibration functions. While doing this, if necessary, improve the calibration documentation. Derive from this expected accuracy of fiducial types for various camera types.
    • Expected Outcomes:
      • Curate camera calibration data from public datasets.
      • Collect calibration data for various fiducials and camera types.
      • Graphically create camera calibration data with ready to go scripts
      • Write test functions for the OpenCV Calibration pipeline
      • New/improved documentation on how to calibrate cameras as needed.
      • Statistical analysis of the performance (accuracy and variance) of OpenCV fiducials, algorithms and camera types.
      • A YouTube video showing describing and demonstrating the OpenCV Calibration testss.
    • Resources: OpenCV Fiducial Markers, OpenCV Calibration Functions, OpenCV Camera Calibration Tutorial 1, OpenCV Camera Calibration Tutorial 2
    • Skills Required: Mastery of C++ and Python, mathematical knowledge of camera calibration, ability to code up mathematical models
    • Difficulty: Medium
    • Possible Mentors: Jean-Yves Bouguet, Alexander Smorkalov
    • Duration: 175 hours
  6. IDEA: Multi-camera calibration toolbox

    • Description: Build a higher-level user-friendly tool (based on the script from the calibration tutorial) to perform multi-camera calibration. This should allow easy multi-camera calibration with at multiple Charco patterns and possibly other calibration fiducial patterns. The results will use Monte-Carlo sampling to determine parameter stability, allow easy switching of camera models and output the camera calibration parameters and the fiducial patterns pose in space as well as the extrinsic locations of each camera relative to the others.
    • Expected Outcomes:
      • Tool with convenient API that will be more or less comparable and compatible with Kalibr tool (https://github.com/ethz-asl/kalibr)
      • New/improved documentation on how to calibrate cameras
      • A Youtube video demonstrating how to use the box
    • Skills Required: Python, mathematical knowledge of camera calibration, ability to code up mathematical models
    • Difficulty: Medium-Difficult
    • Possible Mentors: Jean-Yves Bouguet, Gary Bradski
    • Duration: 175 hours
  7. IDEA: Quantized models for OpenCV Model Zoo

    • Description: Many modern CPUs, GPUs and specialized NPUs include special instructions and hardware blocks for accelerated inference, especially for INT8 inference. The models don't just become ~4x smaller compared to FP32 original models, the inference speed increases significantly (by 2x-4x or more) as well. The number of quantized models steadily increases, however, beyond image classification there are not so many 8-bit computer vision models with proven high-quality results. We will be interested to add to our model zoo (https://github.com/opencv/opencv_zoo) 8-bit models for object detection, optical flow, pose estimation, text detection and recognition etc.
    • Expected Outcomes:
      • Series of patches to OpenCV Zoo and maybe to OpenCV DNN (when OpenCV DNN misses 8-bit flavors of certain operations) to add the corresponding models.
      • If quantization is performed by student during the project, we will request the corresponding scripts to perform the quantization
      • Benchmark results to prove the quality of the quantized models along with the corresponding scripts so that we can reproduce it.
    • Skills Required: very good ML engineering skills, good Python programming skills, familiarity with model quantization algorithms and model quality assessment approaches
    • Possible Mentors: Feng Yuantao, Zhong Wanli, Vadim Pisarevsky
    • Difficulty: Medium
    • Duration: 90 to 175 hours, depending on the particular model.
  8. IDEA: RISC-V Optimizations

    • Description: RISC-V is one of main target platforms for OpenCV. During past several years we brought in some RISC-V optimizations based on RISC-V Vector extension by adding another backend to OpenCV scalable universal intrinsics. We refactored a lot of code in OpenCV to make the vectorized loops compatible with RISC-V backend and more or less efficient. Still, we see a lot of gaps and the performance of certain functions can be further improved. For some critical functions, like convolution in deep learning, it makes sense perhaps to implement custom loops using native RVV intrinsics instead of using OpenCV scalable universal intrinsics. This is what we invite you to do.
    • Expected Outcomes:
      • A series of patches for core, imgproc, video and dnn modules to bring improved loops that use OpenCV scalable universal intrinsics or native RVV intrinsics to improve the performance. In the first case the optimizations should not degrade performance on other major platforms like x86-64 or ARMv8 with NEON.
    • Resources:
    • Skills Required: mastery plus experience coding in C++; good skills of optimizing code using SIMD.
    • Possible Mentors: Mingjie Xing, Maxim Shabunin
    • Difficulty: Hard
    • Duration: 350 hours
  9. IDEA: Dynamic CUDA support in DNN

    • Description: OpenCV DNN module includes several backends for efficient inference on various platforms. Some of the backends are heavy and bring in a lot of dependencies, so it makes sense to make the backends dynamic. Recently, we did it with OpenVINO backend: https://github.com/opencv/opencv/pull/21745. The goal of this project is to make CUDA backend of OpenCV DNN dynamic as well. Once it's implemented, we can have a single set of OpenCV binaries and then add the necessary plugin (also in binary form) to accelerate inference on NVidia GPUs without recompiling OpenCV.
    • Expected Outcomes:
      • A series of patches for dnn and maybe core module to build OpenCV DNN CUDA plugin as a separate binary that could be used by OpenCV DNN. In this case OpenCV itself should not have any dependency of CUDA SDK or runtime - the plugin should encapsulate it. It is fine if the user-supplied tensors (cv::Mat) are automatically uploaded to GPU memory by the engine (cv::dnn::Net) before the inference and the output tensors are downloaded from GPU memory after the inference in such a case.
    • Resources:
    • Skills Required: mastery plus experience coding in C++; good practical experience in CUDA. Acquaintance with deep learning is desirable but not necessary, since the project is mostly about software engineering, not about ML algorithms or their optimization.
    • Possible Mentors: Alexander Smorkalov
    • Difficulty: Hard
    • Duration: 350 hours
  10. IDEA: OpenGL support with GTK 3 and GTK 4

    • Description: OpenCV 4.x supports integration with GTK 2, but it's out of date for most of modern Linux distribution. GTK 3+ provides new API for OpenGL integration that needs to be supported in OpenCV 4.x+.
    • Expected Outcomes:
      • Highgui GTK 3+ backend for OpenGL
    • Skills Required: C++, practice with Linux.
    • Possible Mentors: TBD
    • Difficulty: Medium
    • Duration: 175
  11. IDEA: Animated images and other functionality for imgcodecs

    • Description: It's not a single big project but rather a series of tasks under the "image codecs" umbrella. The goal is to make a series of improvements in the existing opencv_imgcodecs module, including, but not limited to, the following items (please, feel free to propose your ideas):
      • support animation encoding/decoding.
      • zlib-ng integration to accelerate png codec
    • Expected Outcomes:
      • imgcodecs new API
    • Skills Required: C++, practice with Linux.
    • Possible Mentors: Vincent Rabaud
    • Difficulty: Medium
    • Duration: 90 or 175, depending on the exact scope
  12. IDEA: Synchronized multi camera video recorder

    • Description: Multi-camera calibration and multi-view scenarios require synchronous recording with multiple cameras. Need to tune cv::VideoCapture or/and VideoWriter and implement sample for video recording with several cameras with timestamps
    • Expected Outcomes:
      • Sync video recording sample for several cameras: V4L2, RTSP(?)
    • Resources: Overview
    • Skills Required: C++
    • Possible Mentors: Alexander S.
    • Difficulty: Easy-Medium
    • Duration: 175
  13. IDEA: libcamera back end for VideoCapture

    • Description: Discussion: #21653
    • Expected Outcomes:
      • MIPI camera support on Raspberry Pi
    • Resources:
      • Skills Required: C++, Linux
      • Possible Mentors: TBD
      • Difficulty: Medium
      • Duration: 175

Idea Template:

1. #### _IDEA:_ <Descriptive Title>
   * ***Description:*** 3-7 sentences describing the task
   * ***Expected Outcomes:***
      * < Short bullet list describing what is to be accomplished >
      * <i.e. create a new module called "bla bla">
      * < Has method to accomplish X >
      * <...>
   * ***Resources:***
         * [For example a paper citation](https://arxiv.org/pdf/1802.08091.pdf)
         * [For example an existing feature request](https://github.com/opencv/opencv/issues/11013)
         * [Possibly an existing related module](https://github.com/opencv/opencv_contrib/tree/master/modules/optflow) that includes some new optical flow algorithms.
   * ***Skills Required:*** < for example mastery plus experience coding in C++, college course work in vision that covers optical flow, python. Best if you have also worked with deep neural networks. >
   * ***Possible Mentors:*** < your name goes here >
   * ***Difficulty:*** <Easy, Medium, Hard>
   * ***Duration:*** <8 weeks is 320 hard, Medium 240, Easy 200>


Contributors

How to Apply

The process is described at GSoC home page

How contributors will be evaluated once working:

  • Contributors will be paid only if:
    • Phase 1:
      • You must generate a pull request
        • That builds
        • Has at least stubbed out (place holder functions such as just displaying an image) functionality
        • With OpenCV appropriate Doxygen documentation (example tutorial)
          • Includes What the function or net is, what the function or net is used for
        • Has at least stubbed out unit test
        • Has a stubbed out example/tutorial of use that builds
    • Phase 2:
      • You must generate a pull request
        • That builds
        • Has all or most of the planned functionality (but still usable without those missing parts)
        • With OpenCV appropriate Doxygen documentation
          • Includes What the function or net is, what the function or net is used for
        • Has some unit tests
        • Has a tutorial/sample of how to use the function or net and why you'd want to use it.
      • Optionally, but highly desirable: create a (short! 30sec-1min) Movie (preferably on Youtube, but any movie) that demonstrates your project. We will use it to create the final video:
    • Extended period:
      • TBD

Mentors:

  1. Contact us, preferably in February or early March, on the opencv-gsoc googlegroups mailing list above and ask to be a mentor (or we will ask you in some known cases)
  2. If we accept you, we will post a request from the Google Summer of Code OpenCV project site asking you to join.
  3. You must accept the request and you are a mentor!
  1. You then:
    • Look through the ideas above, choose one you'd like to mentor or create your own and post it for discussion on the mentor list.
    • Go to the opencv-gsoc googlegroups mailing list above and look through the project proposals and discussions. Discuss the ideas you've chosen.
      • Find likely contributors, ask them to apply to your project(s)
    • You will get a list of contributors who have applied to your project. Go through them and select a contributor or rejecting them all if none suits and joining to co-mentor or to quit this year are acceptable outcomes.
  2. Then, when we get a slot allocation from Google, the administrators "spend" the slots in order of priority influenced by whether there's a capable mentor or not for each topic.
  3. Contributors must finally actually accept to do that project (some sign up for multiple organizations and then choose)
  4. Get to work!

If you are accepted as a mentor and you find a suitable contributor and we give you a slot and the contributor signs up for it, then you are an actual mentor! Otherwise you are not a mentor and have no other obligations.

  • Thank you for trying.
  • You may contact other mentors and co-mentor a project.

You get paid a modest stipend over the summer to mentor, typically $500 minus an org fee of 6%.

Several mentors donate their salary, earning ever better positions in heaven when that comes.

Potential Mentors List:

Ankit Sachan
Anatoliy Talamanov
Clément Pinard
Davis King
Dmitry Kurtaev
Dmitry Matveev
Edgar Riba
Gholamreza Amayeh
Grace Vesom
Jiri Hörner
João Cartucho
Justin Shenk
Michael Tetelman
Ningxin Hu
Rostislav Vasilikhin
Satya Mallick
Stefano Fabri
Steven Puttemans
Sunita Nayak
Vikas Gupta
Vincent Rabaud
Vitaly Tuzov
Vladimir Tyan
Yida Wang
Jia Wu
Yuantao Feng
Zihao Mu

Admins

Gary Bradski
Vadim Pisarevsky
Shiqi Yu

GSoC Org Application Answers

Answers from our OpenCV GSoC application

Clone this wiki locally