Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VitTrack Evaluation] OTB100 Dataset Implementation #245

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

ryan1288
Copy link
Contributor

@ryan1288 ryan1288 commented Mar 13, 2024

This PR implements the evaluation script for VitTrack using the OTB100 dataset, as part of an issue raised in #119. Reference was made to #205 during the development process.

  • demo.cpp: Calculates "AUC", "precision", and "normalized precision" using the dataset GT.
  • eval.py: Referenced VitTrack's onnx model and OTB as the tracking evaluation dataset.
  • __init__.py: Added OTB dataset.
  • README.md: Updated with instructions for dataset preparation and evaluation. Simplification from Eval tracker #205 ensures complete automation of dataset preparation post-download.

Testing:

  • Executed data preparation and evaluation tests.
  1. python eval.py -m vittrack -d otb -dr /path/to/otb
  2. Within the tools/eval directory, python eval.py -m vittrack -d otb -dr .

Both procedures yield identical results regardless of the presence of the OTB100.zip file, the correct placement of OTB.json, or the preprocessing status of the folders.
image

@fengyuentau fengyuentau self-requested a review March 18, 2024 10:02
@fengyuentau fengyuentau self-assigned this Mar 18, 2024
@fengyuentau fengyuentau added the evaluation adding tools for evaluation or bugs of eval scripts label Mar 18, 2024
@fengyuentau fengyuentau added this to the 4.10.0 milestone Mar 18, 2024
@fengyuentau
Copy link
Member

@lpylpy0514 Could you verify whether this accuracy test results are close to yours?

tools/eval/README.md Outdated Show resolved Hide resolved
@lpylpy0514
Copy link
Contributor

I tested VitTrack based on my pytorch code and got 56.98 AUC on OTB-100.

@fengyuentau
Copy link
Member

I tested VitTrack based on my pytorch code and got 56.98 AUC on OTB-100.

Thank you! @ryan1288 Could you also provide AUC in the script?

@ryan1288
Copy link
Contributor Author

@fengyuentau
Of course. The AUC was actually named success (now renamed to AUC), which is why the output metric (0.586) is similar to @lpylpy0514's AUC metric (0.5698) on OTB-100. I've confirmed that the general logic of AUC calculation between both evaluation methods are the same: IOU calculation and AUC thresholds (0 to 1 in increments of 0.05) were identical.

In order to figure out what is different between the two methods, I need to be able to run the evaluation script from @lpylpy0514 's repository.
I've cloned https://github.com/lpylpy0514/VitTracker, placed the otb dataset in, and tried to run the evaluation on the otb dataset using VitTrack, however, after setting up the environment and trying to run this:
python tracking/test.py vittrack config --dataset otb --threads 16 --num_gpus 4
However, the repository doesn't contain the following, so I wasn't able to run it.
import lib.models.vt.levit_utils as utils
from lib.models.vittrack import build_vittrack

Would you be able to help with this? @lpylpy0514
Thank you!

@lpylpy0514
Copy link
Contributor

Try this.
python tracking/test.py vit_dist vit_48_h32_noKD --dataset otb --threads 16 --num_gpus 4

@ryan1288
Copy link
Contributor Author

@fengyuentau
I've got the python tracking/test.py vit_dist vit_48_h32_noKD --dataset otb --threads 16 --num_gpus 4 working now. Looking into the discrepancy, I noticed that while the image sequence and initialization box are identical, the model tracking outputs begin to diverge. Comparison for Basketball Sequence

@lpylpy0514
Is it possible to get confirmation that the two models are the same? I could confirm it by converting to/from onnx but thought that if you know, it'd be much quicker so I can dig into the differences within the process.

  • opencv_zoo uses object_tracking_vittrack_2023sep.onnx
  • @lpylpy0514's tracker with the above command uses OstrackDist_ep0300.pth.tar

@fengyuentau
Copy link
Member

@fengyuentau I've got the python tracking/test.py vit_dist vit_48_h32_noKD --dataset otb --threads 16 --num_gpus 4 working now. Looking into the discrepancy, I noticed that while the image sequence and initialization box are identical, the model tracking outputs begin to diverge. Comparison for Basketball Sequence

@lpylpy0514 Is it possible to get confirmation that the two models are the same? I could confirm it by converting to/from onnx but thought that if you know, it'd be much quicker so I can dig into the differences within the process.

  • opencv_zoo uses object_tracking_vittrack_2023sep.onnx
  • @lpylpy0514's tracker with the above command uses OstrackDist_ep0300.pth.tar

Good progress 👍 Let's see whether we are comparing two differnet models or two identical models.

@lpylpy0514
Copy link
Contributor

Two models are almost same, but I found a bug in my post-process. The hanning window implementation in openCV is not same as my pytorch code, and this may cause some difference. It seems that the implementation in openCV is more accurate based on @ryan1288 's result, but the result may also differ in different dataset. I got 57.55 AUC with the same post-process in openCV and OstrackDist_ep0300.pth.tar.
Some results of OstrackDist_ep0300.pth.tar:
[1.7655e-03, 3.0098e-03, 6.3548e-03, 6.4020e-03, 7.7697e-03, 7.4521e-03, 7.8579e-03, 8.2895e-03, 8.2445e-03, 8.3831e-03, 8.2833e-03, 8.2697e-03, 8.5975e-03, 7.6809e-03, 7.2185e-03, 3.7524e-03]
Some results of object_tracking_vittrack_2023sep.onnx with the same input:
[1.76614523e-03, 3.01009417e-03, 6.35188818e-03, 6.40192628e-03, 7.77038932e-03, 7.45379925e-03, 7.86361098e-03, 8.29905272e-03, 8.25238228e-03, 8.39337707e-03, 8.29535723e-03, 8.28057528e-03, 8.60300660e-03, 7.68348575e-03, 7.22128153e-03, 3.75285745e-03]

@fengyuentau
Copy link
Member

Two models are almost same, but I found a bug in my post-process. The hanning window implementation in openCV is not same as my pytorch code, and this may cause some difference. It seems that the implementation in openCV is more accurate based on @ryan1288 's result, but the result may also differ in different dataset. I got 57.55 AUC with the same post-process in openCV and OstrackDist_ep0300.pth.tar. Some results of OstrackDist_ep0300.pth.tar: [1.7655e-03, 3.0098e-03, 6.3548e-03, 6.4020e-03, 7.7697e-03, 7.4521e-03, 7.8579e-03, 8.2895e-03, 8.2445e-03, 8.3831e-03, 8.2833e-03, 8.2697e-03, 8.5975e-03, 7.6809e-03, 7.2185e-03, 3.7524e-03] Some results of object_tracking_vittrack_2023sep.onnx with the same input: [1.76614523e-03, 3.01009417e-03, 6.35188818e-03, 6.40192628e-03, 7.77038932e-03, 7.45379925e-03, 7.86361098e-03, 8.29905272e-03, 8.25238228e-03, 8.39337707e-03, 8.29535723e-03, 8.28057528e-03, 8.60300660e-03, 7.68348575e-03, 7.22128153e-03, 3.75285745e-03]

Thank you for verifying all these details👍 I think we should make sure that the implementation of algorithm in OpenCV is correct, so feel free to create pull request for the bug fix.

@fengyuentau
Copy link
Member

With patch opencv/opencv#25264, I get AUC 0.579 using this evaluation script, which is still different from 57.55 (0.5755) from @lpylpy0514 's result.

-----------------------------------------------------
|Tracker name|   AUC   | Precision | Norm Precision |
-----------------------------------------------------
|  tracker   |  0.579  |   0.764   |     0.717      |
-----------------------------------------------------

@ryan1288 Could you compare the difference between your script and the one used by @lpylpy0514 ?

@ryan1288
Copy link
Contributor Author

@fengyuentau Yup, will do. I'll both update the OTB dataset process to not rename/remove & find any differences between the processes in the next day or two.

@lpylpy0514
Copy link
Contributor

The data I mentioned above is generated without post-process. It's the difference between pytorch and onnx. I think small difference between them is reasonable.

@ryan1288
Copy link
Contributor Author

@lpylpy0514 Yes I think we're on the same page. The model difference is minimal, so I'll be taking a look at the post-processing to see where the difference is in the next few days.

@fengyuentau
Copy link
Member

The data I mentioned above is generated without post-process. It's the difference between pytorch and onnx. I think small difference between them is reasonable.

There should be very minimal difference between pytorch and onnx. I don't quite understand without post-process. Do you mean straight output from the model and no post processing? Like dropping line 187 in the following link?

https://github.com/opencv/opencv/blob/ff9aeaceb0a8abede5fc5189f2551712037db9ef/modules/video/src/tracking/tracker_vit.cpp#L170-L203

@lpylpy0514
Copy link
Contributor

Yes, the data above is part of conf_map.

…. Removed all moving or renaming or files and directories
@ryan1288
Copy link
Contributor Author

ryan1288 commented Apr 1, 2024

@fengyuentau I've made some changes to:

  1. No longer renames/moves files.
  2. Updated variable naming for dataset use back to OTB100 to match the dataset.
  3. No longer uses the JSON OTB.json, which I've found to have a few discrepancies with @lpylpy0514 's method of importing the dataset (see figures). groundtruth_rect.txt from BlurCar1 specifically has a bad line, where all lines are comma-separated except Line 496, so that zip file needs to be updated with the correct file and rezipped. Specifically, some numbers from the JSON for Jogging don't match & BlurCar1 has an unusually small array.
  4. Updates README.md to give instructions on dataset setup (now just needs the zip).

Although the outputs still do not match for me, I suspect it's because I don't have the patch from opencv/opencv#25264. I'm to try it with his patch now, as I suspect that fixing the testing dataset to match @lpylpy0514 's method may possibly close the gap.

Screenshot from 2024-04-01 00-43-50
Screenshot from 2024-04-01 00-43-24

Side note: I've been finishing up some paper rebuttals, so I couldn't update this earlier. Regarding GSoC, I put in a post within GSoC Google Group but I don't think anyone has gotten a response yet. Please let me know if mentors generally want a longer, detailed, and more specific proposal. I'd love to work on similar (or different!) things in the summer as my research is ending so please keep me updated 😄 Thanks!

@ryan1288
Copy link
Contributor Author

ryan1288 commented Apr 2, 2024

I rebuilt the most recent version of OpenCV (with the patch) and am getting 0.586. Could you please try it? @fengyuentau Perhaps it's a setup issue in how I rebuilt OpenCV for the patch. Thanks!
image
Otherwise I'll continue looking into the differences in the post-processing.

@fengyuentau
Copy link
Member

With patch opencv/opencv#25264, I get AUC 0.579 using this evaluation script, which is still different from 57.55 (0.5755) from @lpylpy0514 's result.

-----------------------------------------------------
|Tracker name|   AUC   | Precision | Norm Precision |
-----------------------------------------------------
|  tracker   |  0.579  |   0.764   |     0.717      |
-----------------------------------------------------

@ryan1288 Could you compare the difference between your script and the one used by @lpylpy0514 ?

@ryan1288 I am still getting the quoted results (without OTB100.json) with your new commits and latest OpenCV which has the patch merged. What I did additionally is only changing the comma-separated line.


Specifically, some numbers from the JSON for Jogging don't match & BlurCar1 has an unusually small array.

Do we need to do anything regarding this? If no, I will rezip the files.


Regarding GSoC, I put in a post within GSoC Google Group but I don't think anyone has gotten a response yet

We are quite busy with OpenCV 5 release so few of us can respond to GSoC mail group. You can take a look at OpenCV5 items in https://github.com/opencv/opencv/issues?q=is%3Aopen+is%3Aissue+label%3AOpenCV5 and see whether there are suitable items for you (most of them require experienced C++ skills though).

Copy link
Member

@fengyuentau fengyuentau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @lpylpy0514 says that minor AUC difference is accepted, we can merge this PR after the following comments are resolved.

tools/eval/README.md Outdated Show resolved Hide resolved
tools/eval/README.md Outdated Show resolved Hide resolved
tools/eval/datasets/otb100.py Outdated Show resolved Hide resolved
tools/eval/datasets/otb100.py Outdated Show resolved Hide resolved
@ryan1288
Copy link
Contributor Author

ryan1288 commented Apr 8, 2024

With patch opencv/opencv#25264, I get AUC 0.579 using this evaluation script, which is still different from 57.55 (0.5755) from @lpylpy0514 's result.

-----------------------------------------------------
|Tracker name|   AUC   | Precision | Norm Precision |
-----------------------------------------------------
|  tracker   |  0.579  |   0.764   |     0.717      |
-----------------------------------------------------

@ryan1288 Could you compare the difference between your script and the one used by @lpylpy0514 ?

@ryan1288 I am still getting the quoted results (without OTB100.json) with your new commits and latest OpenCV which has the patch merged. What I did additionally is only changing the comma-separated line.

Specifically, some numbers from the JSON for Jogging don't match & BlurCar1 has an unusually small array.

Do we need to do anything regarding this? If no, I will rezip the files.

Regarding GSoC, I put in a post within GSoC Google Group but I don't think anyone has gotten a response yet

We are quite busy with OpenCV 5 release so few of us can respond to GSoC mail group. You can take a look at OpenCV5 items in https://github.com/opencv/opencv/issues?q=is%3Aopen+is%3Aissue+label%3AOpenCV5 and see whether there are suitable items for you (most of them require experienced C++ skills though).

Other than the one unusual line in BlurCar1, no other modifications are needed.

Regarding GSoC OpenCV5, I have more experience in C++ than Python, despite my previous contributions to OpenCV Zoo being primarily in Python. While I acknowledge that my C++ proficiency might not match the standards of a complex open-source project like OpenCV, I am enthusiastic about taking on a well-defined project and am committed to learning whatever is necessary to make meaningful contributions.

For instance, as a focused effort within the imgproc enhancements outlined in issue #25001, I could tackle the implementation of a refined version of HoughLines() utilizing techniques akin to those presented in the paper https://arxiv.org/pdf/2003.04676.pdf, incorporating optimizations for improved performance. That was just an example, as I'd be excited to work on any C++ or Python projects to learn & contribute, given a reasonably-scoped project.

Comment on lines 304 to 306
with open(result_path, 'w') as f:
for bbox in pred_bboxes:
f.write(','.join(map(str, bbox)) + '\n')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can drop result saving for now as they are not needed after evaluation is finished and we do not load them for the second time evaluation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with a global dict. If this isn't preferred, I can find another way tomorrow. 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it is not set correctly:

python eval.py -m vittrack -d otb100 -dr ../../benchmark/data/
Evaluating: 100%|█████████████████████████████████████████████████| 100/100 [09:16<00:00,  5.57s/it]
No prediction found for video Basketball
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/Workspace/miniconda3/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Workspace/fytao/opencv_zoo/tools/eval/datasets/otb100.py", line 64, in evaluate
    evaluation_ret[video.name] = success_overlap(gt_traj, tracker_traj, n_frame)
  File "/Workspace/fytao/opencv_zoo/tools/eval/datasets/otb100.py", line 26, in success_overlap
    iou = overlap_ratio(gt_bb[mask], result_bb[mask])
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Workspace/fytao/opencv_zoo/tools/eval/eval.py", line 156, in <module>
    main(args)
  File "/Workspace/fytao/opencv_zoo/tools/eval/eval.py", line 153, in main
    dataset.print_result()
  File "/Workspace/fytao/opencv_zoo/tools/eval/datasets/otb100.py", line 289, in print_result
    for ret in pool.imap_unordered(benchmark.evaluate, [metric], 1):
  File "/Workspace/miniconda3/lib/python3.10/multiprocessing/pool.py", line 873, in next
    raise value
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you pulled recently? With the current branch, I am able to run the full evaluation with python3 eval.py -m vittrack -d otb100 -dr OTB100. I have no extra files. Worth noting that I do get different results than you, presumably because I don't have the patch properly applied locally.
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which opencv version are you using? I can confirm that both my latest built-from-source version and latest release of opencv do not work. Checking back to the last commit works. Did you do some modifications to the dataset or the opencv code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the dataset zip that you uploaded. I was able to run it using version 4.9.0.
image

I am able to:

  1. Create a new folder
  2. git clone git@github.com:ryan1288/opencv_zoo.git
  3. git checkout ryan-vittracker_eval
  4. Move the new downloadable dataset into the tools/eval folder and unzip
  5. cd tools/eval
  6. python3 eval.py -m vittrack -d otb100 -dr OTB100
    And it runs without problems.
    image

I'll check my OpenCV installation to ensure there aren't discrepancies tomorrow. You're significantly more likely to have set it up correctly than me 👍

@fengyuentau
Copy link
Member

That was just an example, as I'd be excited to work on any C++ or Python projects to learn & contribute, given a reasonably-scoped project.

If you are interested in any project for OpenCV 5, please do leave comment in the issue first and have discussion before contribution, so that your time will not be wasted :)

@fengyuentau
Copy link
Member

I re-zipped the dataset with correction on the missing comma: https://drive.google.com/file/d/1TaGOu7FfTKRM57pgRsdFCa-BYHmaxJI1/view?usp=sharing.

@ryan1288
Copy link
Contributor Author

I'm finishing up a paper rebuttal but I'll take a look at this soon next week. I'm sure we can figure out why there's a discrepancy between our builds!

@ryan1288
Copy link
Contributor Author

ryan1288 commented Apr 29, 2024

Hey @fengyuentau, finally got around to it.

I completely uninstalled all opencv libraries from apt and pip, and deleted all OpenCV related files from /usr/. I had no OpenCV version when I tried to use it. From this clean slate, I reinstalled OpenCV using:

  1. git clone both opencv, opencv_contrib, and opencv_zoo
  2. mkdir -p build && cd build
  3. cmake -DOPENCV_EXTRA_MODULES_PATH=../opencv_contrib/modules ../opencv
  4. cmake --build .
  5. sudo make install
  6. Then I went to opencv_zoo and unzipped the downloaded zip file from your upload into tools/eval

I re-zipped the dataset with correction on the missing comma: https://drive.google.com/file/d/1TaGOu7FfTKRM57pgRsdFCa-BYHmaxJI1/view?usp=sharing.

  1. cd to the tools/eval directory
  2. python3 eval.py -m vittrack -d otb100 -dr OTB100

I have strong reason to believe that I set up my OpenCV correctly because I was able to replicate your 0.579 AUC (and other two metrics) as shown, indicating that I correctly have @lpylpy0514's patch fix from OpenCV master branch.
image

With patch opencv/opencv#25264, I get AUC 0.579 using this evaluation script, which is still different from 57.55 (0.5755) from @lpylpy0514 's result.

Could you please check if this process works correctly? I haven't changed any code since my last commit but it seems to work as expected on my local machine. Thank you @fengyuentau.

@fengyuentau
Copy link
Member

Still does not work on my side. Could you make sure that you are on the lastest commit?

@fengyuentau
Copy link
Member

Looks like it is caused by no prediction for class Basketball.

from tqdm import tqdm
from multiprocessing import Pool, cpu_count

PRED_BBOXES_DICT = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that the problem is this global variable. Please keep detection results in each corresponding class instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
evaluation adding tools for evaluation or bugs of eval scripts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants