VideoClips.subset() does not reflect output_format of the original VideoClips #6696

Simcs · 2022-10-04T13:04:56Z

🐛 Describe the bug

When new VideoClips is created using VideoClips.subset() function, output_format of the original VideoClips is not taken into account.

Because of this, passing output_format="TCHW" argument to torchvision.datasets.hmdb51.HMDB51 does not work.
It seems that HMDB51 creates full_video_clips and video_clips for each fold of the dataset using subset() function.

Code snippet

Below is the code snippet for reproducing the bug:

import os
from torchvision.datasets import HMDB51

dataset_root = '~/workspace/dataset/hmdb51'
dataset_root = os.path.expanduser(dataset_root)
annotation_path = '~/workspace/dataset/testTrainMulti_7030_splits'
annotation_path = os.path.expanduser(annotation_path)

hmdb51_thwc = HMDB51(
    root=dataset_root,
    annotation_path=annotation_path,
    frames_per_clip=16,
    step_between_clips=1,
    frame_rate=2,
    fold=1,
    train=False,
    num_workers=8,
    output_format="THWC",
)
video, audio, class_index = hmdb51_thwc[0]
print(video.shape)

hmdb51_tchw = HMDB51(
    root=dataset_root,
    annotation_path=annotation_path,
    frames_per_clip=16,
    step_between_clips=1,
    frame_rate=2,
    fold=1,
    train=False,
    num_workers=8,
    output_format="TCHW",
)
video, audio, class_index = hmdb51_tchw[0]
print(video.shape)

Output:

torch.Size([16, 240, 320, 3])
torch.Size([16, 240, 320, 3])

Possible solution:

Passing output_format of the original VideoClips when instantiating new VideoClips in VideoClips.subset() function.

Versions

Collecting environment information...
PyTorch version: 1.12.1+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31

Python version: 3.10.6 (main, Aug 30 2022, 03:24:51) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.4.0-126-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: NVIDIA TITAN RTX
GPU 1: NVIDIA TITAN RTX

Nvidia driver version: 515.65.01
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.2.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.23.2
[pip3] pytorch-lightning==1.3.8
[pip3] pytorchvideo==0.1.5
[pip3] torch==1.12.1+cu113
[pip3] torchaudio==0.12.1+cu113
[pip3] torchmetrics==0.5.1
[pip3] torchvision==0.13.1+cu113
[conda] Could not collect

The text was updated successfully, but these errors were encountered:

Simcs · 2022-10-04T13:11:44Z

Currently, I am bypassing this issue by hardcoding video_clips.output_format = "TCHW".

For example,

hmdb51_tchw = HMDB51(
    root=dataset_root,
    annotation_path=annotation_path,
    frames_per_clip=16,
    step_between_clips=1,
    frame_rate=2,
    fold=1,
    train=False,
    num_workers=8,
    output_format="TCHW",
    _precomputed_metadata=hmdb51_test_metadata,
)
hmdb51_tchw.video_clips.output_format = "TCHW"
video, audio, class_index = hmdb51_tchw[0]
print(video.shape)

Output:

torch.Size([16, 3, 240, 320])

YosuaMichael · 2022-10-04T16:07:08Z

Thanks for the report @Simcs , I confirm that this is indeed a bug. I will create a PR to fix this.

YosuaMichael · 2022-10-06T08:14:38Z

I believe I have fixed for this issue with #6700 (will be in torchvision v0.14) and I will close the issue for now.

YosuaMichael self-assigned this Oct 4, 2022

YosuaMichael added the bug label Oct 4, 2022

YosuaMichael mentioned this issue Oct 4, 2022

[bugfix] Fix the output format for VideoClips.subset #6700

Merged

YosuaMichael closed this as completed Oct 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VideoClips.subset() does not reflect output_format of the original VideoClips #6696

VideoClips.subset() does not reflect output_format of the original VideoClips #6696

Simcs commented Oct 4, 2022

Simcs commented Oct 4, 2022

YosuaMichael commented Oct 4, 2022

YosuaMichael commented Oct 6, 2022

VideoClips.subset() does not reflect output_format of the original VideoClips #6696

VideoClips.subset() does not reflect output_format of the original VideoClips #6696

Comments

Simcs commented Oct 4, 2022

🐛 Describe the bug

Code snippet

Output:

Possible solution:

Versions

Simcs commented Oct 4, 2022

YosuaMichael commented Oct 4, 2022

YosuaMichael commented Oct 6, 2022