Facing an issue to re-train the model in Jetson-inference #1846

gee1902 · 2024-05-16T10:14:16Z

wget https://nvidia.box.com/shared/static/djf5w54rjvpqocsiztzaandq1m3avr7c.pth -O models/mobilenet-v1-ssd-mp-0_675.pth

Hello @dusty-nv
I tried this but it's not downloading properly, and even if it is downloaded an empty file is being created ...further upon running the training of the model I am getting the following error : /home/atmecs2/.local/lib/python3.6/site-packages/torchvision/io/image.py:11: UserWarning: Failed to load image Python extension:
warn(f"Failed to load image Python extension: {e}")
2024-05-16 12:21:14.203825: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.2
2024-05-16 12:21:23 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=2, checkpoint_folder='models/fruit', dataset_type='open_images', datasets=['data/fruit'], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, log_level='info', lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb1-ssd', num_epochs=1, num_workers=1, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resolution=300, resume=None, scheduler='cosine', t_max=100, use_cuda=True, validation_epochs=1, validation_mean_ap=False, weight_decay=0.0005)
2024-05-16 12:21:24 - model resolution 300x300
2024-05-16 12:21:24 - SSDSpec(feature_map_size=19, shrinkage=16, box_sizes=SSDBoxSizes(min=60, max=105), aspect_ratios=[2, 3])
2024-05-16 12:21:24 - SSDSpec(feature_map_size=10, shrinkage=32, box_sizes=SSDBoxSizes(min=105, max=150), aspect_ratios=[2, 3])
2024-05-16 12:21:24 - SSDSpec(feature_map_size=5, shrinkage=64, box_sizes=SSDBoxSizes(min=150, max=195), aspect_ratios=[2, 3])
2024-05-16 12:21:24 - SSDSpec(feature_map_size=3, shrinkage=100, box_sizes=SSDBoxSizes(min=195, max=240), aspect_ratios=[2, 3])
2024-05-16 12:21:24 - SSDSpec(feature_map_size=2, shrinkage=150, box_sizes=SSDBoxSizes(min=240, max=285), aspect_ratios=[2, 3])
2024-05-16 12:21:24 - SSDSpec(feature_map_size=1, shrinkage=300, box_sizes=SSDBoxSizes(min=285, max=330), aspect_ratios=[2, 3])
2024-05-16 12:21:24 - Prepare training datasets.
2024-05-16 12:21:24 - loading annotations from: data/fruit/sub-train-annotations-bbox.csv
2024-05-16 12:21:24 - annotations loaded from: data/fruit/sub-train-annotations-bbox.csv
num images: 5145
2024-05-16 12:21:45 - Dataset Summary:Number of Images: 5145
Minimum Number of Images for a Class: -1
Label Distribution:
Apple: 3622
Banana: 1574
Grape: 2560
Orange: 6186
Pear: 757
Pineapple: 534
Strawberry: 7553
Watermelon: 753
2024-05-16 12:21:45 - Stored labels into file models/fruit/labels.txt.
2024-05-16 12:21:45 - Train dataset size: 5145
2024-05-16 12:21:45 - Prepare Validation datasets.
2024-05-16 12:21:45 - loading annotations from: data/fruit/sub-test-annotations-bbox.csv
2024-05-16 12:21:45 - annotations loaded from: data/fruit/sub-test-annotations-bbox.csv
num images: 930
2024-05-16 12:21:49 - Dataset Summary:Number of Images: 930
Minimum Number of Images for a Class: -1
Label Distribution:
Apple: 329
Banana: 132
Grape: 446
Orange: 826
Pear: 107
Pineapple: 105
Strawberry: 754
Watermelon: 125
2024-05-16 12:21:49 - Validation dataset size: 930
2024-05-16 12:21:49 - Build network.
2024-05-16 12:21:49 - Init from pretrained SSD models/mobilenet-v1-ssd-mp-0_675.pth
Traceback (most recent call last):
File "train_ssd.py", line 371, in
net.init_from_pretrained_ssd(args.pretrained_ssd)
File "/home/atmecs2/jetson-inference/python/training/detection/ssd/vision/ssd/ssd.py", line 133, in init_from_pretrained_ssd
state_dict = torch.load(model, map_location=lambda storage, loc: storage)
File "/home/atmecs2/.local/lib/python3.6/site-packages/torch/serialization.py", line 608, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/atmecs2/.local/lib/python3.6/site-packages/torch/serialization.py", line 777, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input

Kindly help me out

Originally posted by @dusty-nv in #831 (comment)

The text was updated successfully, but these errors were encountered:

dusty-nv · 2024-05-16T18:23:05Z

Hi @gee1902, are you able to access/download the base model from the original upstream pytorch-ssd repo here?

gee1902 · 2024-05-17T17:48:29Z

Hello dusty, Thanks for responding fast and forwarding pytorch-ssd-repo links, now I am able to train the model. But it's taking a lot of time, even for training using one epoch it's taking about 10 hours, I terminated it for time being. I already created the swap file in the starting though , performed some experiments, is there any further requirement to create another swap file? With regards Geethika

…

On Thu, 16 May 2024, 11:53 pm Dustin Franklin, ***@***.***> wrote: Hi @gee1902 <https://github.com/gee1902>, are you able to access/download the base model from the original upstream pytorch-ssd repo here? - https://github.com/qfgaohao/pytorch-ssd?tab=readme-ov-file#download-models - https://drive.google.com/drive/folders/1pKn-RifvJGWiOx0ZCRLtCXM5GT5lAluu — Reply to this email directly, view it on GitHub <#1846 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A4GFSOCES5QFVQP5TNPSLB3ZCT2R7AVCNFSM6AAAAABHZ4VRYGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJVHEZDKNRZGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

dusty-nv · 2024-05-17T19:38:47Z

Hi @gee1902, which Jetson are you on? Unfortunately yes on Nano it can take a while. You can scale back the size of the dataset so it trains on less images. However what I would really recommend is to just proceed to the next step of collecting or providing your own dataset: https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-collect-detection.md

Those you can train on just a few hundred images, and it takes a lot shorter. You can either collect your dataset through the tool, or use an online tool like CVAT to annotate existing images with bounding boxes and export them is Pascal VOC format.

gee1902 · 2024-05-18T00:37:19Z

Hi dusty,
The jetson I am using is jetson nano developer kit 4GB,
Yes, I am planning for training on custom dataset but I just wanted to experiment on this....is there anyway way I can train in lesser time efficiently.
I will try training using custom dataset, thanks for the response .

With regards
Geethika

dusty-nv · 2024-05-18T01:56:04Z

If you wanted to complete the openImages example to confirm the training to be working, when you download it with the downloader script you can specify less images.

gee1902 · 2024-05-20T17:22:50Z

Thanks, dusty will surely try and get back. Once again thank you for helping out. With regards

…

On Sat, 18 May 2024, 7:26 am Dustin Franklin, ***@***.***> wrote: If you wanted to complete the openImages example to confirm the training to be working, when you download it with the downloader script you can specify less images. — Reply to this email directly, view it on GitHub <#1846 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A4GFSOFW7TZKF6AMQNVJ3LTZC2YMVAVCNFSM6AAAAABHZ4VRYGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJYGU3TONZUGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

gee1902 changed the title ~~Facing an issue to train the model~~ Facing an issue to train the model in Jetson-inference May 16, 2024

gee1902 changed the title ~~Facing an issue to train the model in Jetson-inference~~ Facing an issue to re-train the model in Jetson-inference May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Facing an issue to re-train the model in Jetson-inference #1846

Facing an issue to re-train the model in Jetson-inference #1846

gee1902 commented May 16, 2024 •

edited

dusty-nv commented May 16, 2024

gee1902 commented May 17, 2024 via email

dusty-nv commented May 17, 2024

gee1902 commented May 18, 2024

dusty-nv commented May 18, 2024

gee1902 commented May 20, 2024 via email

Facing an issue to re-train the model in Jetson-inference #1846

Facing an issue to re-train the model in Jetson-inference #1846

Comments

gee1902 commented May 16, 2024 • edited

dusty-nv commented May 16, 2024

gee1902 commented May 17, 2024 via email

dusty-nv commented May 17, 2024

gee1902 commented May 18, 2024

dusty-nv commented May 18, 2024

gee1902 commented May 20, 2024 via email

gee1902 commented May 16, 2024 •

edited