Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom data pretrain #33

Open
LiXinghui-666 opened this issue Jun 7, 2023 · 4 comments
Open

custom data pretrain #33

LiXinghui-666 opened this issue Jun 7, 2023 · 4 comments

Comments

@LiXinghui-666
Copy link

Hi, I would like to ask if I want to train this model on some of my own 3D models to achieve Multi-modal 3D Shape Retrieval task, what do I need to do with the original 3D data to provide the necessary data types for training the model, such as images, text, etc. Can you give me some suggestions or is it convenient to provide some scripts for data preprocessing?

@Tycho-Xue
Copy link
Collaborator

Hi @LiXinghui-666, if your own 3D models don't come with well-paired 3d-text data, then you might want to check "ULIP-2" which we released recently(arxiv link is in this repo), in "ULIP-2", only the 3D models is required.
you basically need to do:

  1. extract point cloud (if you are retrieving by point cloud)
  2. render the 3D models (.obj file or whatever format you have) to holistic views of images.
  3. use image captioning models (we used BLIP-2 in the "ULIP-2") to caption each rendered image from step-2
  4. up until now, you have got tri-modal datasets, use ULIP's framework to train the 3D encoder.
  5. do the retrieval

Let me know if this makes sense or if you need more help.
ULIP-2 is the framework which we are enabling multimodal pre-training where only 3D models themself are needed

@LiXinghui-666
Copy link
Author

Thank you for your patient answer! I want to know if the ULIP-2 model will be open source?

@Tycho-Xue
Copy link
Collaborator

Tycho-Xue commented Jun 7, 2023 via email

@LiXinghui-666
Copy link
Author

Oh! Looking forward to your released codes which can also include the code of the multimodal retrieval task. And I expect ulip2 pre-trained models can also be available for download. I am amazing about the model effect in the paper and can't wait to try it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants