Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script file request consistent with the paper #81

Closed
zhangtianhong-1998 opened this issue May 15, 2024 · 1 comment
Closed

Script file request consistent with the paper #81

zhangtianhong-1998 opened this issue May 15, 2024 · 1 comment

Comments

@zhangtianhong-1998
Copy link

I would very much like the author to further provide a script file consistent with the paper. I tried to set the parameters myself and described in the paper, but the test results could not be replicated.

@kwuking
Copy link
Collaborator

kwuking commented May 17, 2024

I would very much like the author to further provide a script file consistent with the paper. I tried to set the parameters myself and described in the paper, but the test results could not be replicated.

Thank you very much for your interest in our work and the effort you have put into replicating our experimental results. We understand the challenges and difficulties that may arise when attempting to replicate the results of deep learning models, especially when it involves using multiple GPUs and advanced optimization techniques such as DeepSpeed's ZeRO.

Our training process utilizes the ZeRO-2 optimizer under the DeepSpeed framework, which is a technology optimized and accelerated for large-scale training. During this process, we use gradient accumulation to simulate larger batch sizes, which helps us manage limited hardware resources but also introduces randomness, as gradient accumulation affects weight updates. Secondly, we have adopted mixed precision training, specifically using bfloat16 (bf16) precision, which can significantly reduce memory usage and speed up training. However, adopting lower precision in floating-point representation also introduces additional numerical computation errors, another source of randomness in results. It's important to note that these techniques are very common and necessary for training large models in modern deep learning frameworks, although they may cause slight fluctuations in results. Additionally, the deep learning runtime environment, framework versions, CUDA versions, and similar factors might introduce a certain degree of randomness.

If you have more specific questions or need assistance, please feel free to contact us by email for further discussion.

@kwuking kwuking closed this as completed May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants