Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The problem of loss reduction in training process #84

Open
JackKoLing opened this issue May 15, 2024 · 8 comments
Open

The problem of loss reduction in training process #84

JackKoLing opened this issue May 15, 2024 · 8 comments

Comments

@JackKoLing
Copy link

Very good work. Due to the limited equipment, I ran LLAMA in a single Nvidia 3090, and the batch size was set to 4. I only ran 4000 of data from ETTh1, and the input sequence was set to 256 and the output to 48. The learning_rate is 0.001, and other params are followed the script.

During training, I found that the first epoch was the best, and the other losses up end to stop after 10 early stops. I run BERT and GPT2 in a similar situation. (Set 100 epochs, but usually the best is about 10 epochs). The final MAE is 0.6039629. After visualization, it is found that the prediction effect of slightly regular sequences is good, while the prediction effect of less regular sequences is poor.

I don't know if the training process is correct, or if it converges with very few epochs due to the power of LLM in TIME-LLM? Do you have any suggestions? Thanks.

@well0203
Copy link

hi, I am also interested in this question and would like to read the answers (probably authors have experienced that as well).

@dreamerforwuyang
Copy link

你好,可以参照下代码吗,我的一直跑不起来报错,网上也没有解决方案

@JackKoLing
Copy link
Author

你好,可以参照下代码吗,我的一直跑不起来报错,网上也没有解决方案

就是作者提供的script脚本,只是我的显卡比较旧,所以配置的环境还是以前的python 3.85而不是3.11。数据我使用的是ETTh1的前2000条,因为改了数据,所以data_loader那里的数据集划分范围需要相应的修改。

@dreamerforwuyang
Copy link

dreamerforwuyang commented May 22, 2024 via email

@JackKoLing
Copy link
Author

感谢回复,我是一名初学者,,我是用的是windows上的pycharm,版本号是3.11.折腾了有一周了,没有实现运行成功,如果可以,可否学习一下您运行的代码,临近毕业延期,感激不尽 @.*** @.*** 原始邮件 发件人:"JackKoLing"< @.*** >; 发件时间:2024/5/22 10:55 收件人:"KimMeen/Time-LLM"< @.*** >; 抄送人:"吴洋"< @.*** >;"Comment"< @.*** >; 主题:Re: [KimMeen/Time-LLM] The problem of loss reduction in trainingprocess (Issue #84) 你好,可以参照下代码吗,我的一直跑不起来报错,网上也没有解决方案 就是作者提供的script脚本,只是我的显卡比较旧,所以配置的环境还是以前的python 3.85而不是3.11。数据我使用的是ETTh1的前2000条,因为改了数据,所以data_loader那里的数据集划分范围需要相应的修改。 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

你应该把一些错误提示贴出来,如果有同学刚好遇到就一起讨论。我运行的就是作者的代码,下载好预训练权重就可以跑了,并没有改其他内容,作者的readme和script已经很清晰

@dreamerforwuyang
Copy link

dreamerforwuyang commented May 22, 2024 via email

@dreamerforwuyang
Copy link

dreamerforwuyang commented May 22, 2024 via email

@kwuking
Copy link
Collaborator

kwuking commented May 27, 2024

Very good work. Due to the limited equipment, I ran LLAMA in a single Nvidia 3090, and the batch size was set to 4. I only ran 4000 of data from ETTh1, and the input sequence was set to 256 and the output to 48. The learning_rate is 0.001, and other params are followed the script.

During training, I found that the first epoch was the best, and the other losses up end to stop after 10 early stops. I run BERT and GPT2 in a similar situation. (Set 100 epochs, but usually the best is about 10 epochs). The final MAE is 0.6039629. After visualization, it is found that the prediction effect of slightly regular sequences is good, while the prediction effect of less regular sequences is poor.

I don't know if the training process is correct, or if it converges with very few epochs due to the power of LLM in TIME-LLM? Do you have any suggestions? Thanks.

Your observation is correct. Due to the large learning rate set in our demo script, and considering the type of devices and the running environment, it is indeed possible to achieve good convergence with fewer epochs. Given the large parameter space and generalization capabilities of LLMs, this phenomenon is to be expected. We are currently conducting more explorations to better leverage the potential of LLMs for time series data. We greatly appreciate your attention to our work.

I believe that the exploration of LLMs is still in its early stages, particularly due to the lack of theoretical foundations. Many of our findings are often based on empirical results. We are very keen to collaborate with you in exploring LLMs for time series data. We believe that this exploration can effectively advance the time series community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants