Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Roadmap] 1.4.0 Roadmap #6500

Closed
19 of 23 tasks
trivialfis opened this issue Dec 14, 2020 · 19 comments
Closed
19 of 23 tasks

[Roadmap] 1.4.0 Roadmap #6500

trivialfis opened this issue Dec 14, 2020 · 19 comments

Comments

@trivialfis
Copy link
Member

trivialfis commented Dec 14, 2020

@dmlc/xgboost-committer Please add your items here by editing this post. Let's ensure that

  • Each item has to be associated with a ticket
  • Major design/refactoring are associated with a RFC before committing the code
  • Blocking issue must be marked as blocking
  • Breaking change must be marked as breaking

For other contributors who have no permission to edit the post, please comment here about what you think should be in 1.4.0.

Main

Dask

For brief notes, at 1.4, dask interface should be feature complete, categorical data support for GPU is ready for public testing and inplace prediction will be more mature.

@Roffild
Copy link
Contributor

Roffild commented Dec 29, 2020

#6507

@SmirnovEgorRu
Copy link
Contributor

I want to propose to replace 'approx' to 'hist when tree_method is set to 'auto' on CPU. I observed #5178 was about this, but it's closed.

hist is faster than approx and my observations that even accuracy is better/on par. GPUs also have only 'hist' method, not 'approx'.

Do we have any concerns not to do this?

CC: @trivialfis, @hcho3, @ShvetsKS

@trivialfis
Copy link
Member Author

I will put up some documents on theoretical aspect of various tree methods, then we can decide together.

@trivialfis
Copy link
Member Author

in #6564

@SmirnovEgorRu
Copy link
Contributor

@trivialfis, do we need to run experiments to decide, probably?

@trivialfis
Copy link
Member Author

@SmirnovEgorRu I don't have objection of changing the default in this or next release. I mentioned there's a huge refactor for CPU implementations to @ShvetsKS . I would like to see some parts of it merged before making the change so we can make some fair comparison. Will come back to it after sorting out issues in dask interface. (which should be quite fast as most of the features are now supported).

@trivialfis
Copy link
Member Author

I closed the PR you referenced because I couldn't get all tests passing, I think even if we decided to change the default now, we still have some blockers to track down. So refactoring first might help making the change clearer and easier.

@trivialfis
Copy link
Member Author

trivialfis commented Jan 14, 2021

Hi @ShvetsKS @SmirnovEgorRu I have been trying to refactor the CPU code for categorical data support based on the efficient CPU Hist code. I found that on URL dataset the cpu hist is slower than approx. It's not a conventional dataset as it's unusually wide and sparse. Just curious if you have plan on optimizing it.

@trivialfis
Copy link
Member Author

The approx implementation is parallelizing on features with dynamic scheduling, so it has an advantage on these kind of datasets.

@SmirnovEgorRu
Copy link
Contributor

@trivialfis, yep, we are thinking how to tune wide data sets as well. I suppose we can outperform approx with hist on URL.

@Denisevi4
Copy link

"Support training multiple models in parallel using dask". Does this include cross-validation with early stopping?

@trivialfis
Copy link
Member Author

@Denisevi4 No, it's for running multiple training sessions on a single cluster simultaneously. But it's a basic requirement for cv.

@Roffild
Copy link
Contributor

Roffild commented Feb 24, 2021

#6731

@trivialfis
Copy link
Member Author

@hcho3 I would like to get the 1.4 out once we got AUC re-implemented. I can try fixing the gamma metric if the AUC re-implementation goes well.

@trivialfis
Copy link
Member Author

I will branch out next week.

@Roffild
Copy link
Contributor

Roffild commented Mar 20, 2021

Will you fix other metrics (gamma-nloglik, logloss)?

@trivialfis
Copy link
Member Author

Yeah, I will take a deeper look into them this weekend.

@trivialfis
Copy link
Member Author

@Roffild Will reply on the original thread: #6731

@hcho3 hcho3 mentioned this issue Mar 29, 2021
8 tasks
@trivialfis
Copy link
Member Author

1.4 is out, submit status will be on #6793 .

@trivialfis trivialfis unpinned this issue Apr 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants