seperate plot 4 phases to phase1 and phase234 #266

EchoAGI · 2021-05-26T23:23:58Z

create plot_disk_pipeline.hpp from plot_disk.hpp.
separate phases to phase1 and phase234, in order to fully resource usage in kubernetes.
because resource limit to phase1 and phase234 are not the same.

add -h, phase flag to cli.hpp.

add corresponding python bindings to plot_disk_pipeline.hpp, which has two functions "create_plot_disk_phase1" and "create_plot_disk_phase234"

arvidn · 2021-05-27T07:20:59Z

a few high-level comments:

It would be much easier to review if you would not make formatting changes in this PR, but propose those separately
I think we would prefer comments in english
I think the justification for this change is a bit light. Could you elaborate on "in order to fully resource usage in kubernetes.
because resource limit to phase1 and phase234 are not the same."?

EchoAGI · 2021-05-27T09:56:19Z

a few high-level comments:

It would be much easier to review if you would not make formatting changes in this PR, but propose those separately

I think we would prefer comments in english

I think the justification for this change is a bit light. Could you elaborate on "in order to fully resource usage in kubernetes.
because resource limit to phase1 and phase234 are not the same."?

Thanks! We'll improve the changes later... BTW, why not use multiple merge sort, but bucket sort on disk???

mgraczyk · 2021-06-11T21:37:39Z

@newtalentxp newtalentxp The data being sorted is usually uniformly distributed, so the bucket sort performs better at the cost of higher memory. It is O(n) instead of O(n logn).

The quicksort_last sort strategy is used to sort the buckets that are not uniformly distributed. A merge sort would probably perform better there. I use std::sort for my own plotting, which in my libstdc++ does an introsort.

mgraczyk · 2021-06-11T21:40:57Z

IMO the it would be better to first add checkpoints which allow phases to be resumed from start. Then you can run the processes on separate machines by transferring the checkpoint data from machine to machine (or just storing it on a shared location in the first place).

This is pretty easy to do at the beginning and end of each phase.

github-actions · 2021-08-12T11:05:45Z

'This PR has been flagged as stale due to no activity for over 60
days. It will not be automatically closed, but it has been given
a stale-pr label and should be manually reviewed.'

EchoAGI added 2 commits May 27, 2021 06:01

seprate phase1 and phase234

a85455f

separate phase1 and phase234

0f79e4b

EchoAGI changed the title ~~seperate plot phase4 to phase1 and phase234~~ seperate plot 4 phases to phase1 and phase234 May 26, 2021

add k8s.md

7b73d7d

EchoAGI added 6 commits May 31, 2021 05:57

fix

1f2b2f1

fix

fed29db

fix

ab87675

fix

c45a193

fix

8727b00

fix

561231a

github-actions bot added the stale-pr label Aug 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

seperate plot 4 phases to phase1 and phase234 #266

seperate plot 4 phases to phase1 and phase234 #266

EchoAGI commented May 26, 2021

arvidn commented May 27, 2021

EchoAGI commented May 27, 2021

mgraczyk commented Jun 11, 2021 •

edited

mgraczyk commented Jun 11, 2021

github-actions bot commented Aug 12, 2021

seperate plot 4 phases to phase1 and phase234 #266

Are you sure you want to change the base?

seperate plot 4 phases to phase1 and phase234 #266

Conversation

EchoAGI commented May 26, 2021

arvidn commented May 27, 2021

EchoAGI commented May 27, 2021

mgraczyk commented Jun 11, 2021 • edited

mgraczyk commented Jun 11, 2021

github-actions bot commented Aug 12, 2021

mgraczyk commented Jun 11, 2021 •

edited