Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support executing multiple dependent long-running tasks in parallel #1497

Closed
robaca opened this issue Jul 8, 2022 · 32 comments
Closed

Support executing multiple dependent long-running tasks in parallel #1497

robaca opened this issue Jul 8, 2022 · 32 comments

Comments

@robaca
Copy link

robaca commented Jul 8, 2022

Describe the feature you'd like to request

In our monorepo, we have multiple services that to some degree depend on each other. For local development, we have to start a mock service first, then a second one, and then all others in any order.

For local development and testing, it would be great, to be able to start all these services or only one service and it's dependencies via turborepo in one step.

Describe the solution you'd like

It would be cool if turbo can be configured to not wait for a process to exit, but for some string or regexp to appear on stdout|stderr and put all processes into background until Turbo itself is terminated via Ctrl+C.

We could then just configure our server tasks like this:

    "mock-service#server": {
      "dependsOn": ["build"],
      "longRunning": {
        "waitFor": {
          "stdout": "Service started successfully"
          "timeout": "60s"
        },
      },
      "cache": false
    },
    "my-service#server": {
      "dependsOn": ["build", "mock-service#server"],
      "longRunning": true, 
      "cache": false
    },

For dependencies, it would be needed to be able to specify an explicit package, as runtime dependencies do not necessarily match build time dependencies. I'm not sure if this is already possble, but at least it's not documented.

A leaf task should not need to configure a waiting condition, but having them for all background tasks would make it possible to output some informative message that all background tasks had a successful startup and now everything is up and running.

If a dependency is detected on a task that has longRunning: true, turbo could fail to run because of misconfiguration.

Nice-to-have on top:

  • Being able to configure what should happen with console output of these background services (showing them, hiding them, only showing stderr)
  • having a repl to give runtime hints to turbo after all processes are in background, like /hide mock-service#server, /grep ERROR

Describe alternatives you've considered

Create a command line tool that starts a process, puts it in background and exits by itself when some message appears on stdout or stderr. I'm not sure if it's possible then to stop that background processes if the turbo command is interrupted via Ctrl+C.

@jlarmstrongiv
Copy link

Reminds me of https://www.npmjs.com/package/wait-on

@edwardwilson
Copy link

I have a similar requirement. In my mono repo, I have an app in which I need to have multiple long-running processors started at the beginning. The order doesn’t really matter to me. I just have three long-running that at all need to start.

I have a Vite build with the watch flag. Another process that needs to be run and a custom web server process. Each can start independently and in parallel.

This can be achieved with NPM, Scripps and packages to enable parallel processes, but turborepo does not seem to support this via the pipelines. Turborepo will wait for the first process to complete, which never happens due to the long runing nature.

@grabbou
Copy link

grabbou commented Sep 7, 2022

That would be so cool. Imagine having multiple packages that all need to spin up (run start-dev) in order for you to start working.

@weyert
Copy link
Contributor

weyert commented Sep 7, 2022

Yeah, this would be useful, I sometimes want to run some integration tests and want to make sure the backend services are running before they start.

@grabbou
Copy link

grabbou commented Sep 7, 2022

Not to spam anymore in the thread, but for anyone looking for a nice alternative on top of turborepo would be to configure Visual Studio Code to have a special "task" to spin all processes in a shared terminal session. Here's my gist that spins two processes, useful for development workflow: https://gist.github.com/grabbou/1e19049ebc6127b269f4230bfaed5170

@spigelli
Copy link

I've been looking for something like this for a while now. For me this addresses my docker compose problem.

I don't like to keep my docker compose services running in the background:

  1. Ports conflict with my other projects when I forget to docker-compose down
  2. Some things are process intensive

Additionally things start to get messy when you're integrating a few OSS tools that each depend on multiple services. Imagine for example you're building some sort of webrtc app,
you're integrating the following tools and they're docker services:

  • Open Telem Services
    • prometheus
    • alertmanager
    • nodeexporter
    • cadvisor
    • grafana
    • pushgateway
    • caddy
  • Supabase Services
    • studio
    • kong
    • auth
    • rest
    • realtime
    • storage
    • meta
    • db
  • Livekit:
    • traefik
    • livekit-server
    • redis

So what I've considered in the past is making small npm wrappers as "apps" for different associated services, since that's where they would be if they were run from source.

@hymair
Copy link

hymair commented Dec 17, 2022

In my case I want to run these in sequence and I can't find a proper way to do it currently as dev is a long-running task so dependsOn never resolves.

  1. run backend#dev
  2. run app#gen-types
  3. run app#dev
  4. run web#dev

@VanCoding
Copy link

Hi guys

I want to throw my protocol into the mix as a possible solution to this as well.

It's a very simple way how task-runners like (turborepo, Nx, you name it) could talk to non-terminating watch-processes and tell them when to rebuild, and get the results of those builds.

The idea is that we settle on one protocol, that then can be implemented by a lot of task runners and build tools. I'd really like to get your feedback on this! I hope to get the discussion started here

@sschneider-ihre-pvs
Copy link

sschneider-ihre-pvs commented Jan 19, 2023

Hi guys

I want to throw my protocol into the mix as a possible solution to this as well.

It's a very simple way how task-runners like (turborepo, Nx, you name it) could talk to non-terminating watch-processes and tell them when to rebuild, and get the results of those builds.

The idea is that we settle on one protocol, that then can be implemented by a lot of task runners and build tools. I'd really like to get your feedback on this! I hope to get the discussion started here

the protocol looks a xstate state machine

@eboody
Copy link

eboody commented Jan 27, 2023

Hi guys

I want to throw my protocol into the mix as a possible solution to this as well.

It's a very simple way how task-runners like (turborepo, Nx, you name it) could talk to non-terminating watch-processes and tell them when to rebuild, and get the results of those builds.

The idea is that we settle on one protocol, that then can be implemented by a lot of task runners and build tools. I'd really like to get your feedback on this! I hope to get the discussion started here

could you provide an example of how I would get started using your protocol for managing tsup --watch or tsc -w to restart some file-watching server when some other package has a file change?

@kirill-konshin
Copy link

Usually this kind of management is required for (as bare minimum) TS lib + website, when TS lib in watch mode has to produce some output at least once, then website can pick up the output and keep watching it.

See my old article, section Starting/watching. The easiest approach so far would be to run regular build (not watch) task, which can successfully end, and then run watch task but without initial output.

Unfortunately, this brings overhead until webpack/webpack#4991 is fixed (previously TS was prone too: bug 12996.
UPDATE In TypeScript 3.4 new incremental option has been introduced: it will produce a build and a cache, so no matter how often you restart watchers it will be ready much faster. Unfortunately it’s not yet supported by TS Loader for Webpack It still does not eliminate the necessity to pre-build libraries.

In any case build+watch approach for library may cause web to be built twice, first after build then after first output of watch, if website rebuild is looking at timestamps, not contents.

@dobesv
Copy link
Contributor

dobesv commented Feb 9, 2023

If you have interdependent processes you want to run, I think just putting a "wait" task in front of them makes sense. Can use wait-on to wait for the port to open.

This doesn't seem to require special support from turborepo to work.

@kirill-konshin
Copy link

kirill-konshin commented Feb 9, 2023

True, but it’s an extra tool and it looks awkward to have all such scripts like this: “wait-on blabla && real-thing”. It would be much more convenient if turbo config can have “waitOn” property for such tasks. Property can either call an NPM script that has to exit when condition is met, or a list of files/globs/URLs.

It would be a nice advantage over NX which has same problem now.

I've implemented the wait approach in my Next Redux Wrapper repo, it works, but does not look as sleek as Turbo configs can look:

https://github.com/kirill-konshin/next-redux-wrapper/blob/3a14963a0c8a3ebf39baa331b6a467bbc6cb8ee5/packages/wrapper/package.json#L21
https://github.com/kirill-konshin/next-redux-wrapper/blob/3a14963a0c8a3ebf39baa331b6a467bbc6cb8ee5/packages/demo-redux-toolkit/package.json#L9

@dobesv
Copy link
Contributor

dobesv commented Feb 9, 2023

Hmm true. Not sure how far down the road of being a dev job runner turbo wants to go but it could be handy to have all that stuff integrated, especially if it could watch files and rebuild and restart the dev processes.

@kirill-konshin
Copy link

I’d say that it could directly call wait-on package, by passing args. Or just call a certain NPM wait script like in my case. It should be just a gate to delay dev process start, that’s all what is needed. Actual watching and restarting is the dev processes’ own responsibility. This approach will clearly separate concerns and provide good looking user friendly tool.

@dobesv
Copy link
Contributor

dobesv commented Feb 9, 2023 via email

@kirill-konshin
Copy link

For example, Next.js dev restarts by itself when it detects changes in next.config.js, for other cases I use nodemon and so on. The idea is that dev script should be self-aware and know how to restart, that’s not turbo responsibility.

But knowledge that dev script depends on something in order to run - is a turbo responsibility, since it manages all that kind of stuff.

@dobesv
Copy link
Contributor

dobesv commented Feb 9, 2023 via email

@kirill-konshin
Copy link

Turbo manages dependencies between scripts. That’s the essence. Dev scripts can do whatever - restart, die, that’s lifecycle thing. Waiting for files to be able to launch dev script without error is a dependency problem.

@kirill-konshin
Copy link

In my case wait scripts watch other packages files, that’s awkward :) because for all non-dev scripts turbo takes care of it.

E.g. if package A depends on package B, build scripts of B will be scheduled to run before build script of A, turbo takes care of it. But this somehow does not apply on dev scripts. That’s what I mean by managing dependencies.

@dobesv
Copy link
Contributor

dobesv commented Feb 9, 2023

Waiting for files to be able to launch dev script without error is a dependency problem.

I think if you are waiting for files to be built, you can do that with turbo as it is, if the files are generated by a process that exits. Are the files you are waiting for being generated by a process that does not exit ?

@kirill-konshin
Copy link

kirill-konshin commented Feb 9, 2023

Consider following: package A is a library, built with TSC, has build and dev scripts (build & watch). Package B depends on package A, for simplicity, B has only dev script.

So in order to run dev in all packages, with the fresh repo, package A must emit js files first, otherwise B#dev will just fail.

There are 2 tasks that can emit these files: A#dev and A#build. Persistent tasks can't depend on other persistent task, so we only can make dev to depend on build, but this will lead to double processing:

  1. A#build emits files
  2. *#dev starts
  3. A#dev emits same files again, which is double effort, files were already there
  4. B#dev rebuilds, because it is triggered by step 3, double processing here as well

Solution

We can introduce wait script (with wait-on whatever in package A) and make dev scripts to depend on it:

# package-a/package.json
{
  "scripts": {
    "build": "tsc ...",
    "dev": "yarn build --watch",
    "wait": "wait-on lib/index.js"
  }
}

# package-b/package.json
{
  "scripts": {
    "dev": "next dev"
  }
}

# turbo.json
{
  "$schema": "https://turbo.build/schema.json",
  "pipeline": {
    "dev": {
      "cache": false,
      "persistent": true,
      "dependsOn": ["^wait"]
    },
    "wait": {
       "cache": false,
    }
  }
}

If you want to be even more explicit, you can override A#dev to not have any dependencies:

{
  "$schema": "https://turbo.build/schema.json",
  "pipeline": {
    "dev": {
      "cache": false,
      "persistent": true,
      "dependsOn": ["^wait"] // <--- by default all dev tasks depend on wait
    },
    "A#dev": {
      "cache": false,
      "persistent": true,
      "dependsOn": [] // <--- except for package A, dev has no dependencies
    },
    "wait": {
       "cache": false
    }
  }
}

In both cases turbo run start results in:

  1. A#wait and A#dev starts in parallel
  2. A#dev produces output files
  3. A#wait exits
  4. B#dev starts

As expected.

I believe this solution is worthy of being documented to clearly explain the proper way to run development tasks.

P.S. Since we can define output for tasks that do exit, maybe we can also define it for those that don’t exit and treat it as permission to move on, which will allow non-exit tasks to be used as dependencies? Similar to waiting but more inline with turbo naming.

@justinwaite
Copy link

@kirill-konshin Thanks for this! I was able to simplify it for my use case by setting dependsOn in dev to "^wait":

{
  "$schema": "https://turbo.build/schema.json",
  "pipeline": {
    "dev": {
      "cache": false,
      "persistent": true,
      "dependsOn": ["^wait"]
    },
    "wait": {
      "cache": false
    }
  }
}

Then with that I didn't have to do anything special for "Package A"

@kirill-konshin
Copy link

@justinwaite I'm glad my findings were useful for you. The only thing I'd like to highlight is that if wait depends on the initial output of dev then your pipeline can be stuck, wait task will wait for files and dev won't run until wait releases.

@justinwaite
Copy link

@justinwaite I'm glad my findings were useful for you. The only thing I'd like to highlight is that if wait depends on the initial output of dev then your pipeline can be stuck, wait task will wait for files and dev won't run until wait releases.

I think I see what your saying. If Package B depends on A, and you try and run dev on B without running dev on A, then this would get stuck. Right? But if you're always running them together, then I don't see how this could get stuck waiting.

@kirill-konshin
Copy link

@justinwaite if you configure it like in your example, all dev will depend on wait, including A#dev will depend on A#wait, which means A#dev can only run after A#wait but it waits for files that will be emitted by A#dev.

@justinwaite
Copy link

justinwaite commented Feb 15, 2023

@justinwaite if you configure it like in your example, all dev will depend on wait, including A#dev will depend on A#wait, which means A#dev can only run after A#wait but it waits for files that will be emitted by A#dev.

I think you might be mistaken here. I have it set to ^wait which says "only wait on workspace dependencies' wait script, not my own".

From the docs:

The ^ symbol explicitly declares that the task has a dependency on a task in a workspace it depends on

{
  "$schema": "https://turbo.build/schema.json",
  "pipeline": {
    "build": {
      // "A workspace's `build` command depends on its dependencies'
      // and devDependencies' `build` commands being completed first"
      "dependsOn": ["^build"]
    }
  }
}

And I can confirm from my own repo that Package A does not run or wait on the wait script, since it has no workspace dependencies.

Edit:
For more clarification, in order to get into the state that you are describing, you would have to do:

    "dev": {
      "cache": false,
      "persistent": true,
      "dependsOn": ["wait", "^wait"]
    },

@kirill-konshin
Copy link

kirill-konshin commented Feb 16, 2023

@justinwaite you're right, I overlooked the ^, also, you've defined wait: {cache: false} with no dependencies. Works perfectly, great catch.

I have edited my original solution to be more concise.

@remy90
Copy link

remy90 commented Mar 16, 2023

Does anyone have a working template they could refer to? I have the following: https://github.com/remy90/turborepo-payload-starter but haven't managed to make packageB (build) wait on packageA (payload-cms#dev). Anyone have any suggestions to get this working as the comments above have suggested? It just hangs on the template I've provided

@tannerbaum
Copy link

tannerbaum commented Jul 27, 2023

Wanted to also request such a feature. Without going into specifics we start up a little Apollo Server to modify/generate types from an external API.

When I have to do that before my test script for example, my turbo run looks something like turbo run start:server test stop:server. Without wait-on this would be impossible (test waits on server, stop waits on test) , but with it as you can see it leads to cluttered turbo runs or package.json scripts.

A solution like described in the original issue would be a huge improvement.

@imsanchez
Copy link

imsanchez commented Oct 18, 2023

I'd like to share, in case anyone comes here looking for simple answers for running persistent tasks in parallel, the --filter argument worked for me.

This is what my dev script looks like:

dotenv -- turbo run dev --filter=backend --filter=ui --filter=frontend

And the 3 persistent tasks run in parallel. Although they execute in that order, they are not dependent of each other.

@mehulkar
Copy link
Contributor

Thanks for all the disucssion here. In the interest of cleanup, I'm marking this as a duplicate of #986.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests