Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the option to not use a cache #53

Closed
wants to merge 1 commit into from
Closed

Conversation

dguo
Copy link
Contributor

@dguo dguo commented Mar 14, 2017

Here is a rendered version.

This RFC is related to yarnpkg/yarn#986.

@felixfbecker
Copy link

Use case 3:
Running yarn as part of a long-running server. The cache grows indefinitely over time, eventually reaching max disk size or inode limit.

@dguo
Copy link
Contributor Author

dguo commented Mar 14, 2017

@felixfbecker, I added it to the doc.

@felixfbecker
Copy link

felixfbecker commented Mar 14, 2017

Thanks! I would like to point out that yarn cache clean is not possible if there are yarn installs running concurrently, because they all share the same global cache. Besides that, it is of course unneeded overhead to first save in the cache, then copy over, then clean the cache again.

@bestander
Copy link
Member

Indeed cache introduces a lot of issues and it would be nice to have an option avoid it.
Although I find it hard to implement.

Here is a use case: babel-runtime can be used transitively 5 times in one project.
With cache it will be downloaded once from the registry, unzipped and then during linking phase it would be copied 5 times into node_modules.

If you try to avoid cache then would it download and unzip the package 5 times?
Even if we are ok with this it would complicate the phases in Yarn: resolver, fetcher and linker would have to interface with each other.

@felixfbecker
Copy link

felixfbecker commented Mar 23, 2017

@bestander If babel-runtime is depended on 5 times on the same version, the package should not have to be duplicated, it would be flat in node_modules (what you are describing sounds like old NPM v2 style). If the versions are different, you need to download a different archive anyway.

@ljharb
Copy link

ljharb commented Mar 23, 2017

node_modules is never guaranteed to be flat, it's just (in npm 3 and later) as flat as possible. The dependency graph might require the duplication.

@felixfbecker
Copy link

@ljharb in what cases does it require duplication if the versions are the same?

@ljharb
Copy link

ljharb commented Mar 23, 2017

@felixfbecker if A and B require X v1, and C and D require X v2, you are virtually guaranteed to get 2 copies of one version of X, and the other version would be deduped at the top level - totaling 3 copies.

@felixfbecker
Copy link

True. yarn could be made smart enough in that case to copy Xv1 from A to B and Xv2 from C to D (or if the option is set, hardlink them).

@bestander
Copy link
Member

Yeah, from implementation point of view it might require too many code structure compromises.
I think we can achieve the goals some other ways.
I propose closing this one.

@felixfbecker
Copy link

@bestander what would you propose?

@bestander
Copy link
Member

Each of those cases can be addressed with a workaround.

  1. Building a Docker image => just use a local cache folder that you clean after install
  2. Running Yarn in a CI context. => clear caches before run
  3. Running Yarn on a long-running server. => clear caches

@felixfbecker
Copy link

Unless I'm missing something, a long running server can't clean caches between runs, because many installations for different folders could run at any time in parallel. It doesn't seem like a good idea to clean the cache while a different installation is in the middle of the fetch or linking phase.

@bestander
Copy link
Member

Every run can create a cache folder in a random folder in /tmp

@felixfbecker
Copy link

That is correct and the workaround we are currently using. But this is a performance-critical application (which is also why we use yarn, not npm), and the linking phase takes up a considerate amount of time. If I understand it correctly, that is the time spent copying/linking packages from the cache to node_modules?

@bestander
Copy link
Member

Yeah, linking phase is copying files.
Maybe using move instead of copy + --link-duplicates could be a less disruptive change.

@felixfbecker
Copy link

Wouldn't move be subject to the same concurrency issues as cleaning the cache?

@bestander
Copy link
Member

If you use a unique cache folder for each install then no

@felixfbecker
Copy link

Oh, right. I wonder if move is significantly faster than hardlinking though?

@bestander
Copy link
Member

bestander commented Mar 23, 2017 via email

@felixfbecker
Copy link

Then adding an option to move would be great. It's unfortunate --link-duplicates can't be used for the time being because of yarnpkg/yarn#2734

@Diokuz
Copy link

Diokuz commented Jun 13, 2017

Any chances to have an options, which disables cache? I have a docker, and I have to run && yarn clear cache during build process.

@bestander
Copy link
Member

@Diokuz, it may be hard to implement at this moment.
I think it would be easier to introduce a robust hack around your use case instead of baking it into Yarn.

E.g. use a custom cache folder and remove it after install.

@brandonsturgeon
Copy link

brandonsturgeon commented Jul 20, 2017

My use case here is to make local development easier when working with a project which uses a local folder as a package.

eg:

// package.json
dependencies: {
    "my-dependency": "file:../my-dependency"
}

If I make a change in ../my-dependency, I have to rebuild, remove the yarn.lock file, and delete node_modules/my-dependency, and then re-run yarn install or yarn add file:../my-dependency.

Sometimes I get an older version of my-dependency becuase it's being stored in the cache, which means I have to run yarn cache clean.

Maybe there's a better way to deal with this that I'm not aware of, but it would be great to be able to run yarn install --no-cache or yarn add --no-cache file:../my-dependency to ensure that I get the latest version of my changes.

Just my two cents!

@bestander
Copy link
Member

@brandonsturgeon, your case is better handled by knit RFC #73 or link: specifier that is already available.

@nischayv
Copy link

+1 for this feature. Same reason as @brandonsturgeon

@izonder
Copy link

izonder commented Oct 9, 2017

+1 for the feature

One more case: I have a package.json

{
...
  "dependencies": {
    ... //only public npm packages
  },
  "devDependencies": {
    "pack": "git+ssh://git@gitlab.local/maintain/pack.git",
   ...
  }
  ...
}

yarn.lock contains the only one record with git+ssh:

"pack@git+ssh://git@gitlab.local/maintain/pack.git":
  version "1.0.0"
  resolved "git+ssh://git@gitlab.local/maintain/pack.git#8932fcdd220f4ae1d38a071ad8ffbcc552b45171"

Dockerfile contains:

...
RUN -x \
    && yarn install --production
...

But due to caching everything yarn requires git and openssh to be installed as well.

@SmilingNavern
Copy link

+1 for this feature. I would prefer to disable cache completely, it can improve speed of our CI builds in docker and this cache is irrelevant because we don't need it at all.

@herebebeasties
Copy link

Being able to do this via config (and therefore an env var) would be very beneficial for CI, not only for performance but also because of yarnpkg/yarn#683 (it's hard to mandate/educate that all the users of our CI server should run yarn with --mutex network, and that reduces concurrency anyway so is really sub-optimal). Being able to disable the cache entirely would effectively fix that issue.

@csvan
Copy link

csvan commented Jan 5, 2018

We also use Yarn in a Docker CI environment where cache is only overhead, since it is discarded as soon as the given build step is done. Being able to somehow disable cache would be greatly beneficial.

@bestander
Copy link
Member

bestander commented Jan 5, 2018 via email

@gaui
Copy link

gaui commented Jun 19, 2018

This is such a basic crucial feature. This makes me want to switch to npm again.

@kachkaev
Copy link

Here is how approach this in Dockerfile meanwhile (see example):

RUN YARN_CACHE_FOLDER=/dev/shm/yarn_cache yarn --production

/dev/shm is a special drive with a temporary filesystem, which means that cached modules will not be a part of the resulting image and it will become lighter.

The size of the Docker's shm drive is 64M by default, which is often not enough to fit all cached npm modules. So you'll likely need to use this flag:

docker build -t my-image --shm-size 128M .
docker build -t my-image --shm-size 256M .
docker build -t my-image --shm-size 512M .

That said, turning off yarn's cache completely would be a much better option inside CI/CD.

@dguo
Copy link
Contributor Author

dguo commented Jun 28, 2018

@bestander, do you have any high level suggestions for how someone might get started on implementing this?

@bestander
Copy link
Member

@dguo, it might be tricky, yarn's cache paths are deeply ingrained in fetching and linking phases.
At least yarn does not need to fetch .tar.gz files to read the dependency tree.

  1. I would experiment with a specialized fetcher that saves to node_modules in the correct path before trying to add this feature to the existing fetcher.

  2. I would consider some compromise.
    Maybe still use cache folder but make it local inside, e.g. node_modules/.cache, and then hard-link all packages from node_modules/.cache to appropriate node_modules/module-a folders.

@evasyuk
Copy link

evasyuk commented Apr 25, 2019

why it is not merged? it is does not have any collision with master

@arcanis
Copy link
Member

arcanis commented Apr 25, 2019

Because it's an rfc and it likely won't be implemented due to the internal design of the v1.

From the v2 onward there are no node_modules, the cache is the only artifact needed to run the project, so this RFC doesn't make sense there either.

@arcanis arcanis closed this Apr 25, 2019
@jplwood
Copy link

jplwood commented Jul 12, 2019

@arcanis would you mind linking to some docs/discussion describing the v2 differences for those of us casual users popping in here interested in having the no-cache feature? Just curious to read up a bit more so i understand where yarn is going -- thanks!

@arcanis
Copy link
Member

arcanis commented Jul 12, 2019

The v2 leverages PnP and zip loading to directly load the vendor files from within their archives (similar to phar archives in PHP, or asar in Electron). In this context the cache, albeit necessary, is the only artifact we generate. If you configure it to be stored within your project and don't version it, it will essentially be a "no cache Yarn".

@adrian-skybaker
Copy link

@oliviermartin
Copy link

I have a simple case why I would also need this 'no cache' option. Here is my react native project

my_new_react_component
├── ios
├── android
├── js
└── example   # My example application
                ├── ios
                ├── android
                └──  js

React Native does not support symlink in node_modules: facebook/metro#1. So for installing my component in my example application (which is a subdirectory of my component), I use yarn add everytime I need to sync my_new_react_component in example app.
And because example is a sub-directory it copies even the example itself, its generated files/binary.

Note: I have not checked why but calling multiple times yarn add ../ seems to consume more cache everytime - so the cache does not really work in this case.

An alternative would be a mechanism like .npmignore for Yarn for me to exclude example from my node module itself, but it does not seem to be supported: https://stackoverflow.com/questions/60088936/does-yarn-ignore-npmignore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet