Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add dns cache #2440 #2552

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

antoinegomez
Copy link

@antoinegomez antoinegomez commented Dec 28, 2023

This relates to #2440

Rationale

Implement a dns cache lookup feature as requested.

Copy code from https://github.com/szmarczak/cacheable-lookup/blob/master/source/index.mjs

Changes

  • add lookup in core/connect.js

Features

  • dns cache lookup

Bug Fixes

N/A

Breaking Changes and Deprecations

N/A

Status

@antoinegomez
Copy link
Author

Note that this is just an initial draft, anyone can jump in and use this as a base or if in wrong direction simply discard it.

  • if ipv6 is available on host and returned by lookup use this
  • this is a quick copy of the mentioned cacheable-lookup module and it's clearly not finished
  • the difficulty is to know how to set the options for this module and pass them down to the core/connect method

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would create a new class DNSResolver and add a dnsResolver option to pool and Agent. An Agent or pool would allocate one if it's not passed in as an option. The Agent would pass down its cache to all created pools.

@antoinegomez
Copy link
Author

Benchmarks:

Linux KDE Neon
AMD Ryzen 5 5600X 6-Core
4 x 8GB RAM

main branch 9b9ca60

│ Tests               │ Samples │          Result │ Tolerance │ Difference with slowest │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ http - no keepalive │       1 │ 1962.85 req/sec │  ± 0.00 % │                       - │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ http - keepalive    │       1 │ 2387.90 req/sec │  ± 0.00 % │               + 21.65 % │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ undici - pipeline   │       1 │ 3035.14 req/sec │  ± 0.00 % │               + 54.63 % │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ undici - request    │       1 │ 3160.77 req/sec │  ± 0.00 % │               + 61.03 % │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ undici - stream     │       1 │ 3807.23 req/sec │  ± 0.00 % │               + 93.96 % │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ undici - dispatch   │       1 │ 4833.26 req/sec │  ± 0.00 % │              + 146.24 % │

Feature branch:

│ Tests               │ Samples │          Result │ Tolerance │ Difference with slowest │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ http - no keepalive │       1 │ 1824.48 req/sec │  ± 0.00 % │                       - │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ http - keepalive    │       1 │ 2483.32 req/sec │  ± 0.00 % │               + 36.11 % │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ undici - request    │       1 │ 3087.63 req/sec │  ± 0.00 % │               + 69.23 % │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ undici - pipeline   │       1 │ 3289.09 req/sec │  ± 0.00 % │               + 80.28 % │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ undici - stream     │       1 │ 3935.02 req/sec │  ± 0.00 % │              + 115.68 % │
|─────────────────────|─────────|─────────────────|───────────|─────────────────────────|
│ undici - dispatch   │       1 │ 5294.26 req/sec │  ± 0.00 % │              + 190.18 % │

Comment on lines 269 to 271
if (entries[0].status === 'rejected' && entries[1].status === 'rejected') {
throw entries[0].reason
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only throw the first error? Wouldn't it be better to throw an AggregateError that contains both?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing that, using AggregateError and modifying tests

@@ -0,0 +1,443 @@
// source: https://raw.githubusercontent.com/szmarczak/cacheable-lookup/9e60c9f6e74a003692aec68f3ddad93afe613b8f/source/index.mjs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please include the license at the top of the file

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes no probs

test/dns-resolver.js Show resolved Hide resolved
)
cacheError.cause = error

throw cacheError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this kind of error handling is correct. Error situations should auto heal when possible

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it is because it's possible to specify a custom cache object, if the set method throws it is considered as a fatal and unrecoverable error.

Using the default cache which is a simple Map() this would not happen.

Also happy to remove any form of custom caching, that's was part of the copied module.

const { _iface } = this
cached = cached.filter((entry) =>
entry.family === 6 ? _iface.has6 : _iface.has4
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid the use of filter, map etc, they are an order of magnitude (or more) slower than using normal iteration.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I translated the code with loops, no optimizations yet. To have the code passing all tests. Made the function less readable. Also note that here it will be common to have one or two items returns and I am yet to see anyone using more than 4-5 items per record.
Agreeing to keep an eye on the perf side.

@antoinegomez
Copy link
Author

Update:

  • As mentioned above, happy to refactor a bit more and to remove some functionalities like the custom caching if we want to slim down the code and responsibilities. in the end people could easily use cacheable-lookup module in Undici if they want
  • Added a file global-dns-resolver to get/set the global instance of the resolver, as my guess is that the default behavior we want is to cache across all requests
  • There is no option/method to disable the DNS cache globally easily
    • for now you have to create and set a new global Agent with the option dnsResolver.disable: true
  • By default and for the first release should we not enable it by default to test? This kind of feature could break people production if not tested more carefully. DNS sometimes can behaves in unexpected ways.

@mcollina
Copy link
Member

I don't think there should be a global dns resolver. We already have a global agent, which is enough to handle this use case.

@codecov-commenter
Copy link

codecov-commenter commented Dec 30, 2023

Codecov Report

Attention: 8 lines in your changes are missing coverage. Please review.

Comparison is base (e39a632) 85.54% compared to head (8ee39e8) 85.56%.
Report is 311 commits behind head on main.

Files Patch % Lines
lib/dns-resolver.js 96.71% 7 Missing ⚠️
lib/agent.js 85.71% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2552      +/-   ##
==========================================
+ Coverage   85.54%   85.56%   +0.02%     
==========================================
  Files          76       85       +9     
  Lines        6858     7810     +952     
==========================================
+ Hits         5867     6683     +816     
- Misses        991     1127     +136     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

lib/agent.js Outdated Show resolved Hide resolved
lib/agent.js Outdated Show resolved Hide resolved
lib/dns-resolver.js Outdated Show resolved Hide resolved
lib/dns-resolver.js Show resolved Hide resolved
lib/dns-resolver.js Outdated Show resolved Hide resolved
Copy link
Member

@metcoder95 metcoder95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add tests that covers the usage within the Agent/Pool?

lib/agent.js Outdated Show resolved Hide resolved
Copy link
Member

@KhafraDev KhafraDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot with my other review but this does need types as well.

Comment on lines 182 to 184
get _cache () {
return this.#cache
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the tests, it looks like this is only needed to check how many entries are in the cache. Would it make sense to only expose the number of cached entries, rather than the cache itself? If not, changing these to a symbol property to deter users from using this would be better.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes agree

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 186 to 188
get _hostnamesToFallback () {
return this.#hostnamesToFallback
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

lib/agent.js Outdated Show resolved Hide resolved
lib/pool.js Outdated Show resolved Hide resolved
lib/dns-resolver.js Outdated Show resolved Hide resolved
antoinegomez and others added 3 commits January 10, 2024 09:54
Co-authored-by: Carlos Fuentes <me@metcoder.dev>
Co-authored-by: Carlos Fuentes <me@metcoder.dev>
Co-authored-by: Carlos Fuentes <me@metcoder.dev>
@antoinegomez
Copy link
Author

Can we add tests that covers the usage within the Agent/Pool?

I did add two tests, one that will check that the cache is incrementing when using the DNS resolver and one that check that no cache is being used if disabled.

Since Agent is not exposing the DNSResolver it is not possible to get it and check the cache, but I will have a look for a solution to see if I can intercept/mock it.

@mcollina
Copy link
Member

linting is failing

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to implement some kind of solution of "Happy Eyeballs" similar to what Node Core did.

The more I look at this, the more I'm thinking this should live in Node.js core and not here, but let's chat about it.

README.md Outdated
* **hints** [`getaddrinfo flags`](https://nodejs.org/api/dns.html#supported-getaddrinfo-flags)
* **all** `Boolean` - Default: `false`

* **roundRobinStrategy** `'first' | 'random'` - Default: `'first'`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be called scheduling. What does first vs random do?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

if (this.roundRobinStrategy === 'first') {
return cached[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we do not rotate the cache, then it's not round robin.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes true, I will think of a way to rotate the cache

@antoinegomez
Copy link
Author

I forgot with my other review but this does need types as well.

first time I add types in non typescript project, is that ok?

https://github.com/nodejs/undici/pull/2552/files#diff-6704654b8ba0b45cd2cc752d380e7c9f70c6c8aac678524044da44446db6c976

index.js Outdated
@@ -165,3 +165,6 @@ module.exports.MockClient = MockClient
module.exports.MockPool = MockPool
module.exports.MockAgent = MockAgent
module.exports.mockErrors = mockErrors

const DNSResolver = require('./lib/dns-resolver')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to the top

@mcollina
Copy link
Member

cc @ShogunPanda

Comment on lines +182 to +184
get [kDnsCacheSize] () {
return this.#cache.size ?? 0
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KhafraDev can you verify its what you were seeking for?

return cached
}

if (this.#scheduling === 'first') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering, what are the intentions with the scheduling option?
Its not added within cacheable-lookup as it already caches the records

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with first it is the same behavior as cacheable-lookup and returns the first cached entry from cache.

With random it will return a random item from the cache.

This is a way to achieve some sort of round robin without to much complexity.

Copy link
Contributor

@ShogunPanda ShogunPanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions, but not blockers.
LGTM!

lib/agent.js Outdated
@@ -22,6 +23,18 @@ function defaultFactory (origin, opts) {
: new Pool(origin, opts)
}

function getDNSResolver (options) {
if (options?.dnsResolverOptions?.disable) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be a better DX to have { dnsResolverOptions: false } instead to completely disable it.
Not a blocker, just a thought.

types/agent.d.ts Outdated
@@ -22,6 +23,8 @@ declare namespace Agent {
maxRedirections?: number;

interceptors?: { Agent?: readonly Dispatcher.DispatchInterceptor[] } & Pool.Options["interceptors"]

dnsResolverOptions: DNSResolver.Options;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, what about having just dnsResolver here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was asked before to rename the options to dnsResolverOptions and have a separate one called dnsResolver to specify a custom DNSResolver class.

I changed the disable check on dnsResolver instead of putting it in the options.

So to disable it will be:

new Agent({
  dnsResolver: false,
})

@ShogunPanda
Copy link
Contributor

In regard of Happy Eyeballs, let me clarify our options here.

From what I see now, with this new feature undici will attempt the records in the order node (and thus the DNS server) resolves them). This is a good thing, at least as default.

The next step would be to implement Happy Eyeballs. I would add a specific implementation in undici since we can be explicit and control the sockets we open and thus we can implement the full Happy Eyeballs specification. In other words, in Node I was forced to implement a serialized version of the algorithm (not to break too many things), here we don't have to.

If we do this, we have two options:

  1. Implement Happy Eyeballs on the Dispatcher class.
  2. Implement Happy Eyeballs on the DNSResolver class. Maybe this is too much for it, but it would open the possibility to return to the caller an IP which we already verified via Happy Eyeballs. And/or, potentially, to return the socket we were already able to open.

@mcollina
Copy link
Member

I think we would need happy eyeballs to enable this by default.

@metcoder95
Copy link
Member

The next step would be to implement Happy Eyeballs. I would add a specific implementation in undici since we can be explicit and control the sockets we open and thus we can implement the full Happy Eyeballs specification. In other words, in Node I was forced to implement a serialized version of the algorithm (not to break too many things), here we don't have to.

We should also not forget to add an option to disable it if required

  1. Implement Happy Eyeballs on the Dispatcher class.

I'd like to see if we can keep the Dispatcher class lean, and leverage all this logic to the DNSResolver class, so we can also enable users to customize the behaviour as they need.

  1. Implement Happy Eyeballs on the DNSResolver class. Maybe this is too much for it, but it would open the possibility to return to the caller an IP which we already verified via Happy Eyeballs. And/or, potentially, to return the socket we were already able to open.

Like this pretty much, do you think this can enable features such as #2515?

@ShogunPanda
Copy link
Contributor

The next step would be to implement Happy Eyeballs. I would add a specific implementation in undici since we can be explicit and control the sockets we open and thus we can implement the full Happy Eyeballs specification. In other words, in Node I was forced to implement a serialized version of the algorithm (not to break too many things), here we don't have to.

We should also not forget to add an option to disable it if required

Absolutely yes!

  1. Implement Happy Eyeballs on the Dispatcher class.

I'd like to see if we can keep the Dispatcher class lean, and leverage all this logic to the DNSResolver class, so we can also enable users to customize the behaviour as they need.

I'm fine with it. Maybe a separate class used by the the DNSResolver.

  1. Implement Happy Eyeballs on the DNSResolver class. Maybe this is too much for it, but it would open the possibility to return to the caller an IP which we already verified via Happy Eyeballs. And/or, potentially, to return the socket we were already able to open.

Like this pretty much, do you think this can enable features such as #2515?

Probably yes, and I would love it!

types/pool.d.ts Outdated
interceptors?: { Pool?: readonly Dispatcher.DispatchInterceptor[] } & Client.Options["interceptors"]
interceptors?: { Pool?: readonly Dispatcher.DispatchInterceptor[] } & Client.Options["interceptors"];

dnsResolver?: typeof DNSResolver;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add tests for this types

@antoinegomez
Copy link
Author

antoinegomez commented Jan 22, 2024

The next step would be to implement Happy Eyeballs. I would add a specific implementation in undici since we can be explicit and control the sockets we open and thus we can implement the full Happy Eyeballs specification. In other words, in Node I was forced to implement a serialized version of the algorithm (not to break too many things), here we don't have to.

We should also not forget to add an option to disable it if required

Absolutely yes!

  1. Implement Happy Eyeballs on the Dispatcher class.

I'd like to see if we can keep the Dispatcher class lean, and leverage all this logic to the DNSResolver class, so we can also enable users to customize the behaviour as they need.

I'm fine with it. Maybe a separate class used by the the DNSResolver.

  1. Implement Happy Eyeballs on the DNSResolver class. Maybe this is too much for it, but it would open the possibility to return to the caller an IP which we already verified via Happy Eyeballs. And/or, potentially, to return the socket we were already able to open.

Like this pretty much, do you think this can enable features such as #2515?

Probably yes, and I would love it!

I added some tests for types, tried to resolve your comments.

For the Happy Eyeballs I would be glad to do it too if that's ok. Any source or link to existing material?

Since I changed the behavior for this to be disabled by default it can be done in another PR.

@metcoder95
Copy link
Member

Yeah, another PR will be good. So far I believe this is the implementation: https://github.com/ShogunPanda/node/blob/ffb326c583c6a23bc94ff4989fe8a06edb92b011/lib/net.js#L1401

and this is the standard that is based on: https://datatracker.ietf.org/doc/html/rfc8305

Are those resources all right @ShogunPanda?

As pointed out, we should seek to stick to the spec as much as we can as we have better ergonomics here to do so.

@ShogunPanda
Copy link
Contributor

@metcoder95 Yes, all good links sir. :)

@antoinegomez
Copy link
Author

antoinegomez commented Feb 7, 2024

Hello @ShogunPanda, I have read the RFC and also the code linked.

Some thoughts.

For now this PR is creating a custom lookup function to cache dns queries.
So in my understanding if happy eye balls is already implemented in node/net module then it seems redundant to implement it again here.

By default the options for lookup are { family: undefined, hints: 32 }
And in node/net the happy eye balls happens when: if (dnsopts.family !== 4 && dnsopts.family !== 6 && !localAddress && autoSelectFamily).
I can see that autoSelectFamily is undefined by default in undici.

In the RFC it suggest that we should not wait for all DNS queries to resolve to start connecting. Starting to query ipv6 then ipv4. Would be a bit more complex to make it work.

I also understand that we could implement the happy eye balls in undici using a similar approach than node/net module but since we are relying on it I am wondering if it is worth it.

@ShogunPanda
Copy link
Contributor

In Node.js it is going to use Happy Eyeballs only if using a hostname when doing net.connect, which is not the case in undici since we're using a DNS cache.
We don't have to implement Happy Eyeballs exactly as the RFC mandates, for instance not even Node is doing it.

@osuka
Copy link

osuka commented Mar 25, 2024

Great to see work on this, we are very interested in the possibility of DNS caching as in kubernetes in particular the DNS lookups become a big issue at scale (even with local caching)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants