Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Dual http https #156

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

elboulangero
Copy link
Contributor

No description provided.

+ bail out early if the mirror is not handled
+ use 'v' instead of 'm.mirrors[id]', since it's the same
Overview of the changes (best seen in mirrors/mirrors.go):
- Renamed fields:
  - Up -> HttpUp
  - ExcludeReason -> HttpDownReason
- New fields:
  - HttpsURL
  - HttpsUp
  - HttpsDownReason
- ExcludeReason still exists, but it will be set only during mirror selection

Before, HttpURL could be either a HTTP or HTTPS URL. And there was only
one state (up/down), for that URL.

Now, HttpURL is forced to be a HTTP URL, and we introduce a new field
HttpsURL for HTTPS URL. Meaning: it's now possible to declare that a
mirror supports both HTTP and HTTPS, hence the name "Dual" (for Dual
HTTP/HTTPS) for this patch series.

Consequently, we need to support:
- two states: the mirror might be up/down over HTTP, and it might be
  up/down over HTTPS
- two "down reason": for example, the mirror might be down due to 404
  over HTTP, and down due to an invalid TLS certificate over HTTPS

Hence:
- we inteoduce the new field HttpsURL
- we rename 'Up' to 'HttpUp', and introduce a new field 'HttpsUp'
- rename 'ExcludeReason' to 'HttpDownReason', and introduce a new
  variable 'HttpsDownReason'

This commit is a start:
- changes in data structures
- related code changes that are trivial
Add a flag -https to the list command, in order to print HTTPs URLs.
Now that a mirror can have two URLs (HTTP and HTTPS), it's a bit less
straightforward to decide if the mirror is up or down, and to display
the state.

A mirror is said to be "up" if:
- it has two URLs, and both are up
- it has only one URL, and it's up

In order to display the state of the mirror: we still use up and down if
ever the mirror is fully up or fully down. But in case it's up for one
URL, and down on the other, we introduce the new state:

    <http-state>/<https-state>

Hence a mirror might be reported as:

    up/down    <- up over HTTP, down over HTTPS
    down/up    <- down over HTTP, up over HTTPS
Add a new flag -https to specify the HTTPS URL.

When adding a mirror, either -http or -https (or both) must be passed,
otherwise mirrorbits fails.

Mirrorbits now checks for protocol mismatch, and fails if you pass a
HTTPS URL to -http, or a HTTP URL ot -https.

Finally, mirrorbits makes sure that the two URLs point to the same host.
This is because we rely on the mirror hostname to geolocate it, so if we
had two URLs pointing to two different hosts, it wouldn't work.
This commit adds validation of HttpURL and HttpsURL after a mirror has
been edited. It's the same validation as what was introduced in the
previous command, for the 'add' command.

In an ideal world, this code wouldn't be duplicated...
We pick a non-empty URL, favoring the HTTP URL if both are set, in order
to get the host name and then geolocate the mirror.

It makes the assumption that both HTTP and HTTPS URLs point to the same
host, which was addressed in previous commits (we check that in the add
and edit commands).
If both HTTP and HTTPS URLs are set, we favor HTTPS. Given that we'll
fetch a file and read its content, HTTPS is probably a better fit.
Add a IsUp() method to mirrors. This is the same as the IsUp() function
that was added a few commits ago in cli/commands.go, with the difference
that in cli/commands.go, IsUp() acts on a rpc.Mirror type, while here,
IsUp() is a method of mirrors.Mirror.

We use this function in daemon/monitor.go, to enqueue a health-check if
a mirror is not up.
Now that a mirror can have two URLs (HTTP and HTTPS), the health-check
needs to be reworked, and basically duplicated.

In monitor, we move most of the healthCheck() function in another
function healthCheckDo(). Very little remains in healthCheck(): it only
picks a random file, and then it calls healthCheckDo() for the HTTP URL
(if set), and then again for the HTTPS URL (if set).

It's important to use the same file for both health checks, so that if
ever one check fails and the other succeeds, we don't ask ourselves if
it's because of the file that was picked.

Then, doing both checks sequentially is not super elegant, but it's also
super easy to implement it this way, without having to rewrite too much
code, so let's live with that.

Now for an overview of the changes in healthCheckDo():
- we now include the protocol in the log output
- we need to pass the protocol to the functions MarkMirror{Up,Down}

Which leads us to the changes in mirrors/mirrors.go. We introduce a
Protocol enum, and we use it for MarkMirror{Up,Down} and SetMirrorState.

The function SetMirrorState becomes a little bit more tricky, and we
take this chance to check that the URL is set when user tries to set the
state to 'up'. This is because, obviously, if a mirror has no URL for a
protocol (HTTP or HTTPS), then the only legit state for this protocol is
'down'.

Finally, we also update mirrors/logs.go to include the protocol in the
LogStateChanged logs. Going forward, the lines:

    Mirror is up/down ...

will be replaced with

    HTTP mirror is up/down ...

and

    HTTPS mirror is up/down ...

For old logs already in the database, the protocol is not stored, and
that will be translated to a value of 0 aka. UNDEFINED, which will be
logged as 'Mirror is up/down ...' (that is: as before).
The method IsHTTPS is trivial and it's only used in http/selection.go,
in the function Selection(). This function will be refactored in the
next commits, and the method IsHTTPS() won't be needed anymore.

To make the changes easier to follow, let's start by removing the
IsHTTPS() function now.
This is a subtle change of no consequence in the current version of the
code, but that is needed for the next commit.

The loop here use copies, as `mslist` is a slice of mirrors (and not of
mirror pointers), hence `for i, m := range mlist` creates copies. In
other words, `m` is a copy of `mlist[i]`.

That brings a question: when we save the mirror in one of the result
slices (`mlist` for a mirror that is selected, and `excluded` for a
mirror that is excluded), what should we store in the slice? The
original `mlist[i]`, or the (potentially modified) copy `m`?

As we can see, in the `excluded` slice, we clearly use `m`, in the line:

    excluded = append(excluded, m)

This is needed, as `m` is modified to include the ExcludeReason, for
example in `m.ExcludeReason = "Invalid URL"`.

However, when assigning to `mlist`, we use the original (unmodified)
mirror:

    mlist[safeIndex] = mlist[i]

Is there a good reason to use the original, rather than the copy? After
a careful read of the code, we can see that it doesn't matter: when a
mirror is selected (hence added to `mlist`), it has not been modified.
So, at this point, `mlist[safeIndex] = mlist[i]` is equivalent to
`mlist[safeIndex] = m`.

So why change it? This in preparation for the next commit, where we'll
introduce the field `m.SelectedProtocol`. This new field will be set for
mirrors that are selected. So, starting next commit, we'll modify the
copy of the mirror in both cases (selected or excluded), so we must save
the copy (rather than the orig) in both cases as well.

We do the change now, as it's easier to review.
@jbkempf
Copy link
Collaborator

jbkempf commented Nov 17, 2023

Thanks a lot for this massive change.

We'll try and review that soon.

In this commit, we introduce a SelectedProtocol field in the mirror
struct.  The best way to look at it is to look at mirrors/mirrors.go and
at http/pagerenderer.go.

It is needed for the case where neither HTTP nor HTTPS is forced (ie.
the incoming request was over HTTP, so mirrorbits will happily redirect
to either HTTP or HTTPS, depending on what's available). If ever a
mirror provides a HttpURL AND a HttpsURL, we need to remember which URL
was selected, so that the code in http/pagerenderer.go is able to
correctly build the redirections.

Then the second part of this commit is the refactoring in
http/selection.go, to take into account the fact that a mirror can now
have a HTTP and/or a HTTPS URL. This is rather dumb code, easy to read.
As a reminder, mirrorbits behaviour is as such:
- HTTP request : mirrorbits redirects to either HTTP or HTTPS, depending
  on what's available
- HTTPS request: mirrorbits redirects to HTTPS

Before this commit, the fallback URLs were used "as is". This was
actually not perfect, for example if your fallback mirror was defined as
http://foobar.org, then mirrorbits would happily redirect HTTPS requests
to a HTTP URL.

Of course, the right thing to do is to specify only HTTPS fallback
mirrors, but it might not be obvious for everyone, especially since the
sample mirrorbits.conf doesn't mention that, and gives a HTTP fallback
mirrors (http://fallback1.mirror/repo/) as an example.

With this commit, we change that a bit.

First, in the sample mirrorbits.conf, we mention that fallback mirrors
should support both HTTP and HTTPS (which, I think, is a reasonable
requirement).

Then, in the code, we now make the fallback selection a bit smarter,
possibly ignoring the protocol that was set by user in the conf:
- if WITHTLS: force HTTPS
- if WITHOUTLS: force HTTP
- otherwise: use the URL "as is"

This logic for the protocol selection is what's used already when
selecting mirrors, so basically this commit brings fallback mirrors in
lines with mirror selection.
We make use of the variable mirrors.Mirror.SelectedProtocol that was
introduced in the previous commit. This variable indicates which
protocol was selected for redirection, for a given mirror.

The logic to get the right URL in the template is a bit tricky, it's a
best effort to pick a non-empty URL, and preferably the right one.
Replace the `operational` boolean by a `state` string that can be up,
degraded or down. degraded means that the mirror is up over a protocol,
and down over the other.
This upgrade only changes the mirrors, ie. the MIRRORS key in the redis
database. It does the following:

- always: add new field `https`
- optionally: add new field `httpsUp` (if HTTPS URL and `up` exists)
- optionally: add new field `httpsDownReason` (if HTTPS URL and
  `excludeReason` exists)
- optionally: rename `up` -> `httpUp` (if HTTP URL and `up` exists)
- optionally: rename `excludeReason` -> `httpDownReason` (if HTTP URL
  and `excludeReason` exists)

In case it's not clear: the keys `up` and `excludeReason` are optional,
they might not exists in the database (in case a mirror was never
scanned or enabled). Hence we create their counterparts `https?Up` and
`https?DownReason` only if needed.
@elboulangero
Copy link
Contributor Author

There was an issue with redirections to fallback mirrors, this PR basically broke it. I just force-pushed the changes needed to fix it. The two commits

have been reworked. Thanks

@jbkempf
Copy link
Collaborator

jbkempf commented Apr 8, 2024

This MR needs manual rebase.

@elboulangero elboulangero changed the title GitHub/dual http https Draft: GitHub/dual http https Apr 8, 2024
@elboulangero elboulangero changed the title Draft: GitHub/dual http https Draft: Dual http https Apr 8, 2024
@elboulangero
Copy link
Contributor Author

With the time I though about another approach to the problem, which I think would result in 1) less code changes 2) easier to review 3) maybe more useful to others in general.

So I mark this one as draft, and I'll do my best to propose a new MR in the coming weeks. Hopefully I can get it done within 2 months, to give an estimation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants