Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(proposal): Build init Log instead of magic "init" Step #789

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

cognifloyd
Copy link
Member

@cognifloyd cognifloyd commented Apr 21, 2023

Key Value
Author(s) @cognifloyd
Reviewers @jbrockopp, @plyr4
Date April 21st, 2023
Status Reviewed

Status:

  • Reviewed (Reviewers added to the proposal)
  • Approved (Status updated in proposal and proposal merged)
  • Implemented
    • Types
    • Server
    • sdk-go
    • CLI
    • Worker
      • Build log to new endpoints and deprecate magic "init" string (in one release)
      • Remove special casing around magic "init" string (in another release)
    • UI

This is a successor to:

@cognifloyd cognifloyd self-assigned this Apr 21, 2023
@cognifloyd cognifloyd requested a review from a team as a code owner April 21, 2023 20:32
@cognifloyd cognifloyd changed the title Add proposal for build init Log instead of magic "init" Step feat(proposal): Build init Log instead of magic "init" Step Apr 21, 2023
@cognifloyd cognifloyd added area/api Indicates a change to the API area/cli Indicates a change to the CLI area/server Indicates a change to the server area/ui Indicates a change to the UI area/worker Indicates a change to the worker area/compiler Indicates a change to the compiler enhancement Indicates an improvement to a feature proposal Indicates a suggested feature or improvement labels Apr 21, 2023
@chrispdriscoll
Copy link
Contributor

Target team revisited this proposal during our biweekly review. The team needs more time to dig into the details. Look for more conversations and feedback in the upcoming week.


Sqlite sorts with `NULLS FIRST` by default, and Postgres sorts with `NULLS LAST` (see: [How Are NULLs Sorted by Default?](https://learnsql.com/blog/how-to-order-rows-with-nulls/)).
So this query is inconsistent between databases. And, the service logs returned by this query are not sorted. It would be nice to have this method return the logs sorted in a consistent manner.
I would like to see the Build Log, then Step Logs sorted by step_id, then Service Logs sorted by service_id. Any suggestions on how to do that with gorm?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if GORM supports multiple ORDER BY statements? i.e.

	err = e.client.
		Table(constants.TableLog).
		Where("build_id = ?", b.GetID()).
		Order("step_id ASC").
		Order("service_id ASC").

I could see a scenario where instead we have to use code like:

	err = e.client.
		Table(constants.TableLog).
		Where("build_id = ?", b.GetID()).
		Order("step_id ASC, service_id ASC").

Based off my brief reviews, I'm seeing docs that suggest both Postgres and Sqlite support this:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://gorm.io/docs/query.html#Order

Yep it supports multiple order statements.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this? I believe this will sort the logs as [init, steps..., services...] and the sort order will be consistent for both sqlite and postgres.

	err = e.client.
		Table(constants.TableLog).
		Where("build_id = ?", b.GetID()).
		Order("(step_id IS NULL AND service_id IS NULL) DESC"). // the build init log
		Order("step_id ASC NULLS LAST").
		Order("service_id ASC").

If that proves problematic, then we could add a computed column:

type Log struct {
	ID        sql.NullInt64 `sql:"id"`
	BuildID   sql.NullInt64 `sql:"build_id"`
	RepoID    sql.NullInt64 `sql:"repo_id"`
	ServiceID sql.NullInt64 `sql:"service_id"`
	StepID    sql.NullInt64 `sql:"step_id"`
	Data      []byte        `sql:"data"`
	IsInit    sql.Bool      `sql:"is_init"    gorm:"->"`
}
CREATE TABLE
IF NOT EXISTS
logs (
	id            SERIAL PRIMARY KEY,
	build_id      INTEGER,
	repo_id       INTEGER,
	service_id    INTEGER,
	step_id       INTEGER,
	data          BYTEA,
	is_init       BOOLEAN GENERATED ALWAYS AS (step_id IS NULL AND service_id IS NULL) STORED,
	UNIQUE(step_id),
	UNIQUE(service_id)
);

which works in both postgres and sqlite. Postgres only supports STORED, but in sqlite a STORED column can't be added to an existing table, so the sqlite migration would have to add the column as VIRTUAL instead.

That would lead to this query:

	err = e.client.
		Table(constants.TableLog).
		Where("build_id = ?", b.GetID()).
		Order("is_init DESC").
		Order("step_id ASC NULLS LAST").
		Order("service_id ASC").

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further thought, I think we do need an is_init boolean column. Instead of delegating that calculation to the database, we'll just calculate it when creating the objects, that way it will be accessible in log.Validate.

So:

CREATE TABLE
IF NOT EXISTS
logs (
	id            SERIAL PRIMARY KEY,
	build_id      INTEGER,
	repo_id       INTEGER,
	service_id    INTEGER,
	step_id       INTEGER,
	is_init       BOOLEAN,
	data          BYTEA,
	UNIQUE(step_id),
	UNIQUE(service_id),
	CHECK(is_init IS NOT FALSE),
	UNIQUE(build_id, is_init) -- only one init log per build
);

Then log.Validate will ensure that one of is_init, step_id, and service_id are not NULL.
And we'll only ever set is_init to TRUE if both step_id and service_id are NULL. And, we never set is_init to FALSE to allow the composite unique constraint to ensure that each build has only one init log.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. I just asked someone on the DB team at work, and he said this should work:

CREATE TABLE
IF NOT EXISTS
logs (
	id            SERIAL PRIMARY KEY,
	build_id      INTEGER,
	repo_id       INTEGER,
	service_id    INTEGER,
	step_id       INTEGER,
	data          BYTEA,
	UNIQUE(build_id, step_id, service_id)
);

Though, that might require a newer version of Postgres (as it might need NULLS NOT DISTINCT.

I think we might be able to use a partial index as well (which I didn't know existed):

CREATE TABLE
IF NOT EXISTS
logs (
	id            SERIAL PRIMARY KEY,
	build_id      INTEGER, -- maybe add NOT NULL constraint
	repo_id       INTEGER,
	service_id    INTEGER,
	step_id       INTEGER,
	data          BYTEA,
	UNIQUE(step_id),
	UNIQUE(service_id)
);

CREATE UNIQUE INDEX
	IF NOT EXISTS
	logs_build_init_log
ON
	logs (build_id)
WHERE
	step_id IS NULL AND service_id IS NULL;

Comment on lines 197 to 202
Also, when upgrading, we need to ensure that:
- the queue is empty, or
- all queued builds get re-compiled (or more particularly `item.Pipeline` which is a `types.pipeline.Build` and includes the injected init step) before execution in the worker starts.

To make the worker backwards compatible, could we trigger the rebuild in `server.queue.redis.Pop()` after the item is created?
Or perhaps we'll just need to include queued build migrations in the migration script.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does your company have a requirement that prevents you from temporarily bringing the server offline?

Asking because if you bring the API down, any running builds on the workers should fail rather quickly.

Then, you could upgrade both the server and workers at the same time and bring them back online.

Assuming you're running Docker containers, hopefully the amount of time to upgrade is relatively short.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the proposal, I tried to think through any backwards compatibility gotchas.

No policy prevents me from bringing them down.

I generally strive for 0-downtime upgrades as we have developers across many time zones (India, EU, US) which means, for most DevOps services, downtime always impacts someone. I am running Vela in Kubernetes: so I do any migrations to add new tables/columns, and then do rolling restarts that gradually start pods with the new version and then stop the old pods once enough of the pods are stable. However, the Vela install is still fairly "PoC" stage, so just shutting it down will have very little impact.

So, I guess the upgrade procedure would be something like:

  • scale server deployment down to 0
  • purge the queue or wait till the workers have pulled all builds from the queue and failed
  • scale worker deployment down to 0
  • run database migrations
  • run helm upgrade to update the deployments to the new version of Vela

I'm just not completely sure on how to purge the queue or make sure it is empty. So, the upgrade guide should have a note about that, even if it is not part of the migration script.

jbrockopp
jbrockopp previously approved these changes May 11, 2023
Copy link
Contributor

@jbrockopp jbrockopp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@plyr4
Copy link
Contributor

plyr4 commented May 16, 2023

I need someone familiar eith elm to handle the UI.

I can help out with this.
I think it would be great to organize build init separately from step logs so that its more obvious to the user.
spitballing ideas, but perhaps we build an Init tab (akin to Services), or a collapsible <details> element that is attached to the BuildPreview block?

ideally whatever solution we go with should be backwards compatible, and i'd prefer to have old and new builds look similar to avoid confusion. i can mock up some ideas

plyr4
plyr4 previously approved these changes May 16, 2023
@cognifloyd cognifloyd dismissed stale reviews from plyr4 and jbrockopp via 6e482ac May 16, 2023 23:51
this pre-supposes that it will be merged once approved.
Copy link
Contributor

@jbrockopp jbrockopp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jbrockopp jbrockopp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - approving a second time because my Trailer says I haven't reviewed yet? 🤷

@chrispdriscoll
Copy link
Contributor

@cognifloyd We're laying out the upcoming roadmap and wanted to take this into consideration. It's been a little time since we last discussed. Is this still something you want? If so, maybe recap where we're at.

@chrispdriscoll
Copy link
Contributor

@cognifloyd We haven't heard anything on this and are rapidly approaching the v0.21 release. We'll move you to v0.22 for October. Please let us know if you think October is doable and/or if you want this pushed out further.

@chrispdriscoll
Copy link
Contributor

@cognifloyd Just posting an update during our monthly review of public PRs. During our last Committers Sync you mentioned other top priorities so this didn't have your attention. We're tentatively looking at an early November release for 0.22 so let us know if you think this might fit in that release. If not, we always have 0.23.

@chrispdriscoll
Copy link
Contributor

@cognifloyd With v0.22 rc1 available, the next opportunity for this enhancement will be v0.23 (January 2024). Please let us know if we should expect these updates in that release.

@cognifloyd
Copy link
Member Author

Probably not. Still digging through a backlog of things I need to do.

@chrispdriscoll
Copy link
Contributor

@cognifloyd It's been a couple months. Is this still a feature you're working on? If so, maybe share an updated timeline for when we should expect to see an update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates a change to the API area/cli Indicates a change to the CLI area/compiler Indicates a change to the compiler area/server Indicates a change to the server area/ui Indicates a change to the UI area/worker Indicates a change to the worker enhancement Indicates an improvement to a feature proposal Indicates a suggested feature or improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants