New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osbuild-worker: improve error "reason" in case of stage failures (HMS-1442) #4113
base: main
Are you sure you want to change the base?
Conversation
eaa6ac2
to
d846b4d
Compare
without the ".Error()" the json serialization fails
osbuildJobResult.JobError = clienterrors.WorkerClientError(clienterrors.ErrorBuildJob, "osbuild build failed", osbErrors) | ||
|
||
reason := "osbuild build failed" | ||
if len(failedStages) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a bit awkward to be pluralizing things manually :) You could just use in stage(s):
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this ends up in the frontend at the customers - I thought it nicer there...
although I really expected that there is a simple library for this (I'm not the golang guy :-/ )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fwiw, this is usually handled via something like https://www.gnu.org/software/gettext/manual/html_node/Plural-forms.html as plural forms are complex in different languages. Not sure if we support i18n in the frontend (yet) but even if we don't preparing for it and getting "free" plural handling might be worth it (or not, idk). The topic of what gettext (if we go this route) to use in go is a bit more complex as there is no clear winner (at least last I looked).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also prefer a library (not sure which one, though) but I got the feeling we only support English for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can there be actually more than one failed stage? I don't think so... 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure if there will be support for running parallel stages soonish - so I'd rather support it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, in that case up to you. I was just looking at the current implementation, which can't return failures in more than one stage.
@@ -401,7 +401,7 @@ func (impl *OSBuildJobImpl) Run(job worker.Job) error { | |||
|
|||
osbuildVersion, err := osbuild.OSBuildVersion() | |||
if err != nil { | |||
osbuildJobResult.JobError = clienterrors.WorkerClientError(clienterrors.ErrorBuildJob, "Error getting osbuild binary version", err) | |||
osbuildJobResult.JobError = clienterrors.WorkerClientError(clienterrors.ErrorBuildJob, "Error getting osbuild binary version", err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw. these lines (this one and 519) are maybe a fix for some cases where the details have not been propagated...
@@ -534,6 +535,7 @@ func (impl *OSBuildJobImpl) Run(job worker.Job) error { | |||
logWithId.Infof(" %s success", stageResult.Type) | |||
} else { | |||
logWithId.Infof(" %s failure:", stageResult.Type) | |||
failedStages = append(failedStages, stageResult.Type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the information about the failed stage name should be instead taken from the osbuildJobResult.OSBuildOutput.Error
, which will contain it if the failure is caused by a stage failure.
BTW, this also means that all of this is already part of the osbErrors
added as a detail to JobError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably it still would make sense to add in the the reason (additionally) as it's shown more prominently in the UI - but I'll check to use osbuildJobResult.OSBuildOutput.Error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thozza at that level I get osbuildJobResult.OSBuildOutput.Error
only as json.RawMessage
not sure if I should try to guess the structure of it - or where could I get the type of the structure for sure? (couldn't find something like the ClientErrorCode
or similar in that context)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that a custom unmarshaler would be needed, which would check the value of the "type"
property and unmarshal to the appropriate struct. We never implemented proper osbuild errors unmarshaling, because it was not needed. Until now.
I agree with putting more information into the "reason" of the error 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thozza ... and I thought that the type is not very strict so it would be more some kind of guessing and trying to cover all types of structs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We control what osbuild returns and it currently returns exactly one type of a struct in case of the stage failure. IMHO, it makes sense to standardize this in osbuild if it is not nice and easy to interpret in osbuild-composer (or any other place that consumes osbuild's JSON output).
We do this type of unmarshaling already in multiple places, e.g.:
osbuild-composer/internal/weldr/upload.go
Lines 133 to 175 in 8dece19
func (u *uploadRequest) UnmarshalJSON(data []byte) error { | |
var rawUploadRequest rawUploadRequest | |
err := json.Unmarshal(data, &rawUploadRequest) | |
if err != nil { | |
return err | |
} | |
var settings uploadSettings | |
switch rawUploadRequest.Provider { | |
case "azure": | |
settings = new(azureUploadSettings) | |
case "aws": | |
settings = new(awsUploadSettings) | |
case "aws.s3": | |
settings = new(awsS3UploadSettings) | |
case "gcp": | |
settings = new(gcpUploadSettings) | |
case "vmware": | |
settings = new(vmwareUploadSettings) | |
case "oci": | |
settings = new(ociUploadSettings) | |
case "generic.s3": | |
// While the API still accepts provider type "generic.s3", the request is handled | |
// in the same way as for a request with provider type "aws.s3" | |
settings = new(awsS3UploadSettings) | |
case "container": | |
settings = new(containerUploadSettings) | |
case "pulp.ostree": | |
settings = new(pulpOSTreeUploadSettings) | |
default: | |
return errors.New("unexpected provider name") | |
} | |
err = json.Unmarshal(rawUploadRequest.Settings, settings) | |
if err != nil { | |
return err | |
} | |
u.Provider = rawUploadRequest.Provider | |
u.ImageName = rawUploadRequest.ImageName | |
u.Settings = settings | |
return err | |
} |
osbuild-composer/internal/target/target.go
Lines 62 to 134 in 8dece19
func (target *Target) UnmarshalJSON(data []byte) error { | |
var rawTarget rawTarget | |
err := json.Unmarshal(data, &rawTarget) | |
if err != nil { | |
return err | |
} | |
var options TargetOptions | |
switch rawTarget.Name { | |
case TargetNameAzure: | |
options = new(AzureTargetOptions) | |
case TargetNameAWS: | |
options = new(AWSTargetOptions) | |
case TargetNameAWSS3: | |
options = new(AWSS3TargetOptions) | |
case TargetNameGCP: | |
options = new(GCPTargetOptions) | |
case TargetNameAzureImage: | |
options = new(AzureImageTargetOptions) | |
// Kept for backward compatibility | |
case TargetNameLocal: | |
options = new(LocalTargetOptions) | |
case TargetNameKoji: | |
options = new(KojiTargetOptions) | |
case TargetNameVMWare: | |
options = new(VMWareTargetOptions) | |
case TargetNameOCI: | |
options = new(OCITargetOptions) | |
case TargetNameOCIObjectStorage: | |
options = new(OCIObjectStorageTargetOptions) | |
case TargetNameContainer: | |
options = new(ContainerTargetOptions) | |
case TargetNameWorkerServer: | |
options = new(WorkerServerTargetOptions) | |
case TargetNamePulpOSTree: | |
options = new(PulpOSTreeTargetOptions) | |
default: | |
return fmt.Errorf("unexpected target name: %s", rawTarget.Name) | |
} | |
err = json.Unmarshal(rawTarget.Options, options) | |
if err != nil { | |
return err | |
} | |
target.Uuid = rawTarget.Uuid | |
target.ImageName = rawTarget.ImageName | |
target.OsbuildArtifact = rawTarget.OsbuildArtifact | |
target.Name = rawTarget.Name | |
target.Created = rawTarget.Created | |
target.Status = rawTarget.Status | |
target.Options = options | |
type compatOptionsType struct { | |
// Deprecated: `Filename` is now set in the target itself as `ExportFilename`, not in its options. | |
Filename string `json:"filename"` | |
} | |
var compat compatOptionsType | |
err = json.Unmarshal(rawTarget.Options, &compat) | |
if err != nil { | |
return err | |
} | |
// Kept for backward compatibility | |
// If the `ExportTarget` is not set in the `Target`, the request is most probably | |
// coming from an old composer. Copy the value from the target options. | |
if target.OsbuildArtifact.ExportFilename == "" { | |
target.OsbuildArtifact.ExportFilename = compat.Filename | |
} | |
return nil | |
} |
This should improve the error in the frontend from
to
There is no test framework to test changes like this, right?