Add events to indicate start and end of skaffold dev iterations#4037
Conversation
Codecov Report
|
proto/skaffold.proto
Outdated
| STATUS_CHECK_NODE_NOT_READY= 406; | ||
|
|
||
| // Unknown Error Codes | ||
| ErrorCode_UNKNOWN = 501; |
There was a problem hiding this comment.
How is this different from COULD_NOT_DETERMINE?
There was a problem hiding this comment.
COULD_NOT_DETERMINE or 0 value is default enum. In cases where we don't send any error Code the value is 0. To differentiate between when it is empty Vs when we know its an unknown error, i added the field.
There was a problem hiding this comment.
Can we not enforce that the error-code must be explicitly provided?
pkg/skaffold/errors/errors.go
Outdated
| func ErrorCodeFromError(_ error, _ phase) proto.ErrorCode { | ||
| return proto.ErrorCode_COULD_NOT_DETERMINE | ||
| func ErrorCodeFromError(_ error, _ Phase) proto.ErrorCode { | ||
| return proto.ErrorCode_ErrorCode_UNKNOWN |
There was a problem hiding this comment.
COULD_NOT_DETERMINE seems more apt here?
It feels like we should have an error type that carries the ErrorCode inside, rather than returning an error-code and an error. Then we could get rid of STATUS_CHECK_NO_ERROR.
There was a problem hiding this comment.
I think it would make sense to add a new proto ErrorDef with those 2 fields. However, we won't be able to change that for other events BuildEvent, DeployEvent , StatusCheckEvent etc since IDEs are already consuming it.
Re: STATUS_CHECK_NO_ERR, it was something we discussed for regarding following HTTP error codes for success. It would also be easy for us to understand metrics if we define a success code vs "0" since 0 is the default value for int type enum
There was a problem hiding this comment.
Looking back through this, I'm confused by the intent of this function. It's a placeholder? Can we hook in the pattern-matching aspects of #4045 here?
Or we we could our own error that carries an ErrorCode? Like the os.Err* variables and os.IsExist().
example
type ErrorWithCode interface {
error
ErrorCode() ErrorCode
}
type errorWithCode struct {
errorCode ErrorCode
err error
}
func (e errorWithCode) ErrorCode() ErrorCode {
return e.errorCode
}
func (e errorWithCode) Error() string {
return fmt.Sprintf("%d: %s", e.errorCode, e.err.Error())
}
func NewErrorCode(errorCode ErrorCode, err error) ErrorWithCode {
return errorWithCode{errorCode: errorCode, err: err}
}
pkg/skaffold/errors/errors.go
Outdated
| Deploy = Phase("Deploy") | ||
| StatusCheck = Phase("StatusCheck") | ||
| FileSync = Phase("FileSync") | ||
| Dev = Phase("Dev") |
There was a problem hiding this comment.
Is Dev really a phase? (We don't actually use Dev from what I can tell?)
There was a problem hiding this comment.
it is not a phase, but dev-init is a stage or step in dev loop where errors can happen.
I wanted to capture this "init" step errors.
Shd "DevInit" be better?
There was a problem hiding this comment.
also, i can rename "Phase" to "Stage"
There was a problem hiding this comment.
Oh! So these are phases of the dev loop? That makes sense, and could you add a comment to that effect please? (It seems mildly odd for this to be defined in skaffold/errors.) I think Dev → DevInit or just Init makes sense, and perhaps we should have a TearDown too.
(I had wondered if we need to differentiate between dev and debug, but the dev loop is the same so I think it's good.)
There was a problem hiding this comment.
Change Dev -> DevInit
Add TearDown
Add comment saying these are phases in dev loop.
581e583 to
6e99b4c
Compare
|
Thanks a lot for all your feedback @briandealwis . I have addressed most of the comments and refactored method signatures. Defining a new struct I am wiring up Actionable Error messages here #4045 and will propagate suggestions in the events. |
5dff70e to
6889a05
Compare
briandealwis
left a comment
There was a problem hiding this comment.
So metaEvent is the session start event, and endEvent is the session end? Are these supposed to be sent on every command or just the dev loop?
cmd/skaffold/app/cmd/commands.go
Outdated
| // The actual error code will be passed in | ||
| // https://github.com/GoogleContainerTools/skaffold/pull/4045 |
There was a problem hiding this comment.
This doesn't seem like the right place given that the MetaEvent is raised in runner.NewForConfig(). I believe this will cause things like fix to throw session events too. (Not to mention there's ExactArgs() too.) Can we not have it be thrown at the end of the runner?
(Or if MetaEvent and EndEvent are ok for things like build, deploy, fix, etc. then could we move the start/stop here.)
pkg/skaffold/errors/errors.go
Outdated
| Deploy = Phase("Deploy") | ||
| StatusCheck = Phase("StatusCheck") | ||
| FileSync = Phase("FileSync") | ||
| Dev = Phase("Dev") |
There was a problem hiding this comment.
Oh! So these are phases of the dev loop? That makes sense, and could you add a comment to that effect please? (It seems mildly odd for this to be defined in skaffold/errors.) I think Dev → DevInit or just Init makes sense, and perhaps we should have a TearDown too.
(I had wondered if we need to differentiate between dev and debug, but the dev loop is the same so I think it's good.)
pkg/skaffold/errors/errors.go
Outdated
| func ErrorCodeFromError(_ error, _ phase) proto.ErrorCode { | ||
| return proto.ErrorCode_COULD_NOT_DETERMINE | ||
| func ErrorCodeFromError(_ error, _ Phase) proto.ErrorCode { | ||
| return proto.ErrorCode_ErrorCode_UNKNOWN |
There was a problem hiding this comment.
Looking back through this, I'm confused by the intent of this function. It's a placeholder? Can we hook in the pattern-matching aspects of #4045 here?
Or we we could our own error that carries an ErrorCode? Like the os.Err* variables and os.IsExist().
example
type ErrorWithCode interface {
error
ErrorCode() ErrorCode
}
type errorWithCode struct {
errorCode ErrorCode
err error
}
func (e errorWithCode) ErrorCode() ErrorCode {
return e.errorCode
}
func (e errorWithCode) Error() string {
return fmt.Sprintf("%d: %s", e.errorCode, e.err.Error())
}
func NewErrorCode(errorCode ErrorCode, err error) ErrorWithCode {
return errorWithCode{errorCode: errorCode, err: err}
}
proto/skaffold.proto
Outdated
| message ActionableErr { | ||
| ErrorCode errCode = 1; // error code representing the error | ||
| string message = 2; // message describing the error. | ||
| repeated string suggestions = 3; // a list of suggestions for the error. | ||
|
|
There was a problem hiding this comment.
I don't think reporting messages here is the right thing to do. I'm wondering if it would be better to have a secondary set of possible action codes that could be returned instead. That would also allow i18n or more.
(I think it's also an opportunity to somehow segregate PII fields, like artifact names or image tags, so that we might be able to report the actual errors seen?)
There was a problem hiding this comment.
Thanks. I should have made this clear. The suggestion text is just for IDEs to show to User.
Having suggestions plumbed through skaffold will reduce the duplication of efforts in both IDEs
I will add that as a comment.
For metrics, we should probably report codes like you mentioned.
pkg/skaffold/event/event.go
Outdated
| logEntry.Entry = fmt.Sprintf("Dev Iteration %d in progress", de.Iteration) | ||
| case Succeeded: | ||
| logEntry.Entry = fmt.Sprintf("Dev Iteration %d successful", de.Iteration) | ||
| default: |
There was a problem hiding this comment.
change this case Failed:
This reverts commit ca80757.
5a540c2 to
b467110
Compare
b467110 to
87a56c4
Compare
87a56c4 to
9a2f0ff
Compare
| }, | ||
| { | ||
| "name": "event.devLoopEvent.err.errCode", | ||
| "description": " - UNKNOWN_ERROR: Could not determine error and phase\n - STATUSCHECK_SUCCESS: Status Check Success\n - STATUSCHECK_IMAGE_PULL_ERR: Container image pull error\n - STATUSCHECK_CONTAINER_CREATING: Container creating error\n - STATUSCHECK_RUN_CONTAINER_ERR: Container run error\n - STATUSCHECK_CONTAINER_TERMINATED: Container is already terminated\n - STATUSCHECK_CONTAINER_RESTARTING: Container restarting error\n - STATUSCHECK_NODE_MEMORY_PRESSURE: Node memory pressure error\n - STATUSCHECK_NODE_DISK_PRESSURE: Node disk pressure error\n - STATUSCHECK_NODE_NETWORK_UNAVAILABLE: Node network unavailable error\n - STATUSCHECK_NODE_PID_PRESSURE: Node PID pressure error\n - STATUSCHECK_NODE_UNSCHEDULABLE: Node unschedulable error\n - STATUSCHECK_NODE_UNREACHABLE: Node unreachable error\n - STATUSCHECK_NODE_NOT_READY: Node not ready error\n - STATUSCHECK_UNKNOWN: Status Check error unknown\n - STATUSCHECK_UNKNOWN_UNSCHEDULABLE: Container is unschedulable due to unknown reasons\n - STATUSCHECK_CONTAINER_WAITING_UNKNOWN: Container is waiting due to unknown reason\n - DEPLOY_UNKNOWN: Deploy failed due to unknown reason\n - SYNC_UNKNOWN: SYNC failed due to known reason\n - BUILD_UNKNOWN: Build failed due to unknown reason\n - DEVINIT_UNKNOWN: Dev Init failed due to unknown reason\n - CLEANUP_UNKNOWN: Cleanup failed due to unknown reason\n - SYNC_INIT_ERROR: File Sync Initialize failure\n - DEVINIT_REGISTER_BUILD_DEPS: Failed to configure watcher for build dependencies in dev loop\n - DEVINIT_REGISTER_TEST_DEPS: Failed to configure watcher for test dependencies in dev loop\n - DEVINIT_REGISTER_DEPLOY_DEPS: Failed to configure watcher for deploy dependencies in dev loop\n - DEVINIT_REGISTER_CONFIG_DEP: Failed to configure watcher for Skaffold configuration file.", |
There was a problem hiding this comment.
Can we do something to make this a more manageable string?
There was a problem hiding this comment.
this was recently changed in #4083 . I will follow up later.
/cc @quoctruong
Fixes: #4036
Related: Relevant tracking issues, for context
Merge before/after: Dependent or prerequisite PRs
Description
In this PR,
User facing changes (remove if N/A)
No user facing changes.
Tools consuming Event API will see following new events in the sequence.
Follow-up Work (remove if N/A)