Skip to content

Implement destination inference for sync of dockerfile artifacts [2/3]#2084

Merged
dgageot merged 12 commits intoGoogleContainerTools:masterfrom
corneliusweig:sync/step-2
Jun 7, 2019
Merged

Implement destination inference for sync of dockerfile artifacts [2/3]#2084
dgageot merged 12 commits intoGoogleContainerTools:masterfrom
corneliusweig:sync/step-2

Conversation

@corneliusweig
Copy link
Copy Markdown
Contributor

@corneliusweig corneliusweig commented May 7, 2019

The sync feature is getting refurbished, see design proposal #1844 .

This PR implements step 2 of the implementation plan for the sync improvements. Notable changes include:

  • adding a new method SyncMap to the builder interface. This new method should provide sync destinations for all relevant source files.
  • implementing SyncMap for docker artifacts.

Note that SyncMap is intentionally not wired up yet. The only way to test it is with skaffold diagnose. The speed of GetDependencies vs SyncMap (which do similar things) is almost the same. In a sample project with 4.7k files GetDependencies and SyncMap took ~30ms and ~32ms, respectively.

I tried to (reasonably) avoid code duplication. Yet, the directory walking code (pkg/skaffold/docker/dependencies.go and pkg/skaffold/docker/syncmap.go) is almost twice the same. Nevertheless, I think this is the right approach, because CustomArtifact also relies on docker.WalkWorkspace and this one is a little faster than docker.walkWorkspaceWithDestinations.

Possible issues

  1. Overlapping copies to the same destination are now resolved correctly. This was not the case in the original PR WIP Sync: infer destination from Dockerfile #1812. For example:

    COPY foo /foo
    COPY bar/ /   # bar contains a file called "foo"

    In that case /foo will have the content from bar/foo, as it is expected.

  2. Transitive copies across stages of multi-stage dockerfiles are not resolved. For example:

    FROM scratch
    ADD foo .
    FROM scratch
    COPY --from=0 foo bar

    The current implementation will resolve foo -> foo and not foo -> bar. As discussed in the design
    proposal Design proposal for sync improvements #1844, this edge case will be ignored for now.

based-on #1847
Supersedes #1812
CC design shepherd @tejal29

@corneliusweig corneliusweig changed the title Implement destination inference for sync of dockerfile artifacts Implement destination inference for sync of dockerfile artifacts [2/3] May 7, 2019
@corneliusweig
Copy link
Copy Markdown
Contributor Author

@balopat rebase is done

@codecov-io
Copy link
Copy Markdown

codecov-io commented May 10, 2019

Codecov Report

Merging #2084 into master will decrease coverage by <.01%.
The diff coverage is 68.98%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2084      +/-   ##
==========================================
- Coverage    58.7%   58.69%   -0.01%     
==========================================
  Files         188      191       +3     
  Lines        7849     8012     +163     
==========================================
+ Hits         4608     4703      +95     
- Misses       2873     2900      +27     
- Partials      368      409      +41
Impacted Files Coverage Δ
pkg/skaffold/runner/timings.go 14% <ø> (ø) ⬆️
pkg/skaffold/build/build.go 0% <0%> (ø)
pkg/skaffold/build/local/local.go 36.76% <0%> (-0.55%) ⬇️
pkg/skaffold/runner/diagnose.go 0% <0%> (ø) ⬆️
pkg/skaffold/build/gcb/types.go 0% <0%> (ø) ⬆️
pkg/skaffold/build/cluster/types.go 0% <0%> (ø) ⬆️
cmd/skaffold/app/cmd/deploy.go 68.75% <0%> (+4.46%) ⬆️
cmd/skaffold/app/cmd/run.go 60% <0%> (+6.15%) ⬆️
cmd/skaffold/app/cmd/delete.go 66.66% <0%> (+11.11%) ⬆️
cmd/skaffold/app/cmd/dev.go 23.91% <0%> (+1.18%) ⬆️
... and 53 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9935f9e...5d95a01. Read the comment docs.

@balopat balopat added kokoro:run runs the kokoro jobs on a PR and removed needs-rebase labels May 13, 2019
@kokoro-team kokoro-team removed the kokoro:run runs the kokoro jobs on a PR label May 13, 2019
Copy link
Copy Markdown
Contributor

@dgageot dgageot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with the design.
Because some functions are both moved and modified, I find it hard to review the code changes.

Is there a way to break this PR in two pieces. One that moves the code and one that actually adds the new feature?

if err != nil {
// if the context was cancelled act as if all is well
// TODO(dgageot): this should be even higher in the call chain.
if ctx.Err() == context.Canceled {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we fix that first to simplify the code here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the checks to the outermost call site. Please check carefully :)

func (t *TestBench) Cleanup(ctx context.Context, out io.Writer) error { return nil }
func (t *TestBench) Prune(ctx context.Context, out io.Writer) error { return nil }

func (t *TestBench) SyncMap(ctx context.Context, artifact *latest.Artifact) (map[string][]string, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a oneliner

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goimports won't let me. I can only remove the blank lines


// ADDED
SyncMap() (map[string][]string, error)
SyncMap(ctx context.Context, artifact *latest.Artifact) (map[string][]string, error)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this return a syncMap? (Not sure it's better)

Copy link
Copy Markdown
Contributor Author

@corneliusweig corneliusweig May 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@balopat had the same idea in pkg/skaffold/sync/sync.go. I'm also very unsure what is better. My main issue is that a type alias binds the using sites tighter to the place where type SyncMap is defined (probably builder.go?). Besides, I don't particularly like that there will be two SyncMap symbols in builder.go: 1 the function, 2 its return type.

Overall me feeling is slightly towards keeping it as is. On the other hand, the local type alias in sync.go makes somewhat sense, even though it cannot substitute all usages of map[string][]string due to visibility.

return false
}

func changeDir(cur, to string) string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to understand what this function is doing. It doesn't change the directory.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it to resolveDir and added some godoc.


func processCopy(value *parser.Node, envs map[string]string) ([]string, error) {
var copied []string
func retrieveWorkingDir(tagged string, insecureRegistries map[string]bool) (string, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a copy of this code in sync.go

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😓 I shamelessly copied from there. I think the best place for this will be in the docker package.

@corneliusweig
Copy link
Copy Markdown
Contributor Author

@dgageot Thanks for taking this up so quickly after bringing it to your attention!

Because some functions are both moved and modified, I find it hard to review the code changes.

This is an artifact of a humongous rebase/squash of the original PR. Sorry for that 🙇‍♂️ . As suggested, I squeezed in a pure move commit. This should make reviewing possible.

return nil, build.ErrSyncMapNotSupported{}
}

syncMap, err := docker.SyncMap(ctx, a.Workspace, a.DockerArtifact.DockerfilePath, a.DockerArtifact.BuildArgs, b.insecureRegistries)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we not just return docker.SyncMap(....) here? I think it's more idiomatic to check if err != nil in the caller and then conditionally use the returned syncMap.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, sure! David made me pull other code out from there, but I missed that this can be further simplified..

// No local Docker is available
cf, err = RetrieveRemoteConfig(tagged, insecureRegistries)
} else {
cf, err = localDocker.ConfigFile(context.Background(), tagged)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized after making this comment you didn't write this code, but going to leave it here in case anyone has opinions on it:

do we not want to try and retrieve the image config from remote if we were able to get a local docker client, but couldn't find the image in the local daemon?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. Please have another look.

)

// NormalizeDockerfilePath returns the absolute path to the dockerfile.
func NormalizeDockerfilePath(context, dockerfile string) (string, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was copied from pkg/skaffold/docker/parse.go right? can we not move it to a common location (pkg/skaffold/docker/util/util.go?) and reuse it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nkubala So the function was not copied but moved. It is therefore already reused.

I'm not so happy with moving it to util. Such packages/files quickly attract completely unrelated code. But if you feel strongly about it, I won't start to argue.

return nil
}

func alwaysSucceedWhenCancelled(ctx context.Context, err error) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this related to sync destination inference?

Copy link
Copy Markdown
Contributor Author

@corneliusweig corneliusweig May 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only indirectly. @dgageot recommended to resolve an earlier todo item in order to simplify the code (#2084 (comment)).

Would you prefer if this change was submitted as a separate PR?

fmt.Fprintln(out, " - Dependencies:", len(deps), "files")
fmt.Fprintf(out, " - Time to list dependencies: %v (2nd time: %v)\n", timeDeps1, timeDeps2)

timeSyncMap1, err := timeToConstructSyncMap(ctx, r.Builder, artifact)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we do this 2 times?
Also, is there a way to add tests for the L63 to L65?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's run twice so that the OS can warm the inode cache. The second run should normally be more representative.

Since there was no test for diagnose at all, I didn't bother adding one. Since this is creating (random) timings, the best we can achieve here is a smoke test. Do you feel strongly about it?

Copy link
Copy Markdown
Contributor

@tejal29 tejal29 May 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the runtime object inspection could refactored and unit tested to make sure we dont' break this in future. But i dont feel strongly regarding it.

if _, isNotSupported := err.(build.ErrSyncMapNotSupported); !isNotSupported {
				return errors.Wrap(err, "construct artifact dependencies")
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tejal29 Can you explain a little more how you would like that code to be refactored?

I opened #2157 for some smoke tests of the diagnose subcommand. I think this has advantages over unit testing here, because we would have to set up a lot of mock objects for unit tests.

@balopat balopat added the kokoro:run runs the kokoro jobs on a PR label May 22, 2019
@kokoro-team kokoro-team removed the kokoro:run runs the kokoro jobs on a PR label May 22, 2019
@corneliusweig
Copy link
Copy Markdown
Contributor Author

@dgageot At you convenience could you take another look?

@dgageot
Copy link
Copy Markdown
Contributor

dgageot commented Jun 6, 2019

@corneliusweig Could you rebase this PR? I'll then take a look at it to make sure it's merged quickly

@corneliusweig
Copy link
Copy Markdown
Contributor Author

@dgageot Thanks! Rebase is done.

@dgageot dgageot added the kokoro:run runs the kokoro jobs on a PR label Jun 6, 2019
@kokoro-team kokoro-team removed the kokoro:run runs the kokoro jobs on a PR label Jun 6, 2019
@dgageot
Copy link
Copy Markdown
Contributor

dgageot commented Jun 6, 2019

@corneliusweig sorry, I LGTMed but it needs rebase again

Signed-off-by: Cornelius Weig <22861411+corneliusweig@users.noreply.github.com>
Before, ONBUILD instructions were collected over the whole Dockerfile and prepended to all instructions. This has drawbacks when the sequence of commands matters, for example onbuild WORKDIR instructions.
This prepares for extracting the workdir during parsing.

Signed-off-by: Cornelius Weig <22861411+corneliusweig@users.noreply.github.com>
Signed-off-by: Cornelius Weig <22861411+corneliusweig@users.noreply.github.com>
Signed-off-by: Cornelius Weig <22861411+corneliusweig@users.noreply.github.com>
Ignore when the context was cancelled

Signed-off-by: Cornelius Weig <22861411+corneliusweig@users.noreply.github.com>
Preserve order of copy commands

Refactor to preserve order of copy commands

Add godoc, improve naming

Do not forcibly include the Dockerfile

Do not forcibly exclude `.dockerignore`

Disambiguate overwrites of the same destination location

Only COPY/ADD into the latest image are syncable

Reduce code duplication and reorganize code

- rely on a single implementation to extract the COPY/ADD commands from the Dockerfile
- introduce more helper functions to reduce code duplication

Signed-off-by: Cornelius Weig <22861411+corneliusweig@users.noreply.github.com>
Signed-off-by: Cornelius Weig <22861411+corneliusweig@users.noreply.github.com>
Note: not all occurrences of map[string][]string can replaced, because syncMap's visibility is restricted
Signed-off-by: Cornelius Weig <22861411+corneliusweig@users.noreply.github.com>
Signed-off-by: Cornelius Weig <22861411+corneliusweig@users.noreply.github.com>
Before, this was done on a much smaller scope in GetDependencies. Now this check is performed in the outermost call site, so that more places are covered.

Signed-off-by: Cornelius Weig <22861411+corneliusweig@users.noreply.github.com>
Signed-off-by: Cornelius Weig <22861411+corneliusweig@users.noreply.github.com>
@corneliusweig
Copy link
Copy Markdown
Contributor Author

@dgageot no problem, the rebase is done already. Thanks for seeing this through!

@tejal29 tejal29 added the kokoro:run runs the kokoro jobs on a PR label Jun 6, 2019
@kokoro-team kokoro-team removed the kokoro:run runs the kokoro jobs on a PR label Jun 6, 2019
@dgageot dgageot dismissed nkubala’s stale review June 7, 2019 08:36

your review was taken into account

@dgageot dgageot merged commit 6cb1d51 into GoogleContainerTools:master Jun 7, 2019
@corneliusweig corneliusweig deleted the sync/step-2 branch June 7, 2019 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants