Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server data race #4790

Merged
merged 6 commits into from
Dec 9, 2020
Merged

Server data race #4790

merged 6 commits into from
Dec 9, 2020

Conversation

a-palchikov
Copy link
Contributor

@a-palchikov a-palchikov commented Nov 11, 2020

This PR fixes the data race I spotted the RoundRobin test.

What's interesting with this changeset is that it prompted a change in gogo/protobuf which I implemented in a temporary fork until the issue is resolved upstream.
I've submitted a PR upstream.

google.protobuf.Timestamp Expires = 5 [
    (gogoproto.stdtime) = true,   // specifically this is causing issues with reflect-based cloning
    (gogoproto.nullable) = false,
    (gogoproto.jsontag) = "started,omitempty"
];

Basically, protobuf package provides a reflect-based implementation of Clone. Attempting it on the Server type unfortunately in the version v1.3.1 results in:

Logs
go test -race ./lib/services/... -check.f=ServerDeepCopy

----------------------------------------------------------------------
PANIC: services_test.go:105: ServicesSuite.TestServerDeepCopy

... Panic: reflect: reflect.flag.mustBeAssignable using value obtained using unexported field (PC=0x4674F2)

/home/deemok/dev/go/go/src/runtime/panic.go:975
  in gopanic
/home/deemok/dev/go/go/src/reflect/value.go:244
  in flag.mustBeAssignableSlow
/home/deemok/dev/go/go/src/reflect/value.go:234
  in flag.mustBeAssignable
/home/deemok/dev/go/go/src/reflect/value.go:1526
  in Value.Set
/home/deemok/go/src/github.com/gravitational/teleport/vendor/github.com/gogo/protobuf/proto/clone.go:161
  in mergeAny
/home/deemok/go/src/github.com/gravitational/teleport/vendor/github.com/gogo/protobuf/proto/clone.go:108
  in mergeStruct
/home/deemok/go/src/github.com/gravitational/teleport/vendor/github.com/gogo/protobuf/proto/clone.go:240
  in mergeAny
/home/deemok/go/src/github.com/gravitational/teleport/vendor/github.com/gogo/protobuf/proto/clone.go:203
  in mergeAny
/home/deemok/go/src/github.com/gravitational/teleport/vendor/github.com/gogo/protobuf/proto/clone.go:108
  in mergeStruct
/home/deemok/go/src/github.com/gravitational/teleport/vendor/github.com/gogo/protobuf/proto/clone.go:98
  in Merge
/home/deemok/go/src/github.com/gravitational/teleport/vendor/github.com/gogo/protobuf/proto/clone.go:151
  in mergeAny
/home/deemok/go/src/github.com/gravitational/teleport/vendor/github.com/gogo/protobuf/proto/clone.go:108
  in mergeStruct
/home/deemok/go/src/github.com/gravitational/teleport/vendor/github.com/gogo/protobuf/proto/clone.go:98
  in Merge
/home/deemok/go/src/github.com/gravitational/teleport/vendor/github.com/gogo/protobuf/proto/clone.go:52
  in Clone
server.go:363
  in ServerV2.DeepCopy
services_test.go:160
  in ServicesSuite.TestServerDeepCopy
/home/deemok/dev/go/go/src/reflect/value.go:321
  in Value.Call
/home/deemok/dev/go/go/src/runtime/asm_amd64.s:1373
  in goexit
OOPS: 0 passed, 1 PANICKED
--- FAIL: TestServices (0.00s)
FAIL
FAIL	github.com/gravitational/teleport/lib/services	0.133s
ok  	github.com/gravitational/teleport/lib/services/local	0.053s
ok  	github.com/gravitational/teleport/lib/services/suite	0.071s
FAIL

Before the change:

Output
$ go test -race -count=5 ./lib/srv/regular/... -check.f=RoundRobin
WARNING: Error cleaning up temporaries: unlinkat /tmp/check-4324745483838182873/1/log/upload/streaming/default: directory not emptyOK: 1 passed
==================
WARNING: DATA RACE
Read at 0x00c000406360 by goroutine 39:
  github.com/gravitational/teleport/lib/utils.UTC()
      /home/deemok/go/src/github.com/gravitational/teleport/lib/utils/time.go:40 +0x4e
  github.com/gravitational/teleport/lib/services.(*Metadata).CheckAndSetDefaults()
      /home/deemok/go/src/github.com/gravitational/teleport/lib/services/resource.go:728 +0x28c
  github.com/gravitational/teleport/lib/services.(*ServerV2).CheckAndSetDefaults()
      /home/deemok/go/src/github.com/gravitational/teleport/lib/services/server.go:344 +0x56
  github.com/gravitational/teleport/lib/services.(*TeleportServerMarshaler).MarshalServer()
      /home/deemok/go/src/github.com/gravitational/teleport/lib/services/server.go:897 +0x59
  github.com/gravitational/teleport/lib/reversetunnel.marshalDiscoveryRequest()
      /home/deemok/go/src/github.com/gravitational/teleport/lib/reversetunnel/discovery.go:65 +0x112
  github.com/gravitational/teleport/lib/reversetunnel.(*remoteConn).sendDiscoveryRequest()
      /home/deemok/go/src/github.com/gravitational/teleport/lib/reversetunnel/conn.go:238 +0xb1
  github.com/gravitational/teleport/lib/reversetunnel.(*localSite).handleHeartbeat()
      /home/deemok/go/src/github.com/gravitational/teleport/lib/reversetunnel/localsite.go:372 +0xf00

Previous write at 0x00c000406360 by goroutine 164:
  github.com/gravitational/teleport/lib/utils.UTC()
      /home/deemok/go/src/github.com/gravitational/teleport/lib/utils/time.go:45 +0x158
  github.com/gravitational/teleport/lib/services.(*Metadata).CheckAndSetDefaults()
      /home/deemok/go/src/github.com/gravitational/teleport/lib/services/resource.go:728 +0x28c
  github.com/gravitational/teleport/lib/services.(*ServerV2).CheckAndSetDefaults()
      /home/deemok/go/src/github.com/gravitational/teleport/lib/services/server.go:344 +0x56
  github.com/gravitational/teleport/lib/services.(*TeleportServerMarshaler).MarshalServer()
      /home/deemok/go/src/github.com/gravitational/teleport/lib/services/server.go:897 +0x59
  github.com/gravitational/teleport/lib/reversetunnel.marshalDiscoveryRequest()
      /home/deemok/go/src/github.com/gravitational/teleport/lib/reversetunnel/discovery.go:65 +0x112
  github.com/gravitational/teleport/lib/reversetunnel.(*remoteConn).sendDiscoveryRequest()
      /home/deemok/go/src/github.com/gravitational/teleport/lib/reversetunnel/conn.go:238 +0xb1
  github.com/gravitational/teleport/lib/reversetunnel.(*localSite).handleHeartbeat()
      /home/deemok/go/src/github.com/gravitational/teleport/lib/reversetunnel/localsite.go:372 +0xf00

Goroutine 39 (running) created at:
  github.com/gravitational/teleport/lib/reversetunnel.(*server).handleNewService()
      /home/deemok/go/src/github.com/gravitational/teleport/lib/reversetunnel/srv.go:659 +0x343
  github.com/gravitational/teleport/lib/reversetunnel.(*server).handleHeartbeat()
      /home/deemok/go/src/github.com/gravitational/teleport/lib/reversetunnel/srv.go:627 +0x664
  github.com/gravitational/teleport/lib/reversetunnel.(*server).HandleNewChan()
      /home/deemok/go/src/github.com/gravitational/teleport/lib/reversetunnel/srv.go:564 +0x47a

Goroutine 164 (running) created at:
  github.com/gravitational/teleport/lib/reversetunnel.(*server).handleNewService()
      /home/deemok/go/src/github.com/gravitational/teleport/lib/reversetunnel/srv.go:659 +0x343
  github.com/gravitational/teleport/lib/reversetunnel.(*server).handleHeartbeat()
      /home/deemok/go/src/github.com/gravitational/teleport/lib/reversetunnel/srv.go:627 +0x664
  github.com/gravitational/teleport/lib/reversetunnel.(*server).HandleNewChan()
      /home/deemok/go/src/github.com/gravitational/teleport/lib/reversetunnel/srv.go:564 +0x47a
==================
WARNING: Error cleaning up temporaries: unlinkat /tmp/check-2740103009342231109/1/log/upload/streaming/default: directory not emptyOK: 1 passed
--- FAIL: TestRegular (4.06s)
    testing.go:906: race detected during execution of test
WARNING: Error cleaning up temporaries: unlinkat /tmp/check-8995016276575641803/1/log/upload/streaming/default: directory not emptyOK: 1 passed
OK: 1 passed
WARNING: Error cleaning up temporaries: unlinkat /tmp/check-898860202204764712/1/log/upload/streaming/default: directory not emptyOK: 1 passed
FAIL
FAIL	github.com/gravitational/teleport/lib/srv/regular	23.328s
FAIL

After the change:

Output
$ go test -race -count=10 ./lib/srv/regular/... -check.f=RoundRobin
ok  	github.com/gravitational/teleport/lib/srv/regular	45.447s

Fixes #4781.

Copy link
Contributor

@awly awly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you link the upstream patch please?
Let's wait a week or two, maybe it'll get accepted and we won't end up with yet another dependency fork.

lib/reversetunnel/agent.go Outdated Show resolved Hide resolved
lib/reversetunnel/discovery.go Show resolved Hide resolved
lib/reversetunnel/agent.go Outdated Show resolved Hide resolved
lib/reversetunnel/srv.go Outdated Show resolved Hide resolved
lib/services/server.go Outdated Show resolved Hide resolved
lib/services/services_test.go Outdated Show resolved Hide resolved
lib/sshutils/server.go Outdated Show resolved Hide resolved
@a-palchikov a-palchikov force-pushed the dmitri/4653-server-data-race branch 3 times, most recently from b35a4f7 to 997bbc9 Compare November 25, 2020 11:31
@a-palchikov a-palchikov force-pushed the dmitri/4653-server-data-race branch 2 times, most recently from 94c1b5b to 1ed53d1 Compare December 7, 2020 15:19
return
}
defer conn.Close()

// Successfully connected to remote cluster.
a.Infof("Connected to %s", conn.RemoteAddr())
a.log.WithField("addr", conn.LocalAddr().String()).WithField("remote-addr", conn.RemoteAddr().String()).Info("Connected.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use WithFields to specify multiple fields at once

if !ok {
return
}
*r = *s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proto.Merger docs say:

	// Merge merges the contents of src into the receiver message.
	// It clones all data structures in src such that it aliases no mutable
	// memory referenced by src.

There are 2 possible issues:

  • if r.Foo is set and s.Foo is empty, I assume that r.Foo value should be preserved
  • if Rotation ever gets a pointer field, we break the promise of not keeping mutable memory references

I don't fully understand why we need to implement proto.Merger. Does proto.Clone not work on Rotation otherwise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you link to the docs you reference, btw?
I know this is not the complete Merger implementation - it implements enough to support the Clone use-case so it assumes that the receiver is zero value. Unfortunately I was not able to find a concise way to implement it to fulfill the Merger requirements you described and yet be able to avoid exhaustive manual tests of each attribute which is exactly why I opted for Clone.
Also, the Merger interface used here has a somewhat less strict requirement:

// Merge merges src into this message.
// Required and optional fields that are set in src will be set to that value in dst.
// Elements of repeated fields will be appended.
// ...

As to why I implemented Merger also for Rotation - exactly for the same reason as for Metadata - b/c it was not working with reflect due to use of time.Time which sports unexported fields.

Copy link
Contributor

@awly awly Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I was looking at the regular protobuf docs, not gogoproto: https://github.com/golang/protobuf/blob/master/proto/proto.go#L99-L101

I think this merger implementation will cause problems later on, when it starts getting hit from non-Clone use-cases for whatever reason.
For now, it's OK to merge to fix the race, but leave a comment explaining that it's a partial implementation. Next person reading this code will appreciate the background :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data race in lib/srv/regular tests
2 participants