Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Use of Kthread-blocking Epoll with Poller Read, Remove Per-Event LStats on Linux #433 #434

Merged

Conversation

horahoradev
Copy link
Member

@horahoradev horahoradev commented Feb 28, 2022

NOTE: Please do not open a pull request that adds or changes the public API without giving consideration to all supported operating systems.

What does this pull request do?

This PR:

  1. replaces the use of a raw kthread-blocking epoll call for monitoring inotify events with a blocking Read() call using the Go runtime's poller system. This will allow the individual goroutine to block while waiting for I/O, rather than needing to block an entire kernel thread (see: Linux inotify poller wastes an OS thread #240)
  2. Removes Linux per-event uses of LStat to check that the file still exists before passing the event to the user (see: emits CREATE when it should have been REMOVE #404)

Change #1

Where should the reviewer start?

The changes themselves are fairly trivial, but the concepts are not, so I'd recommend familiarizing yourself with the conceptual background. I've provided a few sources with notes on takeaways, and my explanation and rundown of the Go source code as it pertains to this change.

References
  1. os.File.Read() does not return after os.File.Close() and os.File.Fd() golang/go#26439
  • calling Fd() will prevent Read() from returning after close
  • one supported workflow for performing I/O on file descirptors is: setting the FD to non-blocking mode, then passing the FD to os.NewFile (this is what I ended up doing)
  1. os: add a function to construct *File with non-blocking file descriptor golang/go#22939
  • if a file passed to NewFile is in non-blocking mode, then it will be added to the runtime I/O poller
  1. https://morsmachine.dk/netpoller
  • Socket i/O in Golang uses the netpoller mechanism, which handles I/O events in a separate thread, and enables I/O to block the goroutine rather than the entire kernel thread
  • uses platform-specific async I/O mechanisms (epoll, kqueue, etc)
  • this is socket-specific, but normal FS I/O is probably handled the same way
Golang I/O Rundown

In Golang, I/O is blocking, but we can't directly use blocking syscalls, as this would block whichever kernel thread the Goroutine was running in. As a solution, Go I/O is performed via the internal poller package (https://pkg.go.dev/internal/poll), which parks goroutines while waiting for I/O, and uses the platform-specific asynchronous I/O mechanism (e.g. epoll) to detect FDs for which I/O can be performed.

I went through the code to make sure that this is the case, here's a small rundown:
Here's the implementation of read in *os.FIle:

// Read reads up to len(b) bytes from the File and stores them in b.
// It returns the number of bytes read and any error encountered.
// At end of file, Read returns 0, io.EOF.
func (f *File) Read(b []byte) (n int, err error) {
	if err := f.checkValid("read"); err != nil {
		return 0, err
	}
	n, e := f.read(b)
	return n, f.wrapErr("read", e)
}

we can see this method calls .read(), which is defined as follows for posix:

// read reads up to len(b) bytes from the File.
// It returns the number of bytes read and an error, if any.
func (f *File) read(b []byte) (n int, err error) {
	n, err = f.pfd.Read(b)
	runtime.KeepAlive(f)
	return n, err
}

It calls pfd.Read(), which we can see refers to poll.FD:

// File represents an open file descriptor.
type File struct {
	*file // os specific
}
// file is the real representation of *File.
// The extra level of indirection ensures that no clients of os
// can overwrite this data, which could cause the finalizer
// to close the wrong file descriptor.
type file struct {
	pfd         poll.FD
	name        string
	dirinfo     *dirInfo // nil unless directory being read
	nonblock    bool     // whether we set nonblocking mode
	stdoutOrErr bool     // whether this is stdout or stderr
	appendMode  bool     // whether file is opened for appending
}

The implementation of poll.FD's read for unix is as follows:

// Read implements io.Reader.
func (fd *FD) Read(p []byte) (int, error) {
	if err := fd.readLock(); err != nil {
		return 0, err
	}
	defer fd.readUnlock()
	if len(p) == 0 {
		// If the caller wanted a zero byte read, return immediately
		// without trying (but after acquiring the readLock).
		// Otherwise syscall.Read returns 0, nil which looks like
		// io.EOF.
		// TODO(bradfitz): make it wait for readability? (Issue 15735)
		return 0, nil
	}
	if err := fd.pd.prepareRead(fd.isFile); err != nil {
		return 0, err
	}
	if fd.IsStream && len(p) > maxRW {
		p = p[:maxRW]
	}
	for {
		n, err := ignoringEINTRIO(syscall.Read, fd.Sysfd, p)
		if err != nil {
			n = 0
			if err == syscall.EAGAIN && fd.pd.pollable() {
				if err = fd.pd.waitRead(fd.isFile); err == nil {
					continue
				}
			}
		}
		err = fd.eofError(n, err)
		return n, err
	}
}

We first execute the read syscall. If the file descriptor is non-blocking, then it will return quickly with EAGAIN and wait for waitRead to return. If it's blocking, then it'll cause the current kernel thread to sleep, and will not call waitRead(), as the fd isn't pollable and doesn't return EAGAIN.

waitRead is as follows:

func (pd *pollDesc) waitRead(isFile bool) error {
	return pd.wait('r', isFile)
}

which calls pd.Wait:

func (pd *pollDesc) wait(mode int, isFile bool) error {
	if pd.runtimeCtx == 0 {
		return errors.New("waiting for unsupported file type")
	}
	res := runtime_pollWait(pd.runtimeCtx, mode)
	return convertErr(res, isFile)
}

which calls runtime_pollWait() (note that this is the internal/poll package)

// poll_runtime_pollWait, which is internal/poll.runtime_pollWait,
// waits for a descriptor to be ready for reading or writing,
// according to mode, which is 'r' or 'w'.
// This returns an error code; the codes are defined above.
//go:linkname poll_runtime_pollWait internal/poll.runtime_pollWait
func poll_runtime_pollWait(pd *pollDesc, mode int) int {
	errcode := netpollcheckerr(pd, int32(mode))
	if errcode != pollNoError {
		return errcode
	}
	// As for now only Solaris, illumos, and AIX use level-triggered IO.
	if GOOS == "solaris" || GOOS == "illumos" || GOOS == "aix" {
		netpollarm(pd, mode)
	}
	for !netpollblock(pd, int32(mode), false) {
		errcode = netpollcheckerr(pd, int32(mode))
		if errcode != pollNoError {
			return errcode
		}
		// Can happen if timeout has fired and unblocked us,
		// but before we had a chance to run, timeout has been reset.
		// Pretend it has not happened and retry.
	}
	return pollNoError
}

which calls netpollblock:

// returns true if IO is ready, or false if timedout or closed
// waitio - wait only for completed IO, ignore errors
// Concurrent calls to netpollblock in the same mode are forbidden, as pollDesc
// can hold only a single waiting goroutine for each mode.
func netpollblock(pd *pollDesc, mode int32, waitio bool) bool {
	gpp := &pd.rg
	if mode == 'w' {
		gpp = &pd.wg
	}

	// set the gpp semaphore to pdWait
	for {
		// Consume notification if already ready.
		if gpp.CompareAndSwap(pdReady, 0) {
			return true
		}
		if gpp.CompareAndSwap(0, pdWait) {
			break
		}

		// Double check that this isn't corrupt; otherwise we'd loop
		// forever.
		if v := gpp.Load(); v != pdReady && v != 0 {
			throw("runtime: double wait")
		}
	}

	// need to recheck error states after setting gpp to pdWait
	// this is necessary because runtime_pollUnblock/runtime_pollSetDeadline/deadlineimpl
	// do the opposite: store to closing/rd/wd, publishInfo, load of rg/wg
	if waitio || netpollcheckerr(pd, mode) == pollNoError {
		gopark(netpollblockcommit, unsafe.Pointer(gpp), waitReasonIOWait, traceEvGoBlockNet, 5)
	}
	// be careful to not lose concurrent pdReady notification
	old := gpp.Swap(0)
	if old > pdWait {
		throw("runtime: corrupted polldesc")
	}
	return old == pdReady
}

I'll assume that the comment, " returns true if IO is ready, or false if timedout or closed" is true. The current goroutine is parked until I/O is ready.

gpp is set to ready within the function netpollunblock, which is called by netpollready :

// netpollready is called by the platform-specific netpoll function.
// It declares that the fd associated with pd is ready for I/O.
// The toRun argument is used to build a list of goroutines to return
// from netpoll. The mode argument is 'r', 'w', or 'r'+'w' to indicate
// whether the fd is ready for reading or writing or both.
//
// This may run while the world is stopped, so write barriers are not allowed.
//go:nowritebarrier
func netpollready(toRun *gList, pd *pollDesc, mode int32) {
	var rg, wg *g
	if mode == 'r' || mode == 'r'+'w' {
		rg = netpollunblock(pd, 'r', true)
	}
	if mode == 'w' || mode == 'r'+'w' {
		wg = netpollunblock(pd, 'w', true)
	}
	if rg != nil {
		toRun.push(rg)
	}
	if wg != nil {
		toRun.push(wg)
	}
}

netpollready is called by the platform-specific implementation of netpoll(); here's the implementation for Linux, which uses epoll to determine which FDs are ready for I/O, and notifies the corresponding poll descriptor.

// netpoll checks for ready network connections.
// Returns list of goroutines that become runnable.
// delay < 0: blocks indefinitely
// delay == 0: does not block, just polls
// delay > 0: block for up to that many nanoseconds
func netpoll(delay int64) gList {
	if epfd == -1 {
		return gList{}
	}
	var waitms int32
	if delay < 0 {
		waitms = -1
	} else if delay == 0 {
		waitms = 0
	} else if delay < 1e6 {
		waitms = 1
	} else if delay < 1e15 {
		waitms = int32(delay / 1e6)
	} else {
		// An arbitrary cap on how long to wait for a timer.
		// 1e9 ms == ~11.5 days.
		waitms = 1e9
	}
	var events [128]epollevent
retry:
	n := epollwait(epfd, &events[0], int32(len(events)), waitms)
	if n < 0 {
		if n != -_EINTR {
			println("runtime: epollwait on fd", epfd, "failed with", -n)
			throw("runtime: netpoll failed")
		}
		// If a timed sleep was interrupted, just return to
		// recalculate how long we should sleep now.
		if waitms > 0 {
			return gList{}
		}
		goto retry
	}
	var toRun gList
	for i := int32(0); i < n; i++ {
		ev := &events[i]
		if ev.events == 0 {
			continue
		}

		if *(**uintptr)(unsafe.Pointer(&ev.data)) == &netpollBreakRd {
			if ev.events != _EPOLLIN {
				println("runtime: netpoll: break fd ready for", ev.events)
				throw("runtime: netpoll: break fd ready for something unexpected")
			}
			if delay != 0 {
				// netpollBreak could be picked up by a
				// nonblocking poll. Only read the byte
				// if blocking.
				var tmp [16]byte
				read(int32(netpollBreakRd), noescape(unsafe.Pointer(&tmp[0])), int32(len(tmp)))
				atomic.Store(&netpollWakeSig, 0)
			}
			continue
		}

		var mode int32
		if ev.events&(_EPOLLIN|_EPOLLRDHUP|_EPOLLHUP|_EPOLLERR) != 0 {
			mode += 'r'
		}
		if ev.events&(_EPOLLOUT|_EPOLLHUP|_EPOLLERR) != 0 {
			mode += 'w'
		}
		if mode != 0 {
			pd := *(**pollDesc)(unsafe.Pointer(&ev.data))
			pd.setEventErr(ev.events == _EPOLLERR)
			netpollready(&toRun, pd, mode)
		}
	}
	return toRun
}

How should this be manually tested?

I think that the changes I've made are fairly easy to reason about, and automated tests should validate that the behavior is correct. I tested this change on Linux, which is the affected platform. I did not test on other platforms.

Change #2

Where should the reviewer start?

I would read #404. My argument is that since inotify guarantees that events will be sent in order, and by the time the client receives the event for which we're calling lstat the file may have been removed anyway, there's no need to perform LStat for so many file events.

How should this be manually tested?

I think the automated tests should cover it. I tested this change on Linux, which is the affected platform. I did not test on other platforms.

@nshalman
Copy link
Contributor

nshalman commented Mar 2, 2022

Thank you!
I'm not an expert on inotify and haven't done a close read of your changes yet, but I have a couple of questions that I didn't see addressed by your otherwise very thorough writeup.

Does this change imply a newer minimum version of Linux than the previous implementation? If so, do you know what version that would be?

Is there any way to add a test (ideally an integration test that runs on all platforms) that fails before this change is applied and that passes after applying it? Perhaps rather than the usual test pattern of spinning up a goroutine to consume events, explicitly do all the work to queue up the events in the kernel before then consuming all of them afterwards in a single thread?

Thank you for digging into this issue! I will try to find time to review these changes if some other linux expert doesn't get to it first.

@horahoradev
Copy link
Member Author

I don't see any reason why this change would increase the minimum Linux version. My changes as it relates to the kernel's API would be:

  1. Removed use of epoll (this is not an issue)
  2. I added a call to change a FD to non-blocking... I'm 99.9% this isn't an issue. I'll need to dig up the man page for fcntl on 2.6.37 (the minimum Linux version we support) and see if it includes non-blocking IO

The changes I've made would only be relevant for Linux; other platforms aren't affected, so I wouldn't be able to add a test for them, but I can definitely add one for Linux. My initial idea is to monitor whether the kernel thread blocks too often on epoll... but I'll need to think about it some more.

@nshalman nshalman added the linux label Mar 3, 2022
@horahoradev horahoradev force-pushed the switch_to_poller_read_and_drop_linux_lstats branch 15 times, most recently from 2b35b32 to c016463 Compare March 6, 2022 06:18
@horahoradev
Copy link
Member Author

horahoradev commented Mar 6, 2022

Here's my new test:

func TestINotifyNoBlockingSyscalls(t *testing.T) {
	if os.Getenv("KTHREAD_TEST") == "" {
		t.Skip("Skipping kernel thread count test")
	}

	getThreads := func() int {
		cmd := fmt.Sprintf("ls /proc/%d/task | wc -l", os.Getpid())
		output, err := exec.Command("/bin/bash", "-c", cmd).Output()
		if err != nil {
			t.Fatalf("Failed to execute command to check number of threads, err %s", err)
		}

		n, err := strconv.ParseInt(strings.Trim(string(output), "\n"), 10, 64)
		if err != nil {
			t.Fatalf("Failed to parse output as int, err: %s", err)
		}
		return int(n)
	}

	startingThreads := getThreads()

	for i := 0; i <= 30; i++ {
		testDir := tempMkdir(t)
		defer os.RemoveAll(testDir)

		w, err := NewWatcher()
		if err != nil {
			t.Fatalf("Failed to create watcher: %v", err)
		}

		w.Add(testDir)
	}

	// Bad synchronization mechanism
	time.Sleep(time.Second * 2)

	endingThreads := getThreads()

	// Did we spawn a significant number of new threads?
	if diff := endingThreads - startingThreads; diff >= 10 {
		t.Fatalf("Expected diff<10 but got %v. starting: %v. ending: %v", diff, startingThreads, endingThreads)
	}

	// Is the Go runtime spawning many more threads than the configured maximum number of threads for user code?
	if maxprocs := runtime.GOMAXPROCS(0) + 10; maxprocs <= endingThreads {
		t.Fatalf("Expected %v threads (max procs) but got %v", maxprocs, endingThreads)
	}
}

The gist is that we try to spawn 30 watchers, and see if:

  1. the number of kernel threads increases over the course of the test
  2. the number of kernel threads is significantly higher than GOMAXPROCS (which is the configured limit on the number of kernel threads handling user code)

From the runtime package docs:

The GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously. There is no limit to the number of threads that can be blocked in system calls on behalf of Go code; those do not count against the GOMAXPROCS limit. This package's GOMAXPROCS function queries and changes the limit.

GOMAXPROCS defaults to the number of cores, so if we have 16 logical cores and 200 goroutines, we can expect that the Go runtime will spawn at most 16 threads for those user goroutines to run on, unless those goroutines have blocking system calls, which may create additional threads. Previously, we were invoking a raw blocking epoll, which led to the creation of additional threads. As I understand it, there are some additional threads spawned by the Go runtime (e.g. for the netpoller), so the test verifies that the number of threads is no more than 10 above GOMAXPROCS.

Succeeds with my changes: https://github.com/horahoradev/fsnotify/runs/5441488327
Fails without my changes: https://github.com/horahoradev/fsnotify/runs/5441480547 (creates ~37 kernel threads)

I had to run this test in a separate process. For some reason, the existing test suite creates a large number of kernel threads in the Github action test environment (~60-120), even with my changes. I think that warrants investigation, but it's a separate issue. This might be due to something like the networked filesystem present in the test environment, but I wasn't able to reproduce this number of kernel threads by injecting delays into syscall returns using strace.

To resolve questions about whether platform support has changed, I've created #435 , and will attempt to resolve that issue in this PR.

@horahoradev horahoradev force-pushed the switch_to_poller_read_and_drop_linux_lstats branch 11 times, most recently from df9108d to c31e638 Compare March 6, 2022 22:10
@horahoradev horahoradev force-pushed the switch_to_poller_read_and_drop_linux_lstats branch from b423b19 to f189093 Compare March 6, 2022 23:55
@horahoradev
Copy link
Member Author

I've added a new Github action which runs the tests on Debian 6, whose kernel version is 2.6.32. Let me know what you think; this isn't exactly the minimum version we support, but it's reasonably close. I think we can be confident that this change doesn't break platform support.

@horahoradev horahoradev force-pushed the switch_to_poller_read_and_drop_linux_lstats branch from f189093 to 520c763 Compare March 26, 2022 21:10
@horahoradev horahoradev force-pushed the switch_to_poller_read_and_drop_linux_lstats branch 4 times, most recently from b6ad2a3 to f4a1bb0 Compare April 3, 2022 23:36
@horahoradev
Copy link
Member Author

horahoradev commented Apr 3, 2022

I made my test better, so now the improvement is pretty easy to see. In summary, I create one watcher, then spawn 60 watcher.readEvents() goroutines, and see if that yields any (>0) new kthreads. This passes with my changes, but on main:

--- FAIL: TestINotifyNoBlockingSyscalls (2.01s)
    inotify_test.go:493: Got a nonzero diff 41. starting: 27. ending: 68
FAIL
FAIL    github.com/fsnotify/fsnotify    12.084s
FAIL

It spawns 41 new kernel threads, whereas my changes spawn 0. This works because previously readEvents had the blocking read syscall, so spawning 60 readEvents() goroutines will result in ~60ish new kernel threads.

Sorry for the wall of text, I think I'm done now. Ready for review!

@arp242 arp242 force-pushed the switch_to_poller_read_and_drop_linux_lstats branch from f4a1bb0 to 066bc38 Compare July 23, 2022 05:24
arp242 added a commit that referenced this pull request Jul 23, 2022
- Test all GOOS/GOARCH combinations by looping over "go tool dist list";
this just tests if it compiles.

- Add a Vagrant box to test Debian 6 / Linux 2.6.32; this was adapted
from @horahoradev's patch at #434.

- Update the minimum version requirements: we test Linux 2.6.32 now and
  turns out that's also the minimum version [Go supports] in recent
  releases, so just set it to that.

  Need Go 1.16 for retract in go.mod. I don't know, maybe we can just
  remove the retract? Latest Debian ships with Go 1.15.

  [Go supports]: golang/go#45964

- Test both Go 1.16 and 1.18 (the lowest supported version and newest
  version). I guess we could also test 1.17, but the CI already takes
  somewhat long and I can't recall ever having a situation where an
  intermediate version failed.

- Test macOS 11 and 12; macOS 10.15 will (probably) end support in
  November, so probably not worth supporting this. GitHub will [remove]
  support for this at the end of August.

  [remove]: https://github.blog/changelog/2022-07-20-github-actions-the-macos-10-15-actions-runner-image-is-being-deprecated-and-will-be-removed-by-8-30-22/

- Test OpenBSD, NetBSD.

- Move "lint" to its own YAML file.

Future work:

- Actually run tests for all systems Go supports. Bit pointless right
  now as many of these don't do anything. Currently untested are
  Solaris, illumios, plan9, AIX, Android, iOS, DragonflyBSD, WASM.

  Some of these might be difficult (AIX, iOS, Android), but haven't
  looked in to it. I tried setting up Solaris with the
  vmactions/solaris, but it doesn't seem to have an easy way to install
  Go.

- GitHub only [supports] Windows Server 2019 and 2022; probabably also
  want to test Server 2016, but GitHub dropped support for this. Can
  maybe use AppVeyor(?)

  [supports]: https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#choosing-github-hosted-runners

- Could maybe test older versions of BSD, too. Not sure of it's worth
  it.
inotify.go Outdated
if e.Op&Create == Create || e.Op&Write == Write {
_, statErr := os.Lstat(e.Name)
return os.IsNotExist(statErr)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the epoll replacement and removal of the lstat are two unrelated things?
Or are they related somehow?

I'm a bit nervous about removing it, since I assume it was added for a reason to solve a specific issue. Looks like it was added in 2013:

cc2c34e#diff-b5e2484a4976be237475e7872e566a59aac5d8cc82223bdba01dbd61c2d69558R243

Issue 36 refers to this issue on the old repo, which mentions it fixes a memory leak: howeyc/fsnotify#36

I think it would be better to remove it here and create a new PR, so we can focus on just one thing, which is already a large change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're unrelated, I just didn't see the purpose of using lstat here, since it provides no real guarantees, and figured I'd remove it too. I can move that change to a separate PR, np there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just didn't see the purpose of using lstat here, since it provides no real guarantees

I agree; I also don't see the purpose (and think the current behaviour is confusing), which is why I'm hesitant to "just remove it". Chesterton's fence and all of that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #470 for this

@arp242
Copy link
Member

arp242 commented Jul 23, 2022

NB: I rebased this branch on the latest main and removed the Debian 6/Vagrant tests for now; I added those in #469.

I think this mostly looks good, except for my comment about the Lstat. What I don't understand is why the epoll approach was done in the first place, since using IN_NONBLOCK is so much simpler. Basically, it looks "too good to be true", so I'm wondering if I'm missing something 🤔

Maybe IN_NONBLOCK was added later? Or maybe the original author simply didn't realize it could be done like this (I've written "useless" heaps of code like this myself that could be replaced with something much simpler)?

@arp242
Copy link
Member

arp242 commented Jul 23, 2022

The epoll method was added in #66, which has next to no information on why this approach was chosen. In #1 and #5 IN_NONBLOCK is mentioned as "If we use IN_NONBLOCK fsnotify would require Linux 2.6.27." and "I noticed that inotify_init1() has a IN_NONBLOCK option in Linux 2.6.27+ (Go 1.3 requires Linux 2.6.23)". The old repo has a reference to it as well.

That's all the context I could find.

@horahoradev
Copy link
Member Author

The epoll code was written in ~2015, but Golang didn't add the ability to register file descriptors (as returned by raw syscalls, like inotify_init) with Go's runtime poller until years later. See the issues referenced in #240 . I think that's the main reason: we're taking advantage of functionality that didn't exist when the code was originally written.

@horahoradev
Copy link
Member Author

PR amended as suggested

inotify_test.go Outdated
if err != nil {
t.Fatalf("Failed to parse output as int, err: %s", err)
}
return int(n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be identical, and should be more portable; even on Linux /bin/bash isn't 100% guaranteed. For example my Alpine system doesn't have it.

getThreads := func() int {
    d := fmt.Sprintf("/proc/%d/task", os.Getpid())
    ls, err := os.ReadDir(d)
    if err != nil {
        t.Fatalf("reading %q: %s", d, err)
    }
    return len(ls)
}

It will also makes things a bit faster too as it doesn't spawn a whole bunch of processes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated it and merged; thanks!

arp242 added a commit that referenced this pull request Jul 24, 2022
* Test some more things in CI

- Test all GOOS/GOARCH combinations by looping over "go tool dist list";
this just tests if it compiles.

- Add a Vagrant box to test Debian 6 / Linux 2.6.32; this was adapted
from @horahoradev's patch at #434.

- Update the minimum version requirements: we test Linux 2.6.32 now and
  turns out that's also the minimum version [Go supports] in recent
  releases, so just set it to that.

  Need Go 1.16 for retract in go.mod. I don't know, maybe we can just
  remove the retract? Latest Debian ships with Go 1.15.

  [Go supports]: golang/go#45964

- Test both Go 1.16 and 1.18 (the lowest supported version and newest
  version). I guess we could also test 1.17, but the CI already takes
  somewhat long and I can't recall ever having a situation where an
  intermediate version failed.

- Test macOS 11 and 12; macOS 10.15 will (probably) end support in
  November, so probably not worth supporting this. GitHub will [remove]
  support for this at the end of August.

  [remove]: https://github.blog/changelog/2022-07-20-github-actions-the-macos-10-15-actions-runner-image-is-being-deprecated-and-will-be-removed-by-8-30-22/

- Test OpenBSD, NetBSD.

- Move "lint" to its own YAML file.

Future work:

- Actually run tests for all systems Go supports. Bit pointless right
  now as many of these don't do anything. Currently untested are
  Solaris, illumios, plan9, AIX, Android, iOS, DragonflyBSD, WASM.

  Some of these might be difficult (AIX, iOS, Android), but haven't
  looked in to it. I tried setting up Solaris with the
  vmactions/solaris, but it doesn't seem to have an easy way to install
  Go.

- GitHub only [supports] Windows Server 2019 and 2022; probabably also
  want to test Server 2016, but GitHub dropped support for this. Can
  maybe use AppVeyor(?)

  [supports]: https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#choosing-github-hosted-runners

- Could maybe test older versions of BSD, too. Not sure of it's worth
  it.

* Skip TestFsnotifyMultipleOperations on NetBSD

This test doesn't seem to work; I'm sure it passed at least once in the
CI, but locally I can't seem to get it to work.

Fails with:

	=== RUN   TestFsnotifyMultipleOperations
    	integration_test.go:114: event received: "/tmp/fsnotify2194826986/TestFsnotifySeq.testfile": CREATE
    	integration_test.go:114: event received: "/tmp/fsnotify2194826986/TestFsnotifySeq.testfile": WRITE
    	integration_test.go:114: event received: "/tmp/fsnotify2194826986/TestFsnotifySeq.testfile": REMOVE|RENAME
    	integration_test.go:114: event received: "/tmp/fsnotify2194826986/TestFsnotifySeq.testfile": CREATE
    	integration_test.go:186: incorrect number of rename+delete events received after 500 ms (2 vs 1)
    	integration_test.go:114: event received: "/tmp/fsnotify2194826986/TestFsnotifySeq.testfile": REMOVE
    	integration_test.go:114: event received: "/tmp/fsnotify2194826986": REMOVE|WRITE

For reference, this is the output on Linux and OpenBSD (the output is
identical):

	=== RUN   TestFsnotifyMultipleOperations
    	integration_test.go:114: event received: "/tmp/fsnotify989736723/TestFsnotifySeq.testfile": CREATE
    	integration_test.go:114: event received: "/tmp/fsnotify989736723/TestFsnotifySeq.testfile": WRITE
    	integration_test.go:114: event received: "/tmp/fsnotify989736723/TestFsnotifySeq.testfile": RENAME
    	integration_test.go:114: event received: "/tmp/fsnotify989736723/TestFsnotifySeq.testfile": CREATE
    	integration_test.go:190: calling Close()
    	integration_test.go:192: waiting for the event channel to become closed...
    	integration_test.go:195: event channel closed

* Fix "too many open files" on Debian 6, maybe

I guess this started failing after I rebased on main; let's see if a
small sleep works to clean up the file descriptors.
@arp242 arp242 force-pushed the switch_to_poller_read_and_drop_linux_lstats branch from 34f2c19 to b6ecf6d Compare July 24, 2022 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants