Introduce ErrorContext for tracking errors across threads #83

kunalmohan · 2020-12-04T14:35:49Z

fix #61
Tracks and stores function names that send messages across threads. This information will then be passed to PanicInfo as payload for displaying at panic.

kunalmohan · 2020-12-04T14:38:49Z

This is still a WIP. We now need to replace unwrap() with expect() to pass the info to PanicInfo.

Some part of the new code here has introduced more race conditions causing tests to fail intermittently- they pass on increasing time b/w snapshots.

imsnif · 2020-12-05T13:41:55Z

Hey @kunalmohan - good start on this.

I think, as you point out yourself and we start seeing in the implementation, that any solution forcing us to pass the error context very deep into the stack of functions in the thread itself would be very fragile and hard to maintain. This would mean that any developer who wants to add things would need to be aware of the inner workings of this error mechanism. And worst yet - remember to do things differently. :) This is of course true to being unable to use unwrap and the like.

When I think about it, the only thing we really need in addition to the backtrace of the final thread is knowing which threads the error passed through and which instructions (be it ScreenInstruction, PtyInstruction, etc.) it had in those threads. Do you think we might be able to log some sort of OpenCalls struct globally and match against it when we display the error? I'm not sure what the exact implementation would be for it, but I have a feeling we would be able to do more-or-less exact matching through information we have in the error backtrace. What do you think? Does this make sense? Did I manage to explain my general idea well enough?

kunalmohan · 2020-12-05T15:32:21Z

Yep, basically logging the send() calls is the main idea. As I see it we have two options here-

Simply log the send() calls with the message they are sending and the function name where the send() is called. Here we won't have to edit any function to accept any new argument. But the user will have to manually identify and filter out each call in the sequence that caused the error. Not very good option in my opinion.
Log the send() call (and other info) along with a unique sequence number assigned at the time of stdin (similar to how we create a new ErrorContext object currently in this PR. The sequence number will have to be passed along with the message between the threads. User will have to figure out the last call made in that sequence, which should be possible with the available backtrace. The user can then simply filter out the calls made in that particular sequence using the sequence number of that call. This would require us to edit just the functions that send any message to another thread.

I'm not sure if any of this matches to your idea of globally logging an OpenCalls struct. If not, could you elaborate your idea a bit more?

imsnif · 2020-12-06T11:21:35Z

To be honest, I'm more leaning toward option 1. I think most (all?) paths are rather unique and there will mostly be just one "open call" the user will have to choose from. I'd even say to just display the first one and assume there are no others. Or am I missing something and this doesn't make sense? (edit: ping @kunalmohan - forgot :) )

kunalmohan · 2020-12-06T12:08:59Z

This makes sense. We don't have a large number of threads that the user needs to navigate through so it should not be very difficult. And I think you're correct about the paths being unique so I guess we should go ahead with this.

Just to be very clear (before I start work on this), we'll only log the first open call, i.e. only the send() calls in the stdin_handler and the ipc_server threads. These will be logged to a new file (or the existing MOSAIC_TMP_LOG_FILE?). Should this be behind the debug flag or not?

imsnif · 2020-12-06T12:59:59Z

Hmm... maybe I didn't understand you correctly. I was thinking to log the previous calls in memory, and then dump them with the backtrace to screen somehow.

imsnif · 2020-12-06T13:03:21Z

So that in the end we'll have something like
stdin_thread(AcceptInput)->screen_thread(CloseActivePane)->pty_thread(ClosePane)-><backtrace_of_error_in_pty>

EDIT: forgot to add stdin thread :)

kunalmohan · 2020-12-06T14:11:40Z

Hmm... maybe I didn't understand you correctly. I was thinking to log the previous calls in memory, and then dump them with the backtrace to screen somehow.

Right, sorry, it slipped from my mind. I had just one concern with it. We'll use Arc<Mutex<OpenCalls>> to share it between threads. I was concerned with cases where Mutex lock gets poisoned in panic. But I guess that's rarely going to happen (and we always can switch to parking_lot::Mutex, if it does). So I'll go ahead with this.

I'd even say to just display the first one and assume there are no others.

Also, could you briefly explain what you meant by this?

imsnif · 2020-12-06T14:48:16Z

@kunalmohan - I think I might have been too general here in my thoughts and caused some confusion. Let's take a step back.

If we:

Pass an ErrorContext in every call we make (as in: every time we send an instruction to a different thread)
Attach the name of the instruction that we send so that this context contains a Vec of those instructions which we can attach to a backtrace later if there's an error
Keep that context globally in the thread
use that context in the panic hook (maybe sending it using another panic hook inside each thread somehow?)
in the main panic hook, get:

the message if there is one
the ErrorContext
the backtrace

Could we forgo having something global to the whole app shared with an App<Mutex> (because that will hurt performance)?

imsnif · 2020-12-06T15:00:32Z

I don't know if it will work, but maybe we can somehow use catch_unwind inside each thread to forward the ErrorContext to the main thread?

kunalmohan · 2020-12-06T16:33:43Z

@imsnif I guess I found a solution. We'll have a LocalKey for each thread.

thread_local!(static OpenCalls: RefCell<ErrorContext> = RefCell::new(ErrorContext::new());

When a thread sends a message to another thread, we'll clone the current ErrorContext of the thread and pass it to the other thread.
When a thread receives a new message, it will replace/update it's current ErrorContext in LocalKey with the one it has received with the new message. Hence at each point in time, the LocalKey of each thread would contain the context of the instruction that it is currently processing.
We can directly access this key in the panic hook and when a thread panics, the hook will automatically select the key of the thread which panicked.
I have tested it here.

imsnif · 2020-12-06T16:58:55Z

Interesting. I haven't used LocalKey before, so maybe you can help me make sure I understand: if one thread panics, the panic hook will access the LocalKey of the panicking thread, rather than the LocalKey of a different thread that did not panic? Or am I misunderstanding this?

kunalmohan · 2020-12-06T17:16:50Z

When accessing a LocalKey, the LocalKey of the currently executing thread is used.

Since we will be accessing the LocalKey from the panic hook, it will access the LocalKey of the panicking thread (just like thread::current() in panic hook returns the information of panicking thread).

In short, you are correct :)

imsnif · 2020-12-06T18:18:51Z

@kunalmohan - sounds like a great solution. Looking forward to see the implementation :)

kunalmohan · 2020-12-07T07:41:18Z

An example of how the context is displayed-

thread 'screen' panicked at 'removal index (is 0) should be < len (is 0)': library/alloc/src/vec.rs:1066
ErrorContext {
    calls: [
        "stdin_handler_thread(AcceptInput)",
        "pty_thread(SpawnTerminalVertically)",
        "stream_terminal_bytes(AsyncTask)",
        "screen_thread(HandlePtyEvent)",
    ],
}
   0: mosaic::errors::handle_panic
             at src/errors.rs:11:21
   1: mosaic::start::{{closure}}
             at src/main.rs:146:13
   2: std::panicking::rust_panic_with_hook
             at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:581:17
   3: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/panicking.rs:484:9
   4: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/std/src/sys_common/backtrace.rs:153:18

....

kunalmohan · 2020-12-07T07:48:12Z

@imsnif this is ready for the first round of review.

All tests are passing on my local machine. I guess again some race conditions are causing failures in CI.

imsnif

@kunalmohan - I think your solution with thread_local and OPENCALLS is very elegant. I could not have imagined anything simpler. Very cool :)

I left some comments about the implementation. We might have to do some back-and-forth about this, because this is a relatively big and very "deep" feature. I hope you can be a little patient with me. :)

Other than that, about displaying:
What do you think of adding some nice colors? Instead of

ErrorContext {
    calls: [
        "stdin_handler_thread(AcceptInput)",
        "pty_thread(SpawnTerminalVertically)",
        "stream_terminal_bytes(AsyncTask)",
        "screen_thread(HandlePtyEvent)",
    ],
}

something like:

Originating thread(s):
  1. stdin_handler_thread(purple): AcceptInput(green)
  2. pty_thread(purple): SpawnTerminalVertically(green)
  3. stream_terminal_bytes(purple): AsyncTask(green)
  4. screen_thread(purple): HandlePtyEvent(green)

Error: <message>(red)
  <stacktrace>(reset color)

src/main.rs

src/errors.rs

imsnif

Looking good! Added some more comments

src/errors.rs

src/input.rs

Cargo.lock

imsnif

Everything looks great. Thanks for being so patient (and quick!) with all my requests and suggestions. This is top notch work! I would be very happy to keep seeing you contributing and taking part in our project. :)

Implement ErrorContext for tracking errors across threads

cc96a61

kunalmohan marked this pull request as ready for review December 7, 2020 07:45

imsnif requested changes Dec 7, 2020

View reviewed changes

src/main.rs Outdated Show resolved Hide resolved

src/main.rs Outdated Show resolved Hide resolved

src/errors.rs Outdated Show resolved Hide resolved

kunalmohan added 6 commits December 7, 2020 23:26

reorder Instruction and ErrorContext

2fcfbbb

Add ContextType, AppContext, ScreenContext, PtyContext

50e4e68

Use ArrayVec in ErrorContext

e1d6b2f

increase MAX_THREAD_CALL_STACK to 6

56cf487

Custom implement Debug for ErrorContext ad ContextType and color output

fe97191

Use os_input instead of println!()

cb99090

kunalmohan requested a review from imsnif December 8, 2020 07:20

Use array instead of ArrayVec and some cleanup

fc88c48

imsnif requested changes Dec 8, 2020

View reviewed changes

src/errors.rs Outdated Show resolved Hide resolved

src/errors.rs Show resolved Hide resolved

src/input.rs Outdated Show resolved Hide resolved

Introduce SenderWithContext

b6af3bd

kunalmohan requested a review from imsnif December 9, 2020 11:36

imsnif reviewed Dec 9, 2020

View reviewed changes

Cargo.lock Outdated Show resolved Hide resolved

Keep arrayvec at v0.5.1

8a7af50

kunalmohan requested a review from imsnif December 9, 2020 16:54

imsnif approved these changes Dec 9, 2020

View reviewed changes

imsnif merged commit 92d1bcf into zellij-org:main Dec 9, 2020

kunalmohan deleted the error-context branch December 9, 2020 17:01

PhML mentioned this pull request Jun 17, 2021

No mouse support in neovim #579

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce ErrorContext for tracking errors across threads #83

Introduce ErrorContext for tracking errors across threads #83

kunalmohan commented Dec 4, 2020

kunalmohan commented Dec 4, 2020

imsnif commented Dec 5, 2020

kunalmohan commented Dec 5, 2020

imsnif commented Dec 6, 2020 •

edited

kunalmohan commented Dec 6, 2020

imsnif commented Dec 6, 2020

imsnif commented Dec 6, 2020 •

edited

kunalmohan commented Dec 6, 2020

imsnif commented Dec 6, 2020 •

edited

imsnif commented Dec 6, 2020

kunalmohan commented Dec 6, 2020

imsnif commented Dec 6, 2020

kunalmohan commented Dec 6, 2020 •

edited

imsnif commented Dec 6, 2020

kunalmohan commented Dec 7, 2020

kunalmohan commented Dec 7, 2020

imsnif left a comment •

edited

imsnif left a comment

imsnif left a comment

Introduce ErrorContext for tracking errors across threads #83

Introduce ErrorContext for tracking errors across threads #83

Conversation

kunalmohan commented Dec 4, 2020

kunalmohan commented Dec 4, 2020

imsnif commented Dec 5, 2020

kunalmohan commented Dec 5, 2020

imsnif commented Dec 6, 2020 • edited

kunalmohan commented Dec 6, 2020

imsnif commented Dec 6, 2020

imsnif commented Dec 6, 2020 • edited

kunalmohan commented Dec 6, 2020

imsnif commented Dec 6, 2020 • edited

imsnif commented Dec 6, 2020

kunalmohan commented Dec 6, 2020

imsnif commented Dec 6, 2020

kunalmohan commented Dec 6, 2020 • edited

imsnif commented Dec 6, 2020

kunalmohan commented Dec 7, 2020

kunalmohan commented Dec 7, 2020

imsnif left a comment • edited

Choose a reason for hiding this comment

imsnif left a comment

Choose a reason for hiding this comment

imsnif left a comment

Choose a reason for hiding this comment

imsnif commented Dec 6, 2020 •

edited

imsnif commented Dec 6, 2020 •

edited

imsnif commented Dec 6, 2020 •

edited

kunalmohan commented Dec 6, 2020 •

edited

imsnif left a comment •

edited