Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gracefully handle output of bitstring :message in %Credo.Issue{} #1120

Open
chriscrabtree opened this issue Mar 18, 2024 · 8 comments
Open

Comments

@chriscrabtree
Copy link

To make it easier for credo plugin authors, should credo handle the cases where :message or :trigger values end up as bitstrings? Credo could always convert these values to strings before rendering its output. And that way, we would have it happening in one place for every plugin instead of each plugin potentially gradually discovering the need.

The Story

A particular credo plugin was inadvertently bubbling up bitstrings in its %Credo.Issue{} structs in the :message and :trigger fields. It only did this in the presence of UTF-8 non-ASCII characters.

Credo does not expect these bitstrings, and runs into trouble somewhere in its line-wrapping logic. We end up with something like this upon running mix credo:

** (ArgumentError) argument error
    (stdlib 5.2) re.erl:806: :re.run(<<70, 111, 117, 110, 100, 32, 109, 105, 115, 115, 112, 101, 108, 108, 101, 100, 32, 119, 111, 114, 100, 32, 96, 103, 97, 114, 114, 121, 226, 96, 46>>, {:re_pattern, 1, 1, 0, <<69, 82, 67, 80, 32, 1, 0, 0, 0, 8, 0, 32, 1, 136, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...>>}, [{:capture, :all, :binary}, :global, {:offset, 0}])
    (elixir 1.16.0) lib/regex.ex:529: Regex.safe_run/3
    (elixir 1.16.0) lib/regex.ex:516: Regex.scan/3
    (credo 1.7.5) lib/credo/cli/output/ui.ex:60: Credo.CLI.Output.UI.wrap_at/2
    (credo 1.7.5) lib/credo/cli/command/suggest/output/default.ex:209: Credo.CLI.Command.Suggest.Output.Default.do_print_issue/4
    (elixir 1.16.0) lib/enum.ex:987: Enum."-each/2-lists^foreach/1-0-"/2
    (credo 1.7.5) lib/credo/cli/command/suggest/output/default.ex:136: Credo.CLI.Command.Suggest.Output.Default.print_issues_for_category/5
    (elixir 1.16.0) lib/enum.ex:987: Enum."-each/2-lists^foreach/1-0-"/2

I've offered a PR to that plugin to ensure that it always converts :message and :trigger to string values.

But maybe it's better for the whole credo ecosystem to handle this in credo itself?

Looks like Credo.CLI.Output.UI.wrap_at/2 would be a good place to add such a safety conversion step.

I'd be happy to work on a PR if we think this is worth pursuing.

@rrrene
Copy link
Owner

rrrene commented Mar 19, 2024

Hi, yes, we should definitely do something about this 👍

What do you think about putting a simple to_string here:

message: opts[:message],

We did something similar to :trigger a couple of weeks ago, so this would be a great contribution for consistency as well ✌️

@chriscrabtree
Copy link
Author

Indeed, to_string was my initial attempt at fixing the issue in the plugin. But it was not sufficient to solve the problem.

I ended up with this below, which certainly produces strings that credo can successfully work with.

But reading it now with fresh eyes, I don't like how it is transforming the input in the case where String.valid?/2 is false. Let me refine this a bit.

  defp binary_to_string(binary) do
    codepoints = String.codepoints(binary)

    Enum.reduce(codepoints, fn w, result ->
      cond do
        String.valid?(w) ->
          result <> w

        true ->
          <<parsed::8>> = w
          result <> <<parsed::utf8>>
      end
    end)
  end

@rrrene
Copy link
Owner

rrrene commented Mar 19, 2024

Mmh, I get the feeling that we are not yet finished analysing the problem (the StackOverflow post that this snippet seems to be copied from also left me no wiser) ...

Quick thoughts:

  1. we should definitely let plugin authors know with a clear error message that their :message is not valid, because it is not a valid string.
  2. I am not sure that we can convert non-valid Binaries to valid Strings in a generalized manner, without producing even more confusing error message when than abstraction leaks.
  3. Why is this a problem to begin with? Does this have something to do with file encoding?

Let me know your thoughts.

@chriscrabtree
Copy link
Author

  1. Sure, we could do that in format_issue. We could make the message say "Warning: MyCredoPlugin tried to issue an invalid message" or something like that if either String.printable?/2 or String.valid?/2 is false for either :message or :trigger. Or even just raise in that case, maybe? But I don't think this is our best option.
  2. The approach below retains the maximum amount of well-formatted information from :message & :trigger, while replacing the invalid data the way the standard library does. I think this is the sweet spot.
  3. Not file encoding, but data from a reporting system export that I copied into some test cases. That other system data turns out to be consistent, but ill-formed. I figure if it happened to me, it will happen to others at some point.

This replaces every invalid character with the same mark, "�" by default. So the valid surrounding characters will offer the user context to help.

As a user, I'd rather receive 90% of a credo message with an invalid character than 0%.

  defp to_valid_string(binary) do
    binary
    |> to_string()
    |> String.replace_invalid()
  end

Here is the test case I've been working on in test/credo/check_test.exs to demonstrate:

  test "it should cleanse invalid messages" do
    minimum_meta = IssueMeta.for(%{filename: "nofile.ex"}, %{exit_status: 0})

    invalid_message = <<70, 111, 117, 110, 100, 32, 109, 105, 115, 115, 112, 101, 108, 108, 101, 100, 32, 119, 111, 114, 100, 32, 96, 103, 97, 114, 114, 121, 226, 96, 46>>
    invalid_trigger = <<103, 97, 114, 114, 121, 226>>

    issue =
      Check.format_issue(
        minimum_meta,
        [
          message: invalid_message,
          trigger: invalid_trigger
        ],
        :some_category,
        22,
        __MODULE__
      )

    refute String.valid?(invalid_message)
    refute String.valid?(invalid_trigger)
    refute String.printable?(invalid_message)
    refute String.printable?(invalid_trigger)

    assert String.valid?(issue.message)
    assert String.valid?(issue.trigger)
    assert String.printable?(issue.message)
    assert String.printable?(issue.trigger)

    assert issue.message === "Found misspelled word `garry�`."
    assert issue.trigger === "garry�"
  end

@rrrene
Copy link
Owner

rrrene commented Mar 20, 2024

In my mind, we should not assume too much about the data we're given (especially when reading the source code of String.replace_invalid I am reminded of how much I do not know about encodings and binaries).

And: We can not use String.replace_invalid, because it was just introduced in the current minor version of Elixir.

Checking and warning with String.valid? on the :message seems the only reasonable option to me.

Your example shows that even the :trigger can be super idiosyncratic and since the trigger is the stuff in the source that should be highlighted (because it is "triggering" the issue), we cannot transform that inside Credo, because then it literally loses its meaning.

@chriscrabtree
Copy link
Author

Cool. How about something like this? I tried to keep Credo's helpful conversational style, but I admit to a tendency to over-word things.

I do think it's nice when a message, when searched, yields exactly the right results, so I considered that in this proposed wording.

    assert issue.message ===
             "The Credo message you were meant to see here contained at least one invalid byte, so we could not show it to you."

    assert issue.trigger === "The trigger for this Credo issue contained at least one invalid byte, so we could not show it to you."

@rrrene
Copy link
Owner

rrrene commented Mar 21, 2024

Why would we overwrite a message instead of just printing a warning?

I do think it's nice when a message, when searched, yields exactly the right results, so I considered that in this proposed wording.

What does this even mean? What "right results"?

@chriscrabtree
Copy link
Author

Why would we overwrite a message instead of just printing a warning?

Yes, of course. Definitely better. I outline a few approaches below.

What does this even mean? What "right results"?

Accurate search hits with minimum false positives. Users naturally search for the warning phrase verbatim.

Approach 1. Skip Malformed Issues and/or Malformed Messages

We could, for default output, skip malformed issues here, either by skipping the issue entirely or by skipping the invalid message, leaving the other issue elements to be output:

But then that leaves the question of what to do about the other outputs: flycheck, json, oneline, and sarif.

Approach 2. Filter Out Malformed Issues from Execution

Alternatively, we could filter out issues with invalid messages from Execution.get_issues/1 and get_issues/2 and get_issues_grouped_by_filename/1:

def get_issues(exec) do

But this seems strange to filter out issues at this level of abstraction due to an output concern. And lots of other code calls this function.

Approach 3. Smallest change and most simple

Since wrap_at/2 is where the handling of the invalid string raises, we could return an empty string when invalid, and perhaps warn, otherwise continue as normal:

def wrap_at(text, number) do

This is the smallest surface area solution, with only two direct callers.

Regardless

To help plugin authors, I think regardless of which approach we take, we would add checks for invalid message and trigger values here:

defp warn_on_malformed_issues(_source_files, issues) do
no_trigger = Credo.Issue.no_trigger()
Enum.each(issues, fn issue ->
case issue.trigger do
^no_trigger ->
:ok
trigger when is_nil(trigger) ->
IO.warn(":trigger is nil")
trigger when is_binary(trigger) ->
:ok
trigger when is_atom(trigger) ->
:ok
trigger ->
IO.warn(":trigger is not a binary: #{inspect(trigger, pretty: true)}")
end
end)
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants