Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: adding warning for blacklisted variable keyword #103

Open
ParsaNobahari opened this issue Nov 7, 2023 · 8 comments
Open

[Question]: adding warning for blacklisted variable keyword #103

ParsaNobahari opened this issue Nov 7, 2023 · 8 comments

Comments

@ParsaNobahari
Copy link

I want to add warning for every time a variable is being declared with the name "password", but I don't know how should I go about doing it. I looked at similar pull request (here) commits to what I want to do. I kind of got the hang of how the project is structured but don't understand how it works, or basically how linters work.

I would appreciate if someone would explain how I can do it. The pull request is about 200 lines across 14 different files and I can barely wrap my head around lua.

@alerque
Copy link
Member

alerque commented Nov 8, 2023

The PR you linked is probably a bad reference since it was one of the most elaborate change set to land since it was about the syntax of the language itself. I suggest just looking at the code itself and picking one of the simpler lints, particularly about variable names.

What did you have it mind for how this would look? Obviously calling a variable password isn't going to be worthy of a warning for all code bases. Were you thinking about perhaps a new lint with a configurable blacklist of names to to allow in variables?

@ParsaNobahari
Copy link
Author

Yes, an option to be able to blacklist certain names for variables would be quite what I was thinking of.

Although at this point I have to tell you that I took a project from my professor for my bachelor's degree: adding a new feature to a linter.

My professor advised me to start by adding a warning for every time a variable is declared with the name "password".
So you can understand the whole reason why I'm doing this to learn about this linter.

Also, I chose this linter since I wanted to learn lua(the syntax is decipherable to me) and that the original project that this one was forked from seemed abandoned, but I don't understand why people are using that one like flycheck or zerobrane studio.

@alerque
Copy link
Member

alerque commented Nov 8, 2023

The original repository this was forked from is abandoned because the owner passed away a few years back. It was widely in use before that, so any projects still using it probably just haven't updated. You should poke them about that. This is the canonical version.

If you just want to finish your homework you can probably short circuit things and hack it in somewhere variable names are checked for something else. We'll be pretty un-interested in actually merging such a new feature though.

A blacklist on the other hand could serve many practical purposes. Such a configurable blacklist could be configured for your use case but also more widely adopted and we could actually get your new feature merged upstream.

I suggest you look into how the current allowed globals list works and add some sort of blacklist modeled after the inverse logic of the globals which is a whitelist. Then poke around and make sure it runs on all variables, not just ones identified as global.

@ParsaNobahari
Copy link
Author

sorry for keeping this issue and you hanging @alerque

I was waiting to hear my professor's feedback on our conversation before going any further.

His opinion is that first I should learn about the ast parser of this linter and second, how it is doing tree construction(node building). This way, I would have a proper understanding of this project, and I wouldn't have to cobble together a solution.

I have looked at src/luacheck/parser.lua, although I don't have enough understanding of what most of it is doing, so I have to work on that.

I'm gonna leave this issue open for any possible future PR.

@ParsaNobahari
Copy link
Author

ParsaNobahari commented Nov 22, 2023

So one question... how to traverse the AST tree in this linter?
I did some reading about abstract syntax trees and got to the conclusion that in most languages, parsers that give you the AST will also give you the methods to traverse the AST tree. Since linters have parser as a fundamental component for them to operate(and for this project my assumption is that the parser is in ./src/luacheck/parser.lua), I'm curious if there is a cli option to enable something that would print the syntax tree of the given lua code?
Should I just stick to reading parser.lua file to understand the way this linter parses code into a tree or there is a simpler way like astexplorer already available?

Edit: can this project use something like astexplorer or does it even need it at all?

@arichard4
Copy link

Luacheck works by running a series of stages, you can view them and the order in src/luacheck/stages/init.lua. After the parser is run, the bulk of the logic for processing the AST is contained in linearize.lua. So far as I can tell, luacheck doesn't have an option to export the AST, although it's fairly straightforward to pretty print the generated AST. The project can't directly use AST Explorer, since its current output doesn't use its format.

On the original request here:

The straightforward and correct (IMHO) way to handle this would be to set password as being a read-only global, which I would expect to disallow ever creating a variable with that name. HOWEVER, that is not what luacheck actually does. Luacheck doesn't appear to care if a local variable shadows the name of an existing read-only or read-write global. This is unlike local variables, where luacheck does alert that a variable is being shadowed:

local password = "password"
print(password)
local password = "password"
print(password)

test.lua:3:7: variable password was previously defined on line 1

@ParsaNobahari @alerque Do either of you object to this method of allowing a blacklist for local variables? If not, mind if I change this issue's title to describe what I'm talking about here?

@ParsaNobahari If you want to grab this issue, what I've described is actually considerably simpler, and possibly too simple for your purposes, depending on whether you're wanting a use case that will require you to write a new module that itself traverses the AST. (You can see an example of a new module that traverses the AST in #65 and #67; I think that in practice the conclusion there was that it was too complex, and in particular handling each type of AST node and including proper testcases wound up being several thousand lines of code; I would expect that level of additional complexity and size would be very concerning for a straightforward feature like this blacklist request.) Instead, this should be a new error type, possibly (?) emitted by linearize.lua's existing AST traversal for local variable declarations. One question here is how to handle the interaction with the config; currently luacheck caches all possible errors/warnings, and preserves the cache across config changes; configs are used at the end, in filter.lua, to remove some specific errors/warnings. This seems like it would be a really bad interaction, as we'd need to report every local as possibly shadowing a config global, before we can access the config. Unfortunately I don't see another way, without ripping out luacheck's current caching behavior, which maybe we ought to do.

@ParsaNobahari
Copy link
Author

ParsaNobahari commented Dec 3, 2023

first of all, I want to thank you for taking your time and looking into the matter @arichard4
second, I want to apologize for the late reply. I don't want to stall people and my time is not more valuable than others, but this is the last year of my higher education, and I haven't had it busy like this in any other semester.

Luacheck works by running a series of stages, you can view them and the order in src/luacheck/stages/init.lua.

I assume this is what you mean by that:

stages.names = {
"parse",
"unwrap_parens",
"linearize",
"parse_inline_options",
"name_functions",
"resolve_locals",
"detect_bad_whitespace",
"detect_compound_operators",
"detect_cyclomatic_complexity",
"detect_empty_blocks",
"detect_empty_statements",
"detect_globals",
"detect_reversed_fornum_loops",
"detect_unbalanced_assignments",
"detect_uninit_accesses",
"detect_unreachable_code",
"detect_unused_fields",
"detect_unused_locals"
}

Thanks. I didn't notice that.

So far as I can tell, luacheck doesn't have an option to export the AST, although it's fairly straightforward to pretty print the generated AST.

True. This is what I've done for just that(I wrote this in src/luacheck/stages/init.lua):

local function format_any_value(obj, buffer)
    local _type = type(obj)
    if _type == "table" then
        buffer[#buffer + 1] = '{"'
        for key, value in next, obj, nil do
            buffer[#buffer + 1] = tostring(key) .. '":'
            format_any_value(value, buffer)
            buffer[#buffer + 1] = ',"'
        end
        buffer[#buffer] = '}' -- note the overwrite
    elseif _type == "string" then
        buffer[#buffer + 1] = '"' .. obj .. '"'
    elseif _type == "boolean" or _type == "number" then
        buffer[#buffer + 1] = tostring(obj)
    else
        buffer[#buffer + 1] = '"???' .. _type .. '???"'
    end
end

local function format_as_json(obj)
    if obj == nil then return "null" else
        local buffer = {}
        format_any_value(obj, buffer)
        return table.concat(buffer)
    end
end

local function print_as_json(obj)
    print(format_as_json(obj))
end

print_as_json(ast)

The straightforward and correct (IMHO) way to handle this would be to set password as being a read-only global, which I would expect to disallow ever creating a variable with that name. HOWEVER, that is not what luacheck actually does. Luacheck doesn't appear to care if a local variable shadows the name of an existing read-only or read-write global. This is unlike local variables, where luacheck does alert that a variable is being shadowed:
Do either of you object to this method of allowing a blacklist for local variables?

No but I would like to know that where would you be setting password as being a read-only global?

mind if I change this issue's title to describe what I'm talking about here?

Honestly, if it's going to fuel this feature merged into the linter and you're really stoked about it, you can... idk... give this comment a rocket emoji(why not). Just comment the name please. Thank you.
But your implementation though, I'm interested.

I think that in practice the conclusion there was that it was too complex, and in particular handling each type of AST node and including proper testcases wound up being several thousand lines of code; I would expect that level of additional complexity and size would be very concerning for a straightforward feature like this blacklist request.) Instead, this should be a new error type, possibly (?) emitted by linearize.lua's existing AST traversal for local variable declarations. One question here is how to handle the interaction with the config; currently luacheck caches all possible errors/warnings, and preserves the cache across config changes; configs are used at the end, in filter.lua, to remove some specific errors/warnings. This seems like it would be a really bad interaction, as we'd need to report every local as possibly shadowing a config global, before we can access the config. Unfortunately I don't see another way, without ripping out luacheck's current caching behavior, which maybe we ought to do.

Although I myself am in a rush for adding the feature, one way or another, maybe it shouldn't be as complex as thousands of lines. Is it possible to add this to filter.lua so that it would filter in the variable names given by the config? My understanding from your comment is that filter.lua ONLY filters out errors/warnings and does not filter in. Can you explain the issue that would occur? What do you mean by configs?

@ParsaNobahari
Copy link
Author

ParsaNobahari commented Jan 26, 2024

So I finished my project and I hacked something acceptable for my professor.
With that out of the way, I wanted to see if my implementation would be acceptable to merge into this project, and if so I would add a cli argument for file input for variable names.

tl;dr I made a new stage

local vars = {}
local stage = {}
local handle_node
local blacklist_vars = {"pass", "denied"}

stage.warnings = {
    ["33"] = {message_format = "blacklist detected {name}", fields = {"name"}},
}

local function detect_blacklist(chstate, nodes, word)
   local items = {}
   for i in ipairs(vars) do
      for j in ipairs(word) do
         if vars[i] == word[j] then
            table.insert(items, vars[i])
            print(vars[i])
            local num_nodes = #nodes

            for index = 1, num_nodes do
               local node = nodes[index]

               if type(node) == "table" then
                  chstate:warn_range("33", node, { name = word[j] })
                  break
               end
            end
         end
      end
   end
   return items
end

local function check_nodes(chstate, nodes)
   for _, node in ipairs(nodes) do
      if type(node) == "table" then
         handle_node(chstate, node)
      end
   end
end

function handle_node(chstate, node)
   if node.tag == "Id" then
      if #vars == 0 then
         table.insert(vars, node[1])
         print("new entity", node[1])
      else
         local bool = false
         for i in ipairs(vars) do
            if vars[i] == node[1] then
               bool = true
            end
         end
         if bool == false then
            table.insert(vars, node[1])
            print(node[1])
         end
      end
   else
      check_nodes(chstate, node)
   end
end

function stage.run(chstate)
   check_nodes(chstate, chstate.ast)
   detect_blacklist(chstate, chstate.ast, blacklist_vars)
end

return stage

stage.warnings = {
    ["33"] = {message_format = "blacklist detected {name}", fields = {"name"}},
}

local function detect_blacklist(chstate, nodes, word)
   local items = {}
   for i in ipairs(vars) do
      for j in ipairs(word) do
         if vars[i] == word[j] then
            table.insert(items, vars[i])
            print(vars[i])
            local num_nodes = #nodes

            for index = 1, num_nodes do
               local node = nodes[index]

               if type(node) == "table" then
                  chstate:warn_range("33", node, { name = word[j] })
                  break
               end
            end
         end
      end
   end
   return items
end

local function check_nodes(chstate, nodes)
   for _, node in ipairs(nodes) do
      if type(node) == "table" then
         handle_node(chstate, node)
      end
   end
end

function handle_node(chstate, node)
   if node.tag == "Id" then
      if #vars == 0 then
         table.insert(vars, node[1])
         print("new entity", node[1])
      else
         local bool = false
         for i in ipairs(vars) do
            if vars[i] == node[1] then
               bool = true
            end
         end
         if bool == false then
            table.insert(vars, node[1])
            print(node[1])
         end
      end
   else
      check_nodes(chstate, node)
   end
end

function stage.run(chstate)
   check_nodes(chstate, chstate.ast)
   detect_blacklist(chstate, chstate.ast, blacklist_vars)
end

return stage

note:

  • right now, it is reading the blacklisted node names from blacklist_vars and it can be replaced by a cli argument.
  • it also detects library keywords like string and io.
  • it's printing the nodes that it finds for now for debugging purpose.
  • other things that I found out that get detect are: pairs/ipairs, table, print

Although for most of these other false positives that would get detected, I'm assuming there can be whitelisted when the code is looking for the actual blacklisted words inside of the stage if the blacklist is only going to be for variable names.

If you guys think that this is something of worth, I would take the time for working further on it and maintaining after it has been added.

@ParsaNobahari ParsaNobahari changed the title [Question]: how to add a warning for "password"? [Question]: adding warning for blacklisted variable keyword Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants