Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question: commented unreachable code #343

Open
mqnc opened this issue Feb 10, 2024 · 10 comments
Open

question: commented unreachable code #343

mqnc opened this issue Feb 10, 2024 · 10 comments
Assignees

Comments

@mqnc
Copy link

mqnc commented Feb 10, 2024

The parser code contains lots of these comments:

    // This code is unreachable.
    // default:
    //     throw invalidToken()

It looks like there was this code before but it was taken out by some sort of reachability analysis. Is there some tool for this which proves the unreachability of certain code parts or did you sit down with pen and paper and think really hard? 😉

@rlidwka
Copy link
Contributor

rlidwka commented Feb 10, 2024

my guess: code coverage + think really hard

@jordanbtucker
Copy link
Member

You two are cracking me up. You're close.

It was code coverage + a robust test suite + thinking really hard.

@mqnc
Copy link
Author

mqnc commented Feb 11, 2024

If I find a commented piece of reachable code, will you transfer me some karma points?

@jordanbtucker
Copy link
Member

jordanbtucker commented Feb 11, 2024

Sure thing. Your name and contribution will be immortalized in the CHANGELOG.

@mqnc
Copy link
Author

mqnc commented Feb 14, 2024

image

Edit: Ah, I assume there's just a return missing after the pop()s. Well, they aren't necessary as the reachable unreachable code is commented. But technically I still win and at least I want my karma points.

@mqnc
Copy link
Author

mqnc commented Feb 14, 2024

However, these two replacements in stringify are actually "unreachable" but still in the code so that evens things out 😉

image

(however, I'd actually leave them in for consistency with the replacements map in formatChar or maybe just comment them)

@mqnc
Copy link
Author

mqnc commented Feb 14, 2024

Doing some fuzzing, I was able to reach more unreachable code.

fuzz.html
json5.mjs

I've uncommented the unreachable comments and placed a function call unreached() in them in which I put a breakpoint. I commented the ones I actually reached (search for // A, // B and // C) so that fuzzing could continue. I've also hardcoded the strings that reached those points in fuzz.html.

@jordanbtucker
Copy link
Member

jordanbtucker commented Feb 15, 2024

Thanks for looking into this.

Regarding replacements in stringify, although the quote keys are never accessed in the loop, they are accessed in the code that wraps the key in quotes. So those keys are ultimately reachable, specifically when the quote option is provided to stringify.

Try commenting out those lines and then running the following code:

console.log(JSON5.stringify({a: '"'}, {quote: '"'})
// {a:"undefined"}
// should be {a:"\""}
// undefined is inserted because `replacements` does not have a key for `"`

Regarding the unreachable code in parse, your fuzzing did find some reachable code that was incorrectly marked as unreachable, specifically where invalidToken would have been called. However, after further inspection, the issue lies with the comment rather than the code.

The parseState functions are the only places where invalidToken would have been called. In each of those cases, the call to invalidToken was removed, not necessarily because it was unreachable, but because it was obsolete and would be handled in an appropriate lexState function instead. So the comment should read "This code is obsolete since it's handled by the lexState." in some cases.

In every case where the invalidToken function would have been called, an appropriate invalidEOF or invalidCharacter function is called in a lexState function either before it or on the next loop instead. This removed the need for an invalidToken function altogether. However, the use of the word "unreachable" in some places was a mistake on my part.

If you find any reachable code marked as unreachable in any other case besides invalidToken, please let me know.

@mqnc
Copy link
Author

mqnc commented Feb 20, 2024

It seems like every time I feel like I finally got you cornered, you manage to weasel your way out of the situation! The day will come where I find a bug in your code!
I haven't found the time to verify your argumentation but I'm happy to report that I have finally ported the main parts to C++, you can check it out here:

https://github.com/mqnc/json5cpp Happy about remarks and feedback!

I stuck to the reference implementation as closely as possible. Stringify does not have replacing/filtering yet and a reviver makes no sense in C++, it needs a different mechanism to convert parsed json to custom objects. I haven't done that yet. Currently I am trying to convince github to do a coverage report for me, I have never used the github CI before.

As I have now dug through all the js reference code, my verdict is that it's very well written! I appreciate its brevity, especially in comparison to other projects.
Except for this line, it's unnecessarily confusing and initializing the quote count with 0.1 and 0.2 also rubs me the wrong way :P
But other than that, great work!

@jordanbtucker
Copy link
Member

jordanbtucker commented Feb 21, 2024

Congrats on implementing a C++ parser. When you're ready, feel free to add it to In the Wild. I'll take a closer look at your code when I get a chance.

I agree that line could have been written in a less confusing way. When I first wrote the codebase, I wasn't really thinking about it being a reference parser. I was just trying to make something that works. I really should go back and add comments to make things clearer.

That line chooses which quote character to wrap a string with based on the quote character that is used the least in the string. For example, the string Hi, I'm Jordan contains one ' character and no " characters, so the code will choose to wrap the string with " characters. (The quotes object equals {"'": 1.1, '"': 0.2}, and 0.2 < 1.1.) However, if a string contains the same number of each quote character, like Jordan said, "I'm happy you're here.", the code will choose ', which is the default. (The quotes object equals {"'": 2.1, '"': 2.2}, and 2.1 < 2.2.) That's why I used 0.1 and 0.2 as starting values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants