Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve client message parsing #699

Closed
wants to merge 5 commits into from
Closed

improve client message parsing #699

wants to merge 5 commits into from

Conversation

Bonuspunkt
Copy link
Contributor

replaced regexps with parser and processing

fixes #654

@xPaw
Copy link
Member

xPaw commented Oct 17, 2016

Nice! This needs heavy testing. Did you take a look at the previous PR trying to fix this to see why it wasn't merged?

@astorije
Copy link
Member

This needs heavy testing.

@xPaw, what do you say we come up with a way to write tests for these? Every time we improve the message parsing, we fix 5 things and break 1. I remember @maxpoulin64 had a list of strings to test against once, we should be able to make this part of our mild test suite.
I can look into that when I have a minute but I wouldn't be against some inputs or opinions if you have any.

@xPaw
Copy link
Member

xPaw commented Oct 17, 2016

We do need a test suite for it. Since it's exposed as a separate module, trivial to test.

@Bonuspunkt
Copy link
Contributor Author

just realized, Object.assign is a ES2015 Feature - are you supporting IE11-?
Polyfill or rewrite?

@Bonuspunkt
Copy link
Contributor Author

fixed, should also fix #15 and #199

});
}
} else if ((text[position] === "#" || text[position === "&"]) &&
// ^-- this is basically wrong, because we need the 005 CHANTYPES response from the connect
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The front-end is actually aware of this, as I passed it into client when we moved to irc-fw. The network dom object has it in data.

@Bonuspunkt
Copy link
Contributor Author

i would call it done.
imho there is no sensible way to pass RPL_ISUPPORT.CHANTYPES to the parse function

Handlebars.registerHelper(
"parse", function(text) {
function createFragment(fragment) {
var className = "";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could make this an array and then call .join on it, this would fix the leading space.

@xPaw
Copy link
Member

xPaw commented Oct 21, 2016

Testing this, I see a bug in how it parses channels.

[color code]#123[reset color]: is being parsed as a channel #123:, but : should not be part of it.

EDIT: Another bug with the parser. #&">bug turns into #&">bug&">bug

@Bonuspunkt
Copy link
Contributor Author

#123: is a valid channel name, you can join me on freenode :)

@xPaw
Copy link
Member

xPaw commented Oct 21, 2016

@Bonuspunkt I understand it's valid, but it's ignoring the [reset color] char code to stop it from continuing channel parser (\x0f is not a valid channel character)

@Bonuspunkt
Copy link
Contributor Author

the behavior is the same as mirc

i think i see the problem here, should

#[color]the[color]lounge

be #the or #thelounge ?

@xPaw
Copy link
Member

xPaw commented Oct 21, 2016

Pretty sure that should not be parsed as a channel at all. I wouldn't really say mIRC is the ideal client to base our parser on.

@Bonuspunkt
Copy link
Contributor Author

Bonuspunkt commented Oct 21, 2016

same behavior as mIRC in HexChat
AdiIRC seems to strip : but ignores styling chars (#[color]test[color]:! joins #test! )

what is your reference?
also good time to start a discussion if you want to support strip color/styling and how it should behave

@xPaw
Copy link
Member

xPaw commented Oct 30, 2016

Okay for now it might be fine to parse channels with colours, what I'm worried about are the parser bugs producing invalid markup like I showed above.

@Bonuspunkt
Copy link
Contributor Author

oh sorry missed that.
the position was not correctly updated, so #&x would find #&x and next round start at & and find &x
which resulted than in #&x&x

also fixed

@xPaw
Copy link
Member

xPaw commented Nov 6, 2016

Mind looking at (http://example.com) getting ) in the link?

Perhaps it's also possible to optimize the generated code if there are same colours? E.g:
test \x0312#\x0312\x0312\"te\x0312st\x0312\x0312\x0312\x0312\x0312\x0312\x0312\x0312\x0312\x0312\x0312a" generates: <span class="text">test <span class="irc-fg12"></span><span class="inline-channel" role="button" tabindex="0" data-chan="#&quot;testa"><span class="irc-fg12">#</span><span class="irc-fg12">"te</span><span class="irc-fg12">st</span><span class="irc-fg12">a</span></span></span>

If not possible or easy to optimize, it should not at least generate empty tags like <span class="irc-fg12"></span>

@@ -74,6 +74,14 @@ describe("parse Handlebars helper", () => {
"https://theos.kyriasis.com/~kyrias/stats/archlinux.html" +
"</a>" +
"&gt;"
}, {
input: "(http://example.com)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then add a test for links that do have balanced () in them. Like http://example.com/Test_(Page).

@@ -102,6 +102,13 @@ function analyseText(text) {
// NOTE: channel prefixes should be RPL_ISUPPORT.CHANTYPES
var channelPrefixes = ["#", "&"];
var punctuations = ["'", "\"", ".", ",", "!", "?", "¿", "<", ">", "(", ")", "{", "}"];
var commonProtocols = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple more: mailto, steam, mumble, ts3server, ssh

Copy link
Contributor Author

@Bonuspunkt Bonuspunkt Nov 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mailto won't match - url detection is based on ://
never have seen mumble, ts3server urls, are they using ://?

@Bonuspunkt
Copy link
Contributor Author

i'm thinking about replacing the url detection with https://www.npmjs.com/package/autolinker

@MaxLeiter
Copy link
Member

bump? would love to see this merged

);

function uri(text) {
return window.URI.withinString(text, function(url) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you are getting rid of URI.withinString which is, I think, our only use of URI.js (in our repo, official repo). Any specific reason for this or just killing a dependency?

This seems to be a very active project, and they've just released a version that fixes one of the bugs I noticed a long time ago.
The latest release packs quite a lot of changes since the version we use, see medialize/URI.js@v1.14.1...v1.18.4.

Do you want to try running your tests against our version of URI.js and the latest one released?

I know we like to kill dependencies when they are tiny, but I think at that point I'd rather not roll our own URL parser.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i need start & end position of the url to correctly merge it with the styling information.
i'll look at it.

@Bonuspunkt
Copy link
Contributor Author

tests are against latest version of URI.js
tried to use the file, but crashed with could not resolve 'punycode'

@xPaw
Copy link
Member

xPaw commented Dec 10, 2016

Really solid job so far! However I am getting an error at the current version when sending a link:

Uncaught TypeError: Cannot read property 'length' of undefined
    at Function.URI.withinString (uri.js:819)
    at findLinks (parse.js:112)
    at fullProcess (parse.js:228)
    at Object.<anonymous> (parse.js:282)
    at Object.main (lounge.templates.js:506)
    at main (handlebars.js:1092)
    at Object.ret [as actions/topic] (handlebars.js:1095)
    at render (lounge.js:56)
    at Object.<anonymous> (lounge.js:71)
    at Object.main (lounge.templates.js:170)

@Bonuspunkt
Copy link
Contributor Author

would have preferred replacing client/js/libs/uri.js with a symlink to node_modules/urijs/src/URI.js but windows
"fix" till #640 lands

@xPaw
Copy link
Member

xPaw commented Dec 16, 2016

I think we will focus on this PR for 2.3.0 after 2.2.0 is shipped. cc @astorije

@xPaw xPaw modified the milestones: 2.2.0, 2.3.0 Dec 16, 2016
@xPaw xPaw removed this from the 2.2.0 milestone Dec 16, 2016
@xPaw
Copy link
Member

xPaw commented Dec 31, 2016

I'm not a particular fan of moving parsing stuff into a separate repository we have no access to.

@Bonuspunkt
Copy link
Contributor Author

i have no problem with transfer of repo & npm
hit me on irc

@xPaw
Copy link
Member

xPaw commented Jan 28, 2017

@astorije What do you think? Should the code kept in the repo, or have this as a separate project under thelounge?

@MaxLeiter
Copy link
Member

I dont think it makes much sense to move it out of Lounge unless we remove Lounge specific code (like in the merge function)

@kode54
Copy link
Contributor

kode54 commented Jan 29, 2017

This parser mis-parses the following URL:

https://msdn.microsoft.com/en-us/library/windows/desktop/ms644989(v=vs.85).aspx

  1. It drops the period before the aspx from the actual link.
  2. It returns the clickable text range starting from the first character in the line, and ending after the correct number of characters, or possibly correct excluding the above mentioned period.

@Bonuspunkt
Copy link
Contributor Author

medialize/URI.js#325

@astorije
Copy link
Member

@Bonuspunkt, urijs has been bumped in master. Mind rebasing this PR and see what happens? Do you think this PR is in a reviewable/mergeable state?

@Bonuspunkt
Copy link
Contributor Author

rebased to master, it is reviewable.

@astorije
Copy link
Member

@astorije What do you think? Should the code kept in the repo, or have this as a separate project under thelounge?

I see @Bonuspunkt did a good job with setting this up: nice README, semver-releases, good test coverage. For these reasons, I'm happy to keep it in a separate repo at least at the moment.
However, I do agree with @xPaw that we would prefer to keep it "in-house". @Bonuspunkt, would you be OK to transfer it under the organization and still be its maintainer? Just like @williamboman is maintaining docker-lounge, etc.
Also, do you mind using the same license than https://github.com/thelounge/lounge, i.e. MIT license?

Copy link
Member

@astorije astorije left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's already better than what we have on master (but still a few comments inline).
Also, 👏 👏 on the rigorous test suite!

Any idea how to fix CI?

"irc", "ircs",
"svn", "git",
"steam", "mumble", "ts3server", "ssh",
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it quickly become an issue to whitelist schemes here? I feel like this is a well-known problem that should be handled in one of the libraries we use instead of in our code. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, this is the improvement for #730 also list was defined by @xPaw

"svn", "git",
"steam", "mumble", "ts3server", "ssh",
];
const incorrectDetections = ["www"];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we don't want to parse www.google.com as a URL? Why is that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's the fix for matching www. (also covered in the test line126+)

edit: medialize/URI.js#327

@Bonuspunkt
Copy link
Contributor Author

Bonuspunkt commented Feb 13, 2017

would you be OK to transfer it under the organization and still be its maintainer?

yes

license

currently it's ISC, MIT seems pretty much the same, so i'm fine with it.

fix CI

node v4 has not enough ES6 support ( http://node.green/ )
could add "use strict" on top of parse.js but ircmessageparser has parameter defaults. that would need transpiling

@astorije
Copy link
Member

yes
currently it's ISC, MIT seems pretty much the same, so i'm fine with it.

Cool!

could add "use strict" on top of parse.js but ircmessageparser has parameter defaults. that would need transpiling

Could you get rid of the default params instead? It's a small library and transpiling only for this is a bit of a shame IMO :)
I do think it's important we keep Node v4 support in The Lounge as it's sill a LTS for quite a while.

@Bonuspunkt
Copy link
Contributor Author

wait, the code is run by node out of convenience. when the lounge is running, this code is only executed by the webclient.

also it's not worth removing syntactic sugar as i have to readd it.

@xPaw
Copy link
Member

xPaw commented Mar 12, 2017

So yeah, I think keeping all the code in the client would be better for us. As for the polyfill, you've introduced a lot of unnecessary code that we will have to maintain.

Overall, the cleaner solution is just to keep the code here IMO. If you can't be bothered to revert to how it was, we could handle it ourselves (create a new branch in lounge), if you allow it.

I really do want to get this into 2.3 release.

@Bonuspunkt
Copy link
Contributor Author

Bonuspunkt commented Mar 12, 2017

not sure where in time the "how it was" was, but sure, go ahead. if you need help ping me

edit: very old might be in this branch https://github.com/Bonuspunkt/lounge/tree/messageParserOld

@xPaw
Copy link
Member

xPaw commented Mar 18, 2017

By "how it was" I mean when the parser was contained within the lounge code, and not a separate project.

@Bonuspunkt
Copy link
Contributor Author

@xPaw xPaw mentioned this pull request Mar 18, 2017
@xPaw
Copy link
Member

xPaw commented Mar 18, 2017

Closing in favour of #972. I hope to get that PR merged ASAP. @Bonuspunkt, I've used your latest code in the separate package, not messageParserOld branch.

@xPaw xPaw closed this Mar 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple identical formatting commands in the same line ignored?
5 participants