Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freezing issue with coffeescript regex under some circumstances #66

Closed
joshgoebel opened this issue Jan 31, 2020 · 20 comments · Fixed by #67
Closed

Freezing issue with coffeescript regex under some circumstances #66

joshgoebel opened this issue Jan 31, 2020 · 20 comments · Fixed by #67
Labels
bug upstream An issue caused by highlight.js' logic or source files

Comments

@joshgoebel
Copy link

joshgoebel commented Jan 31, 2020

highlightjs/highlight.js#2375

You may want to follow this, not sure how important you consider this type of issue. I think it's pretty bad and plan to release 9.18.1 as soon as we have an approved fix.

@allejo allejo added bug upstream An issue caused by highlight.js' logic or source files labels Jan 31, 2020
@allejo
Copy link
Collaborator

allejo commented Jan 31, 2020

I appreciate the heads up! If this causes to highlight.php to freeze/crash too then it's important for us to fix it as well 👍

@joshgoebel
Copy link
Author

If you wanted to test that'd be great... I'd think you'd be affected also unless the PHP regex engine is much smarter or could self-terminate or something.

@allejo
Copy link
Collaborator

allejo commented Jan 31, 2020

It looks like highlight.php on PHP 7.1 does not crash/hang with the given test file in that issue. However, it also does not match the expected output in your PR.

Still a bug nonetheless but doesn't seem to affect us quite as bad (at least not in my very minimal testing).

@joshgoebel
Copy link
Author

Oh? I'd be curious what your output is... and very interesting it doesn't freeze...

@allejo
Copy link
Collaborator

allejo commented Jan 31, 2020

Example PHP code:

<?php

use Highlight\Highlighter;

require './vendor/autoload.php';

$s = <<<HLJS
            TEST@TEST(1):online> TEST TEST
            version=3 meow=BLAH+BLAH\\/BLAH+BLAH= data=BLAH+BLAH\\/BLAH\\/BLAH+BLAH\\/BLAH\\/BLAH+BLAH\\/BLAH+BLAH+BLAH+BLAH+BLAH\\/BLAH+BLAH+BLAH+BLAH+BLAH+BLAH+BLAH+BLAH\\/BLAH\\/BLAH\\/BLAH+BLAH+BLAH\\/BLAH\\/QfwwjxZ+i3\\/I0\\/Ku4TtlywzkUCgjcKM8WDHbOlj0dMu8xQTEoucssL+cNi5migI6wdlvEzKlGsBcwT3QXAz4qdZ5n\\/aJOxBSJ5XnIGfgeZr\\/AJKEm7q\\/wfU5hWp3bkXNaba1PMln76N2BWP9vb2OJYNT2wMsxhm\\/RAr22gj\\/crvDlg95T+MxcIsjz0mt2wXUr\\/coqiNWmJ2jpmLwhF0HuLk9oHsc0tLh0JuaqGETFegnBAXdV3nrlOVZTSf3dz7Eotshzn8JbOXSKk12CITzFONf3BxDPyvBiEPIjFaIHGXsPKMGR6XqF4A3SGsbpzVsLRGy5Lb2OkumuBQMNArAZPhPkkAGBH9ZOkmBPJpFcBo4tRhuwgY9saa0VqInvjXE1Hyhffx8U5xwu28hztiebpuA3EyeJV+CPfpfEP3I1sQrtblSo9\\/cZybBcVpaTO+Gbflksf1MW5RindyQwAJcygINFnAXwcOPfh8Y8ea8JzlVGrg2a6fPRTwESdM7E4mC56JaftlY8G5ia3v15RiZKNuQWXifbajeUh3NNCFcudprSnIIF2Edg\\/PAM\\/qUINyVmT4w5I9dbZQTKljpWEcmkNHzhtsDWArYHeigIUp7sU1gcyHfQgPzsJUW\\/hsPTAOuYLo27g1EnobxqgGyVAKLn2LpTWsr9gA7Q0ecw8BNkX39DynSCOOWAWUMwdXd3aI9Gim23f2LcgKhamTDu4F8o8JJQtkpAW6cloK2vxjUUfMchHUO9ggWM6vB4MofHYeM0+2HLN3ybCcD9blRxA\\/GgggRnvX63IXASxF642CSPW7WsOUKhXmTO5LXkUwSS+yxyhHeKoidJiJ0S8qSORmW\\/o1EZDiBg64cZES+dRB2iQrPEnCkELgWq9WADgygw+iVXSoMdehwDGWQmsZqI6EpWuQp7sqQczpSKNOqgmZ0I7ZRDi032XHgXv8FzUh4qLu9KQa8pAd84Egg0sOZxPhyZCpAz2joNl5SMi8NfcH73Fv5dkuDYwR1wy4YixHW2Tzdikx+AUqtI1GfR21vLjzxeq+XSUVlCcR068XTDkZSiFpgDiwiyoXNLGeIUK3P5\\/1aiAUxM\\/wYBvbHAOsDapWM1pQnw8ieElyiUfEPPFn9z1zwnEwQ0FAygFmRAYvVe2LPM9RA0SYtyiV8+CGAUxYJFGEthpjMTHoE3ni6Zt3kfqoUNibvzVKHScNJjRGPM1jizE0\\/nI2bqbsufL9gP1VlCcMf6u0akKKala2+YvO0KUYZyo44VvohfJH9jxRTzg7nrrGdyp0vmcCKyAqVP4sv\\/\\/6DwMERIpKjkDjNJ\\/WegPi9tUo7lHPxXUbitqD03TfLekjvtKBxWQ0Inc6ftbHHmrDpL45oDjk9Wl8NMwLCmNZTnKoEGin8WiS4PnJ3ukRuLQ7jJ6gXn+tGMI6GWSZPQreIJxLSXvi1Yr+SO\\/lUN8eUBx9\\/PIf45xHeZ8\\/ENUpIbCJaRI4mKLQJ\\/hsKwPdjG5KpnhkCfMnRMQ\\/jIOEPQXVQQ8BCpiphLkgUP6T4hXtScADnqRM23VG8YM3gphAhwOpDaqIE7Ait1Hrg2EJSnQtHA2W93n6yD\\/Ovz3+xDgVyc\\/uuZaMNpgWsAZOq2yhur1yyg7c7Fra1XBEhVIAJp4tWnOWt5Or7h96C52iN5MnmoMHxWZttrtyHGCUjAg0qhVrOyBkNZn5x\\/6nPTiqKGsBNspNxHh5ULeXr00gLwHi3kEa9hUQmPDN1IBcGz6XcBSRrWcK5puw7ugsqEPsKmlNfQczRt88R8366QuuukO7CEGOQt4gWLDhHvJJmVDPKmQNJa5yRCc7wU0XmbK6CDUJo+zdWP0AeF\\/M+EbD+S4Zn4gSu359jVdQ2kGxi7U1s26q9kMsPiRS4HB0w3lKtu4nh7PDtpqoTaaeMBFXjAR+eMMCTTpTZ3W6iPsz7n3e\\/edbDf8HgVGX0\\/lkJGz1pf+zylw4fh9R6B3bCbeNNf+ILO757GPePTsO+SjL2uKWa09pxv\\/VUwWq5X\\/YT8LnaMqvXrGQrRVYC+8r3SGEPH31FC3cI5LT5QJxD+3IW0XNZBumaKAdOSw\\/7HH3gmuXrkZLNKtYhhlM1hlzB4BU7FNTjfFOFkhlJWHJSvegzzXRxCAcwzdrpSLcPwfJDPrRwV520HBzZwyTuDDG01RS9tTsp5yGTtUuzlcZCZJeM8TRY5ag9Fc4oYL2Cxjr0S2knIPCJ3GJD2wvgv4zkJKvVQwUiQGgXTQ9fT6v3Eg3941dl64hnVF0aN2mqUuP1QTnykHKHVQgDkrpq8fSoppp9wXZ9IpVglQAIqY+a6doHrTxJQuaLAmiZGYnpQHHSDVQXglLvixwqbunUHIa0sVeljCf4OQJQX4gwPvvjfiG+wqB8qBy05vr\\/zm4diki+kkHk\\/9tLwjI81HbYLrKQMMYGPuvWUo1iWxl5\\/2Thswjz5Z2EvaGWIF6db5T5\\/oWdnEP2PAAHLQG3jQCScgeYJ1sz1z\\/Oi26DpShz10HB88Y57\\/tRpE8pJ7\\/BCpJn2x1X8ISgqzvxLgVpzAuIa0HYL\\/uGkCCk1vK2dS8Bud3F2HlvpsNRdRt1Kxzgnj5nMBPW1KN1drMTSl4Ob0FKLqUXJQRHiT+24ETUJjNQV4Ez+BjEWVgwpyHQ5+kyrrtyhS282ArzL1ppCuxj+5caSY6jdHbwEEKzCYvJj9t45hApImUrGFoA58o2\\/LBNVsZl5IJiJ6izCsFuf5aZ\\/Slw42Xa01RwchGYQUarj7JBEFTpFlRTzGX+uSeMfn4OmHKhPVAyfu26BdBCIhDHw1xGx5pkCfJIMjKKhOvOQyZndZpDy0B5JFngcIYcuQm+3iBW8fjHyzuo1qrBCd6kZJ0+1afwMQkkdztX2jauJnIYtrQIxKzrlBSoc464DIxe4G8aMSJLB4gWAjqC8yBQ38\\/RYnIVIGC8SvqKrTYiZF78iWli3VxP5bTGCXZLMogdadfb41RC426viAIPRZ1W\\/LNLTX1JjTX0gqyXDsUl7pLu3hA02Toddq\\/lnLC\\/yPC0ghyaGYTJTnZS+RO49LY30p0tN1y5UJ\\/4ORgA59gLtliSAqbZRvNLwbyLcdDDVL0e13NXkm4Qkd5Yd\\/e+Xto9VDdIbhWh7XTMyTHecHkDm7aChHxdsAuT5Sx+o6pUZc+oPWMsm9Aruv6KzNvYjW33H2jWY3iOe9fbX5zCWEPupYhVhsq8ipFxA35DsT6Cc6IsEr\\/nZS2aBV0ltZdEKVU0x+vLnTAjqla2Qb4qG7KkoZ8pOnlFMdcooU7AYRhXE+f+2QC7kwXLEJ6A1AOxSm9rpoCtScmLTokSS1CSDIRgwOteeBAe0wG1oNt9Sbzz7giagCt+7sdhbvQQKp9WYFOQAJSSUoC6bWgwnvA1\\/ewJu3XkmhKtRa2oop12QQuxC4lkYv9G18mo0JQHmRcLeUFLoDZ+6c6yMOhjPaK63LbHCUBAm05jk\\/IMRTIp5am1sAaGHakYqsbQxxMO9tvzAE3gEChcbDWS3cyTQPpEJiJvlEDPexUwueFQMRaL0rsFdEiISR24qGMzr2ej9gKPnCJp6Vp92NJh5UQmka2hJeSK\\/qAl56my567sGWsX4gpd0VshqUnjAXPqBwO\\/pttFk60Spe6HO6QMnPTOERSVf43ahbrvX0fYc3QyE\\/Z27Q16UyekV8oAbvPgkKDrQsg5yjjqMwL64szX7lRUcTgObmloiKP3zPQwQFMS9NFN+VrL49VkMBM5baVRH5L4tlBuAp9yqcWD56T8GRkKOmZOjRxeRCYGpIBL4WbUVsUsIM6NvDI1TneqRUJkNJauf0gudxiDjlwuuz\\/S94wxWNHqUT6sJN+6Xw5uyyarMFZuzcRt92Wnrlc4hfIBvww+X+GYE47i3sLDXPlG6uQtYvks2XI+x806hw\\/v+0zK40FvkRxNvwMSBr0tB5gtp6lSvYkv\\/Q2NTiwRK61nsD6XphaBgDBSwnoNA1AZmHkL69inZ7mLz012wPx+X7eSx1L7d20XErQJqdHPTDywk5TkaOeMN2IyqmdozxFB6JHusVgtV2ghX2Akz5ZcCMwO5urDspFXdVFp9BYr44DcT\\/FbjVH0ItbznaUE5cVqgUHkoQYasWltSA5\\/HsKwfgyZGRuYMHz3fLr\\/YPXfh4fLzxAeEYlhGKIE0MW0sBnZ1oT16XSiDyTOVDF31TmrmSZnZyREIGbIMSL+Ji7JAuuLClBaregWMvYfszTjkDfs82HKOd\\/Y2VPC0A6ikrtKMFe1ZMjDXqFPsUthWgpMKdwZnb47FUddoflWbBk7\\/U8qrAp5y4UGpGeiYpGFI7dulFwRR\\/7nxk5v4dfzXSREhcTtJWS9Vkeg4hhIqodPf33QBP+N4Tbl994lHy8lYLJDmNZxoz4aUs9gdg9dulYskJR7Ml3T\\/C98et4O0VQ9Yb7BJaysl+vjC4etfcLpGH1JDUQkq\\/S5r25Dg71sOnjgz9IcDbvbJ90icx8OVLFZ1Lglby0wWfz+8MRfqDhCjD\\/C5h86Z0Jz4AI6cbv5T6cxh0mTFnsbli22UtTn1z8gBmoyAhyzc1HoiF\\/rtwtUJF01zgNVRzU3dFTTR7MFa\\/ios6zihg2Xjt0ev1riyiNGOro1Kl3I1+CiLiLBSJhE2gS6wX8f86VLzAw\\/XTV+1Z3Qan6Mwdhp+ZVBBkcIVrQU\\/U6fNmIWitevGROkHHKY8MhrUG1AqLwsZir7acWb0HbjMSiVpROUUw9754BtB53GH17X739xbKGzyMvh6cJIrWPvvMQKNyL2RVHm5XhwPIBgTSX059NQ9PQD+Ps91NIR0V+Bbs=
HLJS;

$hl = new Highlighter();
$r = $hl->highlight('coffeescript', $s);

print $r->value;

Output:

            TEST@TEST(<span class="hljs-number">1</span>):online&gt; TEST TEST
            version=<span class="hljs-number">3</span> meow=BLAH+BLAH\/BLAH+BLAH= data=BLAH+BLAH\/BLAH\/BLAH+BLAH\/BLAH\/BLAH+BLAH\/BLAH+BLAH+BLAH+BLAH+BLAH\/BLAH+BLAH+BLAH+BLAH+BLAH+BLAH+BLAH+BLAH\/BLAH\/BLAH\/BLAH+BLAH+BLAH\/BLAH\/QfwwjxZ+i3\/I0\/Ku4TtlywzkUCgjcKM8WDHbOlj0dMu8xQTEoucssL+cNi5migI6wdlvEzKlGsBcwT3QXAz4qdZ5n\/aJOxBSJ5XnIGfgeZr\/AJKEm7q\/wfU5hWp3bkXNaba1PMln76N2BWP9vb2OJYNT2wMsxhm\/RAr22gj\/crvDlg95T+MxcIsjz0mt2wXUr\/coqiNWmJ2jpmLwhF0HuLk9oHsc0tLh0JuaqGETFegnBAXdV3nrlOVZTSf3dz7Eotshzn8JbOXSKk12CITzFONf3BxDPyvBiEPIjFaIHGXsPKMGR6XqF4A3SGsbpzVsLRGy5Lb2OkumuBQMNArAZPhPkkAGBH9ZOkmBPJpFcBo4tRhuwgY9saa0VqInvjXE1Hyhffx8U5xwu28hztiebpuA3EyeJV+CPfpfEP3I1sQrtblSo9\/cZybBcVpaTO+Gbflksf1MW5RindyQwAJcygINFnAXwcOPfh8Y8ea8JzlVGrg2a6fPRTwESdM7E4mC56JaftlY8G5ia3v15RiZKNuQWXifbajeUh3NNCFcudprSnIIF2Edg\/PAM\/qUINyVmT4w5I9dbZQTKljpWEcmkNHzhtsDWArYHeigIUp7sU1gcyHfQgPzsJUW\/hsPTAOuYLo27g1EnobxqgGyVAKLn2LpTWsr9gA7Q0ecw8BNkX39DynSCOOWAWUMwdXd3aI9Gim23f2LcgKhamTDu4F8o8JJQtkpAW6cloK2vxjUUfMchHUO9ggWM6vB4MofHYeM0+2HLN3ybCcD9blRxA\/GgggRnvX63IXASxF642CSPW7WsOUKhXmTO5LXkUwSS+yxyhHeKoidJiJ0S8qSORmW\/o1EZDiBg64cZES+dRB2iQrPEnCkELgWq9WADgygw+iVXSoMdehwDGWQmsZqI6EpWuQp7sqQczpSKNOqgmZ0I7ZRDi032XHgXv8FzUh4qLu9KQa8pAd84Egg0sOZxPhyZCpAz2joNl5SMi8NfcH73Fv5dkuDYwR1wy4YixHW2Tzdikx+AUqtI1GfR21vLjzxeq+XSUVlCcR068XTDkZSiFpgDiwiyoXNLGeIUK3P5\/1aiAUxM\/wYBvbHAOsDapWM1pQnw8ieElyiUfEPPFn9z1zwnEwQ0FAygFmRAYvVe2LPM9RA0SYtyiV8+CGAUxYJFGEthpjMTHoE3ni6Zt3kfqoUNibvzVKHScNJjRGPM1jizE0\/nI2bqbsufL9gP1VlCcMf6u0akKKala2+YvO0KUYZyo44VvohfJH9jxRTzg7nrrGdyp0vmcCKyAqVP4sv\/\/6DwMERIpKjkDjNJ\/WegPi9tUo7lHPxXUbitqD03TfLekjvtKBxWQ0Inc6ftbHHmrDpL45oDjk9Wl8NMwLCmNZTnKoEGin8WiS4PnJ3ukRuLQ7jJ6gXn+tGMI6GWSZPQreIJxLSXvi1Yr+SO\/lUN8eUBx9\/PIf45xHeZ8\/ENUpIbCJaRI4mKLQJ\/hsKwPdjG5KpnhkCfMnRMQ\/jIOEPQXVQQ8BCpiphLkgUP6T4hXtScADnqRM23VG8YM3gphAhwOpDaqIE7Ait1Hrg2EJSnQtHA2W93n6yD\/Ovz3+xDgVyc\/uuZaMNpgWsAZOq2yhur1yyg7c7Fra1XBEhVIAJp4tWnOWt5Or7h96C52iN5MnmoMHxWZttrtyHGCUjAg0qhVrOyBkNZn5x\/6nPTiqKGsBNspNxHh5ULeXr00gLwHi3kEa9hUQmPDN1IBcGz6XcBSRrWcK5puw7ugsqEPsKmlNfQczRt88R8366QuuukO7CEGOQt4gWLDhHvJJmVDPKmQNJa5yRCc7wU0XmbK6CDUJo+zdWP0AeF\/M+EbD+S4Zn4gSu359jVdQ2kGxi7U1s26q9kMsPiRS4HB0w3lKtu4nh7PDtpqoTaaeMBFXjAR+eMMCTTpTZ3W6iPsz7n3e\/edbDf8HgVGX0\/lkJGz1pf+zylw4fh9R6B3bCbeNNf+ILO757GPePTsO+SjL2uKWa09pxv\/VUwWq5X\/YT8LnaMqvXrGQrRVYC+8r3SGEPH31FC3cI5LT5QJxD+3IW0XNZBumaKAdOSw\/7HH3gmuXrkZLNKtYhhlM1hlzB4BU7FNTjfFOFkhlJWHJSvegzzXRxCAcwzdrpSLcPwfJDPrRwV520HBzZwyTuDDG01RS9tTsp5yGTtUuzlcZCZJeM8TRY5ag9Fc4oYL2Cxjr0S2knIPCJ3GJD2wvgv4zkJKvVQwUiQGgXTQ9fT6v3Eg3941dl64hnVF0aN2mqUuP1QTnykHKHVQgDkrpq8fSoppp9wXZ9IpVglQAIqY+a6doHrTxJQuaLAmiZGYnpQHHSDVQXglLvixwqbunUHIa0sVeljCf4OQJQX4gwPvvjfiG+wqB8qBy05vr\/zm4diki+kkHk\/9tLwjI81HbYLrKQMMYGPuvWUo1iWxl5\/2Thswjz5Z2EvaGWIF6db5T5\/oWdnEP2PAAHLQG3jQCScgeYJ1sz1z\/Oi26DpShz10HB88Y57\/tRpE8pJ7\/BCpJn2x1X8ISgqzvxLgVpzAuIa0HYL\/uGkCCk1vK2dS8Bud3F2HlvpsNRdRt1Kxzgnj5nMBPW1KN1drMTSl4Ob0FKLqUXJQRHiT+24ETUJjNQV4Ez+BjEWVgwpyHQ5+kyrrtyhS282ArzL1ppCuxj+5caSY6jdHbwEEKzCYvJj9t45hApImUrGFoA58o2\/LBNVsZl5IJiJ6izCsFuf5aZ\/Slw42Xa01RwchGYQUarj7JBEFTpFlRTzGX+uSeMfn4OmHKhPVAyfu26BdBCIhDHw1xGx5pkCfJIMjKKhOvOQyZndZpDy0B5JFngcIYcuQm+3iBW8fjHyzuo1qrBCd6kZJ0+1afwMQkkdztX2jauJnIYtrQIxKzrlBSoc464DIxe4G8aMSJLB4gWAjqC8yBQ38\/RYnIVIGC8SvqKrTYiZF78iWli3VxP5bTGCXZLMogdadfb41RC426viAIPRZ1W\/LNLTX1JjTX0gqyXDsUl7pLu3hA02Toddq\/lnLC\/yPC0ghyaGYTJTnZS+RO49LY30p0tN1y5UJ\/4ORgA59gLtliSAqbZRvNLwbyLcdDDVL0e13NXkm4Qkd5Yd\/e+Xto9VDdIbhWh7XTMyTHecHkDm7aChHxdsAuT5Sx+o6pUZc+oPWMsm9Aruv6KzNvYjW33H2jWY3iOe9fbX5zCWEPupYhVhsq8ipFxA35DsT6Cc6IsEr\/nZS2aBV0ltZdEKVU0x+vLnTAjqla2Qb4qG7KkoZ8pOnlFMdcooU7AYRhXE+f+2QC7kwXLEJ6A1AOxSm9rpoCtScmLTokSS1CSDIRgwOteeBAe0wG1oNt9Sbzz7giagCt+7sdhbvQQKp9WYFOQAJSSUoC6bWgwnvA1\/ewJu3XkmhKtRa2oop12QQuxC4lkYv9G18mo0JQHmRcLeUFLoDZ+6c6yMOhjPaK63LbHCUBAm05jk\/IMRTIp5am1sAaGHakYqsbQxxMO9tvzAE3gEChcbDWS3cyTQPpEJiJvlEDPexUwueFQMRaL0rsFdEiISR24qGMzr2ej9gKPnCJp6Vp92NJh5UQmka2hJeSK\/qAl56my567sGWsX4gpd0VshqUnjAXPqBwO\/pttFk60Spe6HO6QMnPTOERSVf43ahbrvX0fYc3QyE\/Z27Q16UyekV8oAbvPgkKDrQsg5yjjqMwL64szX7lRUcTgObmloiKP3zPQwQFMS9NFN+VrL49VkMBM5baVRH5L4tlBuAp9yqcWD56T8GRkKOmZOjRxeRCYGpIBL4WbUVsUsIM6NvDI1TneqRUJkNJauf0gudxiDjlwuuz\/S94wxWNHqUT6sJN+6Xw5uyyarMFZuzcRt92Wnrlc4hfIBvww+X+GYE47i3sLDXPlG6uQtYvks2XI+x806hw\/v+0zK40FvkRxNvwMSBr0tB5gtp6lSvYkv\/Q2NTiwRK61nsD6XphaBgDBSwnoNA1AZmHkL69inZ7mLz012wPx+X7eSx1L7d20XErQJqdHPTDywk5TkaOeMN2IyqmdozxFB6JHusVgtV2ghX2Akz5ZcCMwO5urDspFXdVFp9BYr44DcT\/FbjVH0ItbznaUE5cVqgUHkoQYasWltSA5\/HsKwfgyZGRuYMHz3fLr\/YPXfh4fLzxAeEYlhGKIE0MW0sBnZ1oT16XSiDyTOVDF31TmrmSZnZyREIGbIMSL+Ji7JAuuLClBaregWMvYfszTjkDfs82HKOd\/Y2VPC0A6ikrtKMFe1ZMjDXqFPsUthWgpMKdwZnb47FUddoflWbBk7\/U8qrAp5y4UGpGeiYpGFI7dulFwRR\/7nxk5v4dfzXSREhcTtJWS9Vkeg4hhIqodPf33QBP+N4Tbl994lHy8lYLJDmNZxoz4aUs9gdg9dulYskJR7Ml3T\/C98et4O0VQ9Yb7BJaysl+vjC4etfcLpGH1JDUQkq\/S5r25Dg71sOnjgz9IcDbvbJ90icx8OVLFZ1Lglby0wWfz+8MRfqDhCjD\/C5h86Z0Jz4AI6cbv5T6cxh0mTFnsbli22UtTn1z8gBmoyAhyzc1HoiF\/rtwtUJF01zgNVRzU3dFTTR7MFa\/ios6zihg2Xjt0ev1riyiNGOro1Kl3I1+CiLiLBSJhE2gS6wX8f86VLzAw\/XTV+1Z3Qan6Mwdhp+ZVBBkcIVrQU\/U6fNmIWitevGROkHHKY8MhrUG1AqLwsZir7acWb0HbjMSiVpROUUw9754BtB53GH17X739xbKGzyMvh6cJIrWPvvMQKNyL2RVHm5XhwPIBgTSX059NQ9PQD+Ps91NIR0V+Bbs=

@joshgoebel
Copy link
Author

joshgoebel commented Jan 31, 2020

I wonder if that's not actually a problem with your port itself in how it deals with boundaries in it's parsing engine? Boundaries aren't necessarily required and shouldn't be assumed unless a grammar says so.

Do you run all our markup and detection tests?

@allejo
Copy link
Collaborator

allejo commented Jan 31, 2020

That might be the case? PHP 7.1 uses PCRE 8.x and PHP 7.3 uses PCRE 10.x. I don't have 7.3 installed on my current machine so I can't confirm behavior with PCRE 10.x

@joshgoebel
Copy link
Author

joshgoebel commented Jan 31, 2020

Does \b do something different in PHP? In that output both +2 and /1 are a \b\d match, hence they should be flagged as number.

(of course this sample is ridiculous) :-)

@allejo
Copy link
Collaborator

allejo commented Jan 31, 2020

Not that I'm aware of, \b matches word boundaries however PCRE handles it. We enable UTF-8 in our regular expressions and match any line ending for both CR and LF. But that's the only thing that comes to mind with how we're setting regex behavior.

public static function langRe($value, $global, $case_insensitive)
{
// PCRE allows us to change the definition of "new line." The
// `(*ANYCRLF)` matches `\r`, `\n`, and `\r\n` for `$`
//
// https://www.pcre.org/original/doc/html/pcrepattern.html
// PCRE requires us to tell it the string can be UTF-8, so the 'u' modifier
// is required. The `u` flag for PCRE is different from JS' unicode flag.
$escaped = preg_replace('#(?<!\\\)/#um', '\\/', $value);
$regex = "/(*ANYCRLF){$escaped}/um" . ($case_insensitive ? "i" : "");
return new RegEx($regex);
}

@joshgoebel
Copy link
Author

Do you run all our markup and detection tests?

Not that I'm aware of, \b matches word boundaries however PCRE handles it.

Well, it must be different, OR else there is a bug in your engine somewhere. :-)

@allejo
Copy link
Collaborator

allejo commented Jan 31, 2020

Do you run all our markup and detection tests?

Minus a few tests that we skip due to unknown failures (i.e. haven't been able to figure them out), they all pass.

@joshgoebel
Copy link
Author

You could fix the "tie" issues if you wanted. Alphabetical order breaks ties, LOL. So the first alphabetically always wins.

@joshgoebel
Copy link
Author

joshgoebel commented Jan 31, 2020

Did you port over all our recent (last year) improvements to look-ahead matching? If not that could probably explain the differences, but also means your parser is broken for many cases revolving around look-ahead and proper boundary matching.

Big ones:

The old parser would break up the text and run regexes against small pieces, but you can't do that if you want all the features of regex to work properly... you have to always run the regexes (indexed properly) against the full string.

@allejo
Copy link
Collaborator

allejo commented Jan 31, 2020

You could fix the "tie" issues if you wanted. Alphabetical order breaks ties, LOL. So the first alphabetically always wins.

...wait, is that how highlight.js does it? 😂

Did you port over all our recent (last year) improvements to look-ahead matching? If not that would probably explain the differences, but also means your parser is broken for many cases revolving around look-ahead and proper boundary matching.

Off the top of your head, do you know any languages using lookaheads? I did port over all your changes but I can't say I've tested it further than just making sure our unit tests match yours.

e.g. https://github.com/scrivo/highlight.php/blob/master/Highlight/Language.php#L293-L294

@joshgoebel
Copy link
Author

joshgoebel commented Jan 31, 2020

...wait, is that how highlight.js does it? 😂

Yes. Not my choice. It's just a loop, and in order to "win" you have to have a higher score than a prior match... so in cases of ties the first item always wins since equal is not greater than.

Related: highlightjs/highlight.js#2275

Off the top of your head, do you know any languages using lookaheads?

Several now use look-aheads. I believe that PR includes some of those grammar changes. Actually that wouldn't be the problem with \b though... (well again unless your behavior is different).

I just checked and \b will match the beginning and end of strings also, so it would have behaved the same under the previous engine, which actually makes sense since we use \b so much.

@joshgoebel
Copy link
Author

How does your HTTP markup test fail exactly?

@allejo
Copy link
Collaborator

allejo commented Jan 31, 2020

Yes. Not my choice. It's just a loop, and in order to "win" you have to have a higher score than a prior match... so in cases of ties the first item always wins since equal is not greater than.

That's good to know. I'll implement that functionality on here to match that behavior and get tied unit tests passing.

Several now use look-aheads. I believe that PR includes some of those grammar changes. Actually that wouldn't be the problem with \b though... (well again unless your behavior is different).

Then based on our unit tests passing and match yours, we're handling lookaheads correctly, right?

How does your HTTP markup test fail exactly?

Failing HTTP test

1) MarkupTest::testHighlighter with data set #82 ('http', 'default', 'POST /task?id=1 HTTP/1.1\nHost...true}\n', '<span class="hljs-keyword">PO...span>\n')
The "default" markup test failed for the "http" language
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
 '<span class="hljs-keyword">POST</span> <span class="hljs-string">/task?id=1</span> HTTP/1.1
 <span class="hljs-attribute">Host</span>: example.org
 <span class="hljs-attribute">Content-Type</span>: application/json; charset=utf-8
 <span class="hljs-attribute">Content-Length</span>: 19
 
-<span class="json">{<span class="hljs-attr">"status"</span>: <span class="hljs-string">"ok"</span>, <span class="hljs-attr">"extended"</span>: <span class="hljs-literal">true</span>}
-</span>'
+{"status": "ok", "extended": true}'

Failing Haskell Test

1) MarkupTest::testHighlighter with data set #81 ('haskell', 'nested-comments', '{- this is a {- nested -} comment -}\n', '<span class="hljs-comment">{-...span>\n')
The "nested-comments" markup test failed for the "haskell" language
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'<span class="hljs-comment">{- this is a <span class="hljs-comment">{- nested -}</span> comment -}</span>'
+'<span class="hljs-comment">{- this is a {- nested -}</span> comment -}'

@joshgoebel
Copy link
Author

joshgoebel commented Jan 31, 2020

Then based on our unit tests passing and match yours, we're handling lookaheads correctly, right?

Most likely. Our test suite isn't anywhere close to 100% coverage of all rules though.

-<span class="json">{<span class="hljs-attr">"status"</span>: <span class="hljs-string">"ok"</span>, <span class="hljs-attr">"extended"</span>: <span class="hljs-literal">true</span>}
-</span>'
+{"status": "ok", "extended": true}'

Looks like your sublanguage support is maybe broken somehow? Perhaps you don't support auto-detect of sublanguages? HTTP is a pretty simple syntax.

Failing Haskell Test

Likely a bug in your recursive rule support using "self"?

      hljs.COMMENT(
        '{-',
        '-}',
        {
          contains: ['self']
        }
      )

@joshgoebel
Copy link
Author

Know anything about Zephir?

@allejo
Copy link
Collaborator

allejo commented Feb 1, 2020

Thanks for the hints on the failing HTML/Haskell tests. I'm going to investigate them this weekend and see if I can fix them.

Know anything about Zephir?

In what context? I familiar with that it exists and I've looked at it before but never worked with it. Or is it an allowed failed test somewhere for highlight.php?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug upstream An issue caused by highlight.js' logic or source files
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants