Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching for Arabic words issue - #4784 #5099

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
60 changes: 54 additions & 6 deletions addon/search/searchcursor.js
Original file line number Diff line number Diff line change
Expand Up @@ -115,13 +115,61 @@
}
}

//Normalization for Arabic
var normalizeArabicChars = function (s) {
function filter(c) {
switch (c) {
// ALEF Chars
case 'إ' :
case 'أ' :
case 'آ' :
case 'ٵ' :
case 'ٳ' :
case 'ٲ' :
case 'ٱ' :
return 'ا'
// TAAA MARBOTA Chars
case 'ۃ' :
case 'ہ' :
return 'ة'
// YAAA Chars
case 'ى' :
case 'ي' :
case 'ٸ' :
return 'ي'
case 'ئ':
return 'ي ء'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this space in the returned string intentional?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, `.normalize("NFD") already separates this character into \u064a and \u0654, which seems similar to what you're doing, and might already cover this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for comment .... yes, you are right, it seems that when I converted each Unicode into its char I added a space by mistake .... sorry for that.
About normalize("NFD"), Actually I tested it but it didn`t work , but for sure if you test it, I can remove this case to reduce checks.

default :
return c
}
}
var normalized = "", i, l
for (i = 0, l = s.length; i < l; i = i + 1) {
normalized = normalized + filter(s.charAt(i))
}
return normalized
}

function hasArabic(str) {
var pattern = /[\u0600-\u06FF\u0750-\u077F]/;
return pattern.test(str);
}

var doFold, noFold
if (String.prototype.normalize) {
doFold = function(str) { return str.normalize("NFD").toLowerCase() }
noFold = function(str) { return str.normalize("NFD") }
} else {
doFold = function(str) { return str.toLowerCase() }
noFold = function(str) { return str }
doFold = function(str){
str = str.toLowerCase()
if (String.prototype.normalize)
str = str.normalize("NFD")
if (hasArabic(str))
return normalizeArabicChars(str)
return str
}
noFold = function(str){
if (String.prototype.normalize)
str = str.normalize("NFD")
if (hasArabic(str))
return normalizeArabicChars(str)
return str
}

// Maps a position in a case-folded line back to a position in the original line
Expand Down