Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diffraction vs diff #12

Open
antolinos opened this issue Jun 2, 2022 · 1 comment
Open

diffraction vs diff #12

antolinos opened this issue Jun 2, 2022 · 1 comment

Comments

@antolinos
Copy link

I was expecting that because the implementation is doing stemming:

applying stemming to all words and create terms

to get more items that are matched when the query=diff than query=diffraction

However, my tests demonstrate the opposite, the query diffraction has 239 results whilst the query diff has 0 results.

For the records, this is the list of terms that contains the word diff:

JSON.stringify(terms.filter(t => t.term.search("diff")!= -1))
[
{\"term\":\"ndiffract\",\"numberOfItems\":1,\"numberOfGroups\":1},
{\"term\":\"micodiffract\",\"numberOfItems\":1,\"numberOfGroups\":1},
{\"term\":\"4864differ\",\"numberOfItems\":1,\"numberOfGroups\":1},
{\"term\":\"diffraction10\",\"numberOfItems\":7,\"numberOfGroups\":1},
{\"term\":\"interdiffus\",\"numberOfItems\":1,\"numberOfGroups\":1},
{\"term\":\"difficil\",\"numberOfItems\":1,\"numberOfGroups\":1},
{\"term\":\"diffract\",\"numberOfItems\":239,\"numberOfGroups\":1},
{\"term\":\"206diffract\",\"numberOfItems\":1,\"numberOfGroups\":1},
{\"term\":\"nanodiffraction10\",\"numberOfItems\":1,\"numberOfGroups\":1},
{\"term\":\"difficulti\",\"numberOfItems\":5,\"numberOfGroups\":1},
{\"term\":\"differenti\",\"numberOfItems\":4,\"numberOfGroups\":1},
{\"term\":\"microdiffraction10\",\"numberOfItems\":3,\"numberOfGroups\":1},
{\"term\":\"microdiffract\",\"numberOfItems\":3,\"numberOfGroups\":1},
{\"term\":\"nanodiffract\",\"numberOfItems\":2,\"numberOfGroups\":1},
{\"term\":\"diffus\",\"numberOfItems\":15,\"numberOfGroups\":1},
{\"term\":\"difficult\",\"numberOfItems\":38,\"numberOfGroups\":1},
{\"term\":\"differ\",\"numberOfItems\":322,\"numberOfGroups\":1},
{\"term\":\"diffractomet\",\"numberOfItems\":2,\"numberOfGroups\":1}]

My questions are:

  1. Should not we expect a term called diff because is the root?

  2. Is it a problem if a user queries by diff and there is no result?

@nitrosx
Copy link
Collaborator

nitrosx commented Jun 3, 2022

@antolinos : at the moment the scoring does not apply partial matching to the terms.
The use case that you highlighted in this issue is totally possible.

To answer your questions:

  1. If you search for diff, only the items that contains exactly the term generated applying lemmatization to the word diff will be scored and returned. We should discuss more about this topic if we want to return all the items that contains term which include diff
  2. At the moment, I do not see that as a problem, but I'm open for discussing the topic and how to implement the necessary changes if they are approved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants