Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search could/should be loosened #78

Open
pnkfelix opened this issue Nov 10, 2014 · 6 comments
Open

search could/should be loosened #78

pnkfelix opened this issue Nov 10, 2014 · 6 comments

Comments

@pnkfelix
Copy link
Contributor

Based on observations/reverse engineering, it looks like the search field on rustaceans.org attempts find records with a (case-folded) text match for every word in the search.

It also does not seem to attempt to include the Notes record fields in its search.

For example, searching for Felix Klock currently yields zero results. However, searching for either Felix or Klock yields my record. (And if you look at my record, you can see I have put "Felix Klock" into its Notes field to try to work around this.)

While it would be good to prioritize the results returned by the current search algorithm, it would be good to also include the results of a looser search, especially when the current search algorithm yields zero results. As an example of a looser search algorithm, we could look for records with any of the requested words, and take the union of the resulting sets of record.

@nrc
Copy link
Owner

nrc commented Nov 15, 2014

One thing is that the notes field is excluded from search. That can easily be fixed (I excluded it because I thought it would bring up a whole bunch of bad results, but people don't seem to be using the notes section too heavily, so it seems that won't be an issue) and I should do that.

The other thing here is having a smarter search algorithm - the current tactic is really dumb. Obviously fixing this will take some effort (the right way to do this is to use a more search oriented backend - e.g., ElasticSearch, rather than SQLite, but that is more effort than I want to get into). I wonder if we could tweak search without too much effort, e.g., by just splitting search strings on spaces.

@Aatch
Copy link
Contributor

Aatch commented Jan 20, 2015

@nick29581 I don't think something like ElasticSearch is necessary. Postgres has some pretty decent string searching capabilities and will likely be much easier to set up. Also, much easier to move from SQLite to Postgres than to ElasticSearch.

Somehow I think that using the same engine that Wikipedia does is overkill for rustaceans.org 😁

@lambda-fairy
Copy link
Contributor

Can't we just load all the JSON files into a HashMap and skip the database altogether? Even SQLite seems overkill for a read-only site with < 100 users.

Hash indexes and prefix matching aren't hard to implement manually. Even a dumb linear search would work fine for a few orders of magnitude beyond what we have now.

@nrc
Copy link
Owner

nrc commented Feb 11, 2015

We could, but that would make the backend a lot more stateful than it is at the moment. Currently, there is no in-memory state, which is nice, but not essential.

@lambda-fairy
Copy link
Contributor

What kind of statefulness are you thinking of?

I see that the GitHub daemon, after merging an entry, inserts the new data into the database itself. With a database-less system, we can instead have the daemon update the data on disk (git pull, maybe), then ask the HTTP server to reload everything from there (POST /reload).

At this point the changes would probably amount to a rewrite though, so I'm not so sure.

@nrc
Copy link
Owner

nrc commented Feb 11, 2015

I guess I was imagining that on start up the daemon would read everything from disk into memory and would keep running forever. Then the in memory hashmap is preserved between accesses of the backend. I don't suppose that would be too bad since it is unlikely the program/hashtable would get corrupted, and it would mean we never need to worry about re-constructing the DB. But yeah, it would be pretty much a re-write.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@pnkfelix @Aatch @nrc @lambda-fairy and others