You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm just learning about the project and it's pretty amazing. I tinkered with NTLK and Gensim before but this is so convenient to explore and embed on a page. Learning with Observable notebooks is also great!
That being said I end up for a lot of noise in my selection. I tried a bit of normalize() and remove() with encouraging results. Still, I'm quite surprised that when I search in this repository I don't seem to find stop words.
This made me wonder, is this the "wrong" way in this context? Is the philosophy of compromise not to rely on such lists?
PS: I apologize for hijacking issues but is there a forum/chat/platform for discussions on using compromise that would a better place? I have other questions like using .tfidf() on .ngrams() but I don't make to create noise here.
The text was updated successfully, but these errors were encountered:
hey Fabien, you're talking about the results of the wikipedia plugin right?
Yeah, super noisy. it really needs a lot of work. Yeah, i was using a stop-list here but that was just me eyeballing it. It could really use a PR, if you want to take a swing at it.
To do it properly, we should also add (some!) wikipedia redirects. I held-off because the results were still so rowdy.
cheers
Hi, I'm just learning about the project and it's pretty amazing. I tinkered with NTLK and Gensim before but this is so convenient to explore and embed on a page. Learning with Observable notebooks is also great!
That being said I end up for a lot of noise in my selection. I tried a bit of
normalize()
andremove()
with encouraging results. Still, I'm quite surprised that when I search in this repository I don't seem to find stop words.This made me wonder, is this the "wrong" way in this context? Is the philosophy of compromise not to rely on such lists?
PS: I apologize for hijacking issues but is there a forum/chat/platform for discussions on using compromise that would a better place? I have other questions like using
.tfidf()
on.ngrams()
but I don't make to create noise here.The text was updated successfully, but these errors were encountered: