Skip to content

val-town/typescript-tokenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TypeScript Tokenizer

This is an experimental module! Expect changes and breakage!

Most search utilities are not prepared for source code. They have lists of stopwords that are in English, parsers and tokenizers that don't work with code. This is an attempt at creating a tool to extract useful tokens from TypeScript source. Right now it works by:

  1. Parsing TypeScript with tree-sitter to get useful things like identifier names but avoid not-useful things like keywords.
  2. Feeding the probably-english parts of that AST into natural to run a porter/stemmer/stopwords-removal routine on it.
  3. Returning this all in a format that is, we hope, friendly for Postgres’s preferences.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published