Skip to content

castorini/onboarding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 

Repository files navigation

Castorini: Onboarding Guide

"Castorini" is the GitHub organization of Jimmy Lin's research group at the University of Waterloo. The name is a portmanteau of castor, which is the genus name for a beaver, and anserini, which is the genus name for a goose. It's difficult to come up with two animals that are more quintessentially Canadian than those!

This repository contains onboarding resources for researchers who would like to work with us, which include new graduate students and undergraduates at the University of Waterloo.

Undergraduates at the University of Waterloo: If you're interested in working with our group, read this guide first.

🧱 Foundations of Retrieval

This onboarding path provides the starting point of working in our group and comprises the following lessons:

  1. Begin your journey here.
  2. BM25 Baselines for MS MARCO Passage Ranking in Anserini.
  3. BM25 Baseline for MS MARCO Passage Ranking in Pyserini.
  4. A Conceptual Framework for Retrieval
  5. Contriever Baseline for NFCorpus
  6. A Deeper Dive into Dense and Sparse Representations

When you are proceeding along the onboarding path, please don't send a separate pull request for each file. Instead, consolidate your edits into a single pull request for each repo.

Resources

This repository introduces several methods for users without local GPU resources.

Training monoBERT from Scratch

This is the guide to fine-tuning monoBERT on MS MARCO Passage dataset, based on Capreolus toolkit. For Compute Canada users, you may need to set up the environment following this guide.

About

Onboarding guide to Jimmy Lin's research group at the University of Waterloo

Resources

Stars

Watchers

Forks