Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need more instructions to reproduce the extractive oracle of booksum-chapter #26

Open
lzhou1998 opened this issue Mar 5, 2022 · 0 comments

Comments

@lzhou1998
Copy link

Hi, thank you for your great works!

I plan to reproduce some of your baseline result due to #22 , but I met some problems when reproducing the extractive oracle of booksum-chapter and get a slightly different result from your paper, where I got ROUGE-1/2/L (F1) 42.38/9.82/20.62 while 42.68/9.66/21.33 are posted in your paper.

Here are my steps:

  1. Split text in BOOKSUM-paragraph (lines in chapter_summary_aligned_{}_split.jsonl.gathered.stable) into sentences by spaCy, and compute oracles for each instance as Section 4.2 in your paper.
  2. Split text in BOOKSUM-chapter (lines in chapter_summary_aligned_{}_split.jsonl.gathered) into paragraphs by function "merge_text_paragraphs()" in align_data_bi_encoder_paraphrase.py, then split paragraphs into sentences individually as Step 1.
  3. Mapping ALL of the oracle sentences gained from Step 1 to chapter sentences of BOOKSUM-chapter that gained from Step 2.
  4. Now I have BOOKSUM-chapter that texts are split into sentences and each sentence is marked whether it is an oracle, and I can compute ROUGE for each chapter instance.

Any wrong places in my steps? Can you give more instructions about how you perform this?

Another question is, it seems that those extractive models are not directly provided in Huggingface and need additional efforts to reproduce. Do you train and evaluate the models such as BertExt, MatchSum by using codes of their original repos? Can you also give some instructions about this?

Thank you very much! @jigsaw2212 @muggin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant