Skip to content

A literature parsing tool that compiles and analyzes publication data

Notifications You must be signed in to change notification settings

ChildMindInstitute/biblio-reader

Repository files navigation

biblio-reader

Welcome! biblio-reader is a literature parsing tool based on Christian Kreibich's scholar.py that compiles and analyzes publications matched by Google Scholar searches. For publications found on Google Scholar between pages 1 and 99, it can do the following:

  • Compile key information from each publication (such as article title, year, authors, journal title, URL, and citations)
  • Write all key information into a CSV file
  • Look for trends in journal fields, publication growth over the years, publication types, journal impact, citations, and more
  • Find and display author information, including relationships between each author and attributed articles
  • Help users find full-text PDFs for each publication
  • Subsequently analyze and categorize full text files for each PDF
  • Map author affiliations on Google Maps
  • Facilitate manual publication review, including assigning articles to separate reviewers and analyzing their input
  • Create a sortable table displaying publications and key information about each article

Navigation

manager.py

Manager.py is the utilities manager, and provides support for reading and writing files through the inputs, outputs, and working directories. It is in charge of updating the main data CSV file with the update_data() method.

This is also where users can enter project-specific variables including marking which publications are connected to the original work of interest, and categories of search terms with regular expressions Google Scholar may have used to find them.

scholar.py is where the original Google Scholar results are compiled. (More in link)

See README

inputs

Directory containing all user inputs. Journal categories and attributes have been included. For full text analysis, all PDFs should go here inside a subdirectory entitled "pdfs".

Manual review and categorization of each publication should be stored here as well, under a subdirectory entitled "article_review". It should contain csv files with specific categorization of each article that will then be analyzed by 'review_analysis.py' in biblio_reader directory.

outputs

All final outputs are stored here. This includes matplotlib graphs generated by scholar_reader.py, the main CSV file, and the reviewer assignments.

working

Provides location for intermediate files including Pubmed bibliographies, keyword paragraphs, and TXT converted PDFs.

table

Provides support for creating a sortable, viewable table HTML based on the csv file. In data_mg.py, the data can be filtered to only show specific publications based on criteria set by the user.