v4-hathi-indexer

The code here will contact hathitrust and download the latest updates to their materials, determine which are publically accessible, and which are not publically accessible, and will add those updates to the ingest queues for the staging or production indexes.

The top level script to be executed is updateifnewer. It accepts the command line arguments of

-v verbose
-t test (downloads files but doesn't send anything to solr input queues)
-a force a check of whether the AWS environment variables are defined
-i specify which indexes are to be updated staging or production or staging:production

If there is a locally maintained cache of all records (in the full_dump directory under the data directory) the update process will merge the updates into that cache ensuring it is up to date. At present the scripts for initially populating the cache of records is only half-ported, and resides in the bak directory.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
hathifetch_java		hathifetch_java
.gitignore		.gitignore
README.md		README.md
check_aws		check_aws
clean_up_after_update		clean_up_after_update
cores_to_process		cores_to_process
fetch_all_recs		fetch_all_recs
fetch_updates		fetch_updates
filterrecords		filterrecords
getrecord		getrecord
hathifetch		hathifetch
jtidy-r938.jar		jtidy-r938.jar
jtidy.config		jtidy.config
marc4j-2.9.5-SNAPSHOT.jar		marc4j-2.9.5-SNAPSHOT.jar
marcupdate		marcupdate
merge_updates		merge_updates
outputfuncs.bash		outputfuncs.bash
startfromscratch		startfromscratch
updateifnewer		updateifnewer
upload_all_records		upload_all_records
upload_updates		upload_updates

uvalib/v4-hathi-indexer

Folders and files

Latest commit

History

Repository files navigation

v4-hathi-indexer

About

Resources

Stars

Watchers

Forks

Languages