Finds scientific names using dictionary and nlp approaches.
- Multiplatform packages (Linux, Windows, Mac OS X).
- Self-contained, no external dependencies, only binary
gnfinder
orgnfinder.exe
(~15Mb) is needed. However the internet connection is required for name-verification. - Takes UTF8-encoded text and returns back JSON-formatted output that contains detected scientific names.
- Optionally, automatically detects the language of the text, and adjusts Bayes algorithm for the language. English and German languages are currently supported.
- Uses complementary heuristic and natural language processing algorithms.
- Optionally verifies found names against multiple biodiversity databases using gnindex service.
- Detection of nomenclatural annotations like
sp. nov.
,comb. nov.
,ssp. nov.
and their variants. - Ability to see words that surround detected name-strings.
- The library can be used concurrently to significantly improve speed. On a server with 40threads it is able to detect names on 50 million pages in approximately 3 hours using both heuristic and Bayes algorithms. Check bhlindex project for an example.
Download the binary executable for your operating system from the latest release.
Move gnfinder
executabe somewhere in your PATH
(for example /usr/local/bin
)
sudo mv path_to/gnfinder /usr/local/bin
One possible way would be to create a default folder for executables and place gnfinder
there.
Use Windows+R
keys
combination and type "cmd
". In the appeared terminal window type:
mkdir C:\bin
copy path_to\gnfinder.exe C:\bin
Add C:\bin
directory to your PATH
environment variable.
go get github.com/gnames/gnfinder
cd $GOPATH/src/github.com/gnames/gnfinder
make install
To see flags and usage:
gnfinder --help
# or just
gnfinder
To see the version of its binary:
gnfinder -v
Examples:
Getting data from a pipe forcing English language and verification
echo "Pomatomus saltator and Parus major" | gnfinder find -c -l eng
Displaying matches from NCBI
and Encyclopedia of Life
, if exist.
For the list of data source ids go gnresolver.
echo "Pomatomus saltator and Parus major" | gnfinder find -c -l eng -s "4,12"
Returning 5 words before and after found name-candidate.
gnfinder find -t 5 file_with_names.txt
Getting data from a file and redirecting result to another file
gnfinder find file1.txt > file2.json
Detection of nomenclatural annotations
echo "Parus major sp. n." | gnfinder find
Start gnfinder as a gRPC server:
# using default 8778 port
gnfinder grpc
# using some other port
gnfinder grpc -p 8901
Use a gRPC client for gnfinder. To learn how to make one, check a
Ruby implementation
of a client.
cd $GOPATH/src/github.com/gnames/gnfinder
make deps
import (
"github.com/gnames/gnfinder"
)
bytesText := []byte(utfText)
gnf := gnfinder.NewGNfinder()
jsonNames := gnf.FindNamesJSON(bytesText)
fmt.Println(string(output))
docker pull gnames/gnfinder
# run gnfinder server, and map it to port 8888 on the host machine
docker run -d -p 8888:8778 --name gnfinder gnames/gnfinder
To install the latest gnfinder
Download protoc
binary compiled for your OS from
protobuf releases.
brew install protobuf
If you see any error messages, run brew doctor
, follow any recommended
fixes, and try again. If it still fails, try instead:
brew upgrade protobuf
Alternately, run the following commands:
PROTOC_ZIP=protoc-3.11.4-osx-x86_64.zip
curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v3.11.4/$PROTOC_ZIP
sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc
sudo unzip -o $PROTOC_ZIP -d /usr/local 'include/*'
rm -f $PROTOC_ZIP
Or manually download and install protoc from protobuf releases.
Run the following commands:
PROTOC_ZIP=protoc-3.11.4-linux-x86_64.zip
curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v3.11.4/$PROTOC_ZIP
sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc
sudo unzip -o $PROTOC_ZIP -d /usr/local 'include/*'
rm -f $PROTOC_ZIP
Or manually download and install protoc from protobuf releases.
go get github.com/gnames/gnfinder
cd $GOPATH/src/github.com/gnames/gnfinder
make deps
make
gnfinder -h
Install [ginkgo], a [BDD] testing framefork for Go.
make deps
To run tests go to root directory of the project and run
ginkgo
#or
go test
#or
make test