Skip to content

donatj/unic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

unic

Go Report Card GoDoc

Works like UNIX sort | uniq to provide global uniques except you don't have to sort first.

Works by using Cuckoo Filters - See: https://github.com/seiflotfy/cuckoofilter

Advantages over sort | uniq

Quicker output, lower memory footprint

sort by definitions needs to buffer the entire input before it can begin outputing anything. This can use a lot of memory and prevents anything from getting output until the initial process completes.

unic uses probabalistic filters (Cuckoo) to determine if the input has been seen before, and can begin output after the first line of input.

Original item order is kept

Given the list 3 1 2 1 2 3, compare sort | uniq 's output

$ echo '3\n1\n2\n1\n2\n3' | sort | uniq
1
2
3

to unic

echo '3\n1\n2\n1\n2\n3' | unic
3
1
2

Disadvantages

Probabilistic Filtering

As unic works with Cuckoo Filters, there is a very small probability a line will be wrongly marked duplicate. Lines will never be incorrectly marked as unique due to the nature of the filter.

In cases where a false positive cannot ever be tolerated, unic should not be used.

Not compatible with all of uniq's flags

unic by nature does not buffer; thus some of uniq's flags cannot be implemented.

In these cases, you should use uniq.

Installing

Binaries

See: releases

From Source

$ go install github.com/donatj/unic/cmd/unic@latest