Skip to content

zuevmaxim/TypeSpecificCollections

Repository files navigation

Type-specific Kotlin collections

The purpose of this project is analyzing possibility of implementing collections with primitive values stored.

Desired properties of collections:

  • Kotlin-multiplatform library
  • implementing standard interfaces like MutableList, MutableMap
  • advantage about two times in memory
  • better operating with a cache than standard implementation
  • linked implementations of Map and Set

Map

Two possible implementations were considered - open addressing and chained.

In both cases some common optimizations were used:

  • power of two capacity in order to avoid mod operation
  • multiplicative Knuth hash
  • uses linear probing
  • eager deletion

Implementation of MutableMap<Long, Long> is here.

Advantages:

  • better operating with a cache as it uses only one array for navigating
  • smaller memory usage in case of big load factor
  • could be used for Object map implementation with some advantage in memory

Disadvantages:

  • problems with collisions as all elements are stored sequentially in one array
  • memory usage highly depends on capacity rather than size

Idea: store fields of nodes from standard implementation in several arrays(keys, values, next).

Chained implementation

Implementation of MutableMap<Long, Long> is here.

Advantages:

  • algorithmically equivalent to standard implementation(better resolution of collisions than open addressing)
  • dense memory usage
  • memory usage depends on capacity * load factor - approximately size
  • smaller memory usage in case of small load factor

Disadvantages:

  • needs several extra arrays (free indexes, head indexes)
  • uses two arrays for navigation
  • higher memory usage for small primitives(byte, short, int)
  • could not be used for Object map implementation

Open addressing vs Chained

Memory usage

Table contains average advantage of implementations to the standard one in case of load factor 0,6.

Open addressing vs Chained average memory usage

  • chained implementation is better in case of Map<Long, Long>
  • for other primitives open addressing has better memory usage
  • chained implementation is unused in case of Map<Object, Object> (advantage is less than 1 in worst case)

Some other calculations of memory usage are here.

Performance(get)

Open addressing vs Chained performance

When size of hash map is big enough we can see a great improvements in performance comparing with standard implementation.

Other primitives

Two possible implementations of other than long primitives were considered:

  • store everything as long and redirect all calls to LongLongMap
  • implement a generic version of hash map with specialized storage for primitive elements

Long implementation redirecting

  • higher memory usage

Open addressing vs Chained average memory usage

Generic implementation

Idea: all memory advantages of specialized implementations are concentrated on a primitive array. Therefore we can wrap a primitive array and all work with primitive type into Storage<T> class.

  • one map implementation for all types, no code generation is needed
  • a special generic reified fabric function allows to have only liner number of implementations rather than quadratic other case
  • better memory usage and performance rather than long-long implementation redirecting

Open addressing implementation of MutableMap<K, V> is here.

Generic vs long redirecting

Generic vs long redirecting

ArrayList

Memory advantage is 2.5-3 times.

Implementation of MutableList<Long, Long> is here.

All operations with ArrayList are very simple so there are problems with performance(see next section).

Problem

Map, List and other interfaces from standard library require returning Object in all methods therefore extra boxing is needed everywhere. This critically worsens performance when:

  • ArrayList is used
  • size of hash map is small
  • operations like foreach are used

Open addressing vs Chained performance

When size of hash map is small one extra boxing takes large part of get execution so standard implementation works better.

Name Absolute Std slowness factor
List sum
Small map sum
Big map sum

Foreach(or sum) is a cheap operation so extra boxings are fatal for performance.

Profiler results of ArrayList sum:

Name Profile
Std
My list
FastUtil

The problem could be solved if special methods like getOrDefault are used:

Kotlin Native

Kotlin native benchmarks differ for get and stay the same for foreach.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages