Skip to content

Latest commit

 

History

History
471 lines (361 loc) · 17.7 KB

data-types.md

File metadata and controls

471 lines (361 loc) · 17.7 KB

Data Types

find the previous version of this document at crankshaft/data-types.md

Table of Contents generated with DocToc

Efficiently Representing Values and Tagging

watch | slide

read Numbered properties: fast elements

  • most objects in heap are 4-byte aligned
  • according to spec all numbers in JS are 64-bit floating doubles
  • V8 passes around 32-bit numbers to represent all values for improved efficiency
  • bottom bit reserved as tag to signify if value is a Smi (small integer) or a pointer to an object

watch | slide

| object pointer              | 1 |

or

| 31-bit-signed integer (Smi) | 0 |
  • numbers bigger than 31 bits are boxed
  • stored inside an object referenced via a pointer
  • adds extra overhead (at a minimum an extra lookup)
  • on 64 bit architectures Smis are 32-bit-signed instead of the 31-bit-signed on 32 bit architectures

Considerations

  • prefer Smis for numeric values whenever possible

Objects

Structure

+-------------------+
|  Object           |    +----> +------------------+     +---->  +------------------+
|-------------------|    |      |  FixedArray      |     |       |  FixedArray      |
|  Map              |    |      |------------------|     |       |------------------|
|-------------------|    |      |  Map             |     |       |  Map             |
|  Extra Properties |----+      |------------------|     |       |------------------|
|-------------------|           |  Length          |     |       |  Length          |
|  Elements         |------+    |------------------|     |       |------------------|
|-------------------|      |    |  Property "poo"  |     |       |  Property "0"    |
|  Property "foo"   |      |    |------------------|     |       |------------------|
|-------------------|      |    |  Property "baz"  |     |       |  Property "1"    |
|  Property "bar"   |      |    +__________________+     |       +__________________+
+___________________+      |                             |
                           |                             |
                           |                             |
                           +-----------------------------+
  • above shows most common optimized representation
  • most objects contain all their properties in single block of memory "foo", "bar"
  • all blocks have a Map property describing their structure
  • named properties that don't fit are stored in overflow array "poo", "baz"
  • numbered properties are stored in a separate contiguous array "1", "2"

Object Properties

read Some surprising properties of properties

  • object is a collection of properties aka key-value pairs
  • property names are always strings
  • any name used as property name that is not a string is stringified via .toString(), even numbers, so 1 becomes "1"
  • Arrays in JavaScript are just objects with magic length property

Hash Tables

  • hash table used for difficult objects
  • aka objects in dictionary mode
  • accessing hash table property is much slower than accessing a field at a known offset
  • if non-symbol string is used to access a property it is uniquified first
  • V8 hash tables are large arrays containing keys and values
  • initially all keys and values are undefined

HashTables and Hash Codes

  • on key-vaule pair insertion the key's hash code is computed
  • computing hash code and comparing keys for equality is commonly a fast operation
  • still slow to execute these non-trivial routines on every property read/write
  • data structures such as Map, Set, WeakSet and WeakMap use hash tables under the hood
  • a hash function returns a hash code for given keys which is used to map them to a location in the hash table
  • hash code is a random number (independent of object value) and thus needs to be stored
  • storing the hash code as private symbol on the object, like was done previously, resulted in a variety of performance problems
    • led to slow megamorphic IC lookups of the hash code and
    • triggered hidden class transition in the key on storing the hash code
  • performance issues were fixed (~500% improvement for Maps and Sets) by hiding the hashcode and storing it in unused memory space that is connected to the JSObject
    • if properties backing store is empty: directly stored in the offset of JSObject
    • if properties backing store is array: stored in extranous 21 bits of 31 bits to store array length
    • if properties backing store is dictionary: increase dictionary size by 1 word to store hashcode in dedicated slot at the beginning of the dictionary

Resources

Fast, In-Object Properties

read Fast, in-object properties | read

  • V8 describes the structure of objects using maps used to create hidden classes and match data types
    • resembles a table of descriptors with one entry for each property
    • map contains info about size of the object
    • map contains info about pointers to constructors and prototypes
    • objects with same structure share same map
  • objects created by the same constructor and have the same set of properties assigned in the same order
    • have regular logical structure and therefore regular structure in memory
    • share same map
  • adding new property is handled via transition descriptor
    • use existing map
    • transition descriptor points at other map

Assigning Properties inside Constructor Call

class Point {
  constructor(x, y) {
    // Map M0
    //  "x": Transition to M1 at offset 12

    this.x = x

    // Map M1
    //  "x": Field at offset 12
    //  "y": Transition to M2 at offset 16

    this.y = y

    // Map M2
    //  "x": Field at offset 12
    //  "y": Field at offset 16
  }
}
  • Point starts out without any fields with M0
  • this.x =x -> map pointer set to M1 and value x is stored at offset 12 and "x" Transition descriptor added to M0
  • this.y =y -> map pointer set to M2 and value y is stored at offset 16 and "y" Transition descriptor added to M1

Assigning More Properties Later

var p = new Point(1, 2)

// Map M2
//  "x": Field at offset 12
//  "y": Field at offset 16
//  "z": Transition at offset 20

p.z = z

// Map M3
//  "x": Field at offset 12
//  "y": Field at offset 16
//  "z": Field at offset 20
  • assigning z later
    • create M3, a copy of M2
    • add Transition descriptor to M2
    • add Field descriptor to M3

Assigning Same Properties in Different Order

class Point {
  constructor(x, y, reverse) {
    // Map M0
    //  "x": Transition to M1 at offset 12ak
    //  "y": Transition to M2 at offset 12
    if (reverse) {
      // variation 1

      // Map M1
      //  "x": Field at offset 12
      //  "y": Transition to M4 at offset 16

      this.x = x

      // Map M4
      //  "x": Field at offset 12
      //  "y": Field at offset 16

      this.y = y
    } else {
      // variation 2

      // Map M2
      //  "y": Field at offset 12
      //  "x": Transition to M5 at offset 16

      this.y = x

      // Map M5
      //  "y": Field at offset 12
      //  "x": Field at offset 16
      this.x = y
    }
  }
}
  • both variations share M0 which has two transitions
  • not all Points share same map
  • in worse cases V8 drops object into dictionary mode in order to prevent huge number of maps to be allocated
    • when assigning random properties to objects from same constructor in random order
    • when deleting properties

In-object Slack Tracking

read In-object slack tracking

  • objects allocated by a constructor are given enough memory for 32 fast properties to be stored
  • after certain number of objects (8) were allocated from same constructor
    • V8 traverses transition tree from initial map to determine size of largest of these initial objects
    • new objects of same type are allocated with exact amount of memory to store max number of properties
    • initial objects are resized (down)

Methods And Prototypes

read Methods and prototypes

Assigning Functions to Properties

function pointDistance(p) { /* calculates distance */ }

class Point {
  constructor(x, y) {
    // Map M0
    //  "x": Transition to M1 at offset 12

    this.x = x

    // Map M1
    //  "x": Field at offset 12
    //  "y": Transition to M2 at offset 16

    this.y = y

    // Map M2
    //  "x": Field at offset 12
    //  "y": Field at offset 16
    // "distance": Transition to M3 <pointDistance>

    this.distance = pointDistance

    // Map M3
    //  "x": Field at offset 12
    //  "y": Field at offset 16
    // "distance": Constant_Function <pointDistance>
  }
}
  • properties pointing to Functions are handled via constant functions descriptor
  • constant_function descriptor indicates that value of property is stored with descriptor itself rather than in the object
  • pointers to functions are directly embedded into optimized code
  • if distance is reassigned, a new map has to be created since the Transition breaks

Assigning Functions to Prototypes

class Point {
  constructor(x, y) {
    this.x = x
    this.y = y
  }

  pointDistance() { /* calculates distance */ }
}
  • V8 represents prototype methods (aka class methods) using constant_function descriptors
  • calling prototype methods maybe a tiny bit slower due to overhead of the following:
    • check receiver's map (as with own properties)
    • check maps of prototype chain (extra step)
  • the above lookup overhead won't make measurable performance difference and shouldn't impact how you write code

Numbered Properties

read Numbered properties: fast elements

see Arrays

  • numbered properties are treated and ordered differently than others since any object can behave like an array
  • element === any property whose key is non-negative integer
  • V8 stores elements separate from named properties in an elements kind field (see structure diagram)
  • if object drops into dictionary mode for elements, access to named properties remains fast and vice versa
  • maps don't need transitions to maps that are identical except for element kinds
  • most elements are fast elements which are stored in a contiguous array

Arrays

watch | slide

read Numbered properties: fast elements

read

  • V8 has two methods for storing arrays, fast elements and dictionary elements

Fast Elements

see Numbered Properties

  • compact keysets
  • linear storage buffer
  • contiguous (non-sparse)
  • 0 based
  • smaller than 64K

Dictionary Elements

  • hash table storage
  • slow access
  • sparse
  • large

Packed vs. Holey Elements

  • V8 makes distinction whether the elements backing store is packed or has holes
  • holes in a backing store are created by deleting an indexed element
  • missing properties are marked with special hole value to keep Array functions performant
  • however missing properties cause expensive lookups on prototype chain

Elements Kinds

read

  • fast elements kinds in order of increasing generality:
    • fast Smis (small integers)
    • fast doubles (Doubles stored in unboxed representation)
    • fast values (strings or other objects)
Elements Kind Lattice
+--------------------+
| PACKED_SMI_ELEMENT |---+
+--------------------+   |    +------------------------+
          |              +--->| PACKED_DOUBLE_ELEMENTS |---+
          ↓                   +------------------------+   |    +-------------------+
+--------------------+                   |                 +--->|  PACKED_ELEMENTS  |
| HOLEY_SMI_ELEMENTS |---+               ↓                      +-------------------+
+--------------------+   |    +------------------------+                   |
                         +--->|  HOLEY_DOUBLE_ELEMENTS |---+               ↓
                              +------------------------+   |    +-------------------+
                                                           +--->|  HOLEY_ELEMENTS   |
                                                                +-------------------+

Double Array Unboxing

watch | slide

  • Array's hidden class tracks element types
  • if all doubles, array is unboxed aka upgraded to fast doubles
    • wrapped objects layed out in linear buffer of doubles
    • each element slot is 64-bit to hold a double
    • Smis that are currently in Array are converted to doubles
    • very efficient access
    • storing requires no allocation as is the case for boxed doubles
    • causes hidden class change
    • requires expensive copy-and-convert operation
  • careless array manipulation may cause overhead due to boxing/unboxing watch | slide

Typed Arrays

blog | spec

  • difference is in semantics of indexed properties
  • V8 uses unboxed backing stores for such typed arrays

Float64Array

  • gets 64-bit allocated for each element

Considerations

  • once array is marked as holey it is holey forever
  • don't pre-allocate large arrays (>64K), instead grow as needed, to avoid them being considered sparse
  • do pre-allocate small arrays to correct size to avoid allocations due to resizing
  • avoid creating holes, and thus don't delete elements
  • don't load uninitialized or deleted elements watch | slide
  • use literal initializer for Arrays with mixed values
  • don't store non-numeric values in numeric arrays
    • causes boxing and efficient code that was generated for manipulating values can no longer be used
  • use typed arrays whenever possible especially when performing mathematical operations on an array of numbers
  • when copying an array, you should avoid copying from the back (higher indices to lower indices) because this will almost certainly trigger dictionary mode
  • avoid elements kind transitions, i.e. edge case of adding -0, NaN, Infinity to a Smi array as they are represented as doubles

Strings

  • string representation and how it maps to each bit
map |len |hash|characters
0123|4567|8901|23.........
  • contain only data (no pointers)
  • content not tagged
  • immutable except for hashcode field which is lazily computed (at most once)

Resources