Module Kv_hash.Partition

Top-level bucket partition, implemented using Jane St. Map

Given that we want to store (key,value) pairs, a partition is a way to map a key to a particular bucket, where we store the corresponding (key,value).

We assume that the keys are roughly uniformly distributed. (In our use case, they are hashes generated by some "good" hash function.)

We assume the key space is a linear order (in fact, our usecase has keys as ints... actually hashes). A partition is then (finitely represented) total map from key to bucket.

How do we implement this? Each bucket covers a range l_i <= _ < h_i. The ranges are mutually disjoint and cover the key space. So, h_i = l_{i+1} say. The partition contains a map from each l_i to a bucket identifier. Given a particular key k, we find the l_i which is just <= k (i.e., it is <= k, and is the largest such l) and look up the appropriate bucket identifier. This can be done efficiently using a map based on binary search trees. The standard library doesn't support this operation, but Jane St. map library does.

The lowest key (min_key) is 0 in our use case. Every partition explicitly contains at least a mapping for min_key.

Sometimes a bucket becomes full. In this case, we split the bucket into two new buckets (half the kvs in one, half in another), and split the old range l_i <= _ < h_i into two new ranges, corresponding to the two new buckets.

Terminology:

NOTE In our use case keys are ints from 0 to Int.max_int inclusive. This means we miss out on using negative integers.

FIXME distinguish offsets (and lengths) measured in ints {off_i} from offsets measured in bytes {off_b}

module type S = sig ... end
module Make_1 : functor (S : S) -> Partition_intf.PURE_PARTITION with type k = S.k and type r = S.r

Pure partitions

module Make_2 : functor (S : S) -> Partition_intf.PARTITION with type k = S.k and type r = S.r

Mutable partitions, with split hook

module Make_partition = Make_2
module Partition_ii : sig ... end

Default partition instance, mutable, k=int, r=int