Module Kv_hash.Partition
Top-level bucket partition, implemented using Jane St. Map
Given that we want to store (key,value) pairs, a partition is a way to map a key to a particular bucket, where we store the corresponding (key,value).
We assume that the keys are roughly uniformly distributed. (In our use case, they are hashes generated by some "good" hash function.)
We assume the key space is a linear order (in fact, our usecase has keys as ints... actually hashes). A partition is then (finitely represented) total map from key to bucket.
How do we implement this? Each bucket covers a range l_i <= _ < h_i. The ranges are mutually disjoint and cover the key space. So, h_i = l_{i+1} say. The partition contains a map from each l_i to a bucket identifier. Given a particular key k, we find the l_i which is just <= k (i.e., it is <= k, and is the largest such l) and look up the appropriate bucket identifier. This can be done efficiently using a map based on binary search trees. The standard library doesn't support this operation, but Jane St. map library does.
The lowest key (min_key) is 0 in our use case. Every partition explicitly contains at least a mapping for min_key.
Sometimes a bucket becomes full. In this case, we split the bucket into two new buckets (half the kvs in one, half in another), and split the old range l_i <= _ < h_i into two new ranges, corresponding to the two new buckets.
Terminology:
- partition: a total map from key to bucket identifier; each bucket corresponds to a subrange of the keyspace; these ranges are disjoint and cover the keyspace
- bucket: a collection of kvs for a particular subrange of the space
NOTE In our use case keys are ints from 0 to Int.max_int inclusive. This means we miss out on using negative integers.
FIXME distinguish offsets (and lengths) measured in ints {off_i} from offsets measured in bytes {off_b}
module type S = sig ... endmodule Make_1 : functor (S : S) -> Partition_intf.PURE_PARTITION with type k = S.k and type r = S.rPure partitions
module Make_2 : functor (S : S) -> Partition_intf.PARTITION with type k = S.k and type r = S.rMutable partitions, with split hook
module Make_partition = Make_2module Partition_ii : sig ... endDefault partition instance, mutable, k=int, r=int