Each os_chained_list_block can contain up to 255 pointers. It has two or three words of overhead: one word of state information, a previous pointer, and possibly a next pointer (the first os_chained_list_block allocated does not have a next pointer until the next block is allocated). The default version of os_chained_list contains four pointers in the header and seven or eight pointers in its blocks.
The maximum cardinality for os_chained_lists is 131070.
In order to create a collection using a chained list with other than the default parameterization, you invoke the following static member function:
static os_chained_list_descriptor* os_chained_list_descriptor::find_rep( os_unsigned_int32 ptrs_in_hdr, os_unsigned_int32 ptrs_in_blocks );If the requested parameterization has been specified with the above macros, the appropriate representation descriptor is returned. Otherwise, 0 is returned.
Note that an os_chained_list must have at least four pointers in the header but not more than 15 pointers.
An os_chained_list with a four-pointer header can change freely into any other collection representation and the reverse. However, other collection representations cannot change into os_chained_lists with more than four pointers in the header. A normal collection header is 24 bytes. An os_chained_list with more than four pointers exceeds this limit. It is possible for an os_chained_list with an oversized header to change into another representation (with the same or smaller size header).
Setting OS_COLL_POOL_ALLOC_CHLIST_BLOCKS (to 1) turns on pool allocation. There is one pool per segment; each pool consists of an array of subpools. Each subpool is two pages by default.
By allocating larger subpools, you can defer the cost of allocating new subpools at the expense of potentially wasted space. To allocate larger subpools, use this function:
static void os_chlist_pool::configure_pool( os_unsigned_int32 config_options, os_unsigned_int32 blks_per_subpool=2 );config_options can have one of the following values:
static voidObjectStore sets the check interval to one less than the power of 2 that is greater than or equal to v. For example, in order to check on every other insert or remove, pass 1 or 2 as an argument. Passing 3 or 4 results in a check on every third operation. Passing 0 inhibits mutation. However, if the maximum cardinality for an os_chained_list is reached, it will change to another representation.
os_chained_list_descriptor::set_reorg_check_interval(
os_unsigned_int32 v );
The representation os_dyn_bag minimizes reorganization overhead at the expense of some extra space overhead, compared with os_ptr_bag. At large cardinalities, os_dyn_bag uses a directory structure pointing to many small hash tables that can reorganize independently.
This representation type does not support maintain_order or maintain_cursors behavior.
For cardinalities below 30, os_chained_list might be a better representation type.
In the following table, complexities are shown in terms of collection cardinality, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic.)
insert() | O(1) |
remove() | O(1) |
cardinality() | O(1) |
contains() | O(1) |
comparisons (<=, ==, and so on) | O(n) |
merges (|, &, -) | O(n) |
Avg. total size in bytes = 24 + (cardinality/.69) * 8If cardinality > 64k the large cardinality data structure is used. It contains the following:
n_entries = Avg. number of entries per small hash table = (8192/8) * .7 n_tables = Avg. number of small hash tables = cardinality / n_entries dir_size = Avg. directory size in bytes = 60 + (n_tables+1) * 12 Avg. total size in bytes = 24 bytes + dir_size + n_tables * 8192
At large cardinalities, os_dyn_hash uses a directory structure pointing to many small hash tables that can reorganize independently.
This representation type does not support allow_duplicates, maintain_order, or maintain_cursors behavior.
For cardinalities below 30, os_chained_list might be a better representation type.
In the following table, complexities are shown in terms of collection cardinality, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic.)
insert() | O(1) |
remove() | O(1) |
cardinality() | O(1) |
contains() | O(1) |
comparisons (<=, ==, and so on) | O(n) |
merges (|, &, -) | O(n) |
Avg. total size in bytes = 24 + (cardinality/.69) * 4If cardinality > 64k the large cardinality data structure is used. It contains the following:
n_entries = Avg. number of entries per small hash table = (8192/4) * .7 n_tables = Avg. number of small hash tables = cardinality / n_entries dir_size = Avg. directory size in bytes = 60 + (n_tables+1) * 12 Avg. total size in bytes = 24 bytes + dir_size + n_tables * 8192
For large collections subject to contention, os_ixonly_bc can provide significantly better performance than os_ixonly. See os_ixonly_bc, below.
The next chapter discusses associating indexes with collections to improve the efficiency of queries. With os_ixonly or os_ixonly_bc, you can save space by telling ObjectStore to record the membership of the collection in one of its indexes, as opposed to recording the membership in both the index and the collection. In other words, you can save space by using an index as a collection's representation.
When these representation types are specified for a collection, you must add an index to it before any operations are performed on it. Additional indexes can also be added.
These representation types are incompatible with the following behaviors: maintain_order, maintain_cursors, allow_nulls, and allow_duplicates.
Note that using these representations can save on space overhead at the expense of reducing the efficiency of some collection operations. If the only time-critical collection operation is index-based element lookup, an index-only representation is likely to be beneficial.
For cardinalities below 30, os_chained_list might be a better representation type.
os_ixonly_bc is just like os_ixonly, except that insert() and remove() do not update cardinality information, avoiding contention in the collection header. The disadvantage of os_ixonly_bc is that cardinality() is an O(n) operation, requiring a scan of the whole collection.
You can determine if a collection updates its cardinality in this way with the following member of os_collection:
os_int32 cardinality_is_maintained() const;This function returns nonzero if the collection maintains cardinality; it returns 0 otherwise.
The following member of os_collection, which returns an estimate of a collection's cardinality, is an O(1) operation in the size of the collection:
os_unsigned_int32 cardinality_estimate() const;This function returns the cardinality as of the last call to os_collection::update_cardinality() - see below. For collections that maintain cardinality, the actual cardinality is returned.
Before you add a new index to an os_ixonly_bc collection, call the following member of os_collection:
os_unsigned_int32 update_cardinality();If you do not, add_index() will work correctly, but less efficiently than if you do. This function updates the value returned by os_collection::cardinality_estimate(), by scanning the collection and computing the actual cardinality.
In the following table, complexities are shown in terms of collection cardinality, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic.)
insert() | O(1) |
remove() | O(1) |
cardinality(), os_ixonly | O(1) |
cardinality(), os_ixonly_bc | O(n) |
contains() | O(1) |
comparisons (<=, ==, and so on) | O(n) |
merges (|, &, -) | O(n) |
If there are safe cursors open on a particular collection, each insert or remove operation visits each of those cursors and adjusts them if necessary.
os_ordered_ptr_hash
Instances of this class are used as ObjectStore collection representations. Unlike the other hash tables, this representation supports maintain_order behavior. The ordered pointer hash representation supports O(1) element lookup, which means that operations such as contains() and remove() are O(1) (in the number of elements). But an os_ordered_ptr_hash takes up somewhat more space than an os_packed_list. Time Complexity
In the following table, complexities are shown in terms of collection cardinality, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic.)
insert() | O(1) |
position-based insert | O(n) |
remove() | O(1) |
position-based remove | O(n) |
cardinality() | O(1) |
contains() | O(1) |
comparisons (<=, ==, and so on) | O(n) |
merges (|, &, -) | O(n) |
If there are safe cursors open on a particular collection, each insert or remove operation visits each of those cursors and adjusts them if necessary.
Space Overhead and Clustering
An ordered pointer hash has the following components:
On average, a pointer hash is 58.3% full. You can estimate the average size of a pointer hash as follows:
if cardinality <= 65535 average total size in bytes = 56 + cardinality * 8 / 58.3 if cardinality > 65535 average total size in bytes = 56 + cardinality * 12 / 58.3The minimum fill for a packed list is 46.7%, so an upper bound on collection space overhead can be calculated as follows:
if cardinality <= 65535 maximum total size in bytes = 56 + cardinality * 8 / 46.7 if cardinality > 65535 maximum total size in bytes = 56 + cardinality * 12 / 46.7
For cardinalities below 30, os_chained_list might be a better representation type.
In the following table, complexities are shown in terms of collection cardinality, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic.)
There might be "holes" in an os_packed_list if any elements have been removed.
On average, a packed list is 83.3% full. You can estimate the average size of a collection as follows:
average total size in bytes = 40 + cardinality * 4 / 83.3The minimum fill for a packed list is 66.7%, so an upper bound on collection space overhead can be calculated as follows:
maximum total size in bytes = 40 + cardinality * 4 / 66.7
In addition, as an os_ptr_bag grows, there can be overhead during collection updates, for reorganization. The representation os_dyn_bag minimizes reorganization overhead at the expense of some extra space overhead by using, at large cardinalities, a directory structure that points to many small hash tables that can reorganize independently.
This representation type does not support maintain_order behavior.
For cardinalities below 30, os_chained_list might be a better representation type.
insert() | O(1) |
remove() | O(1) |
cardinality() | O(1) |
contains() | O(1) |
comparisons (<=, ==, and so on) | O(n) |
merges (|, &, -) | O(n) |
If there are safe cursors open on a particular collection, each insert or remove operation visits each of those cursors and adjusts them if necessary.
Space Overhead and Clustering
A pointer hash has the following components:
On average, a pointer bag is 58.3% full. You can estimate the average size of a pointer bag as follows:
average total size in bytes = 48 + cardinality * 8 / 58.3The minimum fill for a packed list is 46.7%, so an upper bound on collection space overhead can be calculated as follows:
maximum total size in bytes = 48 + cardinality * 8 / 46.7
The representation os_vdyn_bag minimizes reorganization overhead at the expense of some extra space overhead, compared with os_ptr_bag. At large cardinalities, os_vdyn_bag uses a directory structure pointing to many small hash tables that can reorganize independently.
This representation type does not support maintain_order or maintain_cursors behavior.
For cardinalities below 30, os_chained_list might be a better representation type.
This class is parameterized, with a parameter indicating the type of ObjectStore reference to use for recording membership. The parameter must be os_reference.
In the following table, complexities are shown in terms of collection cardinality, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic.)
insert() | O(1) |
remove() | O(1) |
cardinality() | O(1) |
contains() | O(1) |
comparisons (<=, ==, and so on) | O(n) |
merges (|, &, -) | O(n) |
For an os_vdyn_bag whose reference type parameter is REF_TYPE, if
cardinality <= 64kthe small-medium cardinality data structure is used. You can estimate its size as follows:
average total size = 24 bytes (header) +If
( ((cardinality / .69) / 16) + ((cardinality / .69) % 16)) *
( ((sizeof(REF_TYPE) + 4) * 16) + 4)
cardinality > 64kthe large cardinality data structure is used. You can estimate its size as follows:
entry_size: os_reference: 20 n_tables = (cardinality / ( ((8192 / <entry-size> )*2) * .7)) dir_size= (n_tables +1) * 12 bytes + 60 average total size = 24 bytes (header) + dir_size + n_tables * 8192 bytes
At large cardinalities, os_vdyn_hash uses a directory structure pointing to many small hash tables that can reorganize independently.
This representation type does not support allow_duplicates, maintain_order, or maintain_cursors behavior.
For cardinalities below 30, os_chained_list might be a better representation type.
This class is parameterized, with a parameter indicating the type of ObjectStore reference to use for recording membership. The parameter must be os_reference.
In the following table, complexities are shown in terms of collection cardinality, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic.)
insert() | O(1) |
remove() | O(1) |
cardinality() | O(1) |
contains() | O(1) |
comparisons (<=, ==, and so on) | O(n) |
merges (|, &, -) | O(n) |
For an os_vdyn_hash whose reference type parameter is REF_TYPE, if
cardinality <= 64kthe small-medium cardinality data structure is used. You can estimate its size as follows:
average total size = 24 bytes (header) +If
( ((cardinality / .69) / 16) + ((cardinality / .69) % 16)) *
(sizeof(REF_TYPE) + 4)
cardinality > 64kthe large cardinality data structure is used. You can estimate its size as follows:
entry_size: os_reference: 20 n_tables = (cardinality / ( ((8192 / <entry-size> )) * .7)) dir_size= (n_tables +1) * 12 bytes + 60 average total size = 24 bytes (header) + dir_size + n_tables * 8192 bytes
Updated: 03/31/98 15:57:31