ObjectStore Collections C++ API Reference

Chapter 3

Representation Types

Types

os_chained_list

The class os_chained_list is a representation type that is optimized (in both time and space) for small- to medium-sized collections. Each os_chained_list consists of a header and any number of blocks. The header has a vptr, one word of state and up to 15 pointers. When the number of pointers in the header is exhausted, an os_chained_list_block is allocated and chained to the header.

Each os_chained_list_block can contain up to 255 pointers. It has two or three words of overhead: one word of state information, a previous pointer, and possibly a next pointer (the first os_chained_list_block allocated does not have a next pointer until the next block is allocated). The default version of os_chained_list contains four pointers in the header and seven or eight pointers in its blocks.

The maximum cardinality for os_chained_lists is 131070.

Controlling the number of pointers
When you create an os_chained_list, what is really allocated is an instance of a parameterized class derived from os_chained_list: os_chained_list_pt<NUM_PTRS_IN_HEAD,NUM_PTRS_IN_BLOCKS>. The default parameterization is <4,8>, but you can specify a different parameterization with the following macros:

Macros for specifying parameterization
OS_MARK_CHAINED_LIST_REP
(ptrs_in_header,ptrs_in_blocks)
Use OS_MARK_CHAINED_LIST_REP() in the same dummy function as OS_MARK_SCHEMA_TYPE().
OS_INSTANTIATE_CHAINED_LIST_REP
(ptrs_in_header,ptrs_in_blocks)
Use OS_INSTANTIATE_CHAINED_LIST_REP() at file scope. It declares some static state needed by the representation.
OS_INITIALIZE_CHAINED_LIST_REP
(ptrs_in_header,ptrs_in_blocks)
Execute OS_INITIALIZE_CHAINED_LIST_REP() in a function. It registers the new parameterization with the collections library.
Include the files <coll/chlist.hh>, <coll/chlistpt.hh>, and <coll/chlistpt.c> if you use these macros.

In order to create a collection using a chained list with other than the default parameterization, you invoke the following static member function:

static os_chained_list_descriptor*
os_chained_list_descriptor::find_rep(
      os_unsigned_int32 ptrs_in_hdr,
      os_unsigned_int32 ptrs_in_blocks
);
If the requested parameterization has been specified with the above macros, the appropriate representation descriptor is returned. Otherwise, 0 is returned.

Note that an os_chained_list must have at least four pointers in the header but not more than 15 pointers.

An os_chained_list with a four-pointer header can change freely into any other collection representation and the reverse. However, other collection representations cannot change into os_chained_lists with more than four pointers in the header. A normal collection header is 24 bytes. An os_chained_list with more than four pointers exceeds this limit. It is possible for an os_chained_list with an oversized header to change into another representation (with the same or smaller size header).

Pool allocation of blocks
You can request pool allocation of os_chained_list_blocks with the environment variable OS_COLL_POOL_ALLOC_CHLIST_BLOCKS and the function os_chlist_pool::configure_pool(). In some cases this decreases the time needed for individual allocation of os_chained_list_blocks and increases the chance of getting good locality of reference.

Setting OS_COLL_POOL_ALLOC_CHLIST_BLOCKS (to 1) turns on pool allocation. There is one pool per segment; each pool consists of an array of subpools. Each subpool is two pages by default.

By allocating larger subpools, you can defer the cost of allocating new subpools at the expense of potentially wasted space. To allocate larger subpools, use this function:

static void
os_chlist_pool::configure_pool(
      os_unsigned_int32 config_options,
      os_unsigned_int32 blks_per_subpool=2
);
config_options can have one of the following values:

The second argument, which is optional and defaults to 2, controls the number of pages allocated per subpool.

Mutation checks
In order to improve performance, an os_chained_list does not necessarily check to see if it should change to another representation after every insert or remove operation. By default, it checks when the cardinality is roughly a multiple of 7. However, you can control the frequency with which it checks by invoking the static member function

static void 
os_chained_list_descriptor::set_reorg_check_interval(
os_unsigned_int32 v
);
ObjectStore sets the check interval to one less than the power of 2 that is greater than or equal to v. For example, in order to check on every other insert or remove, pass 1 or 2 as an argument. Passing 3 or 4 results in a check on every third operation. Passing 0 inhibits mutation. However, if the maximum cardinality for an os_chained_list is reached, it will change to another representation.

mutate_when_full behavior
For collections whose representation is os_chained_list, if you specify the behavior enumerator os_collection::chained_list_mutate_when_full, the collection's representation will not change until it reaches the maximum cardinality for chained lists.

os_dyn_bag

Instances of this class are used as ObjectStore collection representations. The os_dyn_bag representation supports O(1) element lookup, which means that operations such as contains() and remove() are O(1) (in the number of elements). But an os_dyn_bag takes up somewhat more space than an os_packed_list.

The representation os_dyn_bag minimizes reorganization overhead at the expense of some extra space overhead, compared with os_ptr_bag. At large cardinalities, os_dyn_bag uses a directory structure pointing to many small hash tables that can reorganize independently.

This representation type does not support maintain_order or maintain_cursors behavior.

For cardinalities below 30, os_chained_list might be a better representation type.

In the following table, complexities are shown in terms of collection cardinality, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic.)
insert()O(1)
remove()O(1)
cardinality()O(1)
contains()O(1)
comparisons (<=, ==, and so on)O(n)
merges (|, &, -)O(n)

If cardinality <= 64k, the small-medium cardinality data structure is used. It contains the following:

On average, an os_dyn_bag at low-medium cardinalities is 69% full. You can estimate the average size as follows:

Avg. total size in bytes = 24 + (cardinality/.69) * 8
If cardinality > 64k the large cardinality data structure is used. It contains the following:

On average, each small hash table in an os_dyn_bag at high cardinalities is 70% full. You can estimate the average size as follows:

                                                                 n_entries = Avg. number of entries per small hash table = (8192/8) * .7
n_tables = Avg. number of small hash tables = cardinality / n_entries
dir_size = Avg. directory size in bytes = 60 + (n_tables+1) * 12
Avg. total size in bytes = 24 bytes + dir_size + n_tables * 8192

os_dyn_hash

Instances of this class are used as ObjectStore collection representations. The dynamic hash representation supports O(1) element lookup, which means that operations such as contains() and remove() are O(1) (in the number of elements). But an os_dyn_hash takes up somewhat more space than an os_packed_list.

At large cardinalities, os_dyn_hash uses a directory structure pointing to many small hash tables that can reorganize independently.

This representation type does not support allow_duplicates, maintain_order, or maintain_cursors behavior.

For cardinalities below 30, os_chained_list might be a better representation type.

In the following table, complexities are shown in terms of collection cardinality, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic.)
insert()O(1)
remove()O(1)
cardinality()O(1)
contains()O(1)
comparisons (<=, ==, and so on)O(n)
merges (|, &, -)O(n)

If cardinality <= 64k, the small-medium cardinality data structure is used. It contains the following:

On average, an os_dyn_hash at low-medium cardinalities is 69% full. You can estimate the average size as follows:

Avg. total size in bytes = 24 + (cardinality/.69) * 4
If cardinality > 64k the large cardinality data structure is used. It contains the following:

On average, each small hash table in an os_dyn_hash at high cardinalities is 70% full. You can estimate the average size as follows:

                                                      n_entries = Avg. number of entries per small hash table = (8192/4) * .7
n_tables = Avg. number of small hash tables = cardinality / n_entries
dir_size = Avg. directory size in bytes = 60 + (n_tables+1) * 12
Avg. total size in bytes = 24 bytes + dir_size + n_tables * 8192

os_ixonly and os_ixonly_bc

Instances of these classes are used as ObjectStore collection representations. They are both index-only representations that support O(1) element lookup. Operations such as contains() and remove() are O(1) (in the number of elements). But they take up somewhat more space than an os_packed_list.

For large collections subject to contention, os_ixonly_bc can provide significantly better performance than os_ixonly. See os_ixonly_bc, below.

The next chapter discusses associating indexes with collections to improve the efficiency of queries. With os_ixonly or os_ixonly_bc, you can save space by telling ObjectStore to record the membership of the collection in one of its indexes, as opposed to recording the membership in both the index and the collection. In other words, you can save space by using an index as a collection's representation.

When these representation types are specified for a collection, you must add an index to it before any operations are performed on it. Additional indexes can also be added.

These representation types are incompatible with the following behaviors: maintain_order, maintain_cursors, allow_nulls, and allow_duplicates.

Note that using these representations can save on space overhead at the expense of reducing the efficiency of some collection operations. If the only time-critical collection operation is index-based element lookup, an index-only representation is likely to be beneficial.

For cardinalities below 30, os_chained_list might be a better representation type.

os_ixonly_bc is just like os_ixonly, except that insert() and remove() do not update cardinality information, avoiding contention in the collection header. The disadvantage of os_ixonly_bc is that cardinality() is an O(n) operation, requiring a scan of the whole collection.

You can determine if a collection updates its cardinality in this way with the following member of os_collection:

      os_int32 cardinality_is_maintained() const;
This function returns nonzero if the collection maintains cardinality; it returns 0 otherwise.

The following member of os_collection, which returns an estimate of a collection's cardinality, is an O(1) operation in the size of the collection:

      os_unsigned_int32 cardinality_estimate() const;
This function returns the cardinality as of the last call to os_collection::update_cardinality() - see below. For collections that maintain cardinality, the actual cardinality is returned.

Before you add a new index to an os_ixonly_bc collection, call the following member of os_collection:

      os_unsigned_int32 update_cardinality();
If you do not, add_index() will work correctly, but less efficiently than if you do. This function updates the value returned by os_collection::cardinality_estimate(), by scanning the collection and computing the actual cardinality.

In the following table, complexities are shown in terms of collection cardinality, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic.)
insert()O(1)
remove()O(1)
cardinality(), os_ixonlyO(1)
cardinality(), os_ixonly_bcO(n)
contains()O(1)
comparisons (<=, ==, and so on)O(n)
merges (|, &, -)O(n)

If there are safe cursors open on a particular collection, each insert or remove operation visits each of those cursors and adjusts them if necessary.

os_ordered_ptr_hash

Instances of this class are used as ObjectStore collection representations. Unlike the other hash tables, this representation supports maintain_order behavior. The ordered pointer hash representation supports O(1) element lookup, which means that operations such as contains() and remove() are O(1) (in the number of elements). But an os_ordered_ptr_hash takes up somewhat more space than an os_packed_list.

This representation type does not support be_an_array behavior.

For cardinalities below 30, os_chained_list might be a better representation type.

Time Complexity

In the following table, complexities are shown in terms of collection cardinality, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic.)
insert()O(1)
position-based insertO(n)
remove()O(1)
position-based removeO(n)
cardinality()O(1)
contains()O(1)
comparisons (<=, ==, and so on)O(n)
merges (|, &, -)O(n)

If there are safe cursors open on a particular collection, each insert or remove operation visits each of those cursors and adjusts them if necessary.

Space Overhead and Clustering

An ordered pointer hash has the following components:

The entry for a given element is likely to be on a different page from the collection header.

On average, a pointer hash is 58.3% full. You can estimate the average size of a pointer hash as follows:

if cardinality <= 65535
      average total size in bytes = 56 + cardinality * 8 / 58.3
if cardinality > 65535
      average total size in bytes = 56 + cardinality * 12 / 58.3
The minimum fill for a packed list is 46.7%, so an upper bound on collection space overhead can be calculated as follows:

if cardinality <= 65535
      maximum total size in bytes = 56 + cardinality * 8 / 46.7
if cardinality > 65535
      maximum total size in bytes = 56 + cardinality * 12 / 46.7

os_packed_list

Instances of this class are used as ObjectStore collection representations. The packed list representation is relatively space-efficient, but element lookup is an O(n) operation, which means that operations such as remove() and contains() are O(n) (in the number of elements). If duplicates are allowed, this representation provides the fastest insertion times, but if duplicates are not allowed (requiring element lookup to check for the presence of a duplicate), insert() is O(n).

For cardinalities below 30, os_chained_list might be a better representation type.

In the following table, complexities are shown in terms of collection cardinality, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic.)
insert(), duplicates allowed O(1)
insert(), duplicates not allowed O(n)
position-based insert, no "holes"O(1)
position-based insert, with "holes"O(n)
remove()O(n)
position-based remove, no "holes"O(1)
position-based remove, with "holes"O(n)
cardinality()O(1)
contains()O(n)
comparisons (<=, ==, and so on)O(n2)
merges (|, &, -)O(n2)

There might be "holes" in an os_packed_list if any elements have been removed.

If there are safe cursors open on a particular collection, each insert or remove operation visits each of those cursors and adjusts them if necessary.

A packed list has the following components:

The entry for a given element is likely to be on a different page from the collection header.

On average, a packed list is 83.3% full. You can estimate the average size of a collection as follows:

      average total size in bytes = 40 + cardinality * 4 / 83.3
The minimum fill for a packed list is 66.7%, so an upper bound on collection space overhead can be calculated as follows:

      maximum total size in bytes = 40 + cardinality * 4 / 66.7

os_ptr_bag

Instances of this class are used as ObjectStore collection representations. The pointer hash representation supports O(1) element lookup, which means that operations such as contains() and remove() are O(1) (in the number of elements). But an os_ptr_bag takes up somewhat more space than an os_packed_list.

In addition, as an os_ptr_bag grows, there can be overhead during collection updates, for reorganization. The representation os_dyn_bag minimizes reorganization overhead at the expense of some extra space overhead by using, at large cardinalities, a directory structure that points to many small hash tables that can reorganize independently.

This representation type does not support maintain_order behavior.

For cardinalities below 30, os_chained_list might be a better representation type.

Time Complexity

In the following table, complexities are shown in terms of collection cardinality, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic.)
insert()O(1)
remove()O(1)
cardinality()O(1)
contains()O(1)
comparisons (<=, ==, and so on)O(n)
merges (|, &, -)O(n)

If there are safe cursors open on a particular collection, each insert or remove operation visits each of those cursors and adjusts them if necessary.

Space Overhead and Clustering

A pointer hash has the following components:

The entry for a given element is likely to be on a different page from the collection header. In addition, the count slot for a given element is likely to be stored on a different page from both the header and the entry for the element.

On average, a pointer bag is 58.3% full. You can estimate the average size of a pointer bag as follows:

average total size in bytes = 48 + cardinality * 8 / 58.3
The minimum fill for a packed list is 46.7%, so an upper bound on collection space overhead can be calculated as follows:

maximum total size in bytes = 48 + cardinality * 8 / 46.7

os_vdyn_bag

Instances of this class are used as ObjectStore collection representations. The os_vdyn_bag representation saves on relocation overhead by recording its membership using ObjectStore references instead of pointers. It supports O(1) element lookup, which means that operations such as contains() and remove() are O(1) (in the number of elements). But an os_vdyn_bag takes up more space than an os_packed_list.

The representation os_vdyn_bag minimizes reorganization overhead at the expense of some extra space overhead, compared with os_ptr_bag. At large cardinalities, os_vdyn_bag uses a directory structure pointing to many small hash tables that can reorganize independently.

This representation type does not support maintain_order or maintain_cursors behavior.

For cardinalities below 30, os_chained_list might be a better representation type.

This class is parameterized, with a parameter indicating the type of ObjectStore reference to use for recording membership. The parameter must be os_reference.

In the following table, complexities are shown in terms of collection cardinality, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic.)
insert()O(1)
remove()O(1)
cardinality()O(1)
contains()O(1)
comparisons (<=, ==, and so on)O(n)
merges (|, &, -)O(n)

For an os_vdyn_bag whose reference type parameter is REF_TYPE, if

      cardinality <= 64k
the small-medium cardinality data structure is used. You can estimate its size as follows:

average total size = 24 bytes (header) +
( ((cardinality / .69) / 16) + ((cardinality / .69) % 16)) *
( ((sizeof(REF_TYPE) + 4) * 16) + 4)
If

      cardinality > 64k
the large cardinality data structure is used. You can estimate its size as follows:

entry_size:
      os_reference: 20
n_tables = (cardinality / ( ((8192 / <entry-size> )*2) * .7)) 
dir_size= (n_tables +1) * 12 bytes + 60
average total size = 24 bytes (header) + dir_size + n_tables * 8192 
bytes

os_vdyn_hash

Instances of this class are used as ObjectStore collection representations. The os_vdyn_hash representation saves on relocation overhead by recording its membership using ObjectStore references instead of pointers. It supports O(1) element lookup, which means that operations such as contains() and remove() are O(1) (in the number of elements). But an os_vdyn_hash takes up more space than an os_packed_list.

At large cardinalities, os_vdyn_hash uses a directory structure pointing to many small hash tables that can reorganize independently.

This representation type does not support allow_duplicates, maintain_order, or maintain_cursors behavior.

For cardinalities below 30, os_chained_list might be a better representation type.

This class is parameterized, with a parameter indicating the type of ObjectStore reference to use for recording membership. The parameter must be os_reference.

In the following table, complexities are shown in terms of collection cardinality, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic.)
insert()O(1)
remove()O(1)
cardinality()O(1)
contains()O(1)
comparisons (<=, ==, and so on)O(n)
merges (|, &, -)O(n)

For an os_vdyn_hash whose reference type parameter is REF_TYPE, if

      cardinality <= 64k
the small-medium cardinality data structure is used. You can estimate its size as follows:

average total size = 24 bytes (header) +
( ((cardinality / .69) / 16) + ((cardinality / .69) % 16)) *
(sizeof(REF_TYPE) + 4)
If

      cardinality > 64k
the large cardinality data structure is used. You can estimate its size as follows:

entry_size:
      os_reference: 20
n_tables = (cardinality / ( ((8192 / <entry-size> )) * .7)) 
dir_size= (n_tables +1) * 12 bytes + 60
average total size = 24 bytes (header) + dir_size + n_tables * 8192 
bytes



[previous] [next]

Copyright © 1997 Object Design, Inc. All rights reserved.

Updated: 03/31/98 15:57:31