ObjectStore C++ Advanced API User Guide

Chapter 4 Advanced Collections

This chapter is intended to augment the information contained in Chapter 5, Collections, of the ObjectStore C++ API User Guide. The information contained here is organized in the following manner:

Advanced Collections Overview

A collection is an object that serves to group together other objects. It provides a convenient means of storing and manipulating groups of objects, supporting operations for inserting, removing, and retrieving elements. Collections also support set-theory operations such as intersection and set-theory comparisons such as subset.

In addition, collections form the basis of the ObjectStore query facility, which allows you to select those elements of a collection that satisfy a specified condition. Queries with simple conditions are discussed in this chapter. Queries with complex conditions are described in Chapter 5, Queries and Indexes.

Collections are commonly used to model many-valued attributes, and they can also be used as class extents (which hold all instances of a particular class). Collections of one type, dictionaries, associate a key with each element or group of elements, and so can be used to model binary associations or mappings.

Using Paths in Navigation

The ObjectStore collection facility provides a number of classes that help you navigate within a collection. The os_Cursor class, the os_index_path class, and the os_coll_range class all help you insert and remove elements, as well as retrieve particular elements or sequences of elements.

The os_Cursor class is discussed in Using Cursors for Navigation in Chapter 5 of the ObjectStore C++ API User Guide. See also os_Cursor in the ObjectStore Collections C++ API Reference.
The os_coll_range class is described in Using Ranges in Navigation. See also os_coll_range in the ObjectStore Collections C++ API Reference.

The os_index_path class is discussed here and in the immediately following sections of this chapter. See also os_index_path in the ObjectStore Collections C++ API Reference.

Paths

A path, an instance of os_index_path, represents a navigational path starting from the elements of a collection. Paths are used to specify index keys, and to specify a cursor's associated order. See Creating Paths, and also Paths and Member Functions.

Paths are also used in conjunction with ranges to specify a restriction on the elements a cursor visits. See Using Ranges in Navigation for information on ranges.

Creating Paths

Paths are used to specify traversal order (see Controlling Traversal Order) and index keys (see Indexes and Query Optimization).

Simple Paths

Using the simplest kind of path, you can base an index key or iteration order on the value of some data member or simple member function. For example, you can iterate through a set of parts in order of the part's part numbers (parts with lower numbers precede parts with higher numbers). The member is specified with an instance of the class os_index_path, which designates a member access path.

To iterate in order of part numbers, first create a path with os_index_path::create(), which returns an os_index_path&:

      os_index_path::create("part*", "part_number", db1)

The object created designates the path from part pointers to their part numbers. Here, part* is the path's type string, which names the element type of collections whose elements can serve as path starting points. part_number is a path string indicating the data member itself. The argument db1 is a database whose schema contains the definition of the class part.

Both type strings and path strings can contain white space around tokens.

The instance of os_index_path generated by the call to create() is heap-allocated. When you no longer need it, you should deallocate it with the following static member of os_index_path:

      static void destroy(os_index_path&) ;

Multiple Member Paths

Sometimes path expressions specify not just a single member, but a navigational path involving multiple member accesses. For example, to base iteration order on the emp_id of a part's responsible_engineer, you create the following path:

      os_index_path::create(
            "part*", "responsible_engineer->emp_id", 
            db1
      ) ;

Examples of path expressions

A path applied to a pointer to an employee, returning the employee's name, is specified by the path expression

      os_index_path::create("employee*", "name", db1)

A path applied to a pointer to an employee, returning the employee's manager, is specified by

      os_index_path::create(
            "employee*", "department->manager", 
            db1
      )

A path applied to a pointer to an employee, returning the name of each of the employee's supervisees, is specified by

      os_index_path::create(
            "employee*", "supervisees[]->name", 
            db1
      )

Brackets ([]) in a path indicate that the path goes through each element of the indicated collection. The path mapping follows the remainder of the specified path for each element. Thus, application of a path mapping can result in many values.

In this last example, the path maps a single employee* to many names, the names of the employee's supervisees. If you use such a path to specify a key for an index, then the index would handle lookup of an employee by the name of any one of the employee's supervisees.

But paths with brackets cannot be used to specify iteration order. Iteration in order of names of supervisees is not well defined, because there is more than one supervisee. In contrast, iterating by name of supervisor, for example, is well defined, if each employee has exactly one supervisor.

Another important point is that if an index path contains a path through a collection. This collection either has to be parameterized or you must cast it to the appropriate type of parameterized collection in the index path string. This is necessary so thaat the index path parser can determine if, for example, name is a valid data member of the type of element in the supervisees collection. For example,

(os_Collection<employee*>&)supervisees[]->name

Rank and Hash Functions

Paths that end in a class, or that end in a floating numerical type, must have associated rank and/or hash functions. And you must register these functions by calling os_index_key(). See The os_index_key() Macro.

Paths and Member Functions

Member functions called in path strings are subject to certain restrictions and prerequisites, as described in this section.

Restrictions

Member functions called in query or path strings are subject to certain restrictions:

The return type can be any type but if it is a user-defined type, the rank/hash functions for the type must be defined.
The function must take no arguments.

For example, consider the following class:

      class person {
            private:
                  unsigned int age;
                  person* sibling;
            public:
                  unsigned int get_age();
            person* get_sibling();
            unsigned int get_age_n_years_ago(unsigned int n);
      }

The functions get_age() and get_sibling() can be referenced in a query or path string. get_age_n_years_ago() violates the second of the preceding restrictions.

To perform a query, ObjectStore sometimes (depending on what indexes are present) issues calls to member functions used in paths and queries. If such a member function allocates memory it does not free (for example if it returns a pointer to newly allocated memory), memory leaks can result; ObjectStore does not free the space the function allocates. So member functions used in paths or queries should not allocate persistent memory or memory in the transient heap.

Prerequisites

Applications that use a member function in a query or path string must do four things:

Define an os_backptr-valued data member in the class that defines the member function. This data member must appear before any query functions in the class definitions.
Call the macro os_query_function().
Call the macro os_query_function_body().
Call the macro OS_MARK_QUE RY_FUNCTION().

You can name the os_backptr member anything you want. In addition, you can use the same os_backptr member for indexable data members and member functions; a class never needs to define more than one os_backptr-valued member.

The os_query_function() Macro

A call to os_query_function() has the form

Form of the call

      os_query_function( class, func, return_type)

where

class is the name of the class defining the member function.
func is the name of the member function itself.
return_type names the type of value returned by the member function.

The os_query_function() macro should be invoked at module level in a header file (for example, the file containing the definition of the class that declares the member function). No white space should appear in the argument list.

The os_query_function_returning_ref() Macro

The application that uses this member function, returning reference, in a query must call os_query_function_returning_ref().

Form of the call

A call to this macro has the form

      os_query_function_returning_ref( class, func, return_type)

where

class is the name of the class defining the member function.
func is the name of the member function itself.
return_type names the type of value returned by the member function. The way to use this is to pass just return_type, not return_type&, to the macro return_type arguments.

Use of functions returning references in query strings

In query and index path strings, functions returning references should be treated as if they returned pointers. For example, queries over the function

      Name& get_name();

Should be written as if the function signature were

      Name* get_name();

That is,

      *get_name() == *(Name*)  Freevar

An index path to support this query would be

      *get_name()

The os_query_function_body() Macro

A call to os_query_function_body() has the form

Form of the call

      os_query_function_body( class, func,return_type, bpname)

where

class is the name of the class that defines the member function.
func is the name of the member function itself.
return_type names the type of value returned by the member function.
bpname is the name of the os_backptr-valued member of class.

The os_query_function_body() macro should be invoked at module level in a source file (for example, the file containing the definition of the member function). No white space should appear in the argument list.

The OS_MARK_QUERY_FUNCTION() Macro

A call to OS_MARK_QUERY_FUNCTION() has the form

Form of the call

      OS_MARK_QUERY_FUNCTION( class, func)

where

class is the name of the class that defines the member function.
func is the name of the member function itself.

The OS_MARK_QUERY_FUNCTION() macro should be invoked along with the OS_MARK_SCHEMA_TYPE() macros for an application's schema, that is, in the schema source file, inside the dummy function containing the calls to OS_MARK_SCHEMA_TYPE(). No white space should appear in the argument list of OS_MARK_QUERY_FUNCTION().

The os_query_function_body_returning_ref() Macro

This macro enables users to register a query function that returns a reference. The application that uses this member function in a query must call os_query_function_body_returning_ref():

Form of the call

      os_query_function_body_returning_ref(

                   class, func, return_type, bpname

      )

where

class is the name of the class defining the member function.
func is the name of the member function itself.
return_type names the type of value returned by the member function. The way to use this is to pass just return_type, not return_type&, to the return_type arguments of the macro.
bpname is the name of the os_backptr-valued member of class.

Use of functions returning references in query strings

In query and index path strings, functions returning references should be treated as if they returned pointers. For example, queries over the function

      Name& get_name();

Should be written as if the function signature were

      Name* get_name()

That is,

      *get_name() == *(Name*) Freevar

An index path to support this query would be

      *get_name()

Path String Syntax Extension

Given a path string that specifies a path ending in pointers to objects:

       path-string

If the last component of the path string is a member function name, you can construct a path string to specify a path ending in those objects themselves, using * (asterisk) as follows:

      * ( path-string)

The parentheses are not necessary if the original path string specifies a single-step path.

Consider for example the path

      os_index_path::create(
            "rectangle*", 
            "get_location()", 
            db
      ) ;

If the function rectangle::get_location() returns a pointer to a coord object, this path ends in pointers to coord objects. So you can also construct a path that ends in coord objects themselves, rather than pointers:

      os_index_path::create(
            "rectangle*", 
            "*get_location()", 
            db
      ) ;

Index Maintenance

To maintain indexes keyed by paths containing member function calls, use os_backptr::make_link() and os_backptr::break_link(). See User-Controlled Index Maintenance with an os_backptr.

Controlling Traversal Order

To control traversal order, use one of the constructors for os_Cursor. The various overloadings allow you to specify a traversal order based on

A specified path
The rank function registered for the collection's element type
A specified rank function
The order in persistent memory of the objects pointed to by collection elements

Rank-Function-Based Traversal

      os_Cursor(
            const os_collection&, 
            const char* typename, 
      ) ;

If you create a cursor using this constructor, which takes the name of the element type as argument, then iteration using that cursor will follow the order determined by the element type's rank function. See Supplying Rank and Hash Functions.

      os_Cursor(
            const os_collection&, 
            _Rank_fcn
      ) ;

Rank-function-based cursors are update insensitive. See Performing Collection Updates During Traversal.

Address Order Traversal

      os_Cursor(
            const os_Collection&, 
            os_int32 options
      ) ;

If you supply os_collection::order_by_address as the options argument, this cursor iterates in address order. This is the order in which the objects pointed to by collection elements are arranged in persistent memory.

If you will dereference each collection element as you retrieve it, and the objects pointed to by collection elements will not all fit in the client cache at once, this order can dramatically reduce paging overhead.

An order-by-address cursor is update insensitive.

Path-Based Traversal

      os_Cursor(
            const os_Collection&, 
            const os_index_path&
      ) ;

If you create a cursor using this constructor, which takes a reference to an os_index_path as an argument, then iteration using that cursor will follow the order determined by the path. See Creating Paths.

In the case of multiple member paths (see page 65), the traversal will first visit all elements that have a complete path, and then visit the other elements in order of decreasing path length. That is, given the path boss->boss->boss->name, the cursor will first visit those collection elements that actually have a boss->boss->boss in name order, before visiting any that do not.

      os_index_path &a_path = os_index_path::create(
            "part*", 
            "responsible_engineer->emp_id", 
            db1
      ) ;
      os_Cursor<part*> c(a_set, a_path) ;
      part *p = 0 ;
      for (p = c.first(); p; p = c.next())
                  printf("%d", e->emp_id) ;

If an index on the path is present, it is used to make the traversal more efficient.

If an index on the iteration path is not present, the cursor constructor copies the collection elements into a separate structure, applies the path to each element, copies the terminal key into that structure, and then sorts it according to the key rank. Care is taken to sort the structure by address whenever the path interpretation calls for dereferencing a pointer, in order to improve paging behavior. Cursor creation time depends on the length of the path, the size of the collection, and the complexity of the rank function.

When you are performing path-based traversal over some collection, if you update a data member on which the ordering is based, you are effectively removing and then reinserting the element you changed. In other words, when you update such a data member for an element of a collection, you also update the collection itself. Therefore, for an ordered iteration with such updates, the collection's behavior specification must include os_collection::maintain_cursors and the cursor must be created with os_cursor::safe. See Performing Collection Updates During Traversal.

Keep in mind some representations of collections never allow maintain_cursors behavior. These are unordered types of collections.

Using Ranges in Navigation

The os_Cursor class is discussed in Using Cursors for Navigation in Chapter 5 of the ObjectStore C++ API User Guide. See also os_Cursor in the ObjectStore Collections C++ API Reference.
The os_index_path class is described in Using Paths in Navigation. See also os_index_path in the ObjectStore Collections C++ API Reference.

The os_coll_range class is discussed here and in the immediately following sections of this chapter. See also os_coll_range in the ObjectStore Collections C++ API Reference.

Ranges

A range, an instance of os_coll_range, represents a range of values. A range can be used in conjunction with a path to restrict the elements visited by a cursor. See Specifying Collection Ranges and Restricting the Elements Visited in a Traversal.

You can also use an os_coll_range to retrieve from a dictionary the elements whose key falls within a specified range. See Dictionaries in Chapter 5, Collections, of the ObjectStore C++ API User Guide.

Both these uses of ranges provide a way of performing inexpensive, simple queries, without some of the overhead associated with using query(), query_pick(), or exists(). See Chapter 5, Queries and Indexes, for more detailed information about performing queries.

Specifying Collection Ranges

You can create an object that represents a range of values with the class os_coll_range. An instance of this class can be used as argument to the os_Cursor constructor to create a restricted cursor, or as argument to os_Dictionary::pick().

The constructor for os_coll_range has several overloadings. Each overloading falls into one of the following two groups:

Overloadings that specify both a lower and upper bound on a range of values (as in "all values greater than 4 and less than or equal to 7")
Overloadings that specify just a lower bound or just an upper bound (as in "all values less than or equal to 7")

In each of these two groups, there is one overloading for each C++ fundamental type of value, and one for the type void*. To specify a range for any type of pointer value, use a void* overloading.

Ranges with Only One Bound

Here are the overloadings for os_coll_range() that specify only an upper or lower bound.

      os_coll_range(int rel_op, int value) ;
      os_coll_range(int rel_op, unsigned int value) ;
      os_coll_range(int rel_op, short value) ;
      os_coll_range(int rel_op, unsigned short value) ;
      os_coll_range(int rel_op, char value) ;
      os_coll_range(int rel_op, unsigned char value) ;
      os_coll_range(int rel_op, long value) ;
      os_coll_range(int rel_op, unsigned long value) ;
      os_coll_range(int rel_op, float value) ;
      os_coll_range(int rel_op, double value) ;
      os_coll_range(int rel_op, const void* value) ;

Enumerators for the rel_op argument

The argument rel_op should be one of the following enumerators:

os_collection::EQ (equal to)
os_collection::NE (not equal to)
os_collection::LT (less than)
os_collection::LE (less than or equal to)
os_collection::GT (greater than)
os_collection::GE (greater than or equal to)

A collection range created with one of these functions is satisfied by all values that bear the relation rel_op to value. When the value type is char*, these operators are defined in terms of strcmp().

So, for example,

      os_coll_range( os_collection::LE, 7)

is satisfied by all values less than or equal to 7, and

      os_coll_range( os_collection::EQ, 7)

is satisfied only by the value 7.

      os_coll_range( os_collection::LE, "foo")

is satisfied by any char* value, s, such that strcmp(s,"foo") is less than or equal to 0.

      os_coll_range( os_collection::EQ, "foo")

is satisfied by any char* value, s, such that strcmp(s, "foo") is 0.

Ranges with Both an Upper and Lower Bound

Here are the overloadings for os_coll_range() that specify both an upper and lower bound.

      os_coll_range(int rel_op1, int value1, int rel_op2, int value2) ;
      os_coll_range(int rel_op1, unsigned int value1, int rel_op2, 
                  unsigned int value2) ;
      os_coll_range(int rel_op1, short value1, int rel_op2, 
                  short value2) ;
      os_coll_range(int rel_op1, char value1, int rel_op2, char value2) ;
      os_coll_range(int rel_op1, unsigned char value1, int rel_op2, 
                  unsigned char value2) ;
      os_coll_range(int rel_op1, long value1, int rel_op2, long value2) ;
      os_coll_range(int rel_op1, unsigned long value1, int rel_op2, 
                  unsigned long value2) ;
      os_coll_range(int rel_op1, float value1, int rel_op2, float value2) ;
      os_coll_range(int rel_op1, double value1, int rel_op2, 
                  double value2) ;
      os_coll_range(int rel_op1, const void *value1, int rel_op2, 
                  const void *value2) ;

Constructs an os_coll_range satisfied by all values that both bear the relation rel_op1 to value1 and bear the relation rel_op2 to value2. The arguments rel_op and rel_op2 should be one of the following enumerators:

Enumerators for rel_op and rel_op2 arguments

os_collection::EQ (equal to)
os_collection::NE (not equal to)
os_collection::LT (less than)
os_collection::LE (less than or equal to)
os_collection::GT (greater than)
os_collection::GE (greater than or equal to)

When the value type is char*, these relations are defined in terms of strcmp(). So, for example:

      os_coll_range( os_collection::GT, 4, os_collection::LE, 7)

is satisfied by all ints greater than 4 and less than or equal to 7.

Do not specify the null range, for example:

      os_coll_range( os_collection::LT, 4, os_collection::GT, 7 )

Discontinuous ranges

Do not specify a discontinuous range, for example:

      os_coll_range( os_collection::GT, 4, os_collection::NE, 7)

If you do, the exception err_am is signaled, and the following message is issued to stdout:

      No handler for exception:

      <maint-0023-0001>invalid restriction on unordered index (err_am)

Restricting the Elements Visited in a Traversal

A special overloading of the os_Cursor constructor allows you to create a cursor for including in traversals only those collection elements that satisfy a specified restriction. Using such a cursor allows you to perform simple queries that are less expensive than queries performed with os_collection::query().

      os_Cursor<E> (
            const os_Collection<E> & coll, 
            const os_index_path &path,
            const os_coll_range &range,
            os_int32 options = os_cursor::unsafe
      ) ;

An element satisfies the cursor's restriction if the result of applying path to the element satisfies range (see Specifying Collection Ranges). The order of iteration is arbitrary.

See also os_Cursor in Chapter 2, Collection, Query, and Index Classes, of the ObjectStore C++ API User Guide.

Dictionaries

For dictionaries, you can specify a restriction that is satisfied by elements whose key satisfies a specified range.

      os_Cursor<E> (
            const os_dictionary & coll, 
            const os_coll_range &range,
            os_int32 options = os_cursor::unsafe
      );

An element satisfies this cursor's restriction if its key satisfies range. If the dictionary's key type is a class, you must supply rank and hash functions for the class. See Supplying Rank and Hash Functions.

Duplicates

With a restricted cursor, a traversal visits only one occurrence of each qualified element.

Performing Collection Updates During Traversal

If you want to be able to update a collection while traversing it, you must use either an update-insensitive cursor, or a safe cursor.

With an update-insensitive cursor, the traversal is based on a snapshot of the collection elements at the time the cursor was bound to the collection. None of the inserts and removes performed on the collection are reflected in the traversal.

A safe cursor at a given point in a traversal visits any elements inserted later in the traversal order, and does not visit any elements that are later in the traversal order that are removed.

If you update a collection while traversing it without using an update-insensitive or safe cursor, the results of the traversal are undefined.

Update-Insensitive Cursors

You can create an update-insensitive cursor with the following cursor constructor:

            os_Cursor(
            const os_Collection&, 
            os_int32 options
      ) ;

Supply os_collection::update_insensitive as the options argument.

In addition, the following kinds of cursors are always update insensitive:

Safe Cursors

To traverse a collection with a safe cursor, you must specify maintain_cursors when you create the collection (or use change_behavior() - see Changing Collection Behavior with change_behavior() - and you must pass os_collection::safe to the cursor constructor.

Disadvantages of safe cursors

Safe cursors have some drawbacks that update-insensitive cursors do not:

Updates to collections with safe cursors are slower. For each collection in a given segment that has maintain_cursors behavior, there is an entry in a table mapping collections to their safe cursors. This table is stored in the same segment as the collections. An update to one of the collections requires a lookup in the table. Each cursor associated with the collection is checked and adjusted if necessary.
Index maintenance for collections with safe cursors is slower. Whenever index maintenance is performed on an object in an indexed collection that has maintain_cursors behavior, the safe cursor table also has to be visited (because there might be safe ordered cursors that are pointing to the indexes).
Safe cursors are not supported for ObjectStore dictionaries. If you try to create a dictionary with maintain_cursors behavior, you will receive an exception.

Using safe cursors to implement recursion

One advantage of safe cursors is that they can be used to implement recursion without the use of recursive function calls.

Consider, for example, the code below:

      os_database *db1 ; 
      . . . 
      os_Collection<part*> &result_set = 
            os_Collection<part*>::create(db1) ;
      part *a_part, *p ;
      . . . 
      result_set |= a_part ;
      os_Cursor<part*> c(result_set) ;
      for ( p = c.first() ; p ; p = c.next() )
            result_set |= p->children ;  /* UNSAFE!! */

Here, a new (empty) set, result_set, is created, and a_part is added to it. The for loop then iterates over the elements of result_set, adding the children of each element visited to result_set itself. The first element visited is a_part. But after that, the results of the iteration are undefined.

You can specify that you want a collection to support recursive queries by including maintain_cursors in the behavior specification. And you can specify that you want a new cursor to be safe by including the enumerator os_cursor::safe as the second argument to the constructor for os_Cursor:

      os_Collection<part*> &result_set = 
            os_Collection<part*>::create(
                  db1,
                  os_collection::maintain_cursors
            ) ;
      . . . 
      os_Cursor<part*> c(result_set, os_cursor::safe) ;

Below is an example that builds a set of all the descendents of a given part (that is, a set containing the part's children, its children's children, and so on). It is just like the first example in this section, except that the collection is to use maintain_cursors, and a safe cursor is created.

      os_Collection<part*> &result_set = 
            os_Collection<part*>::create(
                  db1,
                  os_collection::maintain_cursors
            ) ;
      part *a_part, *p ;
      . . . 
      result_set |= a_part ;
      os_Cursor<part*> c(result_set, os_cursor::safe) ;
      for (p = c.first(); p; p = c.next())
            result_set |= p->children ;  /* union of two sets */

Every child added to result_set is visited later in the iteration, and its children are added. By the end of the iteration, all the descendents are in the set.

Example: recursive query

Below is another example of a recursive query. This one finds the primitive descendents of a given part, those descendents that themselves have no children:

      os_Collection<part*> &result_set = 
            os_Collection<part*>::create(
                  db1,
                  os_collection::maintain_cursors
            ) ;
      part *a_part, *p ;
      . . . 
      result_set |= a_part ; 
      os_Cursor<part*> c(result_set, os_cursor::safe) ;
      for (p = c.first(); p; p = c.next())
            if (p->children.size()) {
                  result -= p ; /* remove if not primitive */
                  result_set |= p->children ;  /* union of two sets */
            }

Here, each child that itself has children is removed, and the child's children are added to the set. These children just added are visited later in the iteration so that, if they have children, they can be removed and their children added. By the end of the iteration, only the primitive descendents remain in the set.

As mentioned above, recursive queries always work for unordered iteration, but performing recursive queries using ordered iteration requires more care. A value inserted during an ordered iteration is visited only if it is inserted later, in the order of iteration, than the current iteration position.

Ordered, Safe Traversal

If you want to create a cursor that is both ordered and safe, supply the path argument before the enumerator os_cursor::safe:

      os_Cursor<part*> c(a_collection, a_path, os_cursor::safe);

You must perform index maintenance for any data member or member function in the path used to specify the traversal order, if during iteration you update a data member controlling iteration order. See Performing or Enabling Index Maintenance.

Caution about infinite loops

It is important to realize that there are certain dangers associated with performing updates, within an iteration, to a data member controlling iteration order. Consider, for example, the following loop, which iterates through a set of employees, giving some a raise:

      os_Set<part*> &employees = ... ;
      os_index_path &emp_path = 
                  os_index_path::create("employee*", "salary", db1);
      os_Cursor<part*> c(
            employees, 
            emp_path, 
            os_cursor::safe
      );
      employee *e = 0;
      for( e = c.first() ; e ; e = c.next() )
            if ( e->widgets_sold > 100000 )
                  e->salary *= 1.2 ;

Because iteration is by increasing salary, any employee who gets a raise is moved ahead in the iteration order, and so is visited again. If anyone gets a raise, the loop will be infinite. The proper approach is to use an iteration order unaffected by the updates.

Retrieving Uniquely Specified Collection Elements

The retrieve() function

You can retrieve the collection element at which a specified cursor is positioned with this function:

      E retrieve(const os_Cursor<E>&) const ;

If the cursor is null, err_coll_null_cursor is signaled. If the cursor is nonnull but not positioned at an element, err_coll_illegal_cursor is signaled.

The only() function

You can retrieve the only element of a collection with

      E only() const ;

If the collection has more than one element, err_coll_not_singleton is signaled. If the collection is empty, err_coll_empty is signaled, unless the collection's behavior includes os_collection::pick_from_empty_returns_null, in which case 0 is returned.

Ordered Collections

For collections with maintain_order behavior, you can retrieve the element with a specified numerical position with this function:

      E retrieve(os_unsigned_int32 index) const ;

The index is zero-based. If the index is not less than the collection's size, err_coll_out_of_range is signaled. If the collection does not have maintain_order behavior, err_coll_not_supported is signaled.

The retrieve_first() function

You can retrieve a collection's first element with

      E retrieve_first() const ;

This function returns the collection's first element or 0 if the collection is empty. If the collection is not ordered, err_coll_not_supported is signaled.

For collections with allow_nulls behavior, you can use this function instead:

      os_int32 retrieve_first(const E&) const ;

This function modifies the argument to refer to the collection's first element. It returns 0 if the specified collection is empty, and nonzero otherwise. If the collection is not ordered, err_coll_not_supported is signaled.

The retrieve_last() function

To retrieve a collection's last element, use

      E retrieve_last() const ;

This function returns the collection's last element or 0 if the collection is empty. If the collection is not ordered, err_coll_not_supported is signaled.

For collections with allow_nulls behavior, you can use this function instead:

      os_int32 retrieve_last(const E&) const ;

This function modifies the argument to refer to the collection's last element. It returns 0 if the specified collection is empty, and nonzero otherwise. If the collection is not ordered, err_coll_not_supported is signaled.

Selecting Individual Collection Elements with pick()

The function os_collection::pick() can be used to perform simple queries. It provides a relatively inexpensive alternative to os_collection::query_pick() and os_collection::exists(). With the following overloading, you can retrieve a collection element such that the result of applying path (see Creating Paths) to the element is a value that satisfies range (see Specifying Collection Ranges):

      void* pick(
            const os_index_path &path, 
            const os_coll_range &range
      ) const;

Example: pick()

For example:

      const os_index_path &id_path = 
            os_index_path::create("employee*", "id", db1) ;
      os_coll_range range_eq_1138(EQ, 1138) ;
      . . . 
      employee *e = a_coll.pick(id_path, range_eq_1138) ;

assigns to e an employee in a_coll whose id equals 1138.

If there is more than one such element, an arbitrary one is picked and returned. If there is no such element, err_coll_empty is signaled, unless the collection has behavior os_collection::pick_from_empty_returns_null, in which case 0 is returned.

Dictionaries

For dictionaries, you can retrieve an element with the specified key with one of the following two functions:

      E pick(const K const &key_ref) const ;
      E pick(const K *key_ptr) const ;

These two differ only in that with one you supply a reference to the key, and with the other you supply a pointer to the key. Again, if there is more than one element with the key, an arbitrary one is picked and returned. If there is no such element and the dictionary has pick_from_empty_returns_null behavior, 0 is returned. If there is no such element and the dictionary does not have pick_from_empty_returns_null behavior, err_coll_empty is signaled.

For dictionaries, you can also retrieve an element whose key satisfies a specified collection range (see Specifying Collection Ranges) with

      E pick(const os_coll_range&) const ;

Example: dictionary pick()

For example:

      a_dictionary.pick( os_coll_range(GE, 100) )

returns an element of a_dictionary whose key is greater than or equal to 100.

As with the other pick() overloadings, if there is more than one such element, an arbitrary one is picked and returned. If there is no such element and the collection has pick_from_empty_returns_null behavior, 0 is returned. If there is no such element and the dictionary does not have pick_from_empty_returns_null behavior, err_coll_empty is signaled.

If the dictionary's key type is a class, you must supply rank and hash functions for the class (see Supplying Rank and Hash Functions).

The key types char*, char[ ], and os_char_star_nocopy are each treated as a class whose rank and hash functions are defined in terms of strcmp(). For example, for char*:

      a_dictionary.pick("Smith")

returns an element of a_dictionary whose key is the string Smith (that is, whose key, k, is such that strcmp(k, "Smith") is 0).

Picking an Arbitrary Element

You can retrieve an arbitrary collection element with

      E pick() const ;

If the collection is empty and has pick_from_empty_returns_null behavior, 0 is returned. If the collection is empty and does not have pick_from_empty_returns_null behavior, err_coll_empty is signaled.

This is sometimes useful when all the elements of a collection have the same value for a data member, and the easiest way to retrieve this value is through one of the elements.

Example: retrieving an arbitrary collection element

For example, suppose the class bus defines a member for the set of pins connected to it, but no member for the cell in which it resides, while pin defines a member pointing to its attached cell, which in turn has a member pointing to its containing cell. Then the best way to find the cell on which a given bus resides is to find the pins connected to it, and then find the cell on which one of the pins resides.

      a_cell = a_bus->pins.pick()->cell->container ;

Consolidating Duplicates with operator =()

You can use the assignment operator os_Collection::operator =() (see Copying, Combining, and Comparing Collections in Chapter 5, Collections, of the ObjectStore C++ API User Guide) to consolidate duplicates in a bag or other collection. Do this by assigning the collection with duplicates to an empty collection that does not allow (but does not signal) duplicates.

Example: consolidating duplicates

      os_database *db1 ;
      part *a_part, *p ;
      employee *e ;
      . . . 
      os_Collection<employee*> &emp_bag =
            os_Collection<employee*>::create(
                  db1, 
                  os_collection::allow_duplicates
            ) ;
      os_Collection<employee*> &emp_set =
            os_Collection<employee*>::create(db1) ;
      . . . 
      os_Cursor<part*> c(a_part->children) ;
      for ( p = c.first() ; p ; p = c.next() )
            emp_bag.insert(p->responsible_engineer) ;
      emp_set = emp_bag ;  /* consolidate duplicates */
      os_Cursor<employee*> c(emp_set) ;
      for (e = c.first() ; e ; e = c.next() )
            cout << e->name << "\t" << emp_bag.count(e) << "\n" ;

If two of a_part's children have the same responsible_engineer, that engineer appears twice as an element of emp_bag. We consolidate duplicates in emp_set so we can iterate over it, retrieving each engineer only once in the loop, and then we use count() to see how many times the engineer occurs in emp_bag. This is the number of parts for which the engineer is responsible.

Supplying Rank and Hash Functions

In all these examples, iteration order is based on integer-valued data members (part_number, emp_id, or salary); that is, the paths end in integer values. The integers have a system-supplied order, defined by the comparison operators <, >, and so forth. The same is true for pointers. For char* pointers, which are treated differently from other pointers, the order is defined by performing strcmp() on the string pointed to. But what if a path ends in some other type of value; that is, what if it ends in the instances of some class or floating-point numerical type?

If you want to use such a path to control iteration order, you must make known to ObjectStore a utility specific to the class, a rank function that defines an ordering on the type's instances.

You must also supply a rank function if you use such a path to specify a key for an ordered index (see Index Options). For unordered indexes keyed by such paths, you must supply both a rank and a hash function (the rank function is used to resolve hashing collisions). ObjectStore uses these utilities to maintain proper information on index paths that end in the class.

The os_index_key() Macro

You make these utilities known to ObjectStore by calling the macro os_index_key(). Calls to os_index_key() have the following form:

      os_index_key( type, rank-function, hash-function);

For example:

      os_index_key(date,date_rank,date_hash);

The type is the type that is at the end of the path.

Rank Functions

The rank-function is a user-defined global function that, for any pair of instances of class, provides an ordering indicator for the instances, much as strcmp does for strings. The rank function should return one of os_collection::LT, os_collection::GT, or os_collection::EQ.

Rank functions for floating-point numerical types (float, double, and long double) should follow these guidelines:

NaN and inf must be handled specially. For example, the representation of NaN is not unique. In the rank function, test for these values before doing anything else. You might want NaN to rank below any other value.
For the purpose of ranking, comparisons should be precise. For example, the rank function should consider x and y to be equal if x == y but not if abs(x - y) < e (for some small value of e) as long as > and < also check for equality using e. Using imprecise comparisons can lead to corrupt indexes and incorrect query results.

Hash Functions

The hash-function is a user-defined global function that, for each instance of class, returns a value, an os_unsigned_int32, that can be used as a key in a hash table. It takes a const void* argument. If you are not supplying a hash function for the class, this argument should be 0.

Example Use of Rank and Hash Functions

Suppose you have a collection of pointers to messages, instances of a class that you have defined. Further suppose you want to iterate through the messages in order of their dates to display each message. If a message has an indexable data member whose value is its date, you can code such an iteration this way:

      os_collection &messages = . . . ;
      message *a_msg;
      os_index_path &date_path = 
            os_index_path::create("message", "date_received", msg_db);
      os_cursor c(messages, date_path);
      for (a_msg = (message*) c.first(); a_msg; a_msg =
                  (message*)c.next()) a_msg->display();

This assumes that dates have an order that is known to ObjectStore. This is true if dates are instances of an integer or pointer type such as int. But suppose that dates are instances of a user-defined class:

      class date{
            public:
                  int month;
                  int day;
                  int year;
      };

In this case you must define the rank function and make it known to ObjectStore. Here is how you might define it:

      int date_rank(const void* arg1, const void *arg2) {
            const date *date1 = (const date *) arg1;
            const date *date2 = (const date *) arg2;
            if (date1->year < date2->year)
                  return os_collection::LT;
            else if (date1->year > date2->year)
                  return os_collection::GT;
            else if (date1->month < date2->month)
                  return os_collection::LT;
            else if (date1->month > date2->month)
                  return os_collection::GT;
            else if (date1->day < date2->day)
                  return os_collection::LT;
            else if (date1->day > date2->day)
                  return os_collection::GT;
            return os_collection::EQ;
      }

If you also use unordered indexes keyed by date, you must supply a hash function, which might be defined this way:

      os_unsigned_int32 date_hash(const void* x) {
            const date* d = (const date*) x;
            return ((os_unsigned_int32) (d->year) << 16) ^ 

                        (d->month << 8) ^ d->day;
      }

Finally, you must call os_index_key() before your application performs any iteration or query employing an index path ending on a data member of type date.

main() {
      . . . 
      os_index_key(date,date_rank,date_hash);
      . . . 
}

If unordered indexes are never created, the hash function is not needed, and the registration could be done as follows:

       os_index_key(date,date_rank,0);

Specifying Expected Size

Frequently, a collection has a loading phase, in which it is loaded with elements before being the subject of other kinds of manipulation such as queries and traversal. In these cases, it is desirable to create the collection with a representation appropriate to the size it will have once loading is complete. This saves on the overhead of transforming the collection's representation as it grows during the initial loading phase, and can improve locality of reference.

To presize a collection, use the expected_size argument to a collection create() operation or constructor.

Customizing Collection Behavior

You can customize the behavior of new collections with regard to the optional properties for the collection's type. You do this by supplying a behavior argument to create(), an unsigned 32-bit integer, a bit pattern indicating the collection's properties. The bit pattern is obtained by using | (bitwise or) to form the bit-wise disjunction of enumerators taken from the following possibilities.

Behavior Enumerators for Collection Subtypes

The following enumerators can be applied via the behavior argument to create() or change_behavior() for os_Collection and its subtypes. Enumerators that can be applied to the os_Dictionary class are listed in the following section. These enumerators are all described in Chapter 2, Collection, Query, and Index Classes, of the ObjectStore Collections C++ API Reference.

os_collection::pick_from_empty_returns_null: Performing pick() on an empty result of querying the collection returns 0 rather than raising an exception.
os_collection::allow_nulls: The pointer 0 is allowed as an element.
os_collection::maintain_cursors: For traversals with safe cursors (see Performing Collection Updates During Traversal), an element inserted during a traversal will be visited later in that same traversal. An element removed during a traversal of its elements will not be visited later in that same traversal.
os_collection::allow_duplicates: Duplicate elements are allowed.
os_collection::be_an_array: For collections that maintain order only. With this behavior, access to the n^th element is an O(1) operation.

Behavior Enumerators for Dictionaries

The following enumerators can be applied using the behavior argument to create() for os_Dictionary. These enumerators are all described in Chapter 2, Collection, Query, and Index Classes, of the ObjectStore Collections C++ API Reference.

os_collection::pick_from_empty_returns_null: Performing pick() on an empty dictionary returns 0 rather than raising an exception.
os_dictionary::signal_dup_keys: Duplicate keys are not allowed; err_am_dup_key is signaled if an attempt is made to establish two or more elements with the same key.
os_dictionary::maintain_key_order: Range lookups are supported using pick() or restricted cursors. See Selecting Individual Collection Elements with pick() and Restricting the Elements Visited in a Traversal.
os_dictionary::dont_maintain_size: For dictionaries that maintain key order only. With this behavior insert() and remove() do not update size information, avoiding contention in the collection header. This can significantly improve performance for large dictionaries subject to contention. The disadvantage of this behavior is that size() is an O(n) operation, requiring a scan of the whole dictionary. See os_ixonly and os_ixonly_bc.

Required and Forbidden Behaviors

Here is a table summarizing the required and forbidden behaviors of the collection subtypes:


Collections Class	Allow Duplicates	Signal Duplicates	Maintain Order	Allow Nulls	O(1) Positional Access	Maintain Cursors
os_Set	Forbidden	Off by default	Forbidden	Off by default	Forbidden	Forbidden
os_Bag	Required	Forbidden	Forbidden	Off by default	Forbidden	Forbidden
os_List	On by default	Off by default	Required	Off by default	Off by default	Off by default
os_Dictionary	Required	Forbidden	Forbidden	Required	Not applicable	Forbidden
os_Array	On by default	Off by default	Required	Required	Required	Off by default

Changing Collection Behavior with change_behavior()

You can change the behavior of a collection any time after its creation with the function os_Collection::change_behavior(). There are restrictions about what combinations of behaviors are allowed. Any illegal combination causes an exception to be signaled when change_behavior is called.

Fully specifying the new behavior

You supply a bit pattern (an unsigned 32-bit integer) as argument, fully specifying the behavior the collection is to have after the change.

      os_database *db1; . . . 
      os_Collection<part*> &some_parts = 
            os_Collection<part*>::create(
                  db1,
                  os_collection::allow_nulls,
            ) ;
      . . . 
      some_parts.change_behavior(
            os_collection::maintain_order |
            os_collection::allow_duplicates
      ) ;

This changes the collection some_parts from an unordered collection that allows nulls but does not allow duplicates into an ordered collection that does not allow nulls and does allow duplicates. Both before and after the change, maintain_cursors is off.

Changing to default behavior

The following call changes some_parts into a collection with default behavior for its type, whatever its original behavior:

      some_parts.change_behavior(0) ;

Removing behavior

To remove behavior from a collection, you can conjoin (using &) the collection's current behavior with the bit-wise negation (~) of the enumerator representing the behavior to remove. You obtain a bit pattern representing the current behavior with os_collection::get_behavior(), which returns an os_unsigned_int32 (a 32-bit unsigned integer).

Here is a function that changes a collection to disallow duplicates and signal duplicates, and that otherwise does not affect its original behavior:

      f(os_collection &some_parts) {
            os_unsigned_int32 current_behavior = 
                  some_parts.get_behavior() ;
            some_parts.change_behavior(
                  (current_behavior & ~os_collection::allow_duplicates) |
                  os_collection::signal_duplicates
            ) ;
      }

Incompatibility between behavior and representation

Some behavior is incompatible with some of the possible collection representations. If a collection whose behavior you are changing has a user-defined representation policy, and that policy is incompatible with the new behavior, an err_illegal_arg exception is signaled. If the collection has the default representation policy, the representation can change to accommodate the new behavior.

Automatic checking for nulls and duplicates

When you change a collection so that it no longer allows duplicate or null insertions, you might want to check to see if duplicates or nulls are already present. Such a check is performed for you if you supply the enumerator os_collection::verify as the second argument.

      os_database *db1; . . . 
      os_Collection<part*> &some_parts = 
            os_Collection<part*>::create(
                        db1,
                        os_collection::allow_nulls |
                        os_collection::allow_duplicates |
                        os_collection::maintain_order
            ) ;
      . . . 
      some_parts->change_behavior(
            os_collection::maintain_order, 
            os_collection::verify
      ) ;

This changes some_parts from an ordered collection that allows both nulls and duplicates into an ordered collection that allows neither nulls nor duplicates. The argument os_collection::verify indicates that the function should check for duplicates and nulls.

If nulls are found, err_coll_nulls is signaled. If duplicates are found, and signal_duplicates is on, err_coll_duplicates is signaled. If signal_duplicates is not on, the first among each set of duplicates is retained and trailing duplicates are silently removed.

If os_collection::verify is not used, the resulting collection is assumed to be free of duplicates or nulls. This could lead to application errors if this is not the case.

Customizing Collection Representation

Each ObjectStore collection type permits a variety of representations. The collection type (os_Set or os_Bag or os_List, and so on) determines the allowable behavior; the collection representation determines the performance and storage characteristics.

Representation Classes

The performance and storage characteristics of each representation are discussed in the following sections of this chapter:

A summary of these classes is provided in Summary of Representation Types, and all of the classes are described in detail in Chapter 3, Representation Types, of the ObjectStore Collections C++ API Reference.

Creating Collection Representation Objects

Creating os_rep objects

You create an os_rep object with the os_rep constructor:

      os_rep(os_rep_type rep_enum, os_unsigned_int32 size) ;

Enumerators for the first argument of the os_rep constructor

The first argument to the os_rep constructor is one of the enumerators listed below. See os_rep::os_rep() in Chapter 2, Collection, Query, and Index Classes, of the ObjectStore Collections C++ API Reference for more information.

os_chained_list_rep
os_dyn_bag_rep
os_dyn_hash_rep
os_ixonly_rep
os_ixonly_bc_rep
os_packed_list_rep
os_ptr_bag_rep
os_vdyn_bag_rep_os_reference
os_vdyn_hash_rep_os_reference

The second argument to the os_rep constructor is the size at which the transition to the next representation is to be made. Use

      ~(os_unsigned_int32)0

to specify no upper bound.

Changing Collection Representation with change_rep()

You can change a collection's associated representation type at any time, using the member function os_collection::change_rep().

      temp_bag->change_rep (
            100,
            &os_rep
      );

The first argument is the expected size, used just as it is in create to determine the initially active representation. The second argument is the os_rep.

If you specify an os_rep and expected size that determine an initial representation incompatible with the behavior of the collection you want to change, err_coll_illegal_arg is signaled. Moreover, if the type and size subsequently determine a representation incompatible with the collection's behavior, err_coll_illegal_arg is also signaled.

Note that changing the representation of a collection is in effect creating a new collection and copying all elements from the old to the new collection.

os_chained_list

The class os_chained_list is a representation type that is optimized (in both time and space) for small to medium-sized collections. Each os_chained_list consists of a header and any number of blocks. The header has a vptr, one word of state, and up to 20 pointers. When the number of pointers in the header is exhausted, an os_chained_list_block is allocated, and chained to the header.

Each os_chained_list_block can contain up to 255 pointers. It has two or three words of overhead: one word of state information, a previous pointer, and possibly a next pointer (the first os_chained_list_block allocated does not have a next pointer until the next block is allocated). The default version of os_chained_list contains four pointers in the header and seven or eight pointers in its blocks.

The maximum size for os_chained_lists is 131070.

For more information, see os_chained_list in Chapter 3, Representation Types, in the ObjectStore Collections C++ API Reference.

Controlling the Number of Pointers

When you create an os_chained_list, what is really allocated is an instance of a parameterized class derived from os_chained_list: os_chained_list_pt<NUM_PTRS_IN_HEAD,NUM_PTRS_IN_BLOCKS>. The default parameterization is <4,8>, but you can specify a different parameterization with the following macros:

OS_MARK_CHAINED_LIST_REP(ptrs_in_header,ptrs_in_blocks)
OS_INSTANTIATE_CHAINED_LIST_REP(ptrs_in_header,ptrs_in_blocks)
OS_INITIALIZE_CHAINED_LIST_REP(ptrs_in_header,ptrs_in_blocks)

Use OS_MARK_CHAINED_LIST_REP() in the same dummy function as OS_MARK_SCHEMA_TYPE().

Use OS_INSTANTIATE_CHAINED_LIST_REP() at file scope. It declares some static state needed by the representation.

Execute OS_INITIALIZE_CHAINED_LIST_REP() in a function. It registers the new parameterization with the collections library.

Include the files <coll/chlist.hh>, <coll/chlistpt.hh>, and <coll/chlistpt.c> if you use these macros.

In order to create a collection using a chained list with other than the default parameterization, you invoke the following static member function:

      static os_coll_rep_descriptor*
      os_chained_list_descriptor::find_rep(os_int32 ptrs_in_hdr,
            os_int32 ptrs_in_blocks);

If the requested parameterization has been specified with the above macros, the appropriate representation descriptor is returned. Otherwise, 0 is returned.

Note that an os_chained_list must have at least four pointers in the header but not more than 20 pointers.

An os_chained_list with a four-pointer header can change freely into any other collection representation and the reverse. However, other collection representations cannot change into os_chained_lists with more than four pointers in the header. A normal collection header is 24 bytes. An os_chained_list with more than four pointers exceeds this limit. It is possible for an os_chained_list with an oversized header to change into another representation (with the same or smaller size header).

Pool Allocation of Blocks

You can request pool allocation of os_chained_list_blocks with the environment variable OS_COLL_POOL_ALLOC_CHLIST_BLOCKS to the function os_chlist_pool::configure_pool(). In some cases this decreases the time needed for individual allocation of os_chained_list_blocks and increases the chance of getting good locality of reference.

Setting OS_COLL_POOL_ALLOC_CHLIST_BLOCKS turns on pool allocation. There is one pool per segment; each pool consists of an array of subpools. Each subpool is two pages by default.

By allocating larger subpools, you can defer the cost of allocating new subpools at the expense of potentially wasted space. To allocate larger subpools, use this function:

      static void 
      os_chlist_pool::configure_pool(

                  os_unsigned_int32 config_options,

                  os_unsigned_int32 blks_per_subpool=2);

config_options can have one of the following values:

os_chlist_pool_no_pooled_allocation
os_chlist_pool_allocate_blks

The second argument, which is optional and defaults to 2, controls the number of pages allocated per subpool.

Mutation Checks

In order to improve performance, an os_chained_list does not necessarily check to see if it should change to another representation after every insert or remove operation. By default, it checks when the size is roughly a multiple of 7. However, you can control the frequency with which it checks by invoking the static member function.

      static void 
      os_chained_list_descriptor::set_reorg_check_interval(
            os_unsigned_int32 v);

ObjectStore sets the check interval to one less than the power of 2 that is greater than or equal to v. For example, in order to check on every other insert or remove, pass 1 or 2 as an argument. Passing 3 or 4 results in a check on every third operation. Passing 0 inhibits mutation. However, if the maximum size for an os_chained_list is reached, it will change to another representation.

mutate_when_full Behavior

For collections whose representation is os_chained_list, if you specify the behavior enumerator os_collection::chained_list_mutate_when_full, the collection's representation will not change until it reaches the maximum size for chained lists.

os_dyn_bag

Instances of this class are used as ObjectStore collection representations. The os_dyn_bag representation supports O(1) element lookup, which means that operations such as contains() and remove() are O(1) (in the number of elements). But an os_dyn_bag takes up somewhat more space than an os_packed_list.

The representation os_dyn_bag minimizes reorganization overhead at the expense of some extra space overhead, compared with os_ptr_bag. At large sizes, os_dyn_bag uses a structure pointing to many small hash tables that can reorganize independently.

This representation type does not support maintain_order or maintain_cursors behavior.

For sizes below 20, os_chained_list might be a better representation type.

For more information, see os_dyn_bag in Chapter 3, Representation Types, in the ObjectStore Collections C++ API Reference.

Time Complexity

In the following table, complexities are shown in terms of collection size, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic.)


insert()	O(1)
remove()	O(1)
size()	O(1)
contains()	O(1)
Comparisons (<=, ==, and so on)	O(n)
Merges (\|, &, -)	O(n)

Space Overhead

      size <= 64k

the small-medium size data structure is used. It contains the following:

Header (24 bytes)
Entry for each element (eight bytes each)
Some number of empty entries (eight bytes each)

On average, an os_dyn_bag at low-medium sizes is 69% full. You can estimate the average size as follows:

      Avg. total size in bytes = 24 + (size/.69) * 8

      size > 64k

the large size data structure is used. It contains the following:

Header (24 bytes)
Directory (60 byte header + 12 bytes per directory entry)
Some number of small hash tables (two pages each, eight bytes per entry)

On average, each small hash table in an os_dyn_bag at high sizes is 70% full. You can estimate the average size as follows:

                                          n_entries = Avg. number of entries per small hash table = (8192/8) * .7
                                          n_tables = Avg. number of small hash tables = size / n_entries
                                          dir_size = Avg. directory size in bytes = 60 + (n_tables+1) * 12
                                          Avg. total size in bytes = 24 bytes + dir_size + n_tables * 8192

os_dyn_hash

Instances of this class are used as ObjectStore collection representations. The dynamic hash representation supports O(1) element lookup, which means that operations such as contains() and remove() are O(1) (in the number of elements). But an os_dyn_hash takes up somewhat more space than an os_packed_list.

At large sizes, os_dyn_hash uses a structure pointing to many small hash tables that can reorganize independently.

This representation type does not support allow_duplicates, maintain_order, or maintain_cursors behavior.

For sizes below 20, os_chained_list might be a better representation type.

For more information, see os_dyn_hash in Chapter 3, Representation Types, in the ObjectStore Collections C++ API Reference.

Time Complexity


insert()	O(1)
remove()	O(1)
size()	O(1)
contains()	O(1)
Comparisons (<=, ==, and so on)	O(n)
Merges (\|, &, -)	O(n)

Space Overhead

      size <= 64k

the small-medium size data structure is used. It contains the following:

Header (24 bytes)
Entry for each element (four bytes each)
Some number of empty entries (four bytes each)

On average, the an os_dyn_hash at low-medium sizes is 69% full. You can estimate the average size as follows:

      Avg. total size in bytes = 24 + (size/.69) * 4

      size > 64k

the large size data structure is used. It contains the following:

Header (24 bytes)
Directory (60 byte header + 12 bytes per directory entry)
Some number of small hash tables (two pages each, four bytes per entry)

On average, each small hash table in an os_dyn_hash at high sizes is 70% full. You can estimate the average size as follows:

                                          n_entries = Avg. number of entries per small hash table = (8192/4) * .7
                                          n_tables = Avg. number of small hash tables = size / n_entries
                                          dir_size = Avg. directory size in bytes = 60 + (n_tables+1) * 12
                                          Acg. total size in bytes = 24 bytes + dir_size + n_tables * 8192

os_ixonly and os_ixonly_bc

Instances of these classes are used as ObjectStore collection representations. They are both index-only representations that support O(1) element lookup. Operations such as contains() and remove() are O(1) (in the number of elements). But they take up somewhat more space than an os_packed_list.

For large collections subject to contention, os_ixonly_bc can provide significantly better performance than os_ixonly. See os_ixonly_bc, below.

The next chapter discusses associating indexes with collections to improve the efficiency of queries. With os_ixonly or os_ixonly_bc, you can save space by telling ObjectStore to record the membership of the collection in one of its indexes, as opposed to recording the membership in both the index and the collection. In other words, you can save space by using an index as a collection's representation.

When these representation types are specified for a collection, you must add an index to it before any operations are performed on it. Additional indexes can also be added.

These representation types are incompatible with the following behaviors: maintain_order, maintain_cursors, allow_nulls, and allow_duplicates.

Note that using these representations can save on space overhead at the expense of reducing the efficiency of some collection operations. If the only time-critical collection operation is index-based element lookup, an index-only representation is likely to be beneficial.

For sizes below 20, os_chained_list might be a better representation type.

For more information, see os_ixonly and os_ixonly_bc in Chapter 3, Representation Types, in the ObjectStore Collections C++ API Reference.

os_ixonly_bc

os_ixonly_bc is just like os_ixonly, except that insert() and remove() do not update size information, avoiding contention in the collection header. The disadvantage of os_ixonly_bc is that size() is an O(n) operation, requiring a scan of the whole collection.

You can determine if a collection updates its size in this way with the following member of os_collection:

      os_int32 size_is_maintained() const;

This function returns nonzero if the collection maintains size; it returns 0 otherwise.

The following member of os_collection, which returns an estimate of a collection's size, is an O(1) operation in the size of the collection:

      os_unsigned_int32 size_estimate() const;

This function returns the size as of the last call to os_collection::update_size(). For collections that maintain size, the actual size is returned.

Before you add a new index to an os_ixonly_bc collection, call the following member of os_collection:

      os_unsigned_int32 update_size();

If you do not, add_index() will work correctly, but less efficiently than if you do. This function updates the value returned by os_collection::size_estimate() by scanning the collection and computing the actual size.

Time Complexity


insert()	O(1)
remove()	O(1)
size(), os_ixonly	O(1)
size(), os_ixonly_bc	O(n)
contains()	O(1)
Comparisons (<=, ==, and so on)	O(n)
Merges (\|, &, -)	O(n)

If there are safe cursors open on a particular collection, each insert or remove operation visits each of those cursors and adjusts them if necessary. This locks the info segment and can cause contention.

os_ordered_ptr_hash

Instances of this class are used as ObjectStore collection representations. Unlike the other hash tables, this representation supports maintain_order behavior. The ordered pointer hash representation supports O(1) element lookup, which means that operations such as contains() and remove() are O(1) (in the number of elements). But an os_ordered_ptr_hash takes up somewhat more space than an os_packed_list.

This representation type does not support be_an_array behavior.

For sizes below 20, os_chained_list might be a better representation type.

For more information, see os_ordered_ptr_hash in Chapter 3, Representation Types, in the ObjectStore Collections C++ API Reference.

Time Complexity


insert()	O(1)
Position-based insert()	O(n)
remove()	O(1)
Position-based remove()	O(n)
size()	O(1)
contains()	O(1)
Comparisons (<=, ==, and so on)	O(n)
Merges (\|, &, -)	O(n)

Space Overhead and Clustering

An ordered pointer hash has the following components:

Header
Entry for each element
Some number of empty entries

The entry for a given element is likely to be on a different page from the collection header.

On average, a pointer hash is 58.3% full. You can estimate the average size of a pointer hash as follows:

      if size <= 65535
                  average total size in bytes = 56 + size * 8 / 58.3
      if size > 65535
                  average total size in bytes = 56 + size * 12 / 58.3

The minimum fill for a packed list is 46.7%, so an upper bound on collection space overhead can be calculated as follows:

      if size <= 65535
                  maximum total size in bytes = 56 + size * 8 / 46.7
      if size > 65535
                  maximum total size in bytes = 56 + size * 12 / 46.7

os_packed_list

Instances of this class are used as ObjectStore collection representations. The packed list representation is relatively space-efficient, but element lookup is an O(n) operation, which means that operations such as remove() and contains() are O(n) (in the number of elements). If duplicates are allowed, this representation provides the fastest insertion times, but if duplicates are not allowed (requiring element lookup to check for the presence of a duplicate) insert() is O(n).

For sizes below 20, os_chained_list might be a better representation type.

For more information, see os_packed_list in Chapter 3, Representation Types, in the ObjectStore Collections C++ API Reference.

Time Complexity


insert(), duplicates allowed	O(1)
insert(), duplicates not allowed	O(n)
Position-based insert(), no "holes"	O(1)
Position-based insert(), with "holes"	O(n)
remove()	O(n)
Position-based remove(), no "holes"	O(1)
Position-based remove(), with "holes"	O(n)
size()	O(1)
contains()	O(n)
Comparisons (<=, ==, and so on)	O(n²)
Merges (\|, &, -)	O(n²)

There might be "holes" in an os_packed_list if any elements have been removed.

If there are safe cursors open on a particular collection, each insert or remove operation visits each of those cursors and adjusts them if necessary.

Space Overhead and Clustering

A packed list has the following components:

Header
Entry for each element
Some number of empty entries

The entry for a given element is likely to be on a different page from the collection header.

On average, a packed list is 83.3% full. You can estimate the average size of a collection as follows:

      average total size in bytes = 40 + size * 4 / 83.3

The minimum fill for a packed list is 66.7%, so an upper bound on collection space overhead can be calculated as follows:

      maximum total size in bytes = 40 + size * 4 / 66.7

os_ptr_bag

Instances of this class are used as ObjectStore collection representations. The pointer hash representation supports O(1) element lookup, which means that operations such as contains() and remove() are O(1) (in the number of elements). But an os_ptr_bag takes up somewhat more space than an os_packed_list.

In addition, as an os_ptr_bag grows, there can be overhead during collection updates, for reorganization. In contrast, the representation os_dyn_bag minimizes reorganization overhead at the expense of some extra space overhead. At large sizes, os_dyn_bag uses a structure pointing to many small hash tables that can reorganize independently. See os_dyn_bag.

This representation type does not support maintain_order behavior.

For sizes below 20, os_chained_list might be a better representation type.

For more information, see os_ptr_bag in Chapter 3, Representation Types, in the ObjectStore Collections C++ API Reference.

Time Complexity


insert()	O(1)
remove()	O(1)
size()	O(1)
contains()	O(1)
Comparisons (<=, ==, and so on)	O(n)
Merges (\|, &, -)	O(n)

If there are safe cursors open on a particular collection, each insert or remove operation visits each of those cursors and adjusts them if necessary.

Space Overhead and Clustering

A pointer hash has the following components:

Header
Entry for each element
Some number of empty entries
Count slot for each entry
Some number of empty count slots

The entry for a given element is likely to be on a different page from the collection header. In addition, the count slot for a given element is likely to be stored on a different page from both the header and the entry for the element.

On average, a pointer bag is 58.3% full. You can estimate the average size of a pointer bag as follows:

      average total size in bytes = 48 + size * 8 / 58.3

The minimum fill for a packed list is 46.7%, so an upper bound on collection space overhead can be calculated as follows:

      maximum total size in bytes = 48 + size * 8 / 46.7

os_vdyn_bag

Instances of this class are used as ObjectStore collection representations. The os_vdyn_bag representation saves on relocation overhead and address space by recording its membership using ObjectStore references instead of pointers. It supports O(1) element lookup, which means that operations such as contains() and remove() are O(1) (in the number of elements). But an os_vdyn_bag takes up somewhat more space than an os_packed_list.

The representation os_vdyn_bag minimizes reorganization overhead at the expense of some extra space overhead, compared with os_ptr_bag. At large sizes, os_vdyn_bag uses a structure pointing to many small hash tables that can reorganize independently.

This representation type does not support maintain_order or maintain_cursors behavior.

For sizes below 20, os_chained_list might be a better representation type.

This class is parameterized, with a parameter indicating the type of ObjectStore reference to use for recording membership. Actual parameters can be os_reference, for general use.

For more information, see os_vdyn_bag in Chapter 3, Representation Types, in the ObjectStore Collections C++ API Reference.

Time Complexity


insert()	O(1)
remove()	O(1)
size()	O(1)
contains()	O(1)
Comparisons (<=, ==, and so on)	O(n)
Merges (\|, &, -)	O(n)

Space Overhead

For an os_vdyn_bag whose reference type parameter is REF_TYPE, if

      size <= 64k

the small-medium size data structure is used. You can estimate its size as follows:

      extra_slot = (((size / .69) % 16) ? 1 : 0)
      average total size = 24 bytes (header) + 
                  ( ((size / .69) / 16) + extra_slot) *
                  ((sizeof(REF_TYPE) * 16) + (16 * 4) + 4)

      size > 64k

the large size data structure is used. You can estimate its size as follows:

      entry_size:
      os_reference: 20
      os_reference_version: 28
      if REF_TYPE != os_reference_protected:
                  n_tables = (size / ( ((8192 / <entry-size>) * 2) * .7)
      else
                  n_tables = (size / ( (8192 / (<entry-size> )) * .7))
      dir_size= (n_tables +1) * 12 bytes + 60
      average total size = 24 bytes (header) + 

                  dir_size + n_tables * 8192 bytes

os_vdyn_hash

Instances of this class are used as ObjectStore collection representations. The os_vdyn_hash representation saves on relocation overhead and address space by recording its membership using ObjectStore references instead of pointers. It supports O(1) element lookup, which means that operations such as contains() and remove() are O(1) (in the number of elements). But an os_vdyn_hash takes up somewhat more space than an os_packed_list.

At large sizes, os_vdyn_hash uses a structure pointing to many small hash tables that can reorganize independently.

This representation type does not support allow_duplicates, maintain_order, or maintain_cursors behavior.

For sizes below 20, os_chained_list might be a better representation type.

This class is parameterized, with a parameter indicating the type of ObjectStore reference to use for recording membership. Actual parameters can be os_reference, for general use.

For more information, see os_vdyn_hash in Chapter 3, Representation Types, in the ObjectStore Collections C++ API Reference.

Time Complexity


insert()	O(1)
remove()	O(1)
size()	O(1)
contains()	O(1)
Comparisons (<=, ==, and so on)	O(n)
Merges (\|, &, -)	O(n)

Space Overhead

For an os_vdyn_bag whose reference type parameter is REF_TYPE, if

      size <= 64k

the small-medium size data structure is used. You can estimate its size as follows:

      extra_slot = (((size / .69) % 16) ? 1 : 0)
      average total size = 24 bytes (header) +
            ( ((size / .69) / 16) + extra_slot) *
            ((sizeof(REF_TYPE) * 16) + 4)

      size > 64k

the large size data structure is used. You can estimate its size as follows:

      entry_size:
            os_reference: 12
            os_reference_version: 20
      if REF_TYPE != os_reference_protected:
                  n_tables = (size / ( ((8192 / <entry-size> ) * 2) * .7))
      else
                  n_tables = (size / ( (8192 / (<entry-size> )) * .7))
      dir_size= (n_tables +1) * 12 bytes + 60
      average total size = 24 bytes (header) + dir_size + 

                  n_tables * 8192 bytes

Summary of Representation Types

Time Complexity Summary

In the table below, complexities are shown in terms of collection size, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic.)

An ordered hash table is an os_ordered_ptr_hash. Unordered hash tables include the following:

os_dyn_bag
os_dyn_hash
os_ptr_bag
os_vdyn_bag

os_vdyn_hash


	Unordered Hash Tables	Ordered Hash Tables	os_packed_ list	os_ixonly	os_chained_list
insert()	O(1)	O(1)	O(1) if duplicates are allowed; O(n) otherwise	O(1)	O(1) if duplicates are allowed; O(n) otherwise
Position-based insert()	Not possible	O(n)	O(1) if there are no holes; O(n) otherwise	Not possible	O(n)
remove()	O(1)	O(1)	O(n)	O(1)	O(n)
Position-based remove()	Not possible	O(n)	O(1) if there are no holes; O(n) otherwise	Not possible	O(n)
size()	O(1)	O(1)	O(1)	O(1)	O(1)
contains()	O(1)	O(1)	O(n)	O(1)	O(n)
Comparisons (<=, ==, and so on)	O(n)	O(n)	O(n²)	O(n)	O(n²)
Merges (\|, &, -)	O(n)	O(n)	O(n²)	O(n)	O(n²)

There might be holes in an os_packed_list if any elements have been removed.

Note that among O(1) operations, the constant varies from one structure to another. So, for example, os_packed_list insertion with duplicates allowed is fastest, and os_ixonly is slowest.

Note also that if there are safe cursors open on a particular collection, each insert or remove operation visits each of those cursors and adjusts them if necessary.

Space Overhead Summary

This section contains a description of the data structures used by each of the following collection representations:

These descriptions will allow you to estimate the amount of storage occupied by a collection with a given size and a given representation type. Information on clustering (what is likely to be on the same page as what) is also included to help you predict paging behavior.

Representation components

Each type of representation has the following components:

Header
Entry for each element
Some number of empty entries

Representations that support duplicates

Representations that support duplicates also have the following:

Count slot for each entry
Some number of empty count slots

Estimating the average size of a collection

The number of empty entries and count slots is, on average, proportional to the collection's size. You can estimate the average size of a collection with a given representation using the following pieces of information:

Average fraction of total entries that are not empty (the average fill)
Size of each kind of component
Size of the collection

Formula for calculating size

Here is the formula to use:

      average total size = size_of(header) + size *
            ( size_of(entry) + size_of(count slot) ) / (average fill)

Here is the information you need for each type of representation:


Representation Type	Header Size in Bytes	Entry Size in Bytes	Count Slot Size in Bytes	Average Fill %
os_ptr_bag	48	4	4	58.3
os_order_ptr_hash, size <= 65535	56	8	0	58.3
os_order_ptr_hash, size > 65535	56	12	0	58.3
os_packed_list	40	4	0	83.3

Minimum representation fills

The minimum fills for each type of representation are as follows:


Representation Type	Minimum Fill %
os_ptr_bag	46.7
os_order_ptr_hash, size <= 65535	46.7
os_order_ptr_hash, size > 65535	46.7
os_packed_list	66.7

Calculating an upper bound on collection space

You can use this information to get an upper bound on collection space overhead:

      maximum total size = size_of(header) + size * 
            ( size_of(entry) + size_of(count slot) ) / (minimum fill)

[previous] [next]

Updated: 03/31/98 15:28:08