This chapter is intended to augment the information contained in Chapter 5, Collections, of the ObjectStore C++ API User Guide. The information contained here is organized in the following manner:
In addition, collections form the basis of the ObjectStore query facility, which allows you to select those elements of a collection that satisfy a specified condition. Queries with simple conditions are discussed in this chapter. Queries with complex conditions are described in Chapter 5, Queries and Indexes.
Collections are commonly used to model many-valued attributes, and they can also be used as class extents (which hold all instances of a particular class). Collections of one type, dictionaries, associate a key with each element or group of elements, and so can be used to model binary associations or mappings.
Using Paths in Navigation
The ObjectStore collection facility provides a number of classes that help you navigate within a collection. The os_Cursor class, the os_index_path class, and the os_coll_range class all help you insert and remove elements, as well as retrieve particular elements or sequences of elements.
Paths
A path, an instance of os_index_path, represents a navigational path starting from the elements of a collection. Paths are used to specify index keys, and to specify a cursor's associated order. See Creating Paths, and also Paths and Member Functions.
Paths are also used in conjunction with ranges to specify a restriction on the elements a cursor visits. See Using Ranges in Navigation for information on ranges.
Creating Paths
Paths are used to specify traversal order (see Controlling Traversal Order) and index keys (see Indexes and Query Optimization).
Simple Paths
Using the simplest kind of path, you can base an index key or iteration order on the value of some data member or simple member function. For example, you can iterate through a set of parts in order of the part's part numbers (parts with lower numbers precede parts with higher numbers). The member is specified with an instance of the class os_index_path, which designates a member access path.
os_index_path::create("part*", "part_number", db1)The object created designates the path from part pointers to their part numbers. Here, part* is the path's type string, which names the element type of collections whose elements can serve as path starting points. part_number is a path string indicating the data member itself. The argument db1 is a database whose schema contains the definition of the class part.
Both type strings and path strings can contain white space around tokens.
The instance of os_index_path generated by the call to create() is heap-allocated. When you no longer need it, you should deallocate it with the following static member of os_index_path:
static void destroy(os_index_path&) ;
os_index_path::create( "part*", "responsible_engineer->emp_id", db1 ) ;
os_index_path::create("employee*", "name", db1)A path applied to a pointer to an employee, returning the employee's manager, is specified by
os_index_path::create( "employee*", "department->manager", db1 )A path applied to a pointer to an employee, returning the name of each of the employee's supervisees, is specified by
os_index_path::create( "employee*", "supervisees[]->name", db1 )Brackets ([]) in a path indicate that the path goes through each element of the indicated collection. The path mapping follows the remainder of the specified path for each element. Thus, application of a path mapping can result in many values.
In this last example, the path maps a single employee* to many names, the names of the employee's supervisees. If you use such a path to specify a key for an index, then the index would handle lookup of an employee by the name of any one of the employee's supervisees.
But paths with brackets cannot be used to specify iteration order. Iteration in order of names of supervisees is not well defined, because there is more than one supervisee. In contrast, iterating by name of supervisor, for example, is well defined, if each employee has exactly one supervisor.
Another important point is that if an index path contains a path through a collection. This collection either has to be parameterized or you must cast it to the appropriate type of parameterized collection in the index path string. This is necessary so thaat the index path parser can determine if, for example, name is a valid data member of the type of element in the supervisees collection. For example,
(os_Collection<employee*>&)supervisees[]->name
Paths and Member Functions
Member functions called in path strings are subject to certain restrictions and prerequisites, as described in this section. Restrictions
Member functions called in query or path strings are subject to certain restrictions:
class person { private: unsigned int age; person* sibling; public: unsigned int get_age(); person* get_sibling(); unsigned int get_age_n_years_ago(unsigned int n); }The functions get_age() and get_sibling() can be referenced in a query or path string. get_age_n_years_ago() violates the second of the preceding restrictions.
To perform a query, ObjectStore sometimes (depending on what indexes are present) issues calls to member functions used in paths and queries. If such a member function allocates memory it does not free (for example if it returns a pointer to newly allocated memory), memory leaks can result; ObjectStore does not free the space the function allocates. So member functions used in paths or queries should not allocate persistent memory or memory in the transient heap.
os_query_function( class, func, return_type)where
os_query_function_returning_ref( class, func, return_type)where
Name& get_name();Should be written as if the function signature were
Name* get_name();That is,
*get_name() == *(Name*) FreevarAn index path to support this query would be
*get_name()
os_query_function_body( class, func,return_type, bpname)where
OS_MARK_QUERY_FUNCTION( class, func)where
os_query_function_body_returning_ref(where
class, func, return_type, bpname
)
Name& get_name();Should be written as if the function signature were
Name* get_name()That is,
*get_name() == *(Name*) FreevarAn index path to support this query would be
*get_name()
path-stringIf the last component of the path string is a member function name, you can construct a path string to specify a path ending in those objects themselves, using * (asterisk) as follows:
* ( path-string)The parentheses are not necessary if the original path string specifies a single-step path.
Consider for example the path
os_index_path::create( "rectangle*", "get_location()", db ) ;If the function rectangle::get_location() returns a pointer to a coord object, this path ends in pointers to coord objects. So you can also construct a path that ends in coord objects themselves, rather than pointers:
os_index_path::create( "rectangle*", "*get_location()", db ) ;
Controlling Traversal Order
To control traversal order, use one of the constructors for os_Cursor. The various overloadings allow you to specify a traversal order based on
os_Cursor( const os_collection&, const char* typename, ) ;If you create a cursor using this constructor, which takes the name of the element type as argument, then iteration using that cursor will follow the order determined by the element type's rank function. See Supplying Rank and Hash Functions.
os_Cursor( const os_collection&, _Rank_fcn ) ;Rank-function-based cursors are update insensitive. See Performing Collection Updates During Traversal.
os_Cursor( const os_Collection&, os_int32 options ) ;If you supply os_collection::order_by_address as the options argument, this cursor iterates in address order. This is the order in which the objects pointed to by collection elements are arranged in persistent memory.
If you will dereference each collection element as you retrieve it, and the objects pointed to by collection elements will not all fit in the client cache at once, this order can dramatically reduce paging overhead.
An order-by-address cursor is update insensitive.
os_Cursor( const os_Collection&, const os_index_path& ) ;If you create a cursor using this constructor, which takes a reference to an os_index_path as an argument, then iteration using that cursor will follow the order determined by the path. See Creating Paths.
Multiple member paths
In the case of multiple member paths (see page 65), the traversal will first visit all elements that have a complete path, and then visit the other elements in order of decreasing path length. That is, given the path boss->boss->boss->name, the cursor will first visit those collection elements that actually have a boss->boss->boss in name order, before visiting any that do not.
Reuse of an os_index_path
Note that, once you have created an os_index_path, you can reuse it (for example, to specify the same order for another iteration, or to specify the key of an index). There is no need to create a separate index path each time you specify an iteration order or index. char* paths
Paths whose values are char* are treated specially by the iteration facility. An ordered iteration based on a char*-valued member does not iterate in order of addresses (the values, strictly speaking, of the data member), but rather the iteration proceeds in the order of the strings pointed to. The order is determined by the function strcmp(). Example
Here is a code fragment demonstrating iteration by responsible_engineer's emp_id:
os_index_path &a_path = os_index_path::create( "part*", "responsible_engineer->emp_id", db1 ) ; os_Cursor<part*> c(a_set, a_path) ; part *p = 0 ; for (p = c.first(); p; p = c.next()) printf("%d", e->emp_id) ;If an index on the path is present, it is used to make the traversal more efficient.
If an index on the iteration path is not present, the cursor constructor copies the collection elements into a separate structure, applies the path to each element, copies the terminal key into that structure, and then sorts it according to the key rank. Care is taken to sort the structure by address whenever the path interpretation calls for dereferencing a pointer, in order to improve paging behavior. Cursor creation time depends on the length of the path, the size of the collection, and the complexity of the rank function.
When you are performing path-based traversal over some collection, if you update a data member on which the ordering is based, you are effectively removing and then reinserting the element you changed. In other words, when you update such a data member for an element of a collection, you also update the collection itself. Therefore, for an ordered iteration with such updates, the collection's behavior specification must include os_collection::maintain_cursors and the cursor must be created with os_cursor::safe. See Performing Collection Updates During Traversal.
Keep in mind some representations of collections never allow maintain_cursors behavior. These are unordered types of collections.
Using Ranges in Navigation
The ObjectStore collection facility provides a number of classes that help you navigate within a collection. The os_Cursor class, the os_index_path class, and the os_coll_range class all help you insert and remove elements, as well as retrieve particular elements or sequences of elements.
Ranges
A range, an instance of os_coll_range, represents a range of values. A range can be used in conjunction with a path to restrict the elements visited by a cursor. See Specifying Collection Ranges and Restricting the Elements Visited in a Traversal.
You can also use an os_coll_range to retrieve from a dictionary the elements whose key falls within a specified range. See Dictionaries in Chapter 5, Collections, of the ObjectStore C++ API User Guide.
Both these uses of ranges provide a way of performing inexpensive, simple queries, without some of the overhead associated with using query(), query_pick(), or exists(). See Chapter 5, Queries and Indexes, for more detailed information about performing queries.
Specifying Collection Ranges
You can create an object that represents a range of values with the class os_coll_range. An instance of this class can be used as argument to the os_Cursor constructor to create a restricted cursor, or as argument to os_Dictionary::pick().
os_coll_range(int rel_op, int value) ; os_coll_range(int rel_op, unsigned int value) ; os_coll_range(int rel_op, short value) ; os_coll_range(int rel_op, unsigned short value) ; os_coll_range(int rel_op, char value) ; os_coll_range(int rel_op, unsigned char value) ; os_coll_range(int rel_op, long value) ; os_coll_range(int rel_op, unsigned long value) ; os_coll_range(int rel_op, float value) ; os_coll_range(int rel_op, double value) ; os_coll_range(int rel_op, const void* value) ;
So, for example,
os_coll_range( os_collection::LE, 7)is satisfied by all values less than or equal to 7, and
os_coll_range( os_collection::EQ, 7)is satisfied only by the value 7.
os_coll_range( os_collection::LE, "foo")is satisfied by any char* value, s, such that strcmp(s,"foo") is less than or equal to 0.
os_coll_range( os_collection::EQ, "foo")is satisfied by any char* value, s, such that strcmp(s, "foo") is 0.
os_coll_range(int rel_op1, int value1, int rel_op2, int value2) ; os_coll_range(int rel_op1, unsigned int value1, int rel_op2, unsigned int value2) ; os_coll_range(int rel_op1, short value1, int rel_op2, short value2) ; os_coll_range(int rel_op1, char value1, int rel_op2, char value2) ; os_coll_range(int rel_op1, unsigned char value1, int rel_op2, unsigned char value2) ; os_coll_range(int rel_op1, long value1, int rel_op2, long value2) ; os_coll_range(int rel_op1, unsigned long value1, int rel_op2, unsigned long value2) ; os_coll_range(int rel_op1, float value1, int rel_op2, float value2) ; os_coll_range(int rel_op1, double value1, int rel_op2, double value2) ; os_coll_range(int rel_op1, const void *value1, int rel_op2, const void *value2) ;Constructs an os_coll_range satisfied by all values that both bear the relation rel_op1 to value1 and bear the relation rel_op2 to value2. The arguments rel_op and rel_op2 should be one of the following enumerators:
os_coll_range( os_collection::GT, 4, os_collection::LE, 7)is satisfied by all ints greater than 4 and less than or equal to 7.
Do not specify the null range, for example:
os_coll_range( os_collection::LT, 4, os_collection::GT, 7 )
os_coll_range( os_collection::GT, 4, os_collection::NE, 7)If you do, the exception err_am is signaled, and the following message is issued to stdout:
No handler for exception:
<maint-0023-0001>invalid restriction on unordered index (err_am)
os_Cursor<E> ( const os_Collection<E> & coll, const os_index_path &path, const os_coll_range &range, os_int32 options = os_cursor::unsafe ) ;An element satisfies the cursor's restriction if the result of applying path to the element satisfies range (see Specifying Collection Ranges). The order of iteration is arbitrary.
See also os_Cursor in Chapter 2, Collection, Query, and Index Classes, of the ObjectStore C++ API User Guide.
Dictionaries
For dictionaries, you can specify a restriction that is satisfied by elements whose key satisfies a specified range.
os_Cursor<E> ( const os_dictionary & coll, const os_coll_range &range, os_int32 options = os_cursor::unsafe );An element satisfies this cursor's restriction if its key satisfies range. If the dictionary's key type is a class, you must supply rank and hash functions for the class. See Supplying Rank and Hash Functions.
Duplicates
With a restricted cursor, a traversal visits only one occurrence of each qualified element. Performing Collection Updates During Traversal
If you want to be able to update a collection while traversing it, you must use either an update-insensitive cursor, or a safe cursor. Update-Insensitive Cursors
You can create an update-insensitive cursor with the following cursor constructor:
os_Cursor( const os_Collection&, os_int32 options ) ;Supply os_collection::update_insensitive as the options argument.
In addition, the following kinds of cursors are always update insensitive:
Disadvantages of safe cursors
Safe cursors have some drawbacks that update-insensitive cursors do not:
Consider, for example, the code below:
os_database *db1 ; . . . os_Collection<part*> &result_set = os_Collection<part*>::create(db1) ; part *a_part, *p ; . . . result_set |= a_part ; os_Cursor<part*> c(result_set) ; for ( p = c.first() ; p ; p = c.next() ) result_set |= p->children ; /* UNSAFE!! */Here, a new (empty) set, result_set, is created, and a_part is added to it. The for loop then iterates over the elements of result_set, adding the children of each element visited to result_set itself. The first element visited is a_part. But after that, the results of the iteration are undefined.
You can specify that you want a collection to support recursive queries by including maintain_cursors in the behavior specification. And you can specify that you want a new cursor to be safe by including the enumerator os_cursor::safe as the second argument to the constructor for os_Cursor:
os_Collection<part*> &result_set = os_Collection<part*>::create( db1, os_collection::maintain_cursors ) ; . . . os_Cursor<part*> c(result_set, os_cursor::safe) ;Below is an example that builds a set of all the descendents of a given part (that is, a set containing the part's children, its children's children, and so on). It is just like the first example in this section, except that the collection is to use maintain_cursors, and a safe cursor is created.
os_Collection<part*> &result_set = os_Collection<part*>::create( db1, os_collection::maintain_cursors ) ; part *a_part, *p ; . . . result_set |= a_part ; os_Cursor<part*> c(result_set, os_cursor::safe) ; for (p = c.first(); p; p = c.next()) result_set |= p->children ; /* union of two sets */Every child added to result_set is visited later in the iteration, and its children are added. By the end of the iteration, all the descendents are in the set.
os_Collection<part*> &result_set = os_Collection<part*>::create( db1, os_collection::maintain_cursors ) ; part *a_part, *p ; . . . result_set |= a_part ; os_Cursor<part*> c(result_set, os_cursor::safe) ; for (p = c.first(); p; p = c.next()) if (p->children.size()) { result -= p ; /* remove if not primitive */ result_set |= p->children ; /* union of two sets */ }Here, each child that itself has children is removed, and the child's children are added to the set. These children just added are visited later in the iteration so that, if they have children, they can be removed and their children added. By the end of the iteration, only the primitive descendents remain in the set.
As mentioned above, recursive queries always work for unordered iteration, but performing recursive queries using ordered iteration requires more care. A value inserted during an ordered iteration is visited only if it is inserted later, in the order of iteration, than the current iteration position.
os_Cursor<part*> c(a_collection, a_path, os_cursor::safe);You must perform index maintenance for any data member or member function in the path used to specify the traversal order, if during iteration you update a data member controlling iteration order. See Performing or Enabling Index Maintenance.
Caution about infinite loops
It is important to realize that there are certain dangers associated with performing updates, within an iteration, to a data member controlling iteration order. Consider, for example, the following loop, which iterates through a set of employees, giving some a raise:
os_Set<part*> &employees = ... ; os_index_path &emp_path = os_index_path::create("employee*", "salary", db1); os_Cursor<part*> c( employees, emp_path, os_cursor::safe ); employee *e = 0; for( e = c.first() ; e ; e = c.next() ) if ( e->widgets_sold > 100000 ) e->salary *= 1.2 ;Because iteration is by increasing salary, any employee who gets a raise is moved ahead in the iteration order, and so is visited again. If anyone gets a raise, the loop will be infinite. The proper approach is to use an iteration order unaffected by the updates.
E retrieve(const os_Cursor<E>&) const ;If the cursor is null, err_coll_null_cursor is signaled. If the cursor is nonnull but not positioned at an element, err_coll_illegal_cursor is signaled.
E only() const ;If the collection has more than one element, err_coll_not_singleton is signaled. If the collection is empty, err_coll_empty is signaled, unless the collection's behavior includes os_collection::pick_from_empty_returns_null, in which case 0 is returned.
E retrieve(os_unsigned_int32 index) const ;The index is zero-based. If the index is not less than the collection's size, err_coll_out_of_range is signaled. If the collection does not have maintain_order behavior, err_coll_not_supported is signaled.
E retrieve_first() const ;This function returns the collection's first element or 0 if the collection is empty. If the collection is not ordered, err_coll_not_supported is signaled.
For collections with allow_nulls behavior, you can use this function instead:
os_int32 retrieve_first(const E&) const ;This function modifies the argument to refer to the collection's first element. It returns 0 if the specified collection is empty, and nonzero otherwise. If the collection is not ordered, err_coll_not_supported is signaled.
E retrieve_last() const ;This function returns the collection's last element or 0 if the collection is empty. If the collection is not ordered, err_coll_not_supported is signaled.
For collections with allow_nulls behavior, you can use this function instead:
os_int32 retrieve_last(const E&) const ;This function modifies the argument to refer to the collection's last element. It returns 0 if the specified collection is empty, and nonzero otherwise. If the collection is not ordered, err_coll_not_supported is signaled.
void* pick( const os_index_path &path, const os_coll_range &range ) const;
const os_index_path &id_path = os_index_path::create("employee*", "id", db1) ; os_coll_range range_eq_1138(EQ, 1138) ; . . . employee *e = a_coll.pick(id_path, range_eq_1138) ;assigns to e an employee in a_coll whose id equals 1138.
If there is more than one such element, an arbitrary one is picked and returned. If there is no such element, err_coll_empty is signaled, unless the collection has behavior os_collection::pick_from_empty_returns_null, in which case 0 is returned.
E pick(const K const &key_ref) const ; E pick(const K *key_ptr) const ;These two differ only in that with one you supply a reference to the key, and with the other you supply a pointer to the key. Again, if there is more than one element with the key, an arbitrary one is picked and returned. If there is no such element and the dictionary has pick_from_empty_returns_null behavior, 0 is returned. If there is no such element and the dictionary does not have pick_from_empty_returns_null behavior, err_coll_empty is signaled.
For dictionaries, you can also retrieve an element whose key satisfies a specified collection range (see Specifying Collection Ranges) with
E pick(const os_coll_range&) const ;
a_dictionary.pick( os_coll_range(GE, 100) )returns an element of a_dictionary whose key is greater than or equal to 100.
As with the other pick() overloadings, if there is more than one such element, an arbitrary one is picked and returned. If there is no such element and the collection has pick_from_empty_returns_null behavior, 0 is returned. If there is no such element and the dictionary does not have pick_from_empty_returns_null behavior, err_coll_empty is signaled.
If the dictionary's key type is a class, you must supply rank and hash functions for the class (see Supplying Rank and Hash Functions).
a_dictionary.pick("Smith")returns an element of a_dictionary whose key is the string Smith (that is, whose key, k, is such that strcmp(k, "Smith") is 0).
E pick() const ;If the collection is empty and has pick_from_empty_returns_null behavior, 0 is returned. If the collection is empty and does not have pick_from_empty_returns_null behavior, err_coll_empty is signaled.
This is sometimes useful when all the elements of a collection have the same value for a data member, and the easiest way to retrieve this value is through one of the elements.
a_cell = a_bus->pins.pick()->cell->container ;
Example: consolidating duplicates
os_database *db1 ; part *a_part, *p ; employee *e ; . . . os_Collection<employee*> &emp_bag = os_Collection<employee*>::create( db1, os_collection::allow_duplicates ) ; os_Collection<employee*> &emp_set = os_Collection<employee*>::create(db1) ; . . . os_Cursor<part*> c(a_part->children) ; for ( p = c.first() ; p ; p = c.next() ) emp_bag.insert(p->responsible_engineer) ; emp_set = emp_bag ; /* consolidate duplicates */ os_Cursor<employee*> c(emp_set) ; for (e = c.first() ; e ; e = c.next() ) cout << e->name << "\t" << emp_bag.count(e) << "\n" ;If two of a_part's children have the same responsible_engineer, that engineer appears twice as an element of emp_bag. We consolidate duplicates in emp_set so we can iterate over it, retrieving each engineer only once in the loop, and then we use count() to see how many times the engineer occurs in emp_bag. This is the number of parts for which the engineer is responsible.
If you want to use such a path to control iteration order, you must make known to ObjectStore a utility specific to the class, a rank function that defines an ordering on the type's instances.
You must also supply a rank function if you use such a path to specify a key for an ordered index (see Index Options). For unordered indexes keyed by such paths, you must supply both a rank and a hash function (the rank function is used to resolve hashing collisions). ObjectStore uses these utilities to maintain proper information on index paths that end in the class.
The os_index_key() Macro
You make these utilities known to ObjectStore by calling the macro os_index_key(). Calls to os_index_key() have the following form:
os_index_key( type, rank-function, hash-function);For example:
os_index_key(date,date_rank,date_hash);The type is the type that is at the end of the path.
Rank functions for floating-point numerical types (float, double, and long double) should follow these guidelines:
os_collection &messages = . . . ; message *a_msg; os_index_path &date_path = os_index_path::create("message", "date_received", msg_db); os_cursor c(messages, date_path); for (a_msg = (message*) c.first(); a_msg; a_msg = (message*)c.next()) a_msg->display();This assumes that dates have an order that is known to ObjectStore. This is true if dates are instances of an integer or pointer type such as int. But suppose that dates are instances of a user-defined class:
class date{ public: int month; int day; int year; };In this case you must define the rank function and make it known to ObjectStore. Here is how you might define it:
int date_rank(const void* arg1, const void *arg2) { const date *date1 = (const date *) arg1; const date *date2 = (const date *) arg2; if (date1->year < date2->year) return os_collection::LT; else if (date1->year > date2->year) return os_collection::GT; else if (date1->month < date2->month) return os_collection::LT; else if (date1->month > date2->month) return os_collection::GT; else if (date1->day < date2->day) return os_collection::LT; else if (date1->day > date2->day) return os_collection::GT; return os_collection::EQ; }If you also use unordered indexes keyed by date, you must supply a hash function, which might be defined this way:
os_unsigned_int32 date_hash(const void* x) { const date* d = (const date*) x; return ((os_unsigned_int32) (d->year) << 16) ^Finally, you must call os_index_key() before your application performs any iteration or query employing an index path ending on a data member of type date.
(d->month << 8) ^ d->day; }
main() { . . . os_index_key(date,date_rank,date_hash); . . . }If unordered indexes are never created, the hash function is not needed, and the registration could be done as follows:
os_index_key(date,date_rank,0);
To presize a collection, use the expected_size argument to a collection create() operation or constructor.
Changing Collection Behavior with change_behavior()
You can change the behavior of a collection any time after its creation with the function os_Collection::change_behavior(). There are restrictions about what combinations of behaviors are allowed. Any illegal combination causes an exception to be signaled when change_behavior is called. Fully specifying the new behavior
You supply a bit pattern (an unsigned 32-bit integer) as argument, fully specifying the behavior the collection is to have after the change.
os_database *db1; . . . os_Collection<part*> &some_parts = os_Collection<part*>::create( db1, os_collection::allow_nulls, ) ; . . . some_parts.change_behavior( os_collection::maintain_order | os_collection::allow_duplicates ) ;This changes the collection some_parts from an unordered collection that allows nulls but does not allow duplicates into an ordered collection that does not allow nulls and does allow duplicates. Both before and after the change, maintain_cursors is off.
some_parts.change_behavior(0) ;
Here is a function that changes a collection to disallow duplicates and signal duplicates, and that otherwise does not affect its original behavior:
f(os_collection &some_parts) { os_unsigned_int32 current_behavior = some_parts.get_behavior() ; some_parts.change_behavior( (current_behavior & ~os_collection::allow_duplicates) | os_collection::signal_duplicates ) ; }
os_database *db1; . . . os_Collection<part*> &some_parts = os_Collection<part*>::create( db1, os_collection::allow_nulls | os_collection::allow_duplicates | os_collection::maintain_order ) ; . . . some_parts->change_behavior( os_collection::maintain_order, os_collection::verify ) ;This changes some_parts from an ordered collection that allows both nulls and duplicates into an ordered collection that allows neither nulls nor duplicates. The argument os_collection::verify indicates that the function should check for duplicates and nulls.
If nulls are found, err_coll_nulls is signaled. If duplicates are found, and signal_duplicates is on, err_coll_duplicates is signaled. If signal_duplicates is not on, the first among each set of duplicates is retained and trailing duplicates are silently removed.
If os_collection::verify is not used, the resulting collection is assumed to be free of duplicates or nulls. This could lead to application errors if this is not the case.
Creating Collection Representation Objects
Creating os_rep objects
You create an os_rep object with the os_rep constructor:
os_rep(os_rep_type rep_enum, os_unsigned_int32 size) ;
~(os_unsigned_int32)0to specify no upper bound.
temp_bag->change_rep ( 100, &os_rep );The first argument is the expected size, used just as it is in create to determine the initially active representation. The second argument is the os_rep.
If you specify an os_rep and expected size that determine an initial representation incompatible with the behavior of the collection you want to change, err_coll_illegal_arg is signaled. Moreover, if the type and size subsequently determine a representation incompatible with the collection's behavior, err_coll_illegal_arg is also signaled.
Note that changing the representation of a collection is in effect creating a new collection and copying all elements from the old to the new collection.
Each os_chained_list_block can contain up to 255 pointers. It has two or three words of overhead: one word of state information, a previous pointer, and possibly a next pointer (the first os_chained_list_block allocated does not have a next pointer until the next block is allocated). The default version of os_chained_list contains four pointers in the header and seven or eight pointers in its blocks.
The maximum size for os_chained_lists is 131070.
For more information, see os_chained_list in Chapter 3, Representation Types, in the ObjectStore Collections C++ API Reference.
Controlling the Number of Pointers
When you create an os_chained_list, what is really allocated is an instance of a parameterized class derived from os_chained_list: os_chained_list_pt<NUM_PTRS_IN_HEAD,NUM_PTRS_IN_BLOCKS>. The default parameterization is <4,8>, but you can specify a different parameterization with the following macros:
Use OS_INSTANTIATE_CHAINED_LIST_REP() at file scope. It declares some static state needed by the representation.
Execute OS_INITIALIZE_CHAINED_LIST_REP() in a function. It registers the new parameterization with the collections library.
Include the files <coll/chlist.hh>, <coll/chlistpt.hh>, and <coll/chlistpt.c> if you use these macros.
In order to create a collection using a chained list with other than the default parameterization, you invoke the following static member function:
static os_coll_rep_descriptor* os_chained_list_descriptor::find_rep(os_int32 ptrs_in_hdr, os_int32 ptrs_in_blocks);If the requested parameterization has been specified with the above macros, the appropriate representation descriptor is returned. Otherwise, 0 is returned.
Note that an os_chained_list must have at least four pointers in the header but not more than 20 pointers.
An os_chained_list with a four-pointer header can change freely into any other collection representation and the reverse. However, other collection representations cannot change into os_chained_lists with more than four pointers in the header. A normal collection header is 24 bytes. An os_chained_list with more than four pointers exceeds this limit. It is possible for an os_chained_list with an oversized header to change into another representation (with the same or smaller size header).
Setting OS_COLL_POOL_ALLOC_CHLIST_BLOCKS turns on pool allocation. There is one pool per segment; each pool consists of an array of subpools. Each subpool is two pages by default.
By allocating larger subpools, you can defer the cost of allocating new subpools at the expense of potentially wasted space. To allocate larger subpools, use this function:
static void os_chlist_pool::configure_pool(config_options can have one of the following values:
os_unsigned_int32 config_options,
os_unsigned_int32 blks_per_subpool=2);
static void os_chained_list_descriptor::set_reorg_check_interval( os_unsigned_int32 v);ObjectStore sets the check interval to one less than the power of 2 that is greater than or equal to v. For example, in order to check on every other insert or remove, pass 1 or 2 as an argument. Passing 3 or 4 results in a check on every third operation. Passing 0 inhibits mutation. However, if the maximum size for an os_chained_list is reached, it will change to another representation.
The representation os_dyn_bag minimizes reorganization overhead at the expense of some extra space overhead, compared with os_ptr_bag. At large sizes, os_dyn_bag uses a structure pointing to many small hash tables that can reorganize independently.
This representation type does not support maintain_order or maintain_cursors behavior.
For sizes below 20, os_chained_list might be a better representation type.
For more information, see os_dyn_bag in Chapter 3, Representation Types, in the ObjectStore Collections C++ API Reference.
Time Complexity
In the following table, complexities are shown in terms of collection size, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic.)
insert() | O(1) |
remove() | O(1) |
size() | O(1) |
contains() | O(1) |
Comparisons (<=, ==, and so on) | O(n) |
Merges (|, &, -) | O(n) |
size <= 64kthe small-medium size data structure is used. It contains the following:
Avg. total size in bytes = 24 + (size/.69) * 8If
size > 64kthe large size data structure is used. It contains the following:
n_entries = Avg. number of entries per small hash table = (8192/8) * .7 n_tables = Avg. number of small hash tables = size / n_entries dir_size = Avg. directory size in bytes = 60 + (n_tables+1) * 12 Avg. total size in bytes = 24 bytes + dir_size + n_tables * 8192
At large sizes, os_dyn_hash uses a structure pointing to many small hash tables that can reorganize independently.
This representation type does not support allow_duplicates, maintain_order, or maintain_cursors behavior.
For sizes below 20, os_chained_list might be a better representation type.
For more information, see os_dyn_hash in Chapter 3, Representation Types, in the ObjectStore Collections C++ API Reference.
Time Complexity
In the following table, complexities are shown in terms of collection size, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic.)
insert() | O(1) |
remove() | O(1) |
size() | O(1) |
contains() | O(1) |
Comparisons (<=, ==, and so on) | O(n) |
Merges (|, &, -) | O(n) |
size <= 64kthe small-medium size data structure is used. It contains the following:
Avg. total size in bytes = 24 + (size/.69) * 4If
size > 64kthe large size data structure is used. It contains the following:
n_entries = Avg. number of entries per small hash table = (8192/4) * .7 n_tables = Avg. number of small hash tables = size / n_entries dir_size = Avg. directory size in bytes = 60 + (n_tables+1) * 12 Acg. total size in bytes = 24 bytes + dir_size + n_tables * 8192
For large collections subject to contention, os_ixonly_bc can provide significantly better performance than os_ixonly. See os_ixonly_bc, below.
The next chapter discusses associating indexes with collections to improve the efficiency of queries. With os_ixonly or os_ixonly_bc, you can save space by telling ObjectStore to record the membership of the collection in one of its indexes, as opposed to recording the membership in both the index and the collection. In other words, you can save space by using an index as a collection's representation.
os_ixonly_bc
os_ixonly_bc is just like os_ixonly, except that insert() and remove() do not update size information, avoiding contention in the collection header. The disadvantage of os_ixonly_bc is that size() is an O(n) operation, requiring a scan of the whole collection.
os_int32 size_is_maintained() const;This function returns nonzero if the collection maintains size; it returns 0 otherwise.
The following member of os_collection, which returns an estimate of a collection's size, is an O(1) operation in the size of the collection:
os_unsigned_int32 size_estimate() const;This function returns the size as of the last call to os_collection::update_size(). For collections that maintain size, the actual size is returned.
Before you add a new index to an os_ixonly_bc collection, call the following member of os_collection:
os_unsigned_int32 update_size();If you do not, add_index() will work correctly, but less efficiently than if you do. This function updates the value returned by os_collection::size_estimate() by scanning the collection and computing the actual size.
insert() | O(1) |
remove() | O(1) |
size(), os_ixonly | O(1) |
size(), os_ixonly_bc | O(n) |
contains() | O(1) |
Comparisons (<=, ==, and so on) | O(n) |
Merges (|, &, -) | O(n) |
If there are safe cursors open on a particular collection, each insert or remove operation visits each of those cursors and adjusts them if necessary. This locks the info segment and can cause contention.
os_ordered_ptr_hash
Instances of this class are used as ObjectStore collection representations. Unlike the other hash tables, this representation supports maintain_order behavior. The ordered pointer hash representation supports O(1) element lookup, which means that operations such as contains() and remove() are O(1) (in the number of elements). But an os_ordered_ptr_hash takes up somewhat more space than an os_packed_list.
Time Complexity
In the following table, complexities are shown in terms of collection size, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic).
insert() | O(1) |
Position-based insert() | O(n) |
remove() | O(1) |
Position-based remove() | O(n) |
size() | O(1) |
contains() | O(1) |
Comparisons (<=, ==, and so on) | O(n) |
Merges (|, &, -) | O(n) |
If there are safe cursors open on a particular collection, each insert or remove operation visits each of those cursors and adjusts them if necessary. This locks the info segment and can cause contention.
Space Overhead and Clustering
An ordered pointer hash has the following components:
On average, a pointer hash is 58.3% full. You can estimate the average size of a pointer hash as follows:
if size <= 65535 average total size in bytes = 56 + size * 8 / 58.3 if size > 65535 average total size in bytes = 56 + size * 12 / 58.3The minimum fill for a packed list is 46.7%, so an upper bound on collection space overhead can be calculated as follows:
if size <= 65535 maximum total size in bytes = 56 + size * 8 / 46.7 if size > 65535 maximum total size in bytes = 56 + size * 12 / 46.7
For sizes below 20, os_chained_list might be a better representation type.
For more information, see os_packed_list in Chapter 3, Representation Types, in the ObjectStore Collections C++ API Reference.
Time Complexity
In the following table, complexities are shown in terms of collection size, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic.)
There might be "holes" in an os_packed_list if any elements have been removed.
Space Overhead and Clustering
A packed list has the following components:
On average, a packed list is 83.3% full. You can estimate the average size of a collection as follows:
average total size in bytes = 40 + size * 4 / 83.3The minimum fill for a packed list is 66.7%, so an upper bound on collection space overhead can be calculated as follows:
maximum total size in bytes = 40 + size * 4 / 66.7
In addition, as an os_ptr_bag grows, there can be overhead during collection updates, for reorganization. In contrast, the representation os_dyn_bag minimizes reorganization overhead at the expense of some extra space overhead. At large sizes, os_dyn_bag uses a structure pointing to many small hash tables that can reorganize independently. See os_dyn_bag.
This representation type does not support maintain_order behavior.
Time Complexity
In the following table, complexities are shown in terms of collection size, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic.)
insert() | O(1) |
remove() | O(1) |
size() | O(1) |
contains() | O(1) |
Comparisons (<=, ==, and so on) | O(n) |
Merges (|, &, -) | O(n) |
If there are safe cursors open on a particular collection, each insert or remove operation visits each of those cursors and adjusts them if necessary.
Space Overhead and Clustering
A pointer hash has the following components:
On average, a pointer bag is 58.3% full. You can estimate the average size of a pointer bag as follows:
average total size in bytes = 48 + size * 8 / 58.3The minimum fill for a packed list is 46.7%, so an upper bound on collection space overhead can be calculated as follows:
maximum total size in bytes = 48 + size * 8 / 46.7
The representation os_vdyn_bag minimizes reorganization overhead at the expense of some extra space overhead, compared with os_ptr_bag. At large sizes, os_vdyn_bag uses a structure pointing to many small hash tables that can reorganize independently.
This representation type does not support maintain_order or maintain_cursors behavior.
For sizes below 20, os_chained_list might be a better representation type.
This class is parameterized, with a parameter indicating the type of ObjectStore reference to use for recording membership. Actual parameters can be os_reference, for general use.
For more information, see os_vdyn_bag in Chapter 3, Representation Types, in the ObjectStore Collections C++ API Reference.
Time Complexity
In the following table, complexities are shown in terms of collection size, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic.)
insert() | O(1) |
remove() | O(1) |
size() | O(1) |
contains() | O(1) |
Comparisons (<=, ==, and so on) | O(n) |
Merges (|, &, -) | O(n) |
Space Overhead
For an os_vdyn_bag whose reference type parameter is REF_TYPE, if
size <= 64kthe small-medium size data structure is used. You can estimate its size as follows:
extra_slot = (((size / .69) % 16) ? 1 : 0) average total size = 24 bytes (header) + ( ((size / .69) / 16) + extra_slot) * ((sizeof(REF_TYPE) * 16) + (16 * 4) + 4)If
size > 64kthe large size data structure is used. You can estimate its size as follows:
entry_size: os_reference: 20 os_reference_version: 28 if REF_TYPE != os_reference_protected: n_tables = (size / ( ((8192 / <entry-size>) * 2) * .7) else n_tables = (size / ( (8192 / (<entry-size> )) * .7)) dir_size= (n_tables +1) * 12 bytes + 60 average total size = 24 bytes (header) +
dir_size + n_tables * 8192 bytes
At large sizes, os_vdyn_hash uses a structure pointing to many small hash tables that can reorganize independently.
This representation type does not support allow_duplicates, maintain_order, or maintain_cursors behavior.
For sizes below 20, os_chained_list might be a better representation type.
This class is parameterized, with a parameter indicating the type of ObjectStore reference to use for recording membership. Actual parameters can be os_reference, for general use.
For more information, see os_vdyn_hash in Chapter 3, Representation Types, in the ObjectStore Collections C++ API Reference.
Time Complexity
In the following table, complexities are shown in terms of collection size, represented by n. (These complexities reflect the nature of the computational overhead involved, not overhead due to disk I/O and network traffic.)
insert() | O(1) |
remove() | O(1) |
size() | O(1) |
contains() | O(1) |
Comparisons (<=, ==, and so on) | O(n) |
Merges (|, &, -) | O(n) |
Space Overhead
For an os_vdyn_bag whose reference type parameter is REF_TYPE, if
size <= 64kthe small-medium size data structure is used. You can estimate its size as follows:
extra_slot = (((size / .69) % 16) ? 1 : 0) average total size = 24 bytes (header) + ( ((size / .69) / 16) + extra_slot) * ((sizeof(REF_TYPE) * 16) + 4)If
size > 64kthe large size data structure is used. You can estimate its size as follows:
entry_size: os_reference: 12 os_reference_version: 20 if REF_TYPE != os_reference_protected: n_tables = (size / ( ((8192 / <entry-size> ) * 2) * .7)) else n_tables = (size / ( (8192 / (<entry-size> )) * .7)) dir_size= (n_tables +1) * 12 bytes + 60 average total size = 24 bytes (header) + dir_size +
n_tables * 8192 bytes
An ordered hash table is an os_ordered_ptr_hash. Unordered hash tables include the following:
Note that among O(1) operations, the constant varies from one structure to another. So, for example, os_packed_list insertion with duplicates allowed is fastest, and os_ixonly is slowest.
Similarly, the constant for O(n) operations varies. So, for example, while position-based insert and remove are O(n) for os_chained_list, the constant is 1/(the number of pointers per block). So position-based insert and remove are relatively fast operations at small sizes.
Note also that if there are safe cursors open on a particular collection, each insert or remove operation visits each of those cursors and adjusts them if necessary.
These descriptions will allow you to estimate the amount of storage occupied by a collection with a given size and a given representation type. Information on clustering (what is likely to be on the same page as what) is also included to help you predict paging behavior.
average total size = size_of(header) + size * ( size_of(entry) + size_of(count slot) ) / (average fill)Here is the information you need for each type of representation:
Minimum representation fills
The minimum fills for each type of representation are as follows:
Representation Type | Minimum Fill % |
---|---|
os_ptr_bag | 46.7 |
os_order_ptr_hash, size <= 65535 | 46.7 |
os_order_ptr_hash, size > 65535 | 46.7 |
os_packed_list | 66.7 |
Calculating an upper bound on collection space
You can use this information to get an upper bound on collection space overhead:
maximum total size = size_of(header) + size * ( size_of(entry) + size_of(count slot) ) / (minimum fill)
Updated: 03/31/98 15:28:08