ObjectStore Java API User Guide
Chapter 7

Working with Collections

ObjectStore provides a set of persistence-capable utility collections classes in the COM.odi.util package. These classes mirror those provided in the upcoming JDK 1.2 release.

ObjectStore includes another package that contains collections classes. The COM.odi.coll packages provides the API for the ObjectStore peer collections. Use these collections when you want to access C++ as well as Java. Information about these collections is in the book Developing ObjectStore Java Applications That Access C++.

This chapter discusses the following topics:

Description of ObjectStore Utility Collections

How to Choose a Collections Alternative

Using ObjectStore Utility Collections

Querying ObjectStore Utility Collections

Enhancing Query Performance with Indexes

Storing Objects as Keys in Persistent Hash Tables

Using Third-Party Collections Libraries

Description of ObjectStore Utility Collections

ObjectStore provides a number of utility collections interfaces and classes in the COM.odi.util package. In addition, ObjectStore provides a query facility in the COM.odi.util.query package.

A collection is an object that groups together other objects. It provides a convenient means of storing and manipulating groups of objects, and supports operations for inserting, removing, and retrieving elements.

Collections form the basis of the ObjectStore query facility, which allows you to select those elements of a collection that satisfy a specified condition. However, some collections can be queried, and others cannot. Consequently, before you create a collection and store it in a database, you should consider how you plan to use a collection. When you know what you need, you can select the best persistent collection representation for your application.

To introduce you to the ObjectStore utility collections facility, this section discusses the following topics:

Introduction to COM.odi.util Interfaces and Classes

The COM.odi.util.Collection and COM.odi.util.Map interfaces provide methods for operating on ObjectStore collections.

The ObjectStore utility collections facility provides the persistence-capable COM.odi.util classes shown in the following table. Most of these classes implement a COM.odi.util interface (many implement other interfaces as well).

Class Implements
OSHashBag Collection
OSHashMap Map
OSHashSet Set
OSHashtable None
OSTreeMapByteArray Map
OSTreeMapDouble Map
OSTreeMapFloat Map
OSTreeMapInteger Map
OSTreeMapLong Map
OSTreeMapString Map
OSTreeSet Set
OSVector Collection
OSVectorList List

Postprocessing

You do not need to postprocess the classes in the utility collections facility. They are already persistence-capable. If you define a subclass that extends any of these classes and you want the subclass to be persistence-capable, you must either run the postprocessor on the subclass or manually annotate the subclass.

Example

The query demo provides an example of using ObjectStore with utility collections. See the README file in the COM/odi/demo/query directory.

JDK 1.2

The JDK 1.2 collections interfaces specify the behavior of the hashCode() method on instances of the Set, Map, and List types. This hashCode() specification is based on the contents of the collection; the hashCode of a collection changes depending on what elements are added or removed. This means that it is not advisable to store an instance of a set, map, or list class in a hash table, unless the set or list is immutable and will never change.

Future change

After the JDK 1.2 is released, Object Design will modify ObjectStore so that it implements the JDK 1.2 collections interfaces. At that time, ObjectStore will no longer need to provide, and so will not provide, the following interfaces:

Therefore, this discussion of transient views of Map classes pertains to OSHashtable as well.)

Description of OSHashBag

An OSHashBag is an unordered collection that allows duplicates. OSHashBags not only keep track of what their elements are, but also of the number of occurrences of each element. As the name implies, a hash table is the internal representation for an OSHashBag. OSHashBag directly implements the COM.odi.util.Collection interface and so you can query instances of OSHashBag.

Description of OSHashMap

An OSHashMap is also an unordered collection that allows duplicates. Unlike OSHashBag, OSHashMap associates a key with each value in the map. When you insert a value into an OSHashMap , you specify the key along with the value. You can retrieve a value with a given key. The internal representation of an OSHashMap is a hash table. OSHashMaps do not allow null keys or null values.

Since OSHashMap implements the Map interface rather than the Collection interface, you cannot query OSHashMaps. However, you can query the collection views of a map: Map.keySet(), Map.values(), and Map.entries(). See Querying Collection Views of Map Entries.

The OSHashMap.equals() method performs value (contents) comparisons as described by Map.equals() to determine whether two Maps are equal. This is the only difference between OSHashMap and OSHashtable. The OSHashtable.equals() method compares the identity of the two objects to determine equality. The OSHashtable.hashcode() method generates a hash code based upon object identity; it is not based on the contents of the OSHashtable. For information about content comparisons and identity comparisons, see OSHashtable and OSVector.

A call to OSHashMap.hashCode() throws UnsupportedOperationException. See Unsupported operations.

Description of OSHashSet

An OSHashSet is an unordered collection that does not allow duplicates. If you try to insert a value into an OSHashSet and the set already contains that value, the set remanins unchanged. OSHashSet implements the COM.odi.util.Set interface. As its name implies, a hash table is the internal representation of an OSHashSet. Since OSHashSet indirectly implements COM.odi.util.Collection, you can query OSHashSets.

OSTreeSets are capable of storing much larger persistent collections than OSHashSets. However, OSTreeSets must be persistent; it is not possible to create a transient instance of an OSTreeSet. If your collection is small, an OSHashSet is the best choice. If your collection is large, an OSTreeSet performs better.

A call to OSHashSet.hashCode() throws UnsupportedOperationException. See Unsupported operations.

Description of OSHashtable

An OSHashtable is also an unordered collection that allows duplicates. This class has the same APIs as java.lang.Hashtable.

OSHashtable associates a key with each element. When you insert an element into an OSHashtable , you specify the key along with the element. You can retrieve an element with a given key. While the internal representation of an OSHashtable is a hash table, it is a map-like structure.

Since OSHashtable does not implement the COM.odi.util.Collection interface, you cannot query OSHashtables. However, you can query the collection views of an OSHashtable. See Querying Collection Views of Map Entries.

The OSHashtable.equals() and OSHashtable.hashCode() methods perform reference (identity) comparisons and not value (contents) comparisons. This is the only difference between OSHashtable and OSHashMap. The OSHashMap methods perform content comparisons. For information about content comparisons and identity comparisons, see OSHashtable and OSVector.

By default, an OSHashtable allocates room for 50 elements. You can presize an OSHashtable to better match what your application really needs. In addition, you can delay allocation of OSHashtable substructure, which ObjectStore uses to represent the OSHashtable, until elements are actually added to the OSHashtable. To do this, specify the lazy argument to the OSHashtable contructor:

OSHashtable(int intitialBufferSize, int capacityIncrement, 
      boolean lazy)

Description of OSTreeMapxxx

OSTreeMap is based on a binary tree representation that is tuned for large persistent collections. OSTreeMap is an abstract class with several concrete subclasses. In all OSTreeMap xxx instances, the values are objects. As for the keys, there are separate classes for different types of keys, as shown in the following table:

Class Key Type
OSTreeMapByteArray ByteArray
OSTreeMapDouble Double
OSTreeMapFloat Float
OSTreeMapInteger Integer
OSTreeMapLong Long
OSTreeMapString String

An OSTreeMap xxx is an unordered collection that allows duplicates. Each OSTreeMap xxx associates a key with a value in the map. When you insert a value into an OSTreeMap xxx , you specify the key along with the value. You can retrieve a value with a given key. OSTreeMap xxxs do not allow null keys or null values.

The OSTreeMap xxx classes extend OSTreeMap, which implements Map. Consequently, you cannot query OSTreeMap xxxs. However, you can query the collection views of a map: Map.keySet(), Map.values(), and Map.entries(). See Querying Collection Views of Map Entries.

The OSTreeMapxxx classes are designed for very large persistent aggregations. These classes allow you to iterate over the collection or query the collection without fetching any objects from the database except those that are explicitly returned to you. ObjectStore does not even create hollow objects to represent the elements. OSTreeMap collections can only be persistent.

A call to OSTreeMap.hashCode() throws UnsupportedOperationException. See Unsupported operations.

Each OSTreeMap xxx class has a constructor for exported objects.

Description of OSTreeSet

An OSTreeSet is an unordered collection that does not allow duplicates. If you try to insert a value into an OSTreeSet and the set already contains that value, the set remains unchanged. OSTreeSet implements the COM.odi.util.Set interface. As its name implies, a balanced tree is the internal representation of an OSTreeSet. Since OSTreeSet indirectly implements COM.odi.util.Collection, you can query OSTreeSets.

The OSTreeSet class is designed for very large persistent aggregations. This class allows you to iterate over the collection or query the collection without fetching any objects from the database except those that are explicitly returned to you. ObjectStore does not even create hollow objects to represent the elements. OSTreeSet collections can only be persistent.

Object Design recommends that if you are going to query a collection that contains a particularly large number objects, define the collection as an OSTreeSet or a subclass of OSTreeSet. OSTreeSet is the only collections class for which ObjectStore provides the ability to add indexes. Indexes can speed up queries on very large collections. You can, of course, define the ability to add indexes to other types collections that implement COM.odi.util.Collection. See Enhancing Query Performance with Indexes.

The main difference between OSTreeSet and OSHashSet is the internal representation. For very large collections, OSTreeSet is the best choice. However, OSTreeSets can only be persistently allocated. It is not possible to create a transient OSTreeSet.

A call to OSTreeSet.hashCode() throws UnsupportedOperationException. See Unsupported operations.

The OSTreeSet class has a constructor for creating exported objects.

Description of OSVector

An OSVector is a collection that implements a persistent expandable array, as well as COM.odi.util.Collection. You can query OSVectors.

An OSVector associates each element with a numerical position based on insertion order. By default, OSVectors allow duplicates. In addition to simple insert (insert into the beginning or end of the collection) and simple remove (removal of the first occurrence of a specified element), you can insert, remove, and retrieve elements based on a specified numerical position, or based on a specified iterator position. An OSVector does not have quick lookup by object or key. Consequently, the overhead for an OSVector is lower than for utility collections that have quick lookup.

The OSVector.equals() and OSVector.hashCode() methods perform reference (identity) comparisons and not value (contents) comparisons. This is one difference between OSVector and OSVectorList. The OSVectorList methods perform content comparisons. For information about content comparisons and identity comparisons, see OSHashtable and OSVector.

By default, an OSVector allocates room for 32 elements. You can presize an OSVector to better match what your application really needs. In addition, you can delay allocation of OSVector substructure, which ObjectStore uses to represent the OSVector, until elements are actually added to the OSVector. To do this, specify the lazy argument to the OSVector contructor:

OSVector(int intitialBufferSize, int capacityIncrement, boolean lazy)

Description of OSVectorList

An OSVectorList is a collection that implements a persistent expandable array. It implements the List interface and functions exactly like an OSVector, except in the following way.

An OSVectorList does not have quick lookup by object or key. Consequently, the overhead for an OSVectorList is lower than for utility collections that have quick lookup.

The OSVectorList.equals() and OSVectorList.hashCode() methods perform value (contents) comparisons and not reference (identity) comparisons. This makes OSVectorList unsuitable for storage in a persistent hash table or any other hash table based collection representation. The OSVector methods perform identity comparisons. For information about content comparisons and identity comparisons, see OSHashtable and OSVector.

A call to OSVectorList.hashCode() throws UnsupportedOperationException. See Unsupported operations.

Advantages of Using ObjectStore Utility Collections

The advantages of using COM.odi.util interfaces and classes are as follows:

Querying Collection Views of Map Entries

The OSHashMap and OSTreeMap xxx classes extend COM.odi.util.Map and not COM.odi.util.Collection, and therefore you cannot use the ObjectStore query facility on them. However, each of the classes that implements Map defines the following methods:

The OSHashtable class, although it does not implement Map, also defines these methods.

You can use the ObjectStore query facility to query the Collection and Set views returned by the keySet(), values(), and entries() methods.

Transient views

While OSHashtable, OSHashMap, and the OSTreeMap xxx subclasses are persistence-capable, the views returned by the entries(), keySet(), and values() methods are not. These are transient views of persistence-capable classes.

Background About Utility Collections and JDK 1.2 Collections

Here is some background information about how the ObjectStore utility collections fit with the JDK 1.2 collections. This discussion assumes that you are familiar with the JDK 1.2 collections API. If you are not, see http://java.sun.com/products/jdk/1.2/docs/guide/collections/reference.html.

ObjectStore provides a collections package that parallels the JDK 1.2 java.util collections. In addition, ObjectStore includes query and indexing facilities. The new collections implementations are in the COM.odi.util package.

The core collections interfaces defined in the JDK 1.2 java.util package are:

In the JDK 1.2, collections classes and behaviors are based on these interfaces. Consequently, you can usually use any representation that is parallel to a particular interface. The java.util implementations and their corresponding ObjectStore implementations are shown in the following table:

Interface java.util Class ObjectStore Class
Collection None COM.odi.util.OSHashBag
Set java.util.HashSet COM.odi.util.OSHashSet
Set java.util.ArraySet COM.odi.util.OSTreeSet
List java.util.Vector COM.odi.util.OSVector
List java.util.ArrayList COM.odi.util.OSVectorList
List java.util.LinkedList None
Map java.util.Hashtable COM.odi.util.OSHashtable
Map java.util.HashMap COM.odi.util.OSHashMap
Map java.util.ArrayMap None
Map java.util.TreeMap COM.odi.util.OSTreeMapxxx

Unsupported operations

In the COM.odi.util package, all persistence-capable collections that implement the Set, List, and Map interfaces throw the UnsupportedOperationException, when the hashCode() method is invoked on them. This is because the definition of the computation of hashCode() for these interfaces is currently in a state of flux in JDK 1.2 beta 3. When JDK 1.2 collections are finalized, ObjectStore will provide hashCode() methods that conform to their JDK 1.2 specificatons. In the interim, you can subclass these representations, and define a suitable overriding hashCode() method if your applications needs it.

OSHashtable and OSVector

COM.odi.util.OSHashtable and COM.odi.util.OSVector have been updated to be parallel to most of the JDK 1.2 specifications. They do not quite meet the description of the JDK 1.2 behavior for equals() and hashCode(). The JDK 1.2 changed this behavior in an incompatible way for these two classes.

The JDK 1.2 List, Set, and Map interfaces mandate an equals() method that does value comparison and not reference comparison. That is, two Sets are equal if they have the same elements, two Lists are equal if they have the same elements in the same order, and two Maps are equal if they have the same key/value pairs.

This places corresponding constraints on the hashCode() method, since (a.equals(b)) => (a.hashCode()==b.hashCode()). The ObjectStore OSHashtable and OSVector classes, however, implement persistent (unchanging) hashCodes, and rely on Object.equals(). The JDK definition for hashCode means that classes that meet the JDK 1.2 specification should not be stored in hash tables, because their hashCodes change when elements are added or removed. So for these two classes, ObjectStore retains the old identity-based definitions, rather than moving to the new content-based definitions of equals() and hashCode().

Collection interface

There are no concrete implementations of the Collection interface in the JDK 1.2. Collection is essentially a Bag, that is, a Set that might contain duplicates. ObjectStore includes the COM.odi.util.OSHashBag and COM.odi.util.OSVector classes to implement Collection.

How to Choose a Collections Alternative

Your choice of how to implement collections depends on

For applications that already use the COM.odi.coll collections, the COM.odi.util.OSTreeMap xxx collections are comparable to the COM.odi.coll.Dictionary_ xxx classes, while the COM.odi.util.OSTreeSet class is comparable to the COM.odi.coll.Set class. These classes are comparable in that their performance should be about the same.

To help you choose the right persistent collection representation for your application, the following table compares the behavior of the utility collections in COM.odi.util.



Collection Class Ordered/ Unordered Duplicates/
No duplicates
Quick Lookup Comparison Operations Queries Allowed Collection Size
OSHashBag Unordered Duplicate values allowed Object lookup Identity-based Can query Medium
OSHashMap Unordered Duplicate values allowed No duplicate keys Key lookup Content-based No queries Medium
OSHashSet Unordered No duplicates Object lookup Content-based Can query Medium
OSHashtable Unordered Duplicate values allowed No duplicate keys Key lookup Identity-based No queries Medium
OSTreeMap xxx Ordered Duplicate values allowed No duplicate keys Key lookup Content-based No queries Large
OSTreeSet Unordered No duplicates Object lookup Content-based Can query
and index
Large
OSVector Ordered Duplicate values allowed None Identity-based Can query Small, medium
OSVectorList Ordered Duplicate values allowed None Content-based Can query Small, medium

The OSHashtable class is not compatible with the JDK 1.2 API. All other collections in COM.odi.util are compatible with the JDK 1.2 API.

Using ObjectStore Utility Collections

To help you use ObjectStore utility collections, this section discusses the following topics:

Creating Collections

Each collection representation has one or more constructors that you can use to create collections. For details about each classes' constructors, see the ObjectStore Java API Reference. For example:

Database db = Database.create(args[1], ALL_READ | ALL_WRITE);
Transaction.begin(UPDATE);
db.createRoot("collection", new OSTreeSet(db));
Transaction.current().commit();

Navigating Collections with Iterators

The Iterator and ListIterator interfaces help you navigate within a utility collection. An iterator, an instance of the COM.odi.util.Iterator or COM.odi.util.ListIterator interface, designates a position in a collection. You can use iterators to traverse collections, as well as to remove elements from collections.

With the JDK 1.2, Iterator takes the place of Enumeration. Iterator provides the same capabilitiess as Enumeration (though method names are different), and it also allows you to remove elements from the underlying collection. When the JDK 1.2 is released, ObjectStore will implement the JDK 1.2 Iterator and ListIterator interfaces and will no longer provide Iterator and ListIterator in COM.odi.util.

The ListIterator interface extends the Iterator interface. A class that implements ListIterator must also implement List. The additional methods that ListIterator provides allow you to

The IndexIterator interface, also in COM.odi.util, allows you to traverse an index or map structure. You can use the IndexIterator interface to obtain the key and value for elements in the underlying collection.

Performing Collection Updates During Iteration

While you are iterating through a collection, you can use the Iterator and ListIterator interface methods to modify that collection. This assumes that the implementation of the Iterator or ListIterator interface supports the methods that modify underlying collections. (The JDK 1.2 defines some of these methods as optional. You should check the API reference information for the particular class you are using to determine exactly which behaviors are supported.)

You cannot use any other methods to update the collection while you are iterating through its elements. If you try to, ObjectStore throws ConcurrentModificationException.

When a thread is iterating over a collection, that thread and cooperating threads can modify the object returned by the iteration. If you are using an Iterator, your application cannot add elements to the collection or change the order of the collection. If you are using a ListIterator, your application can only use ListIterator methods to modify the collection.

Suppose you do add an element in the middle of an iteration, and then try to use the same iterator. ObjectStore recognizes that the collection has been modified and throws ConcurrentModificationException. At this point, if you create a new iterator, it recognizes the updated collection and does not throw an exception.

Querying ObjectStore Utility Collections

The COM.odi.util.query.Query class provides a mechanism for querying collections objects that implement the COM.odi.util.Collection interface. A query applies a predicate expression (an expression that evaluates to a boolean result) to all elements in a collection. The query returns a subset collection of all elements for which the expression is true. You can query the following types of collections:

To accelerate the processing of queries on particularly large collections, you can build indexes on the collection. For information about indexes, see the next section, Enhancing Query Performance with Indexes.

This section provides the following information about queries on ObjectStore utility collections:

Creating Queries

To create a query, run the COM.odi.util.query.Query constructor and pass in a Class object and a query string. Here is the constructor:

public Query(Class elementType, String queryExpression)
There is also a constructor that allows you to specify a FreeVariables map.

elementType

The elementType class or interface provides the context in which the query facility interprets queryExpression. This must be a publicly accessible class or interface. When your application calls the Query.select() or Query.pick() method to execute the query against a particular collection, every element of that collection must be an instance of (in the sense of instanceof) the elementType that was specified when the query was created. Any element of the collection that is not an instance of elementType is not returned in the query result (even if it evaluates to true for the predicate).

queryExpression

The queryExpression is a predicate (that is, an expression with a boolean result) that the query facility evaluates on each element of the collection. The queryExpression operands can be literals and names.

Literals

Literals can be of any of the Java primitive types, including the special values true, false, and null. Since the query expression is a String, you must enclose any embedded strings in escaped quotation marks, like \"this\".

Names

Names can consist of a single identifier, or they can consist of a sequence of identifiers separated by periods. Names can be either free variables or member accesses. You must explicitly specify free variables in the freeVariables argument of the three-argument Query constructor. Any name that is not a free variable is interpreted as a member access.

Member accesses are interpreted as accessing public members (including static members) of an object of class/interface elementType, if possible. This interpretation works as though there were an implicit this argument, of elementType, at the root of the name expression. Any member access that cannot be interpreted as a member access on elementType is interpreted as a static access. Static accesses are resolved as if the package containing elementType were imported.

Queries can contain methods that take arguments.

Example

For example:

Query q = new Query(Employee.class, "salary < 50000");
The query expression can refer to classes without specifying a package name. ObjectStore treats the query expression as if it were defined in a file in another package that has imported the package of the Class object that was passed to the Query constructor. This default package only matters for class names, though, not for member access. Only public classes and members are accessible within the query.

An application can run the example query on a specific collection with a call to the Query.select() method that specifies the collection to be queried as the argument. For example:

Query q = new Query(Employee.class, "salary < 50000");
Collection employees = db.getRoot("employees");
Set result = q.select(employees);
When you create a query, you do not bind it to a particular collection. You can create a query, run it once, and throw it away. Alternatively, you can reuse a query multiple times against the same collection (perhaps with different bindings for free variables), or against different collections.

If something in your query is wrong, you find out at the point where you create the query. You do not need to wait for the application to optimize or execute the query. However, the query facility cannot detect incorrect free variable bindings until you specify them when you execute the query on a collection.

Description of Query Syntax

ObjectStore performs syntax analysis of the query expression in the context of the elementType class or interface that is passed to the query constructor. This must be a publicly accessible class or interface. It can also be a derived type.

When the query is executed against a particular collection using the select() or pick() method, every element of that collection must be an instance of (in the sense of instanceof) the elementType that was specified when the query was created.

The queryExpression is a predicate (that is, an expression with a boolean result). The query is executed on a collection by evaluating this query expression on each element of the collection. However, it might not be necessary to explicitly fetch and examine all elements of the collection. This depends upon the available indexes and query optimization strategy.

Supported operations

Queries on utility collections can include most Java operations:

Unsupported operations

The following operations are not supported:

Statements are not permitted. Only expressions are permitted.

For details on operations and the operands, see the Java Language Specification.

The operators have their usual Java meaning except for the relational and equality operators when used with String operands. In a query expression, ObjectStore uses these operators to compare the contents of the two strings. Null Strings are considered to be less than all other values.

String literals

In a query expression, you must enclose String literals in escaped quotation marks. For example:

new Query(Foo.class, "name == \"Davis\"")
You can specify wildcards in query strings. You can search for substrings, and perform case insensitive searches. See Matching Patterns in Query Strings.

Wrapper objects

The query facility treats wrapper objects just like other Objects. For example, suppose you have the query expression "A==B". A and B refer to Integer wrappers. This results in an identity check on the objects. The query facility determines whether A and B both refer to the same wrapper instance. The query facility does not check that the values of A and B are equal. You can specify "A.intValue()==V.intValue()" to compare contents.

This behavior might change in a future release so that the query facility treats wrapper objects in the way that it treats primitives. Consequently, you should not rely on the identity check for wrapper objects.

Miscellaneous

You can use parentheses to group expressions.

The precedence and associativity of the operators is the same as that for the Java language.

The entire query expression must resolve to a Boolean value.

Sample Program That Uses Queries

In the COM/odi/demo/query directory, there is a sample program that uses ObjectStore utility queries. See the README.htm file in that directory.

Matching Patterns in Query Strings

Specifying a pattern matching query

To specify a string pattern to be matched in a query, the Pattern Matching operator (~~) is used. This operator, which has greater precedence than the Multiplication operator (*), has two arguments. These arguments must be either Strings or null. The left-hand argument specifies the text to be checked for a match. The right-hand argument specifies the pattern to be matched.

Pattern matching characters

The following characters have special meanings when used in the right-hand argument of the Pattern Matching operator. All other characters match themselves.

Operator Function
? Matches any single character
* Matches 0 or more of any character
& Escape character
[ Reserved
] Reserved
( Reserved
) Reserved
| Reserved

Note

The reserved characters are invalid if they are not preceded by an ampersand (&).

The following table shows special two character sequences, known as escape sequences, that start with an ampersand (&). These escape sequences are used to include characters literally in the pattern without their special meaning and to enable case insensitive matching.

Note that the ampersand (&) must appear in front of every sequence. An ampersand followed by any other character is invalid.

Escape Sequence Function
&? Matches a question mark
&* Matches an asterisk
&[ Matches left square bracket
&] Matches right square bracket
&( Matches left parentheses
&) Matches right parentheses
&| Matches a vertical bar
&& Matches an ampersand
&i Enables case insensitive matching.

Case sensitivity in matching

By default, pattern matches are case sensitive. The &i escape sequence enables case insensitive matching for an entire pattern. This escape sequence can only be specified at the start of a pattern.

Optimizing pattern matching

Pattern matching operator takes advantage of any ordered indexes available on the text being matched. If the pattern starts with a character other than an asterisk (*) or a question mark (?), then the query only searches the portion of the index that matches the initial, constant prefix. Therefore, patterns that specify a constant prefix produce much more efficient queries.

Pattern matching examples

The following pattern matching examples use the following class:

public class Person { public String name; }
      new Query(Person.class,"name ~~ \"Tom*\"");
      new Query(Person.class,"name ~~ \"*man\" || name ~~\"*burn\"");
      FreeVariables vars = new FreeVariables();   
      vars.put("var", String.class);    
      Query query = new Query(Person.class,"name ~~ var", vars);
       FreeVariableBindings bindings = new FreeVariableBindings();            
        bindings.put("var","*Gr?y");
      query.select(coll, bindings);
   new Query(Person.class,"name ~~ \"&i&?foo\"");
         new Query(Person.class,"name ~~ \"&i*&*foo*\"");
         new Query(Person.class,"name ~~ \"*foo*&&bar*\"");
         new Query(Person.class,"name ~~ \"&(a&)\"");

Using Free Variables in Queries

Free variables are lexically the same as identifiers in the Java language. If you use free variables in your query, you must specify them in an optional third argument to the Query constructor. Use the COM.odi.util.query.FreeVariables class. This class implements the Map interface. In addition, it provides type-checking to ensure that the keys and values are Strings and Classes, respectively. For example:

FreeVariables vars = new FreeVariables();
vars.put("INPUT_SALARY", int.class);
Query q = new Query(Person.class, 
      "salary>=INPUT_SALARY", vars);
When you execute a query, you must bind any free variables to particular values. Do this by passing an additional argument to the Query.select() or Query.pick() method. This argument must be of type COM.odi.util.query.FreeVariableBindings. This class, like FreeVariables, implements the Map interface, and provides additional type-checking to ensure that the keys are Strings.

The values you bind to the free variables must be of the type specified by the corresponding entry in the FreeVariables map that was specified at query construction. For primitive types, the type of value stored in the FreeVariableBindings must be the associated wrapper type. ObjectStore does not check that the correct types are bound until it executes the query.

For example, the INPUT_SALARY free variable is used in the previous example query. Your application might read in a value from a user in an interactive program, or compute the value in some other way. Regardless of how your application computes the value, the free variable is bound to a specific value only when the query is executed. For example:

int INPUT_SALARY = { user input or some other computation}
FreeVariableBindings bindings = new FreeVariableBindings();
bindings.put("INPUT_SALARY", new Integer(INPUT_SALARY));
Set result = q.select(employees, bindings);

Executing Queries

You can execute a query that

Obtaining a set

To obtain the set of elements that satisfy a query, call the COM.odi.util.query.Query.select() method. There are two overloadings:

public Set select(Collection coll)
public Set select(Collection coll, 
      FreeVariableBindings freeVariableBindings)
The coll argument specifies the collection to be queried. If this query has been explicitly optimized with the Query.optimize() method, any indexes specified in the optimization must be available on this collection. If this query has not been explicitly optimized, ObjectStore optimizes it for all indexes on the collection being queried. If the query has been explicitly optimized for indexes that are not available on the specified collection, ObjectStore throws QueryIndexMismatchException.

The freeVariableBindings argument specifies a FreeVariableBindings object that defines bindings for each free variable in the query. For each entry, the key is a String that identifies the free variable, and the value is the value that should be associated with the free variable during the evalution of the query. The value must be of the type specified by the corresponding entry in the FreeVariable argument passed to the Query constructor. For the query to be evaluated, every free variable associated with the query when it was constructed must have a corresponding binding. Also, every free variable binding must correspond to a free variable that was specified when the query was constructed. If the free variable bindings do not match the free variable definitions specified when the query was constructed, ObjectStore throws QueryException.

The select() method returns a Set that contains the elements that satisfy the query. If ObjectStore does not find any matching elements, it returns an empty collection. The returned Set is transient.

Obtaining a single element

To obtain one element that satisfies a query, call the COM.odi.util.query.Query.pick() method. There are two overloadings:

public Object pick(Collection coll)
public Object pick(Collection coll, 
      FreeVariableBindings freeVariableBindings)
The coll and freeVariableBindings arguments are the same as for the select() method. The pick() methods return the first element found that satisfies the query. The returned element is transient. If no elements in the collection satisfy the query, ObjectStore returns NoSuchElementException.

Type of returned element

The select() and pick() methods never return elements that are not of the class that was specified as the collection element type when the query was constructed.

Null values

Queries ignore null elements but not null fields. The result set of a query never includes null elements. When a query reaches a null element, execution continues to the next element. Suppose you have a query like this:

name != "fred"
A query that evaluates this on a collection returns elements with null name fields, as well as elements with names that are not "fred".

Now suppose you have a query like this:

spouse.name != "fred"
On a collection that includes elements that do not have spouses, this query does not return those elements without spouses. It only returns the elements that have spouses with names that are not "fred" plus the elements that have spouses with null name fields.

Limitations on Queries

When a query refers to a class or field, the class or field must be public.

When a query refers to a method, the method must return something. In other words, in a query string, you cannot refer to a method that returns void.

Queries no longer have the limitation against methods that take arguments. Queries can contain methods that take arguments.

Enhancing Query Performance with Indexes

When you want to run a query on a particularly large collection, it is useful to build indexes on the collection to accelerate query processing. An index provides a reverse-mapping from a field value or from the value returned by a method when it is called, to all elements that have the value. A query that refers to an indexed member executes faster. This is because it is not necessary to examine each object in the collection to determine which elements match the predicate. Also, ObjectStore does not need to fetch into memory every element.

This section discusses the following topics:

How Indexes Work

When you add an index to a collection, ObjectStore examines every element of the collection to determine the value of the indexed field or method. After you build the index, you can run queries against the collection without reexamining the elements to determine the values of any indexed members. The query examines the index instead of the collection.

A query can include both indexed fields/methods and nonindexed fields/methods. ObjectStore evaluates the indexed fields and methods first and establishes a preliminary result set. ObjectStore then applies the nonindexed fields/methods to the elements in the preliminary result set.

Adding Indexes to Collections

You can add indexes to any collection that implements the COM.odi.util.Collection interface, directly or indirectly. To add an index to a collection, the collection must implement the COM.odi.util.IndexedCollection interface, directly or indirectly. Note that the IndexedCollection interface extends the Collection interface.

The IndexedCollection interface provides methods for adding and removing indexes, and updating indexes when the indexed data changes. In this release of ObjectStore, COM.odi.util.OSTreeSet is the only collection class that already implements IndexedCollection. You can, of course, define other COM.odi.util.Collection classes that implement IndexedCollection. Call the COM.odi.util.IndexedCollection.addIndex() method to create an index. There are three overloadings:

The elementType argument indicates the type to which the index applies. Objects of other types can be in the collection that you index, but they are ignored by the index. A query that uses the index does not return such elements.

The path argument indicates the member to be indexed. A method member can have no arguments or one constant argument.

The ordered and duplicates arguments allow you to specify whether the index is ordered and whether it allows duplicates. If you do not specify the boolean arguments, the index is unordered and it allows duplicates.

Finally, the placement parameter indicates the database or segment in which to store the index. The path must be either the name of a public field or a call to a public instance method. If it is not, ObjectStore throws IndexException. The public instance method can be in a superclass. Indexes on paths that specify more than one field or method access are not allowed. If you do not pass a Placement argument, ObjectStore stores the index in the same database and segment as the collection.

Dropping Indexes from Collections

Call the COM.odi.util.IndexedCollection.dropIndex() method to remove an index from a collection. Here is the method signature:

public boolean dropIndex(Class elementType, String path)
The elementType argument indicates the type to which the index applies.

The path argument indicates the member for which the index is being removed.

Sample Program That Uses Indexes

In the COM/odi/demo/query directory, the QueryCustomers class includes the following example of using an index:

IndexedCollection collection = new OSTreeSet(db);
try {
      collection.addIndex(Employee.class, "salary");
}       catch (IllegalAccessException e) {
      System.err.println("Couldn't access field: " + e);
      System.exit(1);
}
Set result = q.select(employees);

Modifying IndexValues

After you add an index to a collection, ObjectStore automatically maintains it as you add or remove elements from the collection. However, it is your responsibility to manage index maintenance when indexed members are modified for instances that are already members of an indexed collection.

For example, suppose you insert Lee into your collection of employees. You build an index for this collection on the phoneExtension field. A query of "phoneExtension == 1234" returns Lee. If you remove Lee from the collection, ObjectStore updates the index so it no longer includes Lee. However, if you leave Lee in the collection, but change Lee's phone extension, you must manually correct the index so that Lee refers to the correct phone extension.

Methods

There are three methods that you can use to manually maintain an index:

After an application calls one of these methods, the next time the application uses that index it uses the updated index. A call to updateIndex() does the same thing as a call to removeIndex() followed by a call to addToIndex(). Except, removeIndex() and addToIndex() inspect the value to determine the index key. That is, they apply the index's path expression to obtain the key from the value. With updateIndex(), you pass in the old key and the new key. ObjectStore does not have to inspect the value to determine its key. For this reason, and because there is a single call, using updateIndex() is more efficient.

Removing and adding index values

The removeFromIndex() method has two overloadings:

public void removeFromIndex(Object value)
public void removeFromIndex(Class elementType, 
      String path, Object value)
The addToIndex() method has two parallel overloadings:

public void addToIndex(Object value)
public void addToIndex(Class elementType, 
      String path, Object value)
Usually, after you remove a value from an index, you should add a value to replace it.

If you know exactly which value you need to add or remove, you can use the form that specifies elementType, path, and value. If you do not know what indexes exist, or if you modified a lot of different fields and want to update all indexes, use the short form. In this case, ObjectStore iterates over all indexes and updates all of them.

Here is an example of removing and adding values to an index:

Employee lee = new Employee("Lee", 1234);
collection.insert(lee);
try {
      collection.removeFromIndex(lee);
      lee.setExtension(5678);
      collection.addToIndex(lee);
}       catch (IllegalAccessException e) {
      System.err.println("Could not access field: " + e);
      System.exit(1);
}

Updating indexes

The updateIndex() method has the following signature:

public void updateIndex(Class elementType, 
      String path, Object oldKey, Object newKey, Object value)
Here is an example of updating an index:

Employee lee = new Employee("Lee", 1234);
collection.insert(lee);
lee.setExtension(5678);
collection.updateIndex(
      Employee.class, "extension", 
      new Integer(1234), new Integer(5678), lee);

Managing Indexes and Index Values

When you add or drop an index, you do it at the class level. That is, you specify the class and member that the index is on. For example, you might add an index on the name field of the Employee class:

employeeCollection.addIndex(Employee, "name")
But when you perform maintenance on an index, that is, when you call removeFromIndex(), addToIndex(), or updateIndex(), you do it at the instance level. For example, suppose you have an employee named Jones with an employee ID number of 1234. The employee's name changes to Smith. You must update this index entry at the instance level. One way you can do it is like this:

employeeCollection.removeFromIndex(employee1234);
employee1234.setName("Smith");
employeeCollection.addToIndex(employee1234);
For each index on the Employee class, these methods update the index's value for employee1234. If there are multiple indexes on Employee, the one-argument overloading of removeFromIndex() and addToIndex() updates all of them. You do not have to specify that you want to update the index on the name field. For example, there might be indexes on the Employee.salary and Employee.location fields, as well as the Employee.name field. The previous code fragment would update the indexes on salary and location, as well as the index on name, even though only the index on name needs to be updated. This technique is useful when you make a lot of changes to different fields.

If you use the three-argument overloading of removeFromIndex() or addToIndex(), you can update just the index that needs to be updated. You must know the type of the indexed element, the name of the indexed member, and the value to be removed or added. For example:

employeeCollection.removeFromIndex(
      Employee, "name", employee1234);
employee1234.setName("Smith");
employeeCollection.addToIndex(
      Employee, "name", employee1234);

Optimizing Queries for Indexes

If you do not explicitly optimize a query for a particular set of indexes, ObjectStore automatically optimizes the query when it applies the query to a collection. This means that ObjectStore optimizes the query to use exactly those indexes that are available on the collection being queried.

Preparation

Before you optimize a query, you must obtain an instance of IndexDescriptorSet. An IndexDescriptorSet implements a set of IndexDescriptor objects. An IndexDescriptor is an object that describes an IndexMap on an instance of IndexedCollection. Typically, you can obtain an IndexDescriptorSet with a call to IndexedCollection.getIndexes() on any collection that has exactly the indexes for which you want to optimize your query.

Explicit optimization

To explicitly optimize a query, call the Query.optimize() method. the method signature is:

public synchronized void optimize(IndexDescriptorSet indexes)
The indexes argument is an instance of IndexDescriptorSet that contains IndexDescriptor objects that describe the indexes against which to optimize.

Reoptimizing

If you apply an optimized query to the same collection again, or to another collection with the same indexes, ObjectStore uses the same optimization. Reoptimization is not required. However, suppose you apply an optimized query to a collection that does not have all the indexes that were present when the query was first run. In this situation, ObjectStore must reoptimize the query. ObjectStore does this automatically; your intervention is not required.

Manual optimization

Automatic index optimization is convenient, and effective. However, suppose a query is to be run multiple times against more than one collection, potentially with different indexes available. In this situation, it might be best to manually control the query optimization strategy.

For example, consider that the same query is to be run repeatedly against two different collections, where the collections have different indexes. One alternative is to create two separate query objects, one for each collection. This avoids the overhead of recomputing the indexing optimization strategy each time you apply the query to a different collection. A second alternative is to explicitly optimize a query to use only the intersection of the indexes that are available on both collections. You can do this with a call to Query.optimize(). Pass in an IndexDescriptorSet object that contains descriptions of only the common indexes.

Restriction

If you explicitly optimize a query with the Query.optimize() method, it cannot run against a collection that does not have the specified indexes. If you try to do this, ObjectStore throws QueryIndexMismatchException. In this way, an explicitly-optimized query differs from an automatically-optimized query. An automatically optimized query reoptimizes itself as needed when you run it against a collection with different indexes.

This might be useful when it would be undesirable to run a particular query on a collection that does not have the required indexes. For example, this is useful when the collection is very large and the overhead of examining every element of the collection is prohibitive.

-noclassgc option to Java VM

To evaluate query expressions efficiently, ObjectStore compiles query expressions into classes and methods that are loaded when the query is evaluated. Each new query can potentially result in the creation of a new class with a new internal name to represent the compiled state of the query. When the query is no longer referenced, this class is normally garbage collected by the Java VM garbage collector and its storage reclaimed.

With the JDK 1.1.7, you can disable garbage collection of classes with the -noclassgc option to the Java VM. If you use this option, you risk running out of heap storage as the query expression classes are accumulated over time and the -noclassgc option prevents them from being reclaimed.

Manipulating Indexes Outside the Query Facility

You can use the IndexMap interface to directly access and manipulate indexes outside the query facility. This interface is useful when you want a sorted result set and you can represent the query as a single range expression on an indexed member. Instead of running a query, you can iterate over the index directly. See ObjectStore Java API Reference, COM.odi.util.IndexMap.

Storing Objects as Keys in Persistent Hash Tables

The COM.odi.util.OSHashtable class introduces a new requirement for classes of objects that will be stored as keys in persistent collections: these classes must provide a suitable hashCode() method. ObjectStore and the class file postprocessor provide facilities for doing this conveniently.

This section discusses the following topics:

Requirements for Hash Code Methods

Objects that are stored as keys in persistent hash tables must provide hash codes that remain the same across transactions. ObjectStore can create a new transient Java object in each transaction to represent a particular persistent object, so it is important that the hashCode() method used for persistent objects return the same hash code for these different transient objects.

The default Object.hashCode() method supplies an identity-based hash code. This identity hash code might depend on the virtual memory address or some internal implementation-level metadata associated with the object. Such a hash code is unsuitable for use in a persistent identity-based hash table because it would effectively be different each time an object was fetched in a transaction.

Providing an Appropriate Persistent Hash Code Method

In cases where a persistence-capable class does not override the hashCode() method it inherits from Object, the class file postprocessor arranges for the class to implement a hashCode() method suitable for storing instances in persistent hash tables. It does this by adding an int field to the class. This field is initialized to an appropriate hash code when an instance is created and returns the value stored in the field from its hashCode() method. This hash code value is guaranteed to remain unchanged for the lifetime of the object.

Applications need to provide their own hashCode() methods for classes that define equals() methods that depend on the contents of instances rather than on object identity. If the equals() method just uses the == operator to compare the argument with this (or inherits Object.equals()), then it is identity-based and the hashCode() method provided by the class file postprocessor is appropriate. If the equals() method compares the contents of the objects, then it is contents-based and your application must supply a hashCode() method that returns the same hash code value for all objects whose contents make them return true when compared with the equals() method.

If an application does not need to store instances of a particular persistence-capable class as keys in a persistent hash table, there is no special requirement for that class's hashCode() method. In this case, to avoid making all your instances one word larger, have the class define or inherit a hashCode() method that calls the superclass's hashCode() method:

public int hashCode() { return super.hashCode(); } 
Doing this ensures that the hashCode() method inherited from Object will be used, which returns a hash code that can be used only in a nonpersistent context.

Storing Built-In Types as Keys in Persistent Hash Tables

You can use the following built-in Java types as OSHashtable keys without overriding the hashCode() method:

There is no way to override the hashCode() method for arrays. Therefore, do not use Java arrays as keys in persistent hash tables. You can, however, define a class that stores the array as a field and provides an appropriate hashCode() method.

Java wrapper classes work nicely as keys because their hashCode() methods are based on the value of the object rather than its address.

Using Third-Party Collections Libraries

You can use a third-party Java collections library with ObjectStore. The advantages of doing so are that it might have features that you need or you might be familiar with how to use it. The disadvantage is that it might not scale to the degree that you need.

One third-party library you can use is Doug Lea's collections library. An example of using this is in the collection subdirectory of the ObjectStore demo directory.



[previous] [next]

Copyright © 1998 Object Design, Inc. All rights reserved.

Updated: 10/07/98 08:45:51