Keyed Fields and Indexes


When you define the field of a relation in THOR, one of the things you can specify about the field is whether or not it is a keyed field. Any number of fields in a relation may be defined as keyed fields as long as the field is not a binary or variable length type field. Keys provide ordered access to data. Fields are keyed if it is expected that an application will search using that field or if that field is used to join its relation to another relation.

There are two types of keys: primary and alternate. A primary keyed field is a field that does not allow duplicate entries in its sorted index. This field uniquely identifies each record of a relation, such as the social security number of an employee. Any number of keys in a relation may be defined as primary keys. If a unique key is not available in a relation, or it requires that an undesirable number of fields be concatenated, Sage.GenerateUniqueKey may be used to create a unique number for a particular relation. This unique value may then be used as a key value.

An alternate keyed field does allow duplicate entries in its sorted index. However, no assumptions should be made concerning the relative order of those keys that are duplicates. While a social security number is frequently used as a primary key, NAME (last name, first name, middle initial) is a common alternate key. More than one person may have the same NAME, and there is no guarantee in what order persons having the same NAME will appear, but all NAMEs will be in alphabetical order. If you wish to always maintain the relative order or records having the same alternate key, you must define a keyed field that will always be sorted in the desired order (see figure 1).

Figure 1: Sorting Sample for Primary and Alternate keys

For each keyed field in a relation (whether primary or alternate), a sorted index is created and maintained. This index is stored in the index file for that relation. The index file contains a sorted index for each key in the relation: all of the indexes required for a relation are contained on one file.

The index file stores its sorted indexes using the B* tree (pronounced B star tree) algorithm. This is an n-ary storage methodology that has the particularly desirable feature of being self-balancing. This means that it does not require a periodic and time consuming reorganization step to optimize empty or unbalanced structures since the B* tree indexes maintain themselves in a balanced form and at a level between 70 and 99% full. It is also much faster for searching when compared to such common indexing methods as binary trees.

When a field is defined as being keyed in THOR, two parameters may be specified by the developer. These are the number of keys per node, and the number of nodes in core (memory -- remember, SAGE's roots date back to the early 1980's). For most cases, the default values (20 keys/node and 5 nodes/core) should be used. Changes to these values from their default cases should only be made with a full understanding of the index tree structure and when further optimization of storage and searching is required for a unique application (see figure 2).

Figure 2: Primary/Alternate Key Definition

Should a relation's files become damaged, it is often sufficient to simply rebuild the index file using the REBUILD program or ReBuildIndex procedure. Since a relation's records are normally only accessed through keyed fields, only valid records will be accessible after a new index file is created. REBUILD also has the option of rebuilding all files, if this becomes necessary. SUBTOPIC

Database Searching Via Keyed Fields Go Back To Sage-ST TABLE OF CONTENTS


warren.merrill@inl.gov , ftp://sage.inel.gov
Copyright © 1989-2006. Battelle Energy Alliance