|
XFOCUS Database and Multidimensional Index (MDI) Target Data Warehousing Applications
The XFOCUS database (NF916) offers parallel functionality to the FOCUS database, with vastly improved performance and capacities. Conceived as an iWay server repository for cross-platform transfers of XML documents, the XFOCUS structure is equally suited to data warehousing applications. Its 16-kilobyte pages hold four times as much as standard FOCUS database pages and, with partitioning, you can accommodate four terabytes in a logical database.
Experienced FOCUS users will find the implementation and use of XFOCUS very intuitive; there is only one new file suffix (XFOCUS) and a SET command for allocating memory (XFOCUSBINS). But before we go further, please note that XFOCUS and MDI are present but not activated in the initial 7.3 release. Information Builders plans to turn these features on in a Service Pack this spring.
To create an XFOCUS data source, either issue a HOLD FORMAT XFOCUS command in your request or write a master file specifying SUFFIX=XFOCUS and then issue CREATE FILE.
Conversion of existing applications from FOCUS to XFOCUS data structures
is almost transparent. Simply apply the REBUILD
utility. Existing applications should then run normally, as commands
used with FOCUS files work the same on XFOCUS. The initial implementation
of XFOCUS is specifically oriented toward data warehouse-type retrieval
applications rather than maintenance transaction processing, so it
will not support Maintain, HLI, SU, or other transaction-oriented
features.
MDI Expediting Retrievals From XFOCUS
The Multidimensional Index (MDI) capabilities (NF987)
were ported from FUSION to expedite XFOCUS (and FOCUS) retrievals and
minimize both execution time and operating overhead incurred in large-scale
applications. MDI is a retrieval-only technique designed to address
business analysts’ requests for specific facts about multiple categories
of data in a data source, such as "regions," "departments," etc. Unlike
B-tree indexing that employs keys in the data as the basis for processing
transactions, MDI expedites retrievals through specified data values
called "measures" in key data fields.
A Multidimensional Index gives complex queries high-speed access to combinations of dimensions across data sources. If you know what information users want to retrieve and why, you can make intelligent choices about index dimensions. With MDI on, field values specified in requests (e.g., REGION EQ ‘EAST’, STATE EQ ‘DC’) create a virtual index cube of field values that intersect at "measures of interest." With data points along each axis of the cube selected, the points of convergence within the cube instantly identify specific records of interest and their pointers point directly to the underlying data. Compare this to a non-MDI situation, in which if we just indexed CAR and COUNTRY, our TABLE would use the index on COUNTRY and then would have to search the rest sequentially through CAR; this is where the difference lies. The more dimensions mentioned in your query the faster the MDI should perform.
An obvious key to using MDI effectively is knowing the data well and what you need from it. One FUSION customer reported seeing a search shrink from several hours to three minutes.
MDI Implementation
To create and use an MDI, issue a SET MDIFEATURE=ON command in your FOCPARM ERRORS file and then specify its attributes in an ACCESS file for that data source.
You specify all MDI attributes in the Access File for the data source and you can partition the MDI into multiple MDI files. FOCUS uses the ACCESS file when performing retrieval analysis for each TABLE request. If the source file on which an MDI is built is partitioned, then each MDI partition spans all partitions of the underlying data source.
Choosing Dimensions for Your Index
In selecting index dimensions, evaluate your data and what you need from it. Start by examining the record selection (IF and WHERE) tests in queries to see how many index dimensions each application uses in selecting records. If different applications need different subsets of dimensions, you may want to build separate indexes for the subsets. Although FOCUS can produce high-speed reporting performance with indexes of up to 30 dimensions, smaller indexes usually generate less retrieval overhead.
The following criteria indicate good candidates for MDI dimensions:
 |
Fields used frequently in record selection tests. Multiple fields used in selection tests within one query belong in the same MDI. |
 |
Fields used as the basis for horizontal partitioning, such as dates or regions. |
 |
Derived dimensions (DEFINE fields) defining
new categories based on existing categories; if your source contains
STATE but you need regions for your
analysis, you could use the STATE fields
to derive the REGION dimensions. Dynamic
DEFINE fields cannot be used. |
 |
Fields with many unique values. Fields with few possible values are normally poor candidates. However, you might index such a field if the source contained very few instances of one value of interest. |
Building and Maintaining a Multidimensional Index
You issue a REBUILD command
to create or maintain a Multidimensional Index. The best MDIs are built
by specifying the dimensions in descending order of best cardinality
(the most distinct values first). When issued in a FOCUS session, the
REBUILD command conducts a dialogue with the
user to construct the MDI. When issued in a FOCEXEC, you must place
the REBUILD command and user-supplied portion
of the REBUILD dialogue in the stored procedure.
Please refer to the detailed feature documentation for information about
the process and structure and capacities.
If your MDI file might exceed two gigabytes or if you are planning to add data partitions, you must partition the MDI file from the outset. Once created, you can use them only in retrieval requests. You cannot use an MDI for modifying the data source.
Joining to a Multidimensional Index
Joining to an MDI uses the power of the MDI to produce a fast, efficient join. Instead of joining one field in the source file to an indexed field in the target file, you can join to multiple dimensions of the MDI. When the join is implemented, an answer set is created from the source file and the source field values retrieved serve as index values for the corresponding dimensions in the target MDI.
You can join to an MDI in two ways:
 |
JOIN to all dimensions of a named MDI (MDI JOIN). In an MDI JOIN, you pick the >MDI to use in the JOIN. |
 |
Join certain dimensions in a dimensional JOIN. In this case, the JOIN engine picks the most efficient MDI based on the dimensions you chose. The dimensional join supports partial JOINs. |
Choosing Source Fields
Source fields must have a one-to-one correspondence with the target dimensions. The MDI engine uses the source field values to get pointers to the target segment of the MDI, expediting data retrieval from the target file. Think of source fields as mirror dimensions. By imposing tighter constraints on the mirror dimensions, you can narrow the answer set retrieved from the source file, and reduce the index I/Os needed to locate the target records. Since the speed of the JOIN dramatically improves retrieval of the source file answer set, you can expedite any TABLE request against the source file by incorporating selection criteria on fields participating in either a B-tree or MDI index.
|