Understanding Dates

By Art Greenhaus

There are many ways to specify a date in the data processing world. Understanding the differences, and how to convert between them, makes date processing much easier. This article will touch the surface of what's out there, and how the FOCUS language handles them. I'm making the assumption that, in whatever format a date may be stored, it was stored as month number, not the literal month value (no JAN or FEB in the raw data).

In the beginning, there was no “native” date format. By that I mean that there was no format that was intrinsically understood to be a date. A date was represented as either a number or character string. The value of either was “understood” by the accessing program to be a date, and was handled accordingly. These are referred to as “legacy” dates.

For example, a VSAM file might store the date January 30, 1997 as 01301997, (eight characters in which the first two were the month number, the next two characters were the day of the month, and the last four were the year). This took 8 bytes of storage per date. It also made it easy to break the date into components. To save storage (back in the days when storage was expensive), it might be stored as 0103097, with the century digits understood. This is what caused the entire Y2K crisis, when the century digits changed, and a two digit year could no longer be assumed to have '19' as the century digits. For sorting purposes, dates were usually stored in a YEAR/MONTH/DAY format, so sorting on the value put records in the correct chronological order. If, when looking at the raw data, the date was recognizable, then the date was probably stored as a character string.

When a date is stored as a character string, it's described with an ACTUAL of A, followed by the number of bytes needed. For a value of 01301997, the ACTUAL would by A8. The USAGE (before there were DATE-type formats) would by A8MDYY, where the M indicates that the two-digit month came first, followed by a two digit day (the D in the ACTUAL), followed by a four digit year (the 'YY' in the ACTUAL). In cases of a two-digit year, there would only be a single Y. Note that the MDYY are only EDIT OPTIONS on the output display, indicating that slashes should separate the components. No data validation is done.

By identifying the position and order of the components, we can also translate the month to the literal value it represents (01=JAN, 02=FEB, …). This is done with the T edit option. On display, the month component of the date is translated to its upper case, three-character representation, with blanks to separate the components. Thus, a USAGE of A8MTDYY would produce the following:

JAN 30 2007

Another alternative was to store the date as a number. Since there were no decimal places, it was typically stored as an integer, which took 4 bytes of storage. While it made breaking up a date into its components more difficult, it took less storage. The ACTUAL would be I4. The USAGE would usually be something like I8MDYY. All the formatting rules for alphas, as specified above, also apply. Again, the date formatting specifications in the USAGE are only EDIT options, controlling the display of output. As a number, you can mathematically change the value with addition or any other arithmetic operation. The result may not be a VALID date, but it will still display with slash separators (for untranslated months), and space separators for translate month. Any invalid month will not translate, but will display the original value.

One problem with alpha or integer dates is sorting. While storing a date in year-month-day format is acceptable when sorting, most people like to see dates as month-day-year or day-month-year. If the dates are month-day-year, sorting them gives APR, AUG, DEC, FEB, … (alphabetic sequence). Not usually what's desired.

With the introduction of date formats in the FOCUS language, dates got smarter. A date format is a USAGE with no data type, just the date components (like MDYY). As a date-formatted field, the components were validated (no month 13 or day 32). Internally, the date was converted to an offset from a base date (Dec. 31, 1900), so Jan. 1, 1901 had an offset of 1. Sorting this field chronologically is trivial, since we're sorting the offsets, not the date. Like before, the date components specify how to translate the values on output, so sorting, where the date components are month-day-year, gives JAN, FEB, MAR, … (chronological). And, since (internally) we're using offsets, to find the prior date, just subtract 1; to find the difference between two dates, just subtract one from the other.

With 'date' formats, we have even more editing options on output. You can:

  1. Specify the separator character (M-D-YY puts in dashes as in 01-30-1997)
  2. Translate the month abbreviation to mixed case (MtDYY gives Jan 30, 1997)
  3. Translate the month to the FULL month name (MTRDYY gives JANUARY 30, 1997)
  4. Translate the month to the FULL mixed case month name (MtrDYY gives January 30, 1997)
  5. Show the quarter (Q) or the date of the Week (W or WTr)

 

Many relational systems have an intrinsic date and/or date-time format. To indicate that the data being provided is a real date format, the ACTUAL is specified as DATE for a date field (or the date component of a date-time field).

A more granular option for handling dates is the date-time format. This is a common format in relational systems, even for storing ONLY dates (the time portion gets the default value of midnight). For a date-time field, the ACTUAL is given as 'Hxx', where 'xx' is 8, 10 or 12, depending on the detail to which the date-time field goes. The USAGE, like for DATE formats, indicates how the date and/or time are displayed. It is made up of a date component and/or a time component, or both.

The date component of a date-time USAGE is made up of a leading H, followed by Y[Y], M and D for 2- or 4-digit year, month and day, optionally followed by the separator for the date components (if none is specified, then slashes [/]are used). Valid separators are period (.), hyphen (-), blank (B) or none (N). Thus, the following are valid:

HYYMD         1997/01/30
HYYMD-        1997-01-30
HMDYB         01 30 07
HYMDN         070103

The time component of a date-time USAGE is made up of the following:

H (hour – 2 digits)
h (hour - leading zero suppressed)
I (minutes – 2 digits)
i (minutes – leading zero suppressed)
S (seconds – 2 digits)
s (milliseconds – 3 digits after a decimal point)
m (microseconds – 3 additional digits after a decimal point)
A (AM/PM indicator)
a (am/pm indicator)
Z (24 hour display)

When you specify a date format, you only need specify the lowest qualifier for the time component.

For example, a valid date-time format might be HMDYYm, which produces the following:

01/30/1997 12:34:56.000000

However, when you specify only a time format, what you specify is what you get. For example:

HSsm      produces  56.000000

The final point to raise about dates and date-times is record selection. Selection on a legacy date should be done on the underlying field value. If the field is numeric, select as a number (no display option); if alpha, select as an alpha string enclosed in single quotes.

For date formats, virtually anything is acceptable. All the following work:

WHERE DATE EQ '970130'
WHERE DATE EQ '97/01/30'
WHERE DATE EQ '19970130'
WHERE DATE EQ 970130
IF DATE EQ '970130'
IF DATE EQ '97/01/30'
IF DATE EQ '19970130'
IF DATE EQ 970130

The value is assumed to be a date, with the components specified if the order given via the setting of DATEFORMAT.

Testing on a date-time field is a bit more involved. For an IF test, the date-time literal must be enclosed in single quotes. For a WHERE test, the literal must be specified as the argument of the DT function, without quotes. For example, all the following are valid:

WHERE DATE EQ DT(970130)
WHERE DATE EQ DT(97/01/30)
WHERE DATE EQ DT(19970130)
IF DATE EQ '970130'
IF DATE EQ '97/01/30'
IF DATE EQ '19970130'

Because of the differences in how dates and date-times are handled, manipulations are also done differently. Many user subroutines have been written to manipulate legacy dates, as they are not ‘true’ dates, but merely representations. Some examples are:

AYMD      adding or subtracting days from a date
CHGDAT         changing how a legacy date displays
DOWK              finding the day of the week

For true ‘date’ format dates, some operations no longer require a routine. For example, to add a number of days to a 'true' date, one need only add, as the increment is added to the internal offset. There are, however, routines for 'true' dates. These routines begin with the characters DATE. For example:

DATEADD       adding or subtracting a date unit to or from a date
DATECVT       converting the format of a date
DATEMOV       moving a date to a significant point

And, for completeness, there are routines to manipulate date-time fields. These begin with the character H. For example:

HADD             incrementing a date-time value
HCNVRT         converting a date-time value to alphanumeric format
HDATE     Converting the date portion of a date-time value to a date format

Anyone using a routine should pay close attention to the following:

  1. What TYPE of date or date-time field is supported
  2. What are the arguments and what is their order
  3. What is the resultant format of the output

 

Using the wrong routine or having the wrong arguments can result in an error or, worse yet, producing the wrong result.

previous next