| Time statistics queries
(time query) |
| Absolute time queries |
| Are those that reflect the evolution of a certain variable over the time (absolute time). |
| Absolute "instantaneous" time queries |
Example:
x=time y=value
1/1/03 00:02am 10
1/1/03 02:01am 20
1/1/03 30:15am 11
1/1/03 11:01pm 12
1/2/03 10:01am 10
1/2/03 11:01am 05
1/2/03 12:05am 04
1/2/03 02:01pm 01
1/2/03 03:01pm 00
1/2/03 11:01pm 10 |
| Absolute "statistical" time queries |
They are an elaboration from a "instantaneous" time query.
If the variable value is not "instantaneous" and it corresponds to a finite period of time identified by the time value, that variable value should respond to some statistical elaboration of the "instantaneous" values content inside the period.
That statistical elaboration could be the count of instantaneous values that fall into the time period (count), the average, the
maximum or minimum, etc.
This kind of queries are then histograms.
The time values for the abscissas is a label for that identified the whole period, in which the time has been divided.
The variable value in the ordinates is an statistical value derived from all the "instantaneous" values for the variable inside the given period.
How to elaborate a statistical from an instantaneous:
1) Select a criteria to define the time periods. The most common one: a whole day.
2) Select how to represent the value that identifies each period. In this case: Just the date (without the hour).
3) Select a statistical function to apply to the variable. Let's say the maximum.
4) For each period take the group of values of the variable that fall in that period and perform the
statistical function over them.
In our example
1) The period will be a day. We have 2 periods.
2) The labels will be 1/1/03 and 1/2/03.
3) The statistical function will be the max().
4) For the first period, the set of corresponding variable values are (10, 20, 11, 12). From there the statistic value is Y=max(10, 20, 11, 12)=20.
So we have our first pair of values for the derived value table: X=1/1/03, Y=20
Proceeding in the same way we have our new value table
X=period Y=max of y in the period
1/1/03 20
1/2/03 10
In abstract SQL:
SELECT OnlyDate(x) as X, Max(y) as Y GROUP BY OnlyDate(x) ORDER BY OnlyDate(x)
Of course this SQL expression doesn't correspond with SQL dialects in the real world:
1) The OnlyDate() function or something like it doesn't necessarily exist.
2) The label of the derived values (X, Y) can be syntactically erroneous if the dialect make no difference between lower and upper case (since the original value labels are (x, y)).
For instance, in MS Access, both considerations apply.
1) The lack of proper functions for time variables can be solved using the useful format():
format(x, "yyyymmdd")
2) The no case sensitivity can be solved just choosing other names for the new labels.
In MS Jet SQL, a way is:
SELECT format(x, "yyyymmdd") as Xp, Max(y) as Yp GROUP BY format(x, "yyyymmdd") ORDER BY format(x, "yyyymmdd")
The problem is that we wnat the query to return a value for the date period already formated to be human readable, the one return by the format() is not the right one.
We have additionally to create another field (or column) that we must obtain from one of the GROUPing expressions or must be a
statistical expression (agregate function) itself.
The straight way to attack this is a brute force string elaboration of the grouping field format():
mid(format(x, "yyyymmdd"), 3, 2) & "/" & right(format(x, "yyyymmdd"),2) & "/" & left(format(x, "yyyymmdd"), 4)
This give us a human readable date in English notation: mm/dd/yyyy
The inconvenient is that it is not sensible to the international settings of the operative system where the query is running, so the format will be always English, even it is
running in a non-English international set machine.
This is because the returned value is a string and not a date. If type is date, in Windows systems the date will be displayed in the right international formatting.
A workaround that is make the human readable value of date type using the MS Jet SQL function date(day, month, year) that returns a date type value:
date(right(format(x, "yyyymmdd"),2), mid(format(x, "yyyymmdd"), 3, 2), left(format(x, "yyyymmdd"), 4))
The last problem is if we have procedures to display queries as tables, that are straigt forward and display all the fields, since it will display the human non-sense field Xp.
1) We can solve it by adding extra intelligence to the procedure, by no displaying fields in the query with a certain pre-established syntax, i.e. an "_" as a prefix
2) Feeding the procedure with a second query, derived from the main one, in which only the human readable fields appear.
The chosen solution will depend in which is easier to whom has to provide it, and the
synergy between the DB and development teams.
Usually is more flexible to manipulate and create new queries than touching programming code. In this case first an agreement is necessary for the fields syntax, since a rule is introduced: fields prefixed by an "_" are hidden.
And this rule must be respected by both teams.
But adding the capability to the query displaying function in the side of the development team, won't affect the procedure behavior over conventional queries, and will add extra power to process new queries with "hidden" fields.
The most useful statistical functions.
COUNT (count the number of occurrences of y inside the period X)
AVERAGE
MAX
MIN
Also could be considered others like standard deviation, etc., these one more related with scientific processing that with the usual business processing, for what the previous are usually enough.
They could be interesting First(), Last().
|
| Relative time queries |
Relative time queries refer consider periods of time independent from its absolute position in the time line.
For example, consider just the months independently from the year they belong to.
Your value table would have X=(1,...,12) covering from January to December.
The statistical function that returns the Y value will run over all the y values that fall inside any given X relative period -in this case months- independently of the position of that relative period in the time line -in this case with independence to the year-.
In abstract SQL:
SELECT MonthOfTheYear(x) as X, Max(y) as Y GROUP BY MonthOfTheYear(x) ORDER BY MonthOfTheYear(x)
In MS Jet SQL, a way is:
SELECT month(x) as Xp, Max(y) as Yp GROUP BY month(x) ORDER BY month(x)
or
SELECT format(x, "mm") as Xp, Max(y) as Yp GROUP BY format(x, "mm") ORDER BY format(x, "mm")
The view returned by these queries is humen readable since there's no problem to identify the month by its number.
Other useful relative periods to process could be:
- the week of the year (1,...,53)
In MS Jet SQL, a way is:
SELECT format(x, "ww") as Xp, Max(y) as Yp GROUP BY format(x, "ww") ORDER BY format(x, "ww")
The problem here is that the returned view is not as human friendly as the majority would like, since a "week" is better described for it starting and ending day.
So you should introduce a field to show that. A trick to get such effect is:
SELECT format(x, "ww") as WeekOfTheYearNumber, Min(x) as InitialDateOfTheWeek, Max(x) as FinalDateOfTheWeek, Max(y) as Yp GROUP BY format(x, "ww") ORDER BY format(x, "ww")
If x includes time, it would be necessary to trim the time to present only the date portion:
SELECT format(x, "ww") as WeekOfTheYearNumber, format(Min(x), "dd/mm/yyyy") as InitialDateOfTheWeek, format(Max(x), "dd/mm/yyyy") as FinalDateOfTheWeek, Max(y) as Yp GROUP BY format(x, "ww") ORDER BY format(x, "ww")
Or to show the week by its extreme dates in an only field, we can compose a string:
SELECT format(x, "ww") as WeekOfTheYearNumber, format(Min(x), "dd/mm/yyyy") & "-" & format(Max(x), "dd/mm/yyyy") as Week, Max(y) as Yp GROUP BY format(x, "ww") ORDER BY format(x, "ww")
If you do not have a function like format(x, "ww") to calculate the week of the year, you should to rely in some calculations from the functions your SQL provides.
- the week of the month (1,...,4)
In MS Jet SQL, you do not have a straight function that returns the week of the month from a date but:
WeekOfTheMonthNumber = Int(Day(x) / 7 + 1)
or
WeekOfTheMonthNumber = Int(Val(format(x, "dd")) / 7 + 1)
Using that:
SELECT Int(Day(x) / 7 + 1) as Xp, Max(y) as Yp GROUP BY Int(Day(x) / 7 + 1) ORDER BY Int(Day(x) / 7 + 1)
The problem here is that the returned view is not as human friendly as the majority would like, since a "week" is better described for it starting and ending day.
So you should introduce a field to show that. A trick to get such effect is:
SELECT Int(Day(x) / 7 + 1) as WeekOfTheMonthNumber, Min(x) as InitialDateOfTheWeek, Max(x) as FinalDateOfTheWeek, Max(y) as Yp GROUP BY Int(Day(x) / 7 + 1) ORDER BY Int(Day(x) / 7 + 1)
If x includes time, it would be necessary to trim the time to present only the date portion:
SELECT Int(Day(x) / 7 + 1) as WeekOfTheMonthNumber, format(Min(x), "dd/mm/yyyy") as InitialDateOfTheWeek, format(Max(x), "dd/mm/yyyy") as FinalDateOfTheWeek, Max(y) as Yp GROUP BY Int(Day(x) / 7 + 1) ORDER BY Int(Day(x) / 7 + 1)
Or to show the week by its extreme dates in an only field, we can compose a string:
SELECT format(x, "ww") as WeekOfTheYearNumber, format(Min(x), "dd/mm/yyyy") & "-" & format(Max(x), "dd/mm/yyyy") as Week, Max(y) as Yp GROUP BY format(x, "ww") ORDER BY format(x, "ww")
SELECT Int(Day(x) / 7 + 1) as WeekOfTheMonthNumber, format(Min(x), "dd/mm/yyyy") & "-" & format(Max(x), "dd/mm/yyyy") as Week, Max(y) as Yp GROUP BY Int(Day(x) / 7 + 1) ORDER BY Int(Day(x) / 7 + 1)
- the day of the year (1,...,365 or 366)
In MS Jet SQL, a way is:
SELECT format(x, "y") as DayOfTheYear, Max(y) as Yp GROUP BY format(x, "y") ORDER BY format(x, "y")
- the day of the month (1,...,31)
In MS Jet SQL, a way is:
SELECT format(x, "dd") as DayOfTheYear, Max(y) as Yp GROUP BY format(x, "dd") ORDER BY format(x, "dd")
or
SELECT Day(x) as DayOfTheYear, Max(y) as Yp GROUP BY Day(x) ORDER BY Day(x)
- the day of the week (sun,...,sat) or (1,...,7), to identify week patterns
In MS Jet SQL, a way is:
SELECT format(x, "w") as DayOfTheYear, Max(y) as Yp GROUP BY format(x, "w") ORDER BY format(x, "w")
- the hour of the day (12am,...,11pm) or (0,...,23); the minute of the hour; the second of the minute; etc.
- any composition of the previous (the day of the week and hour of the day, etc).
You can compose them with the format() function when you can, or just by making a string
|
NOTES:
- In MS Jet SQL we strongly relay on the power of the format() function to perform easily date-time calculations.
Without this functions, some calculations as week of the month or day of the week, will take a relatively complex procedure to obtain the data from just the absolute date.
- Note that in the GROUPing queries all the fields should be a Statistical TOTAL function from fields in the original tables or queries, a
composition of such Statistics, or a GROUPing field or a composition from them. |
|