Search This Blog

Friday 30 October 2009

Intervals intersection in SQL

An interesting problem, elegantly solved with the use of SQL language, is the calculation of intersections between sets of intervals.

An example of this problem is arranging a meeting for a group of people, each person lists his availabilities specifying a set of time intervals

We need to find out the time slots in which all the persons in the group are able to participate in that meeting.

Ex.:

PERSONS
    1   3   5         10    12             20   21
A   ----    ----------------                ----
B     --------        ------
C         ----------------------

the solution is the following set of intervals
{[5,6], [10,12]}


Let's create the table TIMESLOTS (uid, start, end) whose rows contain for each person, identified by uid, timeslots which fit. They are defined by the start and end column.
The table in our example would be populated as follows
person start end
A 1 3
A 5 12
A 20 21
B 2 6
B 10 13
C 4 14

Assuming that in the table TIMESLOTS for each person timeslots are disjoint, that is they don't overlap (this is a non-restrictive assumption, in general) we can solve the problem of the intersection by mean of the following considerations:

the solution contains intervals whose delimiters are delimiters of some intervals in TIMESLOTS: actually, for example, the solution interval [5, 6] is delimited by the point 5, being a delimiter for the interval [5, 12] belonging to the A person, and by the point 6, being a delimiter for the interval [2, 6] belonging to the person B.

Considering the Cartesian product of the table TIMESLOTS, we get couples of the form ([fst.start, fst.end], [lst.start, lst.end]) representing all the possible combinations of delimiters for the intervals included in the table. The solution will then contain intervals of the form [fst.start, lst.end]. Among all these couples, those that concern us are the ones for which it holds fst.start < lst.end, namely the left delimiter is less than the right one, and for which every person has a suitable time availability, that is for every person it exists a "covering" interval.

This idea can be translated in the following SQL query:

SELECT DISTINCT fst.start, lst.end
FROM timeslots fst, timeslots lst
WHERE fst.start < lst.end
and
( -- number of intervals covering [fst.start, lst.end]
SELECT COUNT(*)
FROM timeslots t
WHERE t.start <= fst.start AND t.end >= lst.end
) =
( -- number of persons in the table
SELECT COUNT(DISTINCT uid)
FROM timeslots
)


the hypothesis that the intervals of a single person must be separated, in TIMESLOTS, is justified by the subquery

SELECT COUNT(*)
FROM timeslots t
WHERE t.start <= fst.start AND t.end >= lst.end


that only under such circumstances makes it possible to count the number of people who have time availability compatible with the time interval [fst.start, lst.end].

Bibliography: Vadim Tropashko, "SQL Design Patterns: Expert Guide to SQL Programming", Rampant Techpress

No comments:

Post a Comment