I use PostgreSQL 9.1 on Ubuntu 12.04.
I need to select records inside a range of time: my table time_limits
has two timestamp
fields and one integer
property. There are additional columns in my actual table that are not involved with this query.
create table (
start_date_time timestamp,
end_date_time timestamp,
id_phi integer,
primary key(start_date_time, end_date_time,id_phi);
This table contains roughly 2M records.
Queries like the following took enormous amounts of time:
select * from time_limits as t
where t.id_phi=0
and t.start_date_time <= timestamp'2010-08-08 00:00:00'
and t.end_date_time >= timestamp'2010-08-08 00:05:00';
So I tried adding another index - the inverse of the PK:
create index idx_inversed on time_limits(id_phi, start_date_time, end_date_time);
I got the impression that performance improved: The time for accessing records in the middle of the table seems to be more reasonable: somewhere between 40 and 90 seconds.
But it's still several tens of seconds for values in the middle of the time range. And twice more when targeting the end of the table (chronologically speaking).
I tried explain analyze
for the first time to get this query plan:
Bitmap Heap Scan on time_limits (cost=4730.38..22465.32 rows=62682 width=36) (actual time=44.446..44.446 rows=0 loops=1)
Recheck Cond: ((id_phi = 0) AND (start_date_time <= '2011-08-08 00:00:00'::timestamp without time zone) AND (end_date_time >= '2011-08-08 00:05:00'::timestamp without time zone))
-> Bitmap Index Scan on idx_time_limits_phi_start_end (cost=0.00..4714.71 rows=62682 width=0) (actual time=44.437..44.437 rows=0 loops=1)
Index Cond: ((id_phi = 0) AND (start_date_time <= '2011-08-08 00:00:00'::timestamp without time zone) AND (end_date_time >= '2011-08-08 00:05:00'::timestamp without time zone))
Total runtime: 44.507 ms
See the results on depesz.com.
What could I do to optimize the search? You can see all the time is spent scanning the two timestamps columns once id_phi
is set to 0
. And I don't understand the big scan (60K rows!) on the timestamps. Aren't they indexed by the primary key and idx_inversed
I added?
Should I change from timestamp types to something else?
I have read a little about GIST and GIN indexes. I gather they can be more efficient on certain conditions for custom types. Is it a viable option for my use case?