Table of Contents

There are many types of indexes in Postgres, as well as different ways to use them. In this article we give an overview of the types of indexes available, and explain different ways of using and maintaining the most common index type: B-Trees.

An index is a way to efficiently retrieve a relatively small number of rows from a table. It is only useful if the number of rows to be retrieved from a table is relatively small (i.e. the condition for retrieving rows - the WHERE clause - is selective). B-Tree indexes are also useful for avoiding sorting.

Index Types

Postgres supports many different index types:

  • B-Tree is the default that you get when you do CREATE INDEX. Virtually all databases will have some B-tree indexes. The B stands for Balanced, and the idea is that the amount of data on both sides of the tree is roughly the same. Therefore the number of levels that must be traversed to find rows is always in the same ballpark. B-Tree indexes can be used for equality and range queries efficiently. They can operate against all datatypes, and can also be used to retrieve NULL values. Btrees are designed to work very well with caching, even when only partially cached.
  • Hash Indexes are only useful for equality comparisons, but you pretty much never want to use them since they are not transaction safe, need to be manually rebuilt after crashes, and are not replicated to followers, so the advantage over using a B-Tree is rather small.
  • Generalized Inverted Indexes (GIN) are useful when an index must map many values to one row, whereas B-Tree indexes are optimized for when a row has a single key value. GINs are good for indexing array values as well as for implementing full-text search.
  • Generalized Search Tree (GiST) indexes allow you to build general balanced tree structures, and can be used for operations beyond equality and range comparisons. They are used to index the geometric data types, as well as full-text search.

This article is about how to get the most out of default B-Tree indexes. For examples of GIN and GiST index usage, refer to the contrib packages.

Why is my query not using an index?

There are many reasons why the Postgres planner may choose to not use an index. Most of the time, the planner chooses correctly, even if it isn’t obvious why. It’s okay if the same query uses an index scan on some occasions but not others. The number of rows retrieved from the table may vary based on the particular constant values the query retrieves. So, for example, it might be correct for the query planner to use an index for the query select * from foo where bar = 1, and yet not use one for the query select * from foo where bar = 2, if there happened to be far more rows with “bar” values of 2. When this happens, a sequential scan is actually most likely much faster than an index scan, so the query planner has in fact correctly judged that the cost of performing the query that way is lower.

Partial Indexes

A partial index covers just a subset of a table’s data. It is an index with a WHERE clause. The idea is to increase the efficiency of the index by reducing its size. A smaller index takes less storage, is easier to maintain, and is faster to scan.

For example, suppose you allow users to flag comments on your site, which in turn sets the flagged boolean to true. You then process flagged comments in batches. You may want to create an index like so:

CREATE INDEX articles_flagged_created_at_index ON articles(created_at) WHERE flagged IS TRUE;

This index will remain fairly small, and can also be used along other indexes on the more complex queries that may require it.

Expression Indexes

Expression indexes are useful for queries that match on some function or modification of your data. Postgres allows you to index the result of that function so that searches become as efficient as searching by raw data values. For example, you may require users to store their email addresses for signing in, but you want case insensitive authentication. In that case it’s possible to store the email address as is, but do searches on WHERE lower(email) = '<lowercased-email>'. The only way to use an index in such a query is with an expression index like so:

CREATE INDEX users_lower_email ON users(lower(email));

Another common example is for finding rows for a given date, where we’ve stored timestamps in a datetime field but want to find them by a date casted value. An index like CREATE INDEX articles_day ON articles ( date(published_at) ) can be used by a query containingWHERE date(articles.created_at) = date('2011-03-07').

Unique Indexes

A unique index guarantees that the table won’t have more than one row with the same value. It’s advantageous to create unique indexes for two reasons: data integrity and performance. Lookups on a unique index are generally very fast.

In terms of data integrity, using a validates_uniqueness_of validation on an ActiveModel class does not really guarantee uniqueness because there can and will be concurrent users creating invalid records. Therefore you should always create the constraint at the database level - either with an index or a unique constraint.

There is little distinction between unique indexes and unique constraints. Unique indexes can be though of as lower level, since expression indexes and partial indexes cannot be created as unique constraints. Even partial unique indexes on expressions are possible.

Multi-column Indexes

While Postgres has the ability to create multi-column indexes, it’s important to understand when it makes sense to do so. The Postgres query planner has the ability to combine and use multiple single-column indexes in a multi-column query by performing a bitmap index scan. In general, you can create an index on every column that covers query conditions and in most cases Postgres will use them, so make sure to benchmark and justify the creation of a multi-column index before you create them. As always, indexes come with a cost, and multi-column indexes can only optimize the queries that reference the columns in the index in the same order, while multiple single column indexes provide performance improvements to a larger number of queries.

However there are cases where a multi-column index clearly makes sense. An index on columns (a, b) can be used by queries containing WHERE a = x AND b = y, or queries using WHERE a = xonly, but will not be used by a query using WHERE b = y. So if this matches the query patterns of your application, the multi-column index approach is worth considering. Also note that in this case creating an index on a alone would be redundant.

B-Trees and sorting

B-Tree index entries are sorted in ascending order by default. In some cases it makes sense to supply a different sort order for an index. Take the case when you’re showing a paginated list of articles, sorted by most recent published first. We may have a published_at column on our articles table. For unpublished articles, the published_atvalue is NULL.

In this case we can create an index like so:

CREATE INDEX articles_published_at_index ON articles(published_at DESC NULLS LAST);

In Postgres 9.2 and above, it's of note that indexes are not always required to go to the table, provided we can get everything needed from the index (i.e. no unindexed columns are of interest). This feature is called "Index-only scans".

Since we will be querying the table in sorted order by published_atand limiting the result, we may get some benefit out of creating an index in the same order. Postgres will find the rows it needs from the index in the correct order, and then go to the data blocks to retrieve the data. If the index wasn’t sorted, there’s a good chance that Postgres would read the data blocks sequentially and sort the results.

This technique is mostly relevant with single column indexes when you require “nulls to sort last” behavior, because otherwise the order is already available since an index can be scanned in any direction. It becomes even more relevant when used against a multi-column index when a query requests a mixed sort order, like a ASC, b DESC.

Managing and Maintaining indexes

Indexes in Postgres do not hold all row data. Even when an index is used in a query and matching rows where found, Postgres will go to disk to fetch the row data. Additionally, row visibility information (discussed in the MVCC article) is not stored on the index either, therefore Postgres must also go to disk to fetch that information.

Having that in mind, you can see how in some cases using an index doesn’t really make sense. An index must be selective enough to reduce the number of disk lookups for it to be worth it. For example, a primary key lookup with a big enough table makes good use of an index: instead of sequentially scanning the table matching the query conditions, Postgres is able to find the targeted rows in an index, and then fetch them from disk selectively. For very small tables, for example a cities lookup table, an index may be undesirable, even if you search by city name. In that case, Postgres may decide to ignore the index in favor of a sequential scan. Postgres will decide to perform a sequential scan on any query that will hit a significant portion of a table. If you do have an index on that column, it will be a dead index that’s never used - and indexes are not free: they come at a cost in terms of storage and maintenance.

For more on running production, staging, and other environments for your Heroku application, take a look at ourManaging Multiple Environments article.

When tuning a query and understanding what indexes make the most sense, never try to it on your development machine. Whether an index is used or not depends on a number of factors, including the Postgres server configuration, the data in the table, the index and the query. For instance, trying to make a query use an index on your development machine with a small subset of “test data” will be frustrating: Postgres will determine that the dataset is so small that it’s not worth the overhead of reading through the index and then fetching the data from disk. Random I/O is much slower than sequential, so the cost of a sequential scan is lower than that of the random I/O introduced by reading the index and selectively finding the data on disk. Performing index tuning should be done on production, or on a staging environment that is as close to production as possible. On the Heroku Postgres database platform it is possible to copy your production database to a different environment quite easily.

When you are ready to apply an index on your production database, keep in mind that creating an index locks the table against writes. For big tables that can mean your site is down for hours. Fortunately Postgres allows you to CREATE INDEX CONCURRENTLY, which will take much longer to build, but does not require a lock that blocks writes. Ordinary CREATE INDEX commands require a lock that blocks writes but not reads.

Finally, indexes will become fragmented and unoptimized after some time, especially if the rows in the table are often updated or deleted. In those cases it may be required to perform a REINDEX leaving you with a balanced and optimized index. However be cautious about reindexing big indexes as write locks are obtained on the parent table. One strategy to achieve the same result on a live site is to build an index concurrently on the same table and columns but with a different name, and then dropping the original index and renaming the new one. This procedure, while much longer, won’t require any long running locks on the live tables.

Postgres provides a lot of flexibility when it comes to creating B-tree indexes that are optimized to your specific use cases, as well as options for managing the ever-growing database behind your applications. These tips should help you keep your database healthy, and your queries snappy.



https://devcenter.heroku.com/articles/postgresql-indexes

'RDBMS (PostgreSQL)' 카테고리의 다른 글

JDBC 트랜잭션  (0) 2015.07.30
트랜잭션 인사이드  (0) 2015.07.01
PostgreSQL 조인  (0) 2015.05.13
PostgreSQL 날짜&시간 사용하기  (0) 2015.05.13
PostgreSQL 컬럼 타입들  (0) 2015.05.13

PostgreSQL Date & Time Function






해당 날짜의 데이터 

select count(watt_max) from tbl_test_watt3_sm2ch_min
where to_char(regdate, 'YYYY-MM-DD') = '2016-10-17'

소요시간 : 124초  (150만건)  (하루: 20*60*60*24 = 1,728,000)   이렇게 하면 망함!

해당 날짜의 데이터 

select count(watt) from tbl_test_watt_lsh where 
regdate >= date '2016-10-17'
and regdate < date '2016-10-17' + integer '1'   // 여기선 하루 

소요시간 : 634ms  (150만건)  (하루: 20*60*60*24 = 1,728,000)


해당 날짜의 데이터 

select count(watt) from tbl_test_watt_lsh where 
regdate >=  current_date 
and regdate < current_date + 1

소요시간 : 634ms  (150만건)  (하루: 20*60*60*24 = 1,728,000)


해당 시간의 데이터 

select count(watt) from tbl_test_watt_lsh where regdate
between  to_timestamp('2016-10-17 07:40:00' , 'YYYY-MM-DD HH24:MI:SS')  and   to_timestamp('2016-10-17 07:43:00', 'YYYY-MM-DD HH24:MI:SS')

소요시간 : 14ms  (3600건)  (하루: 20*60*60*24 = 1,728,000)


select count(watt) from tbl_test_watt_lsh where regdate
between  to_timestamp('2016-10-17 07:40:00' , 'YYYY-MM-DD HH24:MI:SS')  and   to_timestamp('2016-10-17 07:43:00', 'YYYY-MM-DD HH24:MI:SS') + interval '1'    // 여기선 1초 

소요시간 : 14ms  (3620건)


select count(watt) from tbl_test_watt_lsh where regdate
between  to_timestamp('2016-10-17 07:40:00' , 'YYYY-MM-DD HH24:MI:SS')  and   to_timestamp('2016-10-17 07:43:00', 'YYYY-MM-DD HH24:MI:SS') + interval '1' HOUR   // 여기선 1 시간  

소요시간 : 23ms  (63340건)

select count(watt) from tbl_test_watt_lsh where regdate
between  to_timestamp('2016-10-17 00:00:00' , 'YYYY-MM-DD HH24:MI:SS')  and   to_timestamp('2016-10-17 00:00:00', 'YYYY-MM-DD HH24:MI:SS') + '1-1 00:00:00' // 여기선 하루 , 인터벌 값을 이렇게 나타낼 수 있다 

'10-10' 는 10년 10개월 

소요시간 : 433ms  (1496720건)


select * from tbl_test_watt_lsh where to_char(regdate , 'YYYY-MM-DD HH24;MI:SS') > '2016-10-17 07:40:00' AND  to_char(regdate , 'YYYY-MM-DD HH24;MI:SS') < '2016-10-17 07:43:00'

역시 to_char 사용하면 망함!






http://daemonjin.tistory.com/564  펌 


간단한 날짜 관련 함수 및 날짜 연산:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- 오늘 (date)
select current_date;
 

-- 현재시각 (timestamp)
select now();
select current_timestamp;
 

-- 어제/오늘/내일
select
  current_date - 1 "어제",
  current_date     "오늘",
  current_date + 1 "내일"
;
 

-- day of week
select extract(dow from current_date);    -- 일요일(0) ~ 토요일(6)
select extract(isodow from current_date); -- 월요일(1) ~ 일요일(7)
 

-- day of year
select extract(doy from current_date);
 

-- week of year
select extract(week from current_date);
 

-- 두 날짜 사이의 날수
select '2010-07-05'::date - '2010-06-25'::date;





한 주의 첫날, 마지막 날 구하기:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

-- 주 첫 날을 월요일로 할 때 주 첫날, 마지막 날
-- date_trunc() 함수의 리턴 타입은 timestamp임
 
-- 이번 주
select
  date_trunc('week', current_date)::date          "이번 주 첫날",
  date_trunc('week', current_date)::date + 6      "이번 주 마지막 날"
;
 
-- 전 주
select
  date_trunc('week', current_date - 7)::date      "전 주 첫날",
  date_trunc('week', current_date - 7)::date + 6  "전주 마지막 날"
;
 
-- 다음 주
select
  date_trunc('week', current_date + 7)::date      "다음 주 첫날",
  date_trunc('week', current_date + 7)::date + 6  "다음주 마지막 날"
;
 
-- (주 첫 날을 일요일로 할 때) 주 첫날/마지막 날
-- week로 date_trunc를 하는 경우 결과가 월요일 날짜가 되기 때문에
-- 한 주를 일요일~토요일까지로 하는 경우는 -1 필요
 
-- 이번 주
select
  date_trunc('week', current_date)::date - 1         "이번 주 첫날",
  date_trunc('week', current_date)::date + 6 - 1     "이번 주 마지막 날"
;
 
-- 전 주
select
  date_trunc('week', current_date - 7)::date - 1     "전 주 첫날",
  date_trunc('week', current_date - 7)::date + 6 - 1 "전주 마지막 날"
;
 
-- 다음 주
select
  date_trunc('week', current_date + 7)::date - 1     "다음 주 첫날",
  date_trunc('week', current_date + 7)::date + 6 - 1 "다음주 마지막 날"
;




한 달의 첫날, 마지막 날 구하기:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

-- 한 달 전/후 날짜
select
  current_date - interval '1 months' "전 달",
  current_date + interval '1 months' "다음 달"
;
 
-- 이번 달 첫날, 마지막 날
select
  date_trunc('month', current_date)::date "첫날",
  date_trunc('month', current_date + interval '1 months')::date - 1 "마지막 날"
;
 
-- 전달 첫날, 마지막 날
select
  date_trunc('month', current_date - interval '1 months')::date "첫 날",
  date_trunc('month', current_date)::date - 1 "마지막 날"
;
 
-- 다음 달 첫날, 마지막 날
select
  date_trunc('month', current_date + interval '1 months')::date "첫 날",
  date_trunc('month', current_date + interval '2 months')::date - 1 "마지막 날"
;





이번 주 첫날부터 마지막 날까지 날짜들:


1
2
3
4
5

-- 이번 주 날짜
select
  date_trunc('week', current_date)::date -1 + i "일~토",
  date_trunc('week', current_date)::date    + i "월~일"
from generate_series(0,6) as t(i);




이번 달 첫날부터 마지막 날까지 날짜들. generate_series() 함수를 사용한다. 

한 달이 28일, 29일, 30일, 31일 중 어떤 것이 될지 알 수 없기 때문에 월의 마지막날을 구해 

generate_series()의 두번째 인수로 넣어준다.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
-- 이번 달 날짜 (첫날 ~ 마지막 날)
select date_trunc('month', current_date)::date + (i - 1)
from
  generate_series(
    1,
    extract(day from date_trunc('month', current_date + interval '1 months')::date - 1)::integer
  ) as t(i);
 
select date_trunc('month', current_date)::date + (i - 1)
from
  generate_series(
    1,
    extract(day from date_trunc('month', current_date) + interval '1 months' - interval '1 days')::integer
  ) as t(i);





week of month

이번 달의 첫날부터 마지막 날까지의 날짜와 week of month를 구하는 쿼리인데, 1일~7일까지는 첫째 주, 8

일~14일 까지는 둘째 주와 같은 식으로 된다. 역시 generate_series() 함수를 사용했는데, 위와 같이 첫 날과

마지막 날의 차를 구해 수열을 만들지 않고 0~30까지 만들어 무조건 더하면서 이번 달에 속하는 날짜만 

where 절 조건으로 추려내게 했다.


1
2
3
4
5
6
7
select dt, to_char(dt, 'W') "day of week"
from (
    select date_trunc('month', current_date)::date + i dt
    from generate_series(0, 30) as t(i)
    ) t
where extract(month from dt) = extract(month from current_date)
;




'RDBMS (PostgreSQL)' 카테고리의 다른 글

JDBC 트랜잭션  (0) 2015.07.30
트랜잭션 인사이드  (0) 2015.07.01
PostgreSQL 조인  (0) 2015.05.13
Efficient Use of PostgreSQL Indexes  (0) 2015.05.13
PostgreSQL 컬럼 타입들  (0) 2015.05.13

 Data Types

NameAliasesDescription
bigintint8signed eight-byte integer
bigserialserial8autoincrementing eight-byte integer
bit [ (n) ] fixed-length bit string
bit varying [ (n) ]varbitvariable-length bit string
booleanboollogical Boolean (true/false)
box rectangular box on a plane
bytea binary data ("byte array")
character [ (n) ]char [ (n) ]fixed-length character string
character varying [ (n) ]varchar [ (n) ]variable-length character string
cidr IPv4 or IPv6 network address
circle circle on a plane
date calendar date (year, month, day)
double precisionfloat8double precision floating-point number (8 bytes)
inet IPv4 or IPv6 host address
integerintint4signed four-byte integer
interval [ fields ] [ (p) ] time span
json JSON data
line infinite line on a plane
lseg line segment on a plane
macaddr MAC (Media Access Control) address
money currency amount
numeric [ (ps) ]decimal [ (ps) ]exact numeric of selectable precision
path geometric path on a plane
point geometric point on a plane
polygon closed geometric path on a plane
realfloat4single precision floating-point number (4 bytes)
smallintint2signed two-byte integer
smallserialserial2autoincrementing two-byte integer
serialserial4autoincrementing four-byte integer
text variable-length character string
time [ (p) ] [ without time zone ] time of day (no time zone)
time [ (p) ] with time zonetimetztime of day, including time zone
timestamp [ (p) ] [ without time zone ] date and time (no time zone)
timestamp [ (p) ] with time zonetimestamptzdate and time, including time zone
tsquery text search query
tsvector text search document
txid_snapshot user-level transaction ID snapshot
uuid universally unique identifier
xml XML data


'RDBMS (PostgreSQL)' 카테고리의 다른 글

JDBC 트랜잭션  (0) 2015.07.30
트랜잭션 인사이드  (0) 2015.07.01
PostgreSQL 조인  (0) 2015.05.13
Efficient Use of PostgreSQL Indexes  (0) 2015.05.13
PostgreSQL 날짜&시간 사용하기  (0) 2015.05.13

+ Recent posts