Wednesday, September 7, 2016

PostgreSQL: Optimizing Aggregates

Aggregation of vast energy: our sun.
It's time to write about PostgreSQL 9.6 again. With the Release Candidate 1 out in the wild, we slowly approach the end of a very interesting development cycle. Here I would like to talk about the development in the field of aggregates. So, two commits, 9/552 and 9/435, will improve the performance of queries using aggregate functions and GROUP BY clauses:

Title Combine Aggs: Serialize/Deserialize Internal aggregate states
Topic Server Features
Created 2016-02-29 04:08:46
Last modified 2016-04-08 17:49:52 (5 months ago)
Latest email 2016-04-09 10:13:21 (5 months ago)
Status2016-03: Committed
Authors David Rowley (davidrowley)
Reviewers Robert Haas (rhaas)Become reviewer
Committer Robert Haas (rhaas)
(full discussion)
Title Remove Functionally Dependent GROUP BY Columns
Topic Performance
Created 2015-12-02 10:23:32
Last modified 2016-02-11 22:52:25 (6 months, 4 weeks ago)
Latest email 2016-02-14 21:16:35 (6 months, 3 weeks ago)
Status2016-03: Committed
2016-01: Moved to next CF
Authors David Rowley (davidrowley)
Reviewers Julien Rouhaud (rjuju)Become reviewer
Committer Tom Lane (tgl)
(full discussion)

The former commit basically improves the performance of queries that look like that:

SELECT AVG(c), VARIANCE(c), SUM(c) FROM my_table;

Here they basically reduce amount of work to a single calculation for each row which previously would do redundant calculations for each result column. This speedup is due to sharing the internal state a common transition function of those aggregate functions.

Another extra is that sharing independent state variables across different aggregates allows parallelizinig the computation of these variables. In case of the functions above which share [count, sum, sum of squares], all three are independent.

The discussion around this topic started in the end of 2014 and ended in spring 2016. So, Robert Haas committed the patch and noted "Man, that was a lot of work" indicating the difficulties in implementing this infrastructure correctly.


The latter commit basically removed useless GROUP BY arguments, i.e. if they functionally depend on each other. So, the GROUP BY statement can be simplified. The PostgreSQL planner does this in 9.6 now. There are some quite impressive performance gains (up to 50%) with some rather small performance losses due to increased planning overhead.

Its main use-case are those users migrating from database systems which do not allow to omit redundant GROUP BY arguments when these appear in the SELECT clause.


That's it for now.

Sven

No comments:

Post a Comment