Few remarks on JIT , parallel query execution and columnar store...
Recently I have to estimate performance of performing select with
multiple search conditions with bad selectivity.
Definitely it is some kind of OLAP query and it will be interesting
for me to understand the role of different PostgreSQL optimization
So the table is the following:
create table t(pk bigint primary key, val1 double, val2 double, val3
insert into t select s."#1" as pk, rnd() as val1, rnd() as val2,
rnd() as val3 from generate_series(1,10000000) s;
I run the following query on standard desktop with quad-core CPU and
16 Gb of RAM with shared buffer adjusted to fit the whole database.:
select count(*) from t where val1>=0.5 and val2<=0.5 and val3
between 0.2 and 0.6;
Results are the following:
So without parallelism JIT provides some speed improvement, but in
case of parallel execution JIT effect is negative.
Most likely because JIT generation time (30 msec) is comparable with
Conclusion: for sequential scan of 10 million records JIT is not
able to provide performance improvement.
Let's increase number of records 10 times.
Now results are the following:
So now JIT is faster both for sequential and parallel execution.
But is it not the fastest result of processing this query with
Let's try my extension VOPS (https://github.com/postgrespro/vops):
So VOPS is > 3 times faster than JIT, but looks like it can
provide even better results for larger number of records,
because as you see increasing number of workers from 2 to 4 cause
increase of performance about 30% and not two times.
Looks like overhead of starting parallel worker is too large and for
query execution time < 1 second it has noticeable impact on total
In some other my prototype DBMS with vertical data representation
and multhreaded execution time of execution of this query is 195
So there is still scope for improvements:)