parallelisation of queries

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

parallelisation of queries

Francesco De Angelis
Hi all.
I am working on the optimization of a simple query and I would like to get more information about the parallel optimisation performed by the DBMS.
My real query is composed of several subqueries connected by ‘union all’ operators, but for the sake of clarity I will discard everything that is not relevant for the understanding and I will reproduce the same “strange” (at least to me) behavior with the following easy example.

Let’s consider a simple table (Test) and the following query:

explain (costs off) select (
    (select count(value) from Test)
    +(select count(value) from Test)
    +(select count(value) from Test)
    +(select count(value) from Test));

I expect that its subqueries (i.e. ‘select count(value) from Test’) are executed in parallel before finally aggregating all the results, but:

1) Even after explicitly setting the correct value of ‘max_parallel_workers_per_gather’ (with ‘force_parallel_mode=On’), the execution time grows linearly over the number of subqueries.
This happens also with a total number of queries that is less than the number of CPUs on my server (and ‘max_parallel_workers_per_gather’ guarantees that there are many available workers);

2) The output of the query, which justifies the aforementioned phenomenon, is reported below.

------------
InitPlan 1                                                
    ->  Finalize Aggregate                                                  
          ->  Gather                                                        
                Workers Planned: 4                                          
                ->  Partial Aggregate                                        
                      ->  Parallel Seq Scan on Test test  |
  InitPlan 2                                               
    ->  Finalize Aggregate                                                  
          ->  Gather                                                        
                Workers Planned: 4                                          
                ->  Partial Aggregate                                        
                      ->  Parallel Seq Scan on Test test_1|
  InitPlan 3                                                
    ->  Finalize Aggregate                                                  
          ->  Gather                                                        
                Workers Planned: 4                                          
                ->  Partial Aggregate                                        
                      ->  Parallel Seq Scan on Test test_2|
  InitPlan 4                                            
    ->  Finalize Aggregate                                                  
          ->  Gather                                                        
                Workers Planned: 4                                          
                ->  Partial Aggregate                                        
                      ->  Parallel Seq Scan on Test test_3|
------------

The output shows that there is one Gather for each subquery. This means that the DBMS decides to run the root query (i.e. the four subqueries) sequentially and then each subquery in parallel with 4 workers.
The “strange” thing is that the DBMS keeps optimising each internal subquery by executing only 4 workers even when the total number of subqueries increases (I tested up to 64 subqueries); in all such cases the number of CPUs on the server is way bigger, so I expect the parallel execution could lead to a further improvement if each subquery is assigned to a worker.
I also guess that the workers do not scale up because the size of the table is constant: in other terms, I think the size of the table limits the speedup that could be achieved in reality when working with an embarrassingly parallel workload.
If my interpretation is correct then, by selecting a specific size of the table, we may even end up in a scenario where only one thread is used for the whole query (i.e. no parallel optimization at all), even though the global workload would justify the execution in parallel on several CPUs.

At this point I have a couple of questions:
1) Is my interpretation correct?
2) Is there a way to “force” the DBMS to dispatch each subquery to a dedicated worker, which would become responsible for it? In other terms, is there a way to use more workers?

Thanks for your support,
I wish you a very good day.

Best regards,
Francesco De Angelis