Parallel Seq Scan vs kernel read ahead

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Parallel Seq Scan vs kernel read ahead

Thomas Munro-5
Hello hackers,

Parallel sequential scan relies on the kernel detecting sequential
access, but we don't make the job easy.  The resulting striding
pattern works terribly on strict next-block systems like FreeBSD UFS,
and degrades rapidly when you add too many workers on sliding window
systems like Linux.

Demonstration using FreeBSD on UFS on a virtual machine, taking ball
park figures from iostat:

  create table t as select generate_series(1, 200000000)::int i;

  set max_parallel_workers_per_gather = 0;
  select count(*) from t;
  -> execution time 13.3s, average read size = ~128kB, ~500MB/s

  set max_parallel_workers_per_gather = 1;
  select count(*) from t;
  -> execution time 24.9s, average read size = ~32kB, ~250MB/s

Note the small read size, which means that there was no read
clustering happening at all: that's the logical block size of this
filesystem.

That explains some complaints I've heard about PostgreSQL performance
on that filesystem: parallel query destroys I/O performance.

As a quick experiment, I tried teaching the block allocated to
allocate ranges of up 64 blocks at a time, ramping up incrementally,
and ramping down at the end, and I got:

  set max_parallel_workers_per_gather = 1;
  select count(*) from t;
  -> execution time 7.5s, average read size = ~128kB, ~920MB/s

  set max_parallel_workers_per_gather = 3;
  select count(*) from t;
  -> execution time 5.2s, average read size = ~128kB, ~1.2GB/s

I've attached the quick and dirty patch I used for that.

0001-Use-larger-step-sizes-for-Parallel-Seq-Scan.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Parallel Seq Scan vs kernel read ahead

akapila
On Wed, May 20, 2020 at 7:24 AM Thomas Munro <[hidden email]> wrote:

>
> Hello hackers,
>
> Parallel sequential scan relies on the kernel detecting sequential
> access, but we don't make the job easy.  The resulting striding
> pattern works terribly on strict next-block systems like FreeBSD UFS,
> and degrades rapidly when you add too many workers on sliding window
> systems like Linux.
>
> Demonstration using FreeBSD on UFS on a virtual machine, taking ball
> park figures from iostat:
>
>   create table t as select generate_series(1, 200000000)::int i;
>
>   set max_parallel_workers_per_gather = 0;
>   select count(*) from t;
>   -> execution time 13.3s, average read size = ~128kB, ~500MB/s
>
>   set max_parallel_workers_per_gather = 1;
>   select count(*) from t;
>   -> execution time 24.9s, average read size = ~32kB, ~250MB/s
>
> Note the small read size, which means that there was no read
> clustering happening at all: that's the logical block size of this
> filesystem.
>
> That explains some complaints I've heard about PostgreSQL performance
> on that filesystem: parallel query destroys I/O performance.
>
> As a quick experiment, I tried teaching the block allocated to
> allocate ranges of up 64 blocks at a time, ramping up incrementally,
> and ramping down at the end, and I got:
>

Good experiment.  IIRC, we have discussed a similar idea during the
development of this feature but we haven't seen any better results by
allocating in ranges on the systems we have tried.  So, we want with
the current approach which is more granular and seems to allow better
parallelism.  I feel we need to ensure that we don't regress
parallelism in existing cases, otherwise, the idea sounds promising to
me.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Parallel Seq Scan vs kernel read ahead

Thomas Munro-5
On Wed, May 20, 2020 at 2:23 PM Amit Kapila <[hidden email]> wrote:
> Good experiment.  IIRC, we have discussed a similar idea during the
> development of this feature but we haven't seen any better results by
> allocating in ranges on the systems we have tried.  So, we want with
> the current approach which is more granular and seems to allow better
> parallelism.  I feel we need to ensure that we don't regress
> parallelism in existing cases, otherwise, the idea sounds promising to
> me.

Yeah, Linux seems to do pretty well at least with smallish numbers of
workers, and when you use large numbers you can probably tune your way
out of the problem.  ZFS seems to do fine.  I wonder how well the
other OSes cope.


Reply | Threaded
Open this post in threaded view
|

Re: Parallel Seq Scan vs kernel read ahead

Ranier Vilela-2
Em qua., 20 de mai. de 2020 às 00:09, Thomas Munro <[hidden email]> escreveu:
On Wed, May 20, 2020 at 2:23 PM Amit Kapila <[hidden email]> wrote:
> Good experiment.  IIRC, we have discussed a similar idea during the
> development of this feature but we haven't seen any better results by
> allocating in ranges on the systems we have tried.  So, we want with
> the current approach which is more granular and seems to allow better
> parallelism.  I feel we need to ensure that we don't regress
> parallelism in existing cases, otherwise, the idea sounds promising to
> me.

Yeah, Linux seems to do pretty well at least with smallish numbers of
workers, and when you use large numbers you can probably tune your way
out of the problem.  ZFS seems to do fine.  I wonder how well the
other OSes cope.
Windows 10 (64bits, i5, 8GB, SSD)

postgres=# set max_parallel_workers_per_gather = 0;
SET
Time: 2,537 ms
postgres=#  select count(*) from t;
   count
-----------
 200000000
(1 row)


Time: 47767,916 ms (00:47,768)
postgres=# set max_parallel_workers_per_gather = 1;
SET
Time: 4,889 ms
postgres=#  select count(*) from t;
   count
-----------
 200000000
(1 row)


Time: 32645,448 ms (00:32,645)

How display " -> execution time 5.2s, average read size ="?

regards,
Ranier VIlela
Reply | Threaded
Open this post in threaded view
|

Re: Parallel Seq Scan vs kernel read ahead

Thomas Munro-5
On Wed, May 20, 2020 at 11:03 PM Ranier Vilela <[hidden email]> wrote:
> Time: 47767,916 ms (00:47,768)
> Time: 32645,448 ms (00:32,645)

Just to make sure kernel caching isn't helping here, maybe try making
the table 2x or 4x bigger?  My test was on a virtual machine with only
4GB RAM, so the table couldn't be entirely cached.

> How display " -> execution time 5.2s, average read size ="?

Execution time is what you showed, and average read size should be
inside the Windows performance window somewhere (not sure what it's
called).


Reply | Threaded
Open this post in threaded view
|

Re: Parallel Seq Scan vs kernel read ahead

Ranier Vilela-2
Em qua., 20 de mai. de 2020 às 18:49, Thomas Munro <[hidden email]> escreveu:
On Wed, May 20, 2020 at 11:03 PM Ranier Vilela <[hidden email]> wrote:
> Time: 47767,916 ms (00:47,768)
> Time: 32645,448 ms (00:32,645)

Just to make sure kernel caching isn't helping here, maybe try making
the table 2x or 4x bigger?  My test was on a virtual machine with only
4GB RAM, so the table couldn't be entirely cached.
4x bigger.
Postgres defaults settings.

postgres=# create table t as select generate_series(1, 800000000)::int i;
SELECT 800000000
postgres=# \timing
Timing is on.
postgres=# set max_parallel_workers_per_gather = 0;
SET
Time: 8,622 ms
postgres=# select count(*) from t;
   count
-----------
 800000000
(1 row)


Time: 227238,445 ms (03:47,238)
postgres=# set max_parallel_workers_per_gather = 1;
SET
Time: 20,975 ms
postgres=# select count(*) from t;
   count
-----------
 800000000
(1 row)


Time: 138027,351 ms (02:18,027)

regards,
Ranier Vilela
Reply | Threaded
Open this post in threaded view
|

Re: Parallel Seq Scan vs kernel read ahead

Thomas Munro-5
On Thu, May 21, 2020 at 11:15 AM Ranier Vilela <[hidden email]> wrote:
> postgres=# set max_parallel_workers_per_gather = 0;
> Time: 227238,445 ms (03:47,238)
> postgres=# set max_parallel_workers_per_gather = 1;
> Time: 138027,351 ms (02:18,027)

Ok, so it looks like NT/NTFS isn't suffering from this problem.
Thanks for testing!


Reply | Threaded
Open this post in threaded view
|

Re: Parallel Seq Scan vs kernel read ahead

Ranier Vilela-2
Em qua., 20 de mai. de 2020 às 20:48, Thomas Munro <[hidden email]> escreveu:
On Thu, May 21, 2020 at 11:15 AM Ranier Vilela <[hidden email]> wrote:
> postgres=# set max_parallel_workers_per_gather = 0;
> Time: 227238,445 ms (03:47,238)
> postgres=# set max_parallel_workers_per_gather = 1;
> Time: 138027,351 ms (02:18,027)

Ok, so it looks like NT/NTFS isn't suffering from this problem.
Thanks for testing!
Maybe it wasn’t clear, the tests were done with your patch applied.

regards,
Ranier Vilela
Reply | Threaded
Open this post in threaded view
|

Re: Parallel Seq Scan vs kernel read ahead

Thomas Munro-5
On Thu, May 21, 2020 at 11:51 AM Ranier Vilela <[hidden email]> wrote:

> Em qua., 20 de mai. de 2020 às 20:48, Thomas Munro <[hidden email]> escreveu:
>> On Thu, May 21, 2020 at 11:15 AM Ranier Vilela <[hidden email]> wrote:
>> > postgres=# set max_parallel_workers_per_gather = 0;
>> > Time: 227238,445 ms (03:47,238)
>> > postgres=# set max_parallel_workers_per_gather = 1;
>> > Time: 138027,351 ms (02:18,027)
>>
>> Ok, so it looks like NT/NTFS isn't suffering from this problem.
>> Thanks for testing!
>
> Maybe it wasn’t clear, the tests were done with your patch applied.

Oh!  And how do the times look without it?


Reply | Threaded
Open this post in threaded view
|

Re: Parallel Seq Scan vs kernel read ahead

Ranier Vilela-2
Em qua., 20 de mai. de 2020 às 21:03, Thomas Munro <[hidden email]> escreveu:
On Thu, May 21, 2020 at 11:51 AM Ranier Vilela <[hidden email]> wrote:
> Em qua., 20 de mai. de 2020 às 20:48, Thomas Munro <[hidden email]> escreveu:
>> On Thu, May 21, 2020 at 11:15 AM Ranier Vilela <[hidden email]> wrote:
>> > postgres=# set max_parallel_workers_per_gather = 0;
>> > Time: 227238,445 ms (03:47,238)
>> > postgres=# set max_parallel_workers_per_gather = 1;
>> > Time: 138027,351 ms (02:18,027)
>>
>> Ok, so it looks like NT/NTFS isn't suffering from this problem.
>> Thanks for testing!
>
> Maybe it wasn’t clear, the tests were done with your patch applied.

Oh!  And how do the times look without it?
Vanila Postgres (latest)

create table t as select generate_series(1, 800000000)::int i;
 set max_parallel_workers_per_gather = 0;
Time: 210524,317 ms (03:30,524)
set max_parallel_workers_per_gather = 1;
Time: 146982,737 ms (02:26,983)

regards,
Ranier Vilela
Reply | Threaded
Open this post in threaded view
|

Re: Parallel Seq Scan vs kernel read ahead

Thomas Munro-5
On Thu, May 21, 2020 at 1:38 PM Ranier Vilela <[hidden email]> wrote:
>> >> On Thu, May 21, 2020 at 11:15 AM Ranier Vilela <[hidden email]> wrote:
>> >> > postgres=# set max_parallel_workers_per_gather = 0;
>> >> > Time: 227238,445 ms (03:47,238)
>> >> > postgres=# set max_parallel_workers_per_gather = 1;
>> >> > Time: 138027,351 ms (02:18,027)

> Vanila Postgres (latest)
>
> create table t as select generate_series(1, 800000000)::int i;
>  set max_parallel_workers_per_gather = 0;
> Time: 210524,317 ms (03:30,524)
> set max_parallel_workers_per_gather = 1;
> Time: 146982,737 ms (02:26,983)

Thanks.  So it seems like Linux, Windows and anything using ZFS are
OK, which probably explains why we hadn't heard complaints about it.


Reply | Threaded
Open this post in threaded view
|

Re: Parallel Seq Scan vs kernel read ahead

David Rowley
On Thu, 21 May 2020 at 14:32, Thomas Munro <[hidden email]> wrote:
> Thanks.  So it seems like Linux, Windows and anything using ZFS are
> OK, which probably explains why we hadn't heard complaints about it.

I tried out a different test on a Windows 8.1 machine I have here.  I
was concerned that the test that was used here ends up with tuples
that are too narrow and that the executor would spend quite a bit of
time going between nodes and performing the actual aggregation.  I
thought it might be good to add some padding so that there are far
fewer tuples on the page.

I ended up with:

create table t (a int, b text);
-- create a table of 100GB in size.
insert into t select x,md5(x::text) from
generate_series(1,1000000*1572.7381809)x; -- took 1 hr 18 mins
vacuum freeze t;

query = select count(*) from t;
Disk = Samsung SSD 850 EVO mSATA 1TB.

Master:
workers = 0 : Time: 269104.281 ms (04:29.104)  380MB/s
workers = 1 : Time: 741183.646 ms (12:21.184)  138MB/s
workers = 2 : Time: 656963.754 ms (10:56.964)  155MB/s

Patched:

workers = 0 : Should be the same as before as the code for this didn't change.
workers = 1 : Time: 300299.364 ms (05:00.299) 340MB/s
workers = 2 : Time: 270213.726 ms (04:30.214) 379MB/s

(A better query would likely have been just: SELECT * FROM t WHERE a =
1; but I'd run the test by the time I thought of that.)

So, this shows that Windows, at least 8.1, does suffer from this too.

For the patch. I know you just put it together quickly, but I don't
think you can do that ramp up the way you have. It looks like there's
a risk of torn reads and torn writes and I'm unsure how much that
could affect the test results here. It looks like there's a risk that
a worker gets some garbage number of pages to read rather than what
you think it will. Also, I also don't quite understand the need for a
ramp-up in pages per serving. Shouldn't you instantly start at some
size and hold that, then only maybe ramp down at the end so that
workers all finish at close to the same time?  However, I did have
other ideas which I'll explain below.

From my previous work on that function to add the atomics. I did think
that it would be better to dish out more than 1 page at a time.
However, there is the risk that the workload is not evenly distributed
between the workers.  My thoughts were that we could divide the total
pages by the number of workers then again by 100 and dish out blocks
based on that. That way workers will get about 100th of their fair
share of pages at once, so assuming there's an even amount of work to
do per serving of pages, then the last worker should only run on at
most 1% longer.  Perhaps that 100 should be 1000, then the run on time
for the last worker is just 0.1%.  Perhaps the serving size can also
be capped at some maximum like 64. We'll certainly need to ensure it's
at least 1!   I imagine that will eliminate the need for any ramp down
of pages per serving near the end of the scan.

David


Reply | Threaded
Open this post in threaded view
|

Re: Parallel Seq Scan vs kernel read ahead

David Rowley
On Thu, 21 May 2020 at 17:06, David Rowley <[hidden email]> wrote:
> For the patch. I know you just put it together quickly, but I don't
> think you can do that ramp up the way you have. It looks like there's
> a risk of torn reads and torn writes and I'm unsure how much that
> could affect the test results here.

Oops. On closer inspection, I see that memory is per worker, not
global to the scan.


Reply | Threaded
Open this post in threaded view
|

Re: Parallel Seq Scan vs kernel read ahead

Thomas Munro-5
On Fri, May 22, 2020 at 10:00 AM David Rowley <[hidden email]> wrote:
> On Thu, 21 May 2020 at 17:06, David Rowley <[hidden email]> wrote:
> > For the patch. I know you just put it together quickly, but I don't
> > think you can do that ramp up the way you have. It looks like there's
> > a risk of torn reads and torn writes and I'm unsure how much that
> > could affect the test results here.
>
> Oops. On closer inspection, I see that memory is per worker, not
> global to the scan.

Right, I think it's safe.  I think you were probably right that
ramp-up isn't actually useful though, it's only the end of the scan
that requires special treatment so we don't get unfair allocation as
the work runs out, due to course grain.  I suppose that even if you
have a scheme that falls back to fine grained allocation for the final
N pages, it's still possible that a highly distracted process (most
likely the leader given its double duties) can finish up sitting on a
large range of pages and eventually have to process them all at the
end after the other workers have already knocked off and gone for a
pint.


Reply | Threaded
Open this post in threaded view
|

Re: Parallel Seq Scan vs kernel read ahead

Soumyadeep Chakraborty-2
Hi Thomas,

Some more data points:

create table t_heap as select generate_series(1, 100000000) i;

Query: select count(*) from t_heap;
shared_buffers=32MB (so that I don't have to clear buffers, OS page
cache)
OS: FreeBSD 12.1 with UFS on GCP
4 vCPUs, 4GB RAM Intel Skylake
22G Google PersistentDisk
Time is measured with \timing on.

Without your patch:

max_parallel_workers_per_gather    Time(seconds)
                              0           33.88s
                              1           57.62s
                              2           62.01s
                              6          222.94s

With your patch:

max_parallel_workers_per_gather    Time(seconds)
                              0           29.04s
                              1           29.17s
                              2           28.78s
                              6          291.27s

I checked with explain analyze to ensure that the number of workers
planned = max_parallel_workers_per_gather

Apart from the last result (max_parallel_workers_per_gather=6), all
the other results seem favorable.
Could the last result be down to the fact that the number of workers
planned exceeded the number of vCPUs?

I also wanted to evaluate Zedstore with your patch.
I used the same setup as above.
No discernible difference though, maybe I'm missing something:

Without your patch:

max_parallel_workers_per_gather    Time(seconds)
                              0           25.86s
                              1           15.70s
                              2           12.60s
                              6           12.41s


With your patch:

max_parallel_workers_per_gather    Time(seconds)
                              0           26.96s
                              1           15.73s
                              2           12.46s
                              6           12.10s
--
Soumyadeep


On Thu, May 21, 2020 at 3:28 PM Thomas Munro <[hidden email]> wrote:
On Fri, May 22, 2020 at 10:00 AM David Rowley <[hidden email]> wrote:
> On Thu, 21 May 2020 at 17:06, David Rowley <[hidden email]> wrote:
> > For the patch. I know you just put it together quickly, but I don't
> > think you can do that ramp up the way you have. It looks like there's
> > a risk of torn reads and torn writes and I'm unsure how much that
> > could affect the test results here.
>
> Oops. On closer inspection, I see that memory is per worker, not
> global to the scan.

Right, I think it's safe.  I think you were probably right that
ramp-up isn't actually useful though, it's only the end of the scan
that requires special treatment so we don't get unfair allocation as
the work runs out, due to course grain.  I suppose that even if you
have a scheme that falls back to fine grained allocation for the final
N pages, it's still possible that a highly distracted process (most
likely the leader given its double duties) can finish up sitting on a
large range of pages and eventually have to process them all at the
end after the other workers have already knocked off and gone for a
pint.


Reply | Threaded
Open this post in threaded view
|

Re: Parallel Seq Scan vs kernel read ahead

Thomas Munro-5
On Fri, May 22, 2020 at 1:14 PM Soumyadeep Chakraborty
<[hidden email]> wrote:
> Some more data points:

Thanks!

> max_parallel_workers_per_gather    Time(seconds)
>                               0           29.04s
>                               1           29.17s
>                               2           28.78s
>                               6          291.27s
>
> I checked with explain analyze to ensure that the number of workers
> planned = max_parallel_workers_per_gather
>
> Apart from the last result (max_parallel_workers_per_gather=6), all
> the other results seem favorable.
> Could the last result be down to the fact that the number of workers
> planned exceeded the number of vCPUs?

Interesting.  I guess it has to do with patterns emerging from various
parameters like that magic number 64 I hard coded into the test patch,
and other unknowns in your storage stack.  I see a small drop off that
I can't explain yet, but not that.

> I also wanted to evaluate Zedstore with your patch.
> I used the same setup as above.
> No discernible difference though, maybe I'm missing something:

It doesn't look like it's using table_block_parallelscan_nextpage() as
a block allocator so it's not affected by the patch.  It has its own
thing zs_parallelscan_nextrange(), which does
pg_atomic_fetch_add_u64(&pzscan->pzs_allocatedtids,
ZS_PARALLEL_CHUNK_SIZE), and that macro is 0x100000.


Reply | Threaded
Open this post in threaded view
|

Re: Parallel Seq Scan vs kernel read ahead

Robert Haas
In reply to this post by akapila
On Tue, May 19, 2020 at 10:23 PM Amit Kapila <[hidden email]> wrote:
> Good experiment.  IIRC, we have discussed a similar idea during the
> development of this feature but we haven't seen any better results by
> allocating in ranges on the systems we have tried.  So, we want with
> the current approach which is more granular and seems to allow better
> parallelism.  I feel we need to ensure that we don't regress
> parallelism in existing cases, otherwise, the idea sounds promising to
> me.

I think there's a significant difference. The idea I remember being
discussed at the time was to divide the relation into equal parts at
the very start and give one part to each worker. I think that carries
a lot of risk of some workers finishing much sooner than others. This
idea, AIUI, is to divide the relation into chunks that are small
compared to the size of the relation, but larger than 1 block. That
carries some risk of an unequal division of work, as has already been
noted, but it's much less, especially if we use smaller chunk sizes
once we get close to the end, as proposed here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Parallel Seq Scan vs kernel read ahead

Robert Haas
In reply to this post by Thomas Munro-5
On Thu, May 21, 2020 at 6:28 PM Thomas Munro <[hidden email]> wrote:
> Right, I think it's safe.  I think you were probably right that
> ramp-up isn't actually useful though, it's only the end of the scan
> that requires special treatment so we don't get unfair allocation as
> the work runs out, due to course grain.

The ramp-up seems like it might be useful if the query involves a LIMIT.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Parallel Seq Scan vs kernel read ahead

akapila
In reply to this post by Robert Haas
On Sat, May 23, 2020 at 12:00 AM Robert Haas <[hidden email]> wrote:

>
> On Tue, May 19, 2020 at 10:23 PM Amit Kapila <[hidden email]> wrote:
> > Good experiment.  IIRC, we have discussed a similar idea during the
> > development of this feature but we haven't seen any better results by
> > allocating in ranges on the systems we have tried.  So, we want with
> > the current approach which is more granular and seems to allow better
> > parallelism.  I feel we need to ensure that we don't regress
> > parallelism in existing cases, otherwise, the idea sounds promising to
> > me.
>
> I think there's a significant difference. The idea I remember being
> discussed at the time was to divide the relation into equal parts at
> the very start and give one part to each worker.
>

I have checked the archives and found that we have done some testing
by allowing each worker to work on a block-by-block basis and by
having a fixed number of chunks for each worker.  See the results [1]
(the program used is attached in another email [2]).  The conclusion
was that we didn't find much difference with any of those approaches.
Now, the reason could be that because we have tested on a machine (I
think it was hydra (Power-7)) where the chunk-size doesn't matter but
I think it can show some difference in the machines on which Thomas
and David are testing.  At that time there was also a discussion to
chunk on the basis of "each worker processes one 1GB-sized segment"
which Tom and Stephen seem to support [3].  I think an idea to divide
the relation into segments based on workers for a parallel scan has
been used by other database (DynamoDB) as well [4] so it is not
completely without merit.  I understand that larger sized chunks can
lead to unequal work distribution but they have their own advantages,
so we might want to get the best of both the worlds where in the
beginning we have larger sized chunks and then slowly reduce the
chunk-size towards the end of the scan.  I am not sure what is the
best thing to do here but maybe some experiments can shed light on
this mystery.


[1] - https://www.postgresql.org/message-id/CAA4eK1JHCmN2X1LjQ4bOmLApt%2BbtOuid5Vqqk5G6dDFV69iyHg%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CAA4eK1JyVNEBE8KuxKd3bJhkG6tSbpBYX_%2BZtP34ZSTCSucA1A%40mail.gmail.com
[3] - https://www.postgresql.org/message-id/30549.1422459647%40sss.pgh.pa.us
[4] - https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Scan.html#Scan.ParallelScan

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Parallel Seq Scan vs kernel read ahead

David Rowley
In reply to this post by Robert Haas
On Sat, 23 May 2020 at 06:31, Robert Haas <[hidden email]> wrote:
>
> On Thu, May 21, 2020 at 6:28 PM Thomas Munro <[hidden email]> wrote:
> > Right, I think it's safe.  I think you were probably right that
> > ramp-up isn't actually useful though, it's only the end of the scan
> > that requires special treatment so we don't get unfair allocation as
> > the work runs out, due to course grain.
>
> The ramp-up seems like it might be useful if the query involves a LIMIT.

That's true, but I think the intelligence there would need to go
beyond, "if there's a LIMIT clause, do ramp-up", as we might have
already fully ramped up well before the LIMIT is reached.

David