Linux kernel impact on PostgreSQL performance

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
200 messages Options
1234 ... 10
Reply | Threaded
Open this post in threaded view
|

Linux kernel impact on PostgreSQL performance

Mel Gorman
Hi,

I'm the chair for Linux Storage, Filesystem and Memory Management Summit 2014
(LSF/MM). A CFP was sent out last month (https://lwn.net/Articles/575681/)
that you may have seen already.

In recent years we have had at least one topic that was shared between
all three tracks that was lead by a person outside of the usual kernel
development community. I am checking if the PostgreSQL community
would be willing to volunteer someone to lead a topic discussing
PostgreSQL performance with recent kernels or to highlight regressions
or future developments you feel are potentially a problem. With luck
someone suitable is already travelling to the collaboration summit
(http://events.linuxfoundation.org/events/collaboration-summit) and it
would not be too inconvenient to drop in for LSF/MM as well.

There are two reasons why I'm suggesting this. First, PostgreSQL was the
basis of a test used to highlight a scheduler problem around kernel 3.6
but otherwise in my experience it is rare that PostgreSQL is part of a
bug report.  I am skeptical this particular bug report was a typical use
case for PostgreSQL (pgbench, read-only, many threads, very small in-memory
database). I wonder why reports related to PostgreSQL are not more common.
One assumption would be that PostgreSQL is perfectly happy with the current
kernel behaviour in which case our discussion here is done.

This brings me to the second reason -- there is evidence
that the PostgreSQL community is not happy with the current
direction of kernel development. The most obvious example is this thread
http://postgresql.1045698.n5.nabble.com/Why-we-are-going-to-have-to-go-DirectIO-td5781471.html
but I suspect there are others. The thread alleges that the kernel community
are in the business of pushing hackish changes into the IO stack without
much thought or testing although the linked article describes a VM and not
a storage problem. I'm not here to debate the kernels regression testing
or development methodology but LSF/MM is one place where a large number
of people involved with the IO layers will be attending.  If you have a
concrete complaint then here is a soap box.

Does the PostgreSQL community have a problem with recent kernels,
particularly with respect to the storage, filesystem or memory management
layers? If yes, do you have some data that can highlight this and can you
volunteer someone to represent your interests to the kernel community? Are
current developments in the IO layer counter to the PostgreSQL requirements?
If so, what developments, why are they a problem, do you have a suggested
alternative or some idea of what we should watch out for? The track topic
would be up to you but just as a hint, we'd need something a lot more
concrete than "you should test more".

--
Mel Gorman
SUSE Labs


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Linux kernel impact on PostgreSQL performance

Josh berkus
Mel,

> I'm the chair for Linux Storage, Filesystem and Memory Management Summit 2014
> (LSF/MM). A CFP was sent out last month (https://lwn.net/Articles/575681/)
> that you may have seen already.
>
> In recent years we have had at least one topic that was shared between
> all three tracks that was lead by a person outside of the usual kernel
> development community. I am checking if the PostgreSQL community
> would be willing to volunteer someone to lead a topic discussing
> PostgreSQL performance with recent kernels or to highlight regressions
> or future developments you feel are potentially a problem. With luck
> someone suitable is already travelling to the collaboration summit
> (http://events.linuxfoundation.org/events/collaboration-summit) and it
> would not be too inconvenient to drop in for LSF/MM as well.

We can definitely get someone there.  I'll certainly be there; I'm
hoping to get someone who has closer involvement with our kernel
interaction as well.

> There are two reasons why I'm suggesting this. First, PostgreSQL was the
> basis of a test used to highlight a scheduler problem around kernel 3.6
> but otherwise in my experience it is rare that PostgreSQL is part of a
> bug report.  I am skeptical this particular bug report was a typical use
> case for PostgreSQL (pgbench, read-only, many threads, very small in-memory
> database). I wonder why reports related to PostgreSQL are not more common.
> One assumption would be that PostgreSQL is perfectly happy with the current
> kernel behaviour in which case our discussion here is done.

To be frank, it's because most people are still running on 2.6.19, and
as a result are completely unaware of recent developments.  Second,
because there's no obvious place to complain to ... lkml doesn't welcome
bug reports, and where else do you go?

> Does the PostgreSQL community have a problem with recent kernels,
> particularly with respect to the storage, filesystem or memory management
> layers? If yes, do you have some data that can highlight this and can you
> volunteer someone to represent your interests to the kernel community?

Yes, and yes.

> Are
> current developments in the IO layer counter to the PostgreSQL requirements?
> If so, what developments, why are they a problem, do you have a suggested
> alternative or some idea of what we should watch out for?

Mostly the issue is changes to the IO scheduler which improve one use
case at the expense of others, or set defaults which emphasize desktop
hardware over server hardware.

What also came up with the recent change to LRU is that the Postgres
community apparently has more experience than the Linux community with
buffer-clearing algorithms, and we ought to share that.

> The track topic
> would be up to you but just as a hint, we'd need something a lot more
> concrete than "you should test more".

How about "don't add major IO behavior changes with no
backwards-compatibility switches"?  ;-)

Seriously, one thing I'd like to get out of Collab would be a reasonable
regimen for testing database performance on Linux kernels.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Linux kernel impact on PostgreSQL performance

Kevin Grittner-5
Josh Berkus <[hidden email]> wrote:

>> Does the PostgreSQL community have a problem with recent
>> kernels, particularly with respect to the storage, filesystem or
>> memory management layers?

> How about "don't add major IO behavior changes with no
> backwards-compatibility switches"?  ;-)

I notice, Josh, that you didn't mention the problems many people
have run into with Transparent Huge Page defrag and with NUMA
access.  Is that because there *are* configuration options that
allow people to get decent performance once the issue is diagnosed?
It seems like maybe there could be a better way to give a heads-up
on hazards in a new kernel to the database world, but I don't know
quite what that would be.  For all I know, it is already available
if you know where to look.

> Seriously, one thing I'd like to get out of Collab would be a
> reasonable regimen for testing database performance on Linux
> kernels.

... or perhaps you figure this is what would bring such issues to
the community's attention before people are bitten in production
environments?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Linux kernel impact on PostgreSQL performance

Josh berkus
On 01/13/2014 10:51 AM, Kevin Grittner wrote:

>> How about "don't add major IO behavior changes with no
>> backwards-compatibility switches"?  ;-)
>
> I notice, Josh, that you didn't mention the problems many people
> have run into with Transparent Huge Page defrag and with NUMA
> access.  Is that because there *are* configuration options that
> allow people to get decent performance once the issue is diagnosed?
> It seems like maybe there could be a better way to give a heads-up
> on hazards in a new kernel to the database world, but I don't know
> quite what that would be.  For all I know, it is already available
> if you know where to look.

Well, it was the lack of sysctl options which takes the 2Q change from
"annoyance" to "potential disaster".  We can't ever get away from the
possibility that the Postgres use-case might be the minority use-case,
and we might have to use non-default options.  It's when those options
aren't present *at all* that we're stuck.

However, I agree that a worthwhile thing to talk about is having some
better channel to notify the Postgres (and other DB) communities about
major changes to IO and Memory management.

Wanna go to Collab?

>> Seriously, one thing I'd like to get out of Collab would be a
>> reasonable regimen for testing database performance on Linux
>> kernels.
>
> ... or perhaps you figure this is what would bring such issues to
> the community's attention before people are bitten in production
> environments?

That, too.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Linux kernel impact on PostgreSQL performance

Robert Haas
In reply to this post by Kevin Grittner-5
On Mon, Jan 13, 2014 at 1:51 PM, Kevin Grittner <[hidden email]> wrote:
> I notice, Josh, that you didn't mention the problems many people
> have run into with Transparent Huge Page defrag and with NUMA
> access.

Amen to that.  Actually, I think NUMA can be (mostly?) fixed by
setting zone_reclaim_mode; is there some other problem besides that?

The other thing that comes to mind is the kernel's caching behavior.
We've talked a lot over the years about the difficulties of getting
the kernel to write data out when we want it to and to not write data
out when we don't want it to.  When it writes data back to disk too
aggressively, we get lousy throughput because the same page can get
written more than once when caching it for longer would have allowed
write-combining.  When it doesn't write data to disk aggressively
enough, we get huge latency spikes at checkpoint time when we call
fsync() and the kernel says "uh, what? you wanted that data *on the
disk*? sorry boss!" and then proceeds to destroy the world by starving
the rest of the system for I/O for many seconds or minutes at a time.
We've made some desultory attempts to use sync_file_range() to improve
things here, but I'm not sure that's really the right tool, and if it
is we don't know how to use it well enough to obtain consistent
positive results.

On a related note, there's also the problem of double-buffering.  When
we read a page into shared_buffers, we leave a copy behind in the OS
buffers, and similarly on write-out.  It's very unclear what to do
about this, since the kernel and PostgreSQL don't have intimate
knowledge of what each other are doing, but it would be nice to solve
somehow.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Linux kernel impact on PostgreSQL performance

Claudio Freire
On Mon, Jan 13, 2014 at 5:15 PM, Robert Haas <[hidden email]> wrote:
> On a related note, there's also the problem of double-buffering.  When
> we read a page into shared_buffers, we leave a copy behind in the OS
> buffers, and similarly on write-out.  It's very unclear what to do
> about this, since the kernel and PostgreSQL don't have intimate
> knowledge of what each other are doing, but it would be nice to solve
> somehow.


There you have a much harder algorithmic problem.

You can basically control duplication with fadvise and WONTNEED. The
problem here is not the kernel and whether or not it allows postgres
to be smart about it. The problem is... what kind of smarts
(algorithm) to use.


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Linux kernel impact on PostgreSQL performance

Jim Nasby-2
On 1/13/14, 2:19 PM, Claudio Freire wrote:

> On Mon, Jan 13, 2014 at 5:15 PM, Robert Haas <[hidden email]> wrote:
>> On a related note, there's also the problem of double-buffering.  When
>> we read a page into shared_buffers, we leave a copy behind in the OS
>> buffers, and similarly on write-out.  It's very unclear what to do
>> about this, since the kernel and PostgreSQL don't have intimate
>> knowledge of what each other are doing, but it would be nice to solve
>> somehow.
>
>
> There you have a much harder algorithmic problem.
>
> You can basically control duplication with fadvise and WONTNEED. The
> problem here is not the kernel and whether or not it allows postgres
> to be smart about it. The problem is... what kind of smarts
> (algorithm) to use.

Isn't this a fairly simple matter of when we read a page into shared buffers tell the kernel do forget that page? And a corollary to that for when we dump a page out of shared_buffers (here kernel, please put this back into your cache).
--
Jim C. Nasby, Data Architect                       [hidden email]
512.569.9461 (cell)                         http://jim.nasby.net


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Linux kernel impact on PostgreSQL performance

Claudio Freire
On Mon, Jan 13, 2014 at 5:23 PM, Jim Nasby <[hidden email]> wrote:

> On 1/13/14, 2:19 PM, Claudio Freire wrote:
>>
>> On Mon, Jan 13, 2014 at 5:15 PM, Robert Haas <[hidden email]>
>> wrote:
>>>
>>> On a related note, there's also the problem of double-buffering.  When
>>> we read a page into shared_buffers, we leave a copy behind in the OS
>>> buffers, and similarly on write-out.  It's very unclear what to do
>>> about this, since the kernel and PostgreSQL don't have intimate
>>> knowledge of what each other are doing, but it would be nice to solve
>>> somehow.
>>
>>
>>
>> There you have a much harder algorithmic problem.
>>
>> You can basically control duplication with fadvise and WONTNEED. The
>> problem here is not the kernel and whether or not it allows postgres
>> to be smart about it. The problem is... what kind of smarts
>> (algorithm) to use.
>
>
> Isn't this a fairly simple matter of when we read a page into shared buffers
> tell the kernel do forget that page? And a corollary to that for when we
> dump a page out of shared_buffers (here kernel, please put this back into
> your cache).


That's my point. In terms of kernel-postgres interaction, it's fairly simple.

What's not so simple, is figuring out what policy to use. Remember,
you cannot tell the kernel to put some page in its page cache without
reading it or writing it. So, once you make the kernel forget a page,
evicting it from shared buffers becomes quite expensive.


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Linux kernel impact on PostgreSQL performance

Jim Nasby-2
On 1/13/14, 2:27 PM, Claudio Freire wrote:

> On Mon, Jan 13, 2014 at 5:23 PM, Jim Nasby <[hidden email]> wrote:
>> On 1/13/14, 2:19 PM, Claudio Freire wrote:
>>>
>>> On Mon, Jan 13, 2014 at 5:15 PM, Robert Haas <[hidden email]>
>>> wrote:
>>>>
>>>> On a related note, there's also the problem of double-buffering.  When
>>>> we read a page into shared_buffers, we leave a copy behind in the OS
>>>> buffers, and similarly on write-out.  It's very unclear what to do
>>>> about this, since the kernel and PostgreSQL don't have intimate
>>>> knowledge of what each other are doing, but it would be nice to solve
>>>> somehow.
>>>
>>>
>>>
>>> There you have a much harder algorithmic problem.
>>>
>>> You can basically control duplication with fadvise and WONTNEED. The
>>> problem here is not the kernel and whether or not it allows postgres
>>> to be smart about it. The problem is... what kind of smarts
>>> (algorithm) to use.
>>
>>
>> Isn't this a fairly simple matter of when we read a page into shared buffers
>> tell the kernel do forget that page? And a corollary to that for when we
>> dump a page out of shared_buffers (here kernel, please put this back into
>> your cache).
>
>
> That's my point. In terms of kernel-postgres interaction, it's fairly simple.
>
> What's not so simple, is figuring out what policy to use. Remember,
> you cannot tell the kernel to put some page in its page cache without
> reading it or writing it. So, once you make the kernel forget a page,
> evicting it from shared buffers becomes quite expensive.

Well, if we were to collaborate with the kernel community on this then presumably we can do better than that for eviction... even to the extent of "here's some data from this range in this file. It's (clean|dirty). Put it in your cache. Just trust me on this."
--
Jim C. Nasby, Data Architect                       [hidden email]
512.569.9461 (cell)                         http://jim.nasby.net


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

James Bottomley
On Mon, 2014-01-13 at 14:32 -0600, Jim Nasby wrote:

> On 1/13/14, 2:27 PM, Claudio Freire wrote:
> > On Mon, Jan 13, 2014 at 5:23 PM, Jim Nasby <[hidden email]> wrote:
> >> On 1/13/14, 2:19 PM, Claudio Freire wrote:
> >>>
> >>> On Mon, Jan 13, 2014 at 5:15 PM, Robert Haas <[hidden email]>
> >>> wrote:
> >>>>
> >>>> On a related note, there's also the problem of double-buffering.  When
> >>>> we read a page into shared_buffers, we leave a copy behind in the OS
> >>>> buffers, and similarly on write-out.  It's very unclear what to do
> >>>> about this, since the kernel and PostgreSQL don't have intimate
> >>>> knowledge of what each other are doing, but it would be nice to solve
> >>>> somehow.
> >>>
> >>>
> >>>
> >>> There you have a much harder algorithmic problem.
> >>>
> >>> You can basically control duplication with fadvise and WONTNEED. The
> >>> problem here is not the kernel and whether or not it allows postgres
> >>> to be smart about it. The problem is... what kind of smarts
> >>> (algorithm) to use.
> >>
> >>
> >> Isn't this a fairly simple matter of when we read a page into shared buffers
> >> tell the kernel do forget that page? And a corollary to that for when we
> >> dump a page out of shared_buffers (here kernel, please put this back into
> >> your cache).
> >
> >
> > That's my point. In terms of kernel-postgres interaction, it's fairly simple.
> >
> > What's not so simple, is figuring out what policy to use. Remember,
> > you cannot tell the kernel to put some page in its page cache without
> > reading it or writing it. So, once you make the kernel forget a page,
> > evicting it from shared buffers becomes quite expensive.
>
> Well, if we were to collaborate with the kernel community on this then
> presumably we can do better than that for eviction... even to the
> extent of "here's some data from this range in this file. It's (clean|
> dirty). Put it in your cache. Just trust me on this."

This should be the madvise() interface (with MADV_WILLNEED and
MADV_DONTNEED) is there something in that interface that is
insufficient?

James




--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Linux kernel impact on PostgreSQL performance

Claudio Freire
In reply to this post by Jim Nasby-2
On Mon, Jan 13, 2014 at 5:32 PM, Jim Nasby <[hidden email]> wrote:

>>
>> That's my point. In terms of kernel-postgres interaction, it's fairly
>> simple.
>>
>> What's not so simple, is figuring out what policy to use. Remember,
>> you cannot tell the kernel to put some page in its page cache without
>> reading it or writing it. So, once you make the kernel forget a page,
>> evicting it from shared buffers becomes quite expensive.
>
>
> Well, if we were to collaborate with the kernel community on this then
> presumably we can do better than that for eviction... even to the extent of
> "here's some data from this range in this file. It's (clean|dirty). Put it
> in your cache. Just trust me on this."


If I had a kernel developer hat, I'd put it on to say: I don't think
allowing that last bit is wise for a kernel.

It would violate oh-so-many separation rules and open an oh-so-big can-o-worms.


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Linux kernel impact on PostgreSQL performance

Andres Freund-3
In reply to this post by Robert Haas
On 2014-01-13 15:15:16 -0500, Robert Haas wrote:
> On Mon, Jan 13, 2014 at 1:51 PM, Kevin Grittner <[hidden email]> wrote:
> > I notice, Josh, that you didn't mention the problems many people
> > have run into with Transparent Huge Page defrag and with NUMA
> > access.
>
> Amen to that.  Actually, I think NUMA can be (mostly?) fixed by
> setting zone_reclaim_mode; is there some other problem besides that?

I think that fixes some of the worst instances, but I've seen machines
spending horrible amounts of CPU (& BUS) time in page reclaim
nonetheless. If I analyzed it correctly it's in RAM << working set
workloads where RAM is pretty large and most of it is used as page
cache. The kernel ends up spending a huge percentage of time finding and
potentially defragmenting pages when looking for victim buffers.

> On a related note, there's also the problem of double-buffering.  When
> we read a page into shared_buffers, we leave a copy behind in the OS
> buffers, and similarly on write-out.  It's very unclear what to do
> about this, since the kernel and PostgreSQL don't have intimate
> knowledge of what each other are doing, but it would be nice to solve
> somehow.

I've wondered before if there wouldn't be a chance for postgres to say
"my dear OS, that the file range 0-8192 of file x contains y, no need to
reread" and do that when we evict a page from s_b but I never dared to
actually propose that to kernel people...

Greetings,

Andres Freund

--
 Andres Freund                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

Trond Myklebust

On Jan 13, 2014, at 15:40, Andres Freund <[hidden email]> wrote:

> On 2014-01-13 15:15:16 -0500, Robert Haas wrote:
>> On Mon, Jan 13, 2014 at 1:51 PM, Kevin Grittner <[hidden email]> wrote:
>>> I notice, Josh, that you didn't mention the problems many people
>>> have run into with Transparent Huge Page defrag and with NUMA
>>> access.
>>
>> Amen to that.  Actually, I think NUMA can be (mostly?) fixed by
>> setting zone_reclaim_mode; is there some other problem besides that?
>
> I think that fixes some of the worst instances, but I've seen machines
> spending horrible amounts of CPU (& BUS) time in page reclaim
> nonetheless. If I analyzed it correctly it's in RAM << working set
> workloads where RAM is pretty large and most of it is used as page
> cache. The kernel ends up spending a huge percentage of time finding and
> potentially defragmenting pages when looking for victim buffers.
>
>> On a related note, there's also the problem of double-buffering.  When
>> we read a page into shared_buffers, we leave a copy behind in the OS
>> buffers, and similarly on write-out.  It's very unclear what to do
>> about this, since the kernel and PostgreSQL don't have intimate
>> knowledge of what each other are doing, but it would be nice to solve
>> somehow.
>
> I've wondered before if there wouldn't be a chance for postgres to say
> "my dear OS, that the file range 0-8192 of file x contains y, no need to
> reread" and do that when we evict a page from s_b but I never dared to
> actually propose that to kernel people...

O_DIRECT was specifically designed to solve the problem of double buffering between applications and the kernel. Why are you not able to use that in these situations?

Cheers,
   Trond

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Linux kernel impact on PostgreSQL performance

Jim Nasby-2
In reply to this post by Claudio Freire
On 1/13/14, 2:37 PM, Claudio Freire wrote:

> On Mon, Jan 13, 2014 at 5:32 PM, Jim Nasby <[hidden email]> wrote:
>>>
>>> That's my point. In terms of kernel-postgres interaction, it's fairly
>>> simple.
>>>
>>> What's not so simple, is figuring out what policy to use. Remember,
>>> you cannot tell the kernel to put some page in its page cache without
>>> reading it or writing it. So, once you make the kernel forget a page,
>>> evicting it from shared buffers becomes quite expensive.
>>
>>
>> Well, if we were to collaborate with the kernel community on this then
>> presumably we can do better than that for eviction... even to the extent of
>> "here's some data from this range in this file. It's (clean|dirty). Put it
>> in your cache. Just trust me on this."
>
>
> If I had a kernel developer hat, I'd put it on to say: I don't think
> allowing that last bit is wise for a kernel.
>
> It would violate oh-so-many separation rules and open an oh-so-big can-o-worms.

Yeah, if it were me I'd probably want to keep a hash of the page and it's address and only accept putting a page back into the kernel if it matched my hash. Otherwise you'd just have to treat it as a write.
--
Jim C. Nasby, Data Architect                       [hidden email]
512.569.9461 (cell)                         http://jim.nasby.net


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

Andres Freund-3
In reply to this post by Trond Myklebust
On 2014-01-13 15:53:36 -0500, Trond Myklebust wrote:
> > I've wondered before if there wouldn't be a chance for postgres to say
> > "my dear OS, that the file range 0-8192 of file x contains y, no need to
> > reread" and do that when we evict a page from s_b but I never dared to
> > actually propose that to kernel people...
>
> O_DIRECT was specifically designed to solve the problem of double buffering between applications and the kernel. Why are you not able to use that in these situations?

Because we like to handle the OS handle part of postgres' caching. For
one, it makes servers with several applications/databases much more
realistic without seriously overallocating memory, for another it's a
huge chunk of platform dependent code to get good performance
everywhere.
The above was explicitly not to avoid double buffering but to move a
buffer away from postgres' own buffers to the kernel's buffers once it's
not 100% clear we need it in buffers anymore.

Part of the reason this is being discussed is because previously people
suggested going the direct IO route and some people (most prominently
J. Corbet in http://archives.postgresql.org/message-id/20131204083345.31c60dd1%40lwn.net
) and others disagreed because that goes the route of reinventing
storage layers everywhere without improving the common codepaths.

Greetings,

Andres Freund

--
 Andres Freund                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

Robert Haas
In reply to this post by Trond Myklebust
On Mon, Jan 13, 2014 at 3:53 PM, Trond Myklebust <[hidden email]> wrote:
> O_DIRECT was specifically designed to solve the problem of double buffering between applications and the kernel. Why are you not able to use that in these situations?

O_DIRECT was apparently designed by a deranged monkey on some serious
mind-controlling substances.  But don't take it from me, I have it on
good authority:

http://yarchive.net/comp/linux/o_direct.html

One might even say the best authority.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Linux kernel impact on PostgreSQL performance

Jeff Janes
In reply to this post by Jim Nasby-2
On Mon, Jan 13, 2014 at 12:32 PM, Jim Nasby <[hidden email]> wrote:
On 1/13/14, 2:27 PM, Claudio Freire wrote:
On Mon, Jan 13, 2014 at 5:23 PM, Jim Nasby <[hidden email]> wrote:
On 1/13/14, 2:19 PM, Claudio Freire wrote:

On Mon, Jan 13, 2014 at 5:15 PM, Robert Haas <[hidden email]>
wrote:

On a related note, there's also the problem of double-buffering.  When
we read a page into shared_buffers, we leave a copy behind in the OS
buffers, and similarly on write-out.  It's very unclear what to do
about this, since the kernel and PostgreSQL don't have intimate
knowledge of what each other are doing, but it would be nice to solve
somehow.



There you have a much harder algorithmic problem.

You can basically control duplication with fadvise and WONTNEED. The
problem here is not the kernel and whether or not it allows postgres
to be smart about it. The problem is... what kind of smarts
(algorithm) to use.


Isn't this a fairly simple matter of when we read a page into shared buffers
tell the kernel do forget that page? And a corollary to that for when we
dump a page out of shared_buffers (here kernel, please put this back into
your cache).


That's my point. In terms of kernel-postgres interaction, it's fairly simple.

What's not so simple, is figuring out what policy to use.

I think the above is pretty simple for both interaction (allow us to inject a clean page into the file page cache) and policy (forget it after you hand it to us, then remember it again when we hand it back to you clean).  And I think it would pretty likely be an improvement over what we currently do.  But I think it is probably the wrong way to get the improvement.  I think the real problem is that we don't trust ourselves to manage more of the memory ourselves.  

As far as I know, we still don't have a publicly disclosable and readily reproducible test case for the reports of performance degradation when we have more than 8GB in shared_buffers.   If we had one of those, we could likely reduce the double buffering problem by fixing our own scalability issues and therefore taking responsibility for more of the data ourselves.



Remember,
you cannot tell the kernel to put some page in its page cache without
reading it or writing it. So, once you make the kernel forget a page,
evicting it from shared buffers becomes quite expensive.

Well, if we were to collaborate with the kernel community on this then presumably we can do better than that for eviction... even to the extent of "here's some data from this range in this file. It's (clean|dirty). Put it in your cache. Just trust me on this."

Which, in the case of it being clean, amounts to "Here is data we don't want in memory any more because we think it is cold.  But we don't trust ourselves, so please hold on to it anyway."  That might be a tough sell to the kernel people.

 Cheers,

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

Trond Myklebust
In reply to this post by Robert Haas

On Jan 13, 2014, at 16:03, Robert Haas <[hidden email]> wrote:

> On Mon, Jan 13, 2014 at 3:53 PM, Trond Myklebust <[hidden email]> wrote:
>> O_DIRECT was specifically designed to solve the problem of double buffering between applications and the kernel. Why are you not able to use that in these situations?
>
> O_DIRECT was apparently designed by a deranged monkey on some serious
> mind-controlling substances.  But don't take it from me, I have it on
> good authority:
>
> http://yarchive.net/comp/linux/o_direct.html
>
> One might even say the best authority.

You do realise that is 12 year old information, right? …and yes, we have added both aio and vectored operations to O_DIRECT in the meantime.

Meanwhile, no progress has been made on the “non-deranged” interface that authority was advocating.

Cheers,
  Trond

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Linux kernel impact on PostgreSQL performance

Kevin Grittner-5
In reply to this post by Josh berkus
Josh Berkus <[hidden email]> wrote:

> Wanna go to Collab?

I don't think that works out for me, but thanks for suggesting it.

I'd be happy to brainstorm with anyone who does go about issues to
discuss; although the ones I keep running into have already been
mentioned.

Regarding the problems others have mentioned, there are a few
features that might be a very big plus for us.  Additional ways of
hinting pages might be very useful.  If we had a way to specify how
many dirty pages were cached in PostgreSQL, the OS would count
those for calculations for writing dirty pages, and we could avoid
the "write avalanche" which is currently so tricky to avoid without
causing repeated writes to the same page.  Or perhaps instead a way
to hint a page as dirty so that the OS could not only count those,
but discard the obsolete data from its cache if it is not already
dirty at the OS level, and lower the write priority if it is dirty
(to improve the odds of collapsing multiple writes).  If there was
a way to use DONTNEED or something similar with the ability to
rescind it if the page was still happened to be in the OS cache,
that might help for when we discard a still-clean page from our
buffers.  And I seem to have a vague memory of there being cases
where the OS is first reading pages when we ask to write them,
which seems like avoidable I/O.  (I'm not sure about that one,
though.)

Also, something like THP support should really have sysctl support
rather than requiring people to put echo commands into scripts and
tie those into runlevel changes.  That's pretty ugly for something
which has turned out to be necessary so often.

I don't get too excited about changes to the default schedulers --
it's been pretty widely known for a long time that DEADLINE or NOOP
perform better than any alternatives for most database loads.
Anyone with a job setting up Linux machines to be used for database
servers should know to cover that.  As long as those two don't get
broken, I'm good.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

Andres Freund-3
In reply to this post by James Bottomley
On 2014-01-13 12:34:35 -0800, James Bottomley wrote:
> On Mon, 2014-01-13 at 14:32 -0600, Jim Nasby wrote:
> > Well, if we were to collaborate with the kernel community on this then
> > presumably we can do better than that for eviction... even to the
> > extent of "here's some data from this range in this file. It's (clean|
> > dirty). Put it in your cache. Just trust me on this."
>
> This should be the madvise() interface (with MADV_WILLNEED and
> MADV_DONTNEED) is there something in that interface that is
> insufficient?

For one, postgres doesn't use mmap for files (and can't without major
new interfaces). Frequently mmap()/madvise()/munmap()ing 8kb chunks has
horrible consequences for performance/scalability - very quickly you
contend on locks in the kernel.
Also, that will mark that page dirty, which isn't what we want in this
case. One major usecase is transplanting a page comming from postgres'
buffers into the kernel's buffercache because the latter has a much
better chance of properly allocating system resources across independent
applications running.

Oh, and the kernel's page-cache management while far from perfect,
actually scales much better than postgres'.

Greetings,

Andres Freund

--
 Andres Freund                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
1234 ... 10