Keep elog(ERROR) and ereport(ERROR) calls in the cold path

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
35 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Keep elog(ERROR) and ereport(ERROR) calls in the cold path

David Rowley
Back in [1] I experimented with a patch to coax compilers to build all
elog/ereport calls that were >= ERROR into a cold path away from the
function rasing the error. At the time, I really just wanted to test
how much of a speedup we could get by doing this and ended up just
writing up a patch that basically changed all elog(ERROR) calls from:

if (<error situation check>)
{
   <do stuff>;
    elog(ERROR, "...");
}

to add an unlikely() and become;

if (unlikely(<error situation check>)
{
   <do stuff>;
    elog(ERROR, "...");
}

Per the report in [1] I did see some pretty good gains in performance
from doing this.  The problem was, that at the time I couldn't figure
out a way to do this without an invasive patch that changed the code
in the thousands of elog/ereport calls.

In the attached, I've come up with a way that works. Basically I've
just added a function named errstart_cold() that is attributed with
__attribute__((cold)), which will hint to the compiler to keep
branches which call this function in a cold path.  To make the
compiler properly mark just >= ERROR calls as cold, and only when the
elevel is a constant, I modified the ereport_domain macro to do:

if (__builtin_constant_p(elevel) && (elevel) >= ERROR ? \
errstart_cold(elevel, domain) : \
errstart(elevel, domain)) \

I see no reason why the compiler shouldn't always fold that constant
expression at compile-time and correctly select the correct version of
the function for the job.  (I also tried using __builtin_choose_expr()
but GCC complained when the elevel was not constant, even with the
__builtin_constant_p() test in the condition)

I sampled a few .s files to inspect what code had changed.  Looking at
mcxt.s, fmgr.s and xlog.s, the first two of these because I found in
[1] that elogs were being done from those files quite often and xlog.s
because it's pretty big.  As far as I could see, GCC correctly moved
all the error reporting stuff where the elevel was a constant and >=
ERROR into the cold path and left the lower-level or non-consts elevel
calls alone.

For clang, I didn't see any changes in the .s files. I suspect that
they might have a few smarts in there and see the
__builtin_unreachable() call and assume the path is cold already based
on that. That was with clang 10. Perhaps older versions are not as
smart.

Benchmarking:

For benchmarking, I've not done a huge amount to test the impacts of
this change.  However, I can say that I am seeing some fairly good
improvements.  There seems to be some small improvements to execution
speed using TPCH-Q1 and also some good results from a pgbench -S test.

For TPCH-Q1:

Master:
$ pgbench -n -f pg-tpch/queries/q01.sql -T 120 tpch
latency average = 5272.630 ms
latency average = 5258.610 ms
latency average = 5250.871 ms

Master + elog_ereport_attribute_cold.patch
$ pgbench -n -f pg-tpch/queries/q01.sql -T 120 tpch
latency average = 5182.761 ms
latency average = 5194.851 ms
latency average = 5183.128 ms

Which is about a 1.42% increase in performance. That's not exactly
groundbreaking, but pretty useful to have if that happens to apply
across the board for execution performance.

For pgbench -S:

My results were a bit noisier than the TPCH test, but the results I
obtained did show about a 10% increase in performance:

Master:
drowley@amd3990x:~$ pgbench -S -T 120 postgres
tps = 25245.903255 (excluding connections establishing)
tps = 26144.454208 (excluding connections establishing)
tps = 25931.850518 (excluding connections establishing)

Master + elog_ereport_attribute_cold.patch
drowley@amd3990x:~$ pgbench -S -T 120 postgres
tps = 28351.480631 (excluding connections establishing)
tps = 27763.656557 (excluding connections establishing)
tps = 28896.427929 (excluding connections establishing)

It would be useful if someone with some server-grade Intel hardware
could run a few tests on this.  The above results are all from AMD
hardware.

I've attached the patch for this. I'll add it to the July 'fest.

David

[1] https://www.postgresql.org/message-id/CAKJS1f8yqRW3qx2CO9r4bqqvA2Vx68=3awbh8CJWTP9zXeoHMw@...

elog_ereport_attribute_cold.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

Robert Haas
On Wed, Jun 24, 2020 at 9:51 PM David Rowley <[hidden email]> wrote:

> $ pgbench -n -f pg-tpch/queries/q01.sql -T 120 tpch
>
> Which is about a 1.42% increase in performance. That's not exactly
> groundbreaking, but pretty useful to have if that happens to apply
> across the board for execution performance.
>
> For pgbench -S:
>
> My results were a bit noisier than the TPCH test, but the results I
> obtained did show about a 10% increase in performance:

This is pretty cool, particularly because it affects single-client
performance. It seems like a lot of ideas people have had about
speeding up pgbench performance - including me - have improved
performance under concurrency at the cost of very slightly degrading
single-client performance. It would be nice to claw some of that back.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

Andres Freund
In reply to this post by David Rowley
Hi,

Thanks for picking this up again!

On 2020-06-25 13:50:37 +1200, David Rowley wrote:
> In the attached, I've come up with a way that works. Basically I've
> just added a function named errstart_cold() that is attributed with
> __attribute__((cold)), which will hint to the compiler to keep
> branches which call this function in a cold path.

I recall you trying this before? Has that gotten easier because we
evolved ereport()/elog(), or has gcc become smarter, or ...?


> To make the compiler properly mark just >= ERROR calls as cold, and
> only when the elevel is a constant, I modified the ereport_domain
> macro to do:
>
> if (__builtin_constant_p(elevel) && (elevel) >= ERROR ? \
> errstart_cold(elevel, domain) : \
> errstart(elevel, domain)) \

I think it'd be good to not just do this for ERROR, but also for <=
DEBUG1. I recall seing quite a few debug elogs that made the code worse
just by "being there".

I suspect that doing this for DEBUG* could also improve the code for
clang, because we obviously don't have __builtin_unreachable after those.


> I see no reason why the compiler shouldn't always fold that constant
> expression at compile-time and correctly select the correct version of
> the function for the job.  (I also tried using __builtin_choose_expr()
> but GCC complained when the elevel was not constant, even with the
> __builtin_constant_p() test in the condition)

I think it has to be constant in all paths for that...


> Master:
> drowley@amd3990x:~$ pgbench -S -T 120 postgres
> tps = 25245.903255 (excluding connections establishing)
> tps = 26144.454208 (excluding connections establishing)
> tps = 25931.850518 (excluding connections establishing)
>
> Master + elog_ereport_attribute_cold.patch
> drowley@amd3990x:~$ pgbench -S -T 120 postgres
> tps = 28351.480631 (excluding connections establishing)
> tps = 27763.656557 (excluding connections establishing)
> tps = 28896.427929 (excluding connections establishing)

That is pretty damn cool.


> diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
> index e976201030..8076e8af24 100644
> --- a/src/backend/utils/error/elog.c
> +++ b/src/backend/utils/error/elog.c
> @@ -219,6 +219,19 @@ err_gettext(const char *str)
>  #endif
>  }
>  
> +#if defined(HAVE_PG_ATTRIBUTE_HOT_AND_COLD) && defined(HAVE__BUILTIN_CONSTANT_P)
> +/*
> + * errstart_cold
> + * A simple wrapper around errstart, but hinted to be cold so that the
> + * compiler is more likely to put error code in a cold area away from the
> + * main function body.
> + */
> +bool
> +pg_attribute_cold errstart_cold(int elevel, const char *domain)
> +{
> + return errstart(elevel, domain);
> +}
> +#endif

Hm. Would it make sense to have this be a static inline?


>  /*
>   * errstart --- begin an error-reporting cycle
> diff --git a/src/include/c.h b/src/include/c.h
> index d72b23afe4..087b8af6cb 100644
> --- a/src/include/c.h
> +++ b/src/include/c.h
> @@ -178,6 +178,21 @@
>  #define pg_noinline
>  #endif
>  
> +/*
> + * Marking certain functions as "hot" or "cold" can be useful to assist the
> + * compiler in arranging the assembly code in a more efficient way.
> + * These are supported from GCC >= 4.3 and clang >= 3.2
> + */
> +#if (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3))) || \
> + (defined(__clang__) && (__clang_major__ > 3 || (__clang_major__ == 3 && __clang_minor__ >= 2)))
> +#define HAVE_PG_ATTRIBUTE_HOT_AND_COLD 1
> +#define pg_attribute_hot __attribute__((hot))
> +#define pg_attribute_cold __attribute__((cold))
> +#else
> +#define pg_attribute_hot
> +#define pg_attribute_cold
> +#endif

Wonder if we should start using __has_attribute() for things like this.

https://gcc.gnu.org/onlinedocs/cpp/_005f_005fhas_005fattribute.html

I.e. we could do something like
#ifndef __has_attribute
#define __has_attribute(attribute) 0
#endif

#if __has_attribute(hot)
#define pg_attribute_hot __attribute__((hot))
#else
#define pg_attribute_hot
#endif

clang added __has_attribute in 2.9 (2010), gcc added it in 5 (2014), so
I don't think we'd loose too much.




>  #ifdef HAVE__BUILTIN_CONSTANT_P
> +#ifdef HAVE_PG_ATTRIBUTE_HOT_AND_COLD
> +#define ereport_domain(elevel, domain, ...) \
> + do { \
> + pg_prevent_errno_in_scope(); \
> + if (__builtin_constant_p(elevel) && (elevel) >= ERROR ? \
> + errstart_cold(elevel, domain) : \
> + errstart(elevel, domain)) \
> + __VA_ARGS__, errfinish(__FILE__, __LINE__, PG_FUNCNAME_MACRO); \
> + if (__builtin_constant_p(elevel) && (elevel) >= ERROR) \
> + pg_unreachable(); \
> + } while(0)
> +#else /* !HAVE_PG_ATTRIBUTE_HOT_AND_COLD */
>  #define ereport_domain(elevel, domain, ...) \
>   do { \
>   pg_prevent_errno_in_scope(); \
> @@ -129,6 +141,7 @@
>   if (__builtin_constant_p(elevel) && (elevel) >= ERROR) \
>   pg_unreachable(); \
>   } while(0)
> +#endif /* HAVE_PG_ATTRIBUTE_HOT_AND_COLD */
>  #else /* !HAVE__BUILTIN_CONSTANT_P */
>  #define ereport_domain(elevel, domain, ...) \
>   do { \
> @@ -146,6 +159,9 @@

Could we do this without another copy? Feels like we should be able to
just do that in the existing #ifdef HAVE__BUILTIN_CONSTANT_P if we just
add errstart_cold() independent HAVE_PG_ATTRIBUTE_HOT_AND_COLD.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

David Rowley
On Fri, 26 Jun 2020 at 04:35, Andres Freund <[hidden email]> wrote:
> On 2020-06-25 13:50:37 +1200, David Rowley wrote:
> > In the attached, I've come up with a way that works. Basically I've
> > just added a function named errstart_cold() that is attributed with
> > __attribute__((cold)), which will hint to the compiler to keep
> > branches which call this function in a cold path.
>
> I recall you trying this before? Has that gotten easier because we
> evolved ereport()/elog(), or has gcc become smarter, or ...?

Yeah, I appear to have tried it and found it not to work in [1]. I can
only assume GCC got smarter in regards to how it marks a path as cold.
Previously it seemed not do due to the do/while(0). I'm pretty sure
back when I tested last that ditching the do while made it work, just
we can't really get rid of it.

> > To make the compiler properly mark just >= ERROR calls as cold, and
> > only when the elevel is a constant, I modified the ereport_domain
> > macro to do:
> >
> > if (__builtin_constant_p(elevel) && (elevel) >= ERROR ? \
> > errstart_cold(elevel, domain) : \
> > errstart(elevel, domain)) \
>
> I think it'd be good to not just do this for ERROR, but also for <=
> DEBUG1. I recall seing quite a few debug elogs that made the code worse
> just by "being there".

I think that case is different. We don't want to move the entire elog
path into the cold path for that. We'd only want to hint that errstart
is unlikely to return true if elevel <= DEBUG1

> > diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
> > index e976201030..8076e8af24 100644
> > --- a/src/backend/utils/error/elog.c
> > +++ b/src/backend/utils/error/elog.c
> > @@ -219,6 +219,19 @@ err_gettext(const char *str)
> >  #endif
> >  }
> >
> > +#if defined(HAVE_PG_ATTRIBUTE_HOT_AND_COLD) && defined(HAVE__BUILTIN_CONSTANT_P)
> > +/*
> > + * errstart_cold
> > + *           A simple wrapper around errstart, but hinted to be cold so that the
> > + *           compiler is more likely to put error code in a cold area away from the
> > + *           main function body.
> > + */
> > +bool
> > +pg_attribute_cold errstart_cold(int elevel, const char *domain)
> > +{
> > +     return errstart(elevel, domain);
> > +}
> > +#endif
>
> Hm. Would it make sense to have this be a static inline?

I thought about that but didn't try it to ensure it still worked ok. I
didn't think it was that important to make sure we don't get the extra
function hop for ERRORs. It seemed like a case we'd not want to really
optimise for.

> >  /*
> >   * errstart --- begin an error-reporting cycle
> > diff --git a/src/include/c.h b/src/include/c.h
> > index d72b23afe4..087b8af6cb 100644
> > --- a/src/include/c.h
> > +++ b/src/include/c.h
> > @@ -178,6 +178,21 @@
> >  #define pg_noinline
> >  #endif
> >
> > +/*
> > + * Marking certain functions as "hot" or "cold" can be useful to assist the
> > + * compiler in arranging the assembly code in a more efficient way.
> > + * These are supported from GCC >= 4.3 and clang >= 3.2
> > + */
> > +#if (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3))) || \
> > +     (defined(__clang__) && (__clang_major__ > 3 || (__clang_major__ == 3 && __clang_minor__ >= 2)))
> > +#define HAVE_PG_ATTRIBUTE_HOT_AND_COLD 1
> > +#define pg_attribute_hot __attribute__((hot))
> > +#define pg_attribute_cold __attribute__((cold))
> > +#else
> > +#define pg_attribute_hot
> > +#define pg_attribute_cold
> > +#endif
>
> Wonder if we should start using __has_attribute() for things like this.
>
> https://gcc.gnu.org/onlinedocs/cpp/_005f_005fhas_005fattribute.html
>
> I.e. we could do something like
> #ifndef __has_attribute
> #define __has_attribute(attribute) 0
> #endif
>
> #if __has_attribute(hot)
> #define pg_attribute_hot __attribute__((hot))
> #else
> #define pg_attribute_hot
> #endif
>
> clang added __has_attribute in 2.9 (2010), gcc added it in 5 (2014), so
> I don't think we'd loose too much.

Thanks for pointing that out. Seems like a good idea to me. I don't
think we'll upset too many people running GCC 4.4 to 5.0. I can't
imagine many people serious about performance will be using a
PostgreSQL version that'll be released in 2021 with a pre 2014
compiler.

> >  #ifdef HAVE__BUILTIN_CONSTANT_P
> > +#ifdef HAVE_PG_ATTRIBUTE_HOT_AND_COLD
> > +#define ereport_domain(elevel, domain, ...)  \
> > +     do { \
> > +             pg_prevent_errno_in_scope(); \
> > +             if (__builtin_constant_p(elevel) && (elevel) >= ERROR ? \
> > +                      errstart_cold(elevel, domain) : \
> > +                      errstart(elevel, domain)) \
> > +                     __VA_ARGS__, errfinish(__FILE__, __LINE__, PG_FUNCNAME_MACRO); \
> > +             if (__builtin_constant_p(elevel) && (elevel) >= ERROR) \
> > +                     pg_unreachable(); \
> > +     } while(0)
> > +#else                                                        /* !HAVE_PG_ATTRIBUTE_HOT_AND_COLD */
> >  #define ereport_domain(elevel, domain, ...)  \
> >       do { \
> >               pg_prevent_errno_in_scope(); \
> > @@ -129,6 +141,7 @@
> >               if (__builtin_constant_p(elevel) && (elevel) >= ERROR) \
> >                       pg_unreachable(); \
> >       } while(0)
> > +#endif                                                       /* HAVE_PG_ATTRIBUTE_HOT_AND_COLD */
> >  #else                                                        /* !HAVE__BUILTIN_CONSTANT_P */
> >  #define ereport_domain(elevel, domain, ...)  \
> >       do { \
> > @@ -146,6 +159,9 @@
>
> Could we do this without another copy? Feels like we should be able to
> just do that in the existing #ifdef HAVE__BUILTIN_CONSTANT_P if we just
> add errstart_cold() independent HAVE_PG_ATTRIBUTE_HOT_AND_COLD.

Yeah. I just did it that way so we didn't get the extra function hop
in compilers that don't support __attribute((cold)). If I can inline
errstart_cold() and have the compiler still properly determine that
it's a cold function, then it seems wise to do it that way. If not,
then I'll need to keep a separate macro.

David

[1] https://www.postgresql.org/message-id/20171030094449.ffqhvt5n623zvyja%40alap3.anarazel.de


Reply | Threaded
Open this post in threaded view
|

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

David Rowley
On Fri, 26 Jun 2020 at 13:21, David Rowley <[hidden email]> wrote:

>
> On Fri, 26 Jun 2020 at 04:35, Andres Freund <[hidden email]> wrote:
> > On 2020-06-25 13:50:37 +1200, David Rowley wrote:
> > > In the attached, I've come up with a way that works. Basically I've
> > > just added a function named errstart_cold() that is attributed with
> > > __attribute__((cold)), which will hint to the compiler to keep
> > > branches which call this function in a cold path.
> >
> > I recall you trying this before? Has that gotten easier because we
> > evolved ereport()/elog(), or has gcc become smarter, or ...?
>
> Yeah, I appear to have tried it and found it not to work in [1]. I can
> only assume GCC got smarter in regards to how it marks a path as cold.
> Previously it seemed not do due to the do/while(0). I'm pretty sure
> back when I tested last that ditching the do while made it work, just
> we can't really get rid of it.
>
> > > To make the compiler properly mark just >= ERROR calls as cold, and
> > > only when the elevel is a constant, I modified the ereport_domain
> > > macro to do:
> > >
> > > if (__builtin_constant_p(elevel) && (elevel) >= ERROR ? \
> > > errstart_cold(elevel, domain) : \
> > > errstart(elevel, domain)) \
> >
> > I think it'd be good to not just do this for ERROR, but also for <=
> > DEBUG1. I recall seing quite a few debug elogs that made the code worse
> > just by "being there".
>
> I think that case is different. We don't want to move the entire elog
> path into the cold path for that. We'd only want to hint that errstart
> is unlikely to return true if elevel <= DEBUG1
I played around with this trying to find if there was a way to make this work.

v2 patch includes the change you mentioned about using __has_attribute
(cold) and removes the additional ereport_domain macro
v3 is v2 plus an additional change to mark the branch within
ereport_domain as unlikely when elevel <= DEBUG1
v4 is v2 plus it marks the errstart call as unlikely regardless of elevel.

I tried v4 as I was having trouble as v3 was showing worse performance
than v2.  v4 appears better on the AMD system, but that system is
producing noisy results (very obvious if looking at attached
amd3990x_elog_cold.png)

I ran pgbench -S T 600 -P 10 with each patch and for the AMD machine I got:

master = 27817.32167 tps
v2 =  28991.65667 tps (104.22% of master)
v3 =  28622.775 tps (102.90% of master)
v4 = 29648.91 tps (106.58% of master)

(I attribute the speedup here not being the same as my last report due
to noise. A recent bios update partially fixed the problem, but not
completely)

For the intel laptop I got:

master = 25452.38167 tps
v2 = 25473.695 tps (100.08% of master)
v3 = 25434.89333 tps (99.93% of master)
v4 = 25389.02833 tps (99.75% of master)

Looking at the assembly for the v3 patch, it does appear that the
elevel <= DEBUG1 calls were correctly moved to the cold area and that
the ones > DEBUG1 and < ERROR were left alone. However, I did only
look at xlog.s. The intel results don't look very promising, but
perhaps this is not the ideal test to show improvements with
instruction cache efficiency.

> > > +bool
> > > +pg_attribute_cold errstart_cold(int elevel, const char *domain)
> > > +{
> > > +     return errstart(elevel, domain);
> > > +}
> > > +#endif
> >
> > Hm. Would it make sense to have this be a static inline?

I didn't find a way to make this work (using gcc-10). Inlining, of
course, makes the assembly just call errstart().  errstart_cold() is
nowhere to be seen. The __attribute(cold) does not seem to be applied
to the errstart() call where the errstart_cold() call was inlined.

I've attached a graph showing the results for the AMD and Intel runs
and also csv files of the pgbench tps output.  I've also attached each
version of the patch I tested.

It would be good to see some testing done on other hardware.

David

amd3990x_elog_cold.png (278K) Download Attachment
intel-i78565u_elog_cold.png (107K) Download Attachment
amd3990x_elog_cold.csv (2K) Download Attachment
intel_i7-8565U_elog_cold.csv (2K) Download Attachment
elog_ereport_attribute_cold_v2.patch (3K) Download Attachment
elog_ereport_attribute_cold_v3.patch (3K) Download Attachment
elog_ereport_attribute_cold_v4.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

Michael Paquier-2
On Mon, Jun 29, 2020 at 09:36:56PM +1200, David Rowley wrote:
> I've attached a graph showing the results for the AMD and Intel runs
> and also csv files of the pgbench tps output.  I've also attached each
> version of the patch I tested.
>
> It would be good to see some testing done on other hardware.

Worth noting that the patch set fails to apply.  I have moved this
entry to the next CF, waiting on author.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

David Rowley
On Mon, 3 Aug 2020 at 19:54, Michael Paquier <[hidden email]> wrote:
> Worth noting that the patch set fails to apply.  I have moved this
> entry to the next CF, waiting on author.

Thanks.

NB: It's not a patch set. It's 3 different versions of the patch.
They're not all meant to apply at once. Probably the CF bot wasn't
aware of that though :-(

David


Reply | Threaded
Open this post in threaded view
|

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

David Rowley
In reply to this post by David Rowley
On Mon, 29 Jun 2020 at 21:36, David Rowley <[hidden email]> wrote:
> (I attribute the speedup here not being the same as my last report due
> to noise. A recent bios update partially fixed the problem, but not
> completely)

I managed to fix the unstable performance on this AMD machine by
tweaking some bios power management settings.

I did some further testing with the v4 patch using both each of:

1. pgbench -S
2. pgbench -S -M prepared
3. pgbench -S -M prepared -c 64 -j 64.
4. TPC-H @ 5GB scale (per recommendation from Andres offlist)

The results of 1-3 above don't really show much of a win, which really
does contradict what I saw about 5 years ago when testing unlikely()
around elog calls in [1]. The experiment I did back then did pre-date
the use of unlikely() in the source code, so I thought perhaps that
since we now have a sprinkling of unlikely() in various of the hottest
code paths that the use of those already gained most of what we were
going to gain from today's patch.  To see if this was the case, I
decided to hack up a test patch which removes all those unlikely()
calls that exist in an if test above an elog/ereport ERROR and I
confirm that I *do* see a small regression in performance from doing
that. This patch only serves to confirm if the existing unlikely()
macros are already giving us most of what we'd get from today's v4
patch, and the results do seem to confirm that.

The 5GB scaled TPC-H test does show some performance gains from the v4
patch and shows an obvious regression from removing the unlikely()
calls too.

Based, mostly on the TPC-H results where performance did improve close
to 2%, I'm starting to think it would be a good idea just to go for
the v4 patch.  It means that future hot elog/ereport calls should make
it into the cold path.

Currently, I'm just unsure how other CPUs will benefit from this.  The
3990x I've been testing with is pretty new and has some pretty large
caches. I suspect older CPUs may see larger gains.

David

[1] https://www.postgresql.org/message-id/CAKJS1f8yqRW3qx2CO9r4bqqvA2Vx68%3D3awbh8CJWTP9zXeoHMw%40mail.gmail.com

pgbench_c1j1_600s.png (109K) Download Attachment
pgbench_c1j1_prepared_600s.png (53K) Download Attachment
tpch_5gb.png (54K) Download Attachment
pgbench_c64j64_prepared_600s.png (80K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

Peter Eisentraut-6
On 2020-08-05 05:00, David Rowley wrote:
> The 5GB scaled TPC-H test does show some performance gains from the v4
> patch and shows an obvious regression from removing the unlikely()
> calls too.
>
> Based, mostly on the TPC-H results where performance did improve close
> to 2%, I'm starting to think it would be a good idea just to go for
> the v4 patch.  It means that future hot elog/ereport calls should make
> it into the cold path.

Something based on the v4 patch makes sense.

I would add DEBUG1 back into the conditional, like

if (__builtin_constant_p(elevel) && ((elevel) >= ERROR || (elevel) <=
DEBUG1) ? \

Also, for the __has_attribute handling, I'd prefer the style that Andres
illustrated earlier, using:

#ifndef __has_attribute
#define __has_attribute(attribute) 0
#endif

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

David Rowley
On Sat, 5 Sep 2020 at 08:36, Peter Eisentraut
<[hidden email]> wrote:
> Something based on the v4 patch makes sense.

Thanks for having a look at this.

> I would add DEBUG1 back into the conditional, like
>
> if (__builtin_constant_p(elevel) && ((elevel) >= ERROR || (elevel) <=
> DEBUG1) ? \

hmm, but surely we don't want to move all code that's in the same
branch as an elog(DEBUG1) call into a cold area.

With elog(ERROR) we generally have the form:

if (<some condition we hope never to see>)
   elog(ERROR, "something bad happened");

In this case, we'd quite like for the compiler to put code relating to
the elog in some cold area of the binary.

With DEBUG we often have the form:

<do normal stuff>

elog(DEBUG1, "some interesting information");

<do normal stuff>

I don't think we'd want to move the <do normal stuff> into a cold area.

The v3 patch just put an unlikely() around the errstart() call if the
level was <= DEBUG1.  That just to move the code that's inside the if
(errstart(...)) in the macro into a cold area.

> Also, for the __has_attribute handling, I'd prefer the style that Andres
> illustrated earlier, using:
>
> #ifndef __has_attribute
> #define __has_attribute(attribute) 0
> #endif

Yip, for sure. That way is much nicer.

David


Reply | Threaded
Open this post in threaded view
|

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

Peter Eisentraut-6
On 2020-09-06 02:24, David Rowley wrote:
>> I would add DEBUG1 back into the conditional, like
>>
>> if (__builtin_constant_p(elevel) && ((elevel) >= ERROR || (elevel) <=
>> DEBUG1) ? \
>
> hmm, but surely we don't want to move all code that's in the same
> branch as an elog(DEBUG1) call into a cold area.

Yeah, nevermind that.

> The v3 patch just put an unlikely() around the errstart() call if the
> level was <= DEBUG1.  That just to move the code that's inside the if
> (errstart(...)) in the macro into a cold area.

That could be useful.  Depends on how much effect it has.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

David Rowley
On Fri, 11 Sep 2020 at 02:01, Peter Eisentraut
<[hidden email]> wrote:

>
> On 2020-09-06 02:24, David Rowley wrote:
> >> I would add DEBUG1 back into the conditional, like
> >>
> >> if (__builtin_constant_p(elevel) && ((elevel) >= ERROR || (elevel) <=
> >> DEBUG1) ? \
> >
> > hmm, but surely we don't want to move all code that's in the same
> > branch as an elog(DEBUG1) call into a cold area.
>
> Yeah, nevermind that.
I've reattached the v4 patch since it just does the >= ERROR case.

> > The v3 patch just put an unlikely() around the errstart() call if the
> > level was <= DEBUG1.  That just to move the code that's inside the if
> > (errstart(...)) in the macro into a cold area.
>
> That could be useful.  Depends on how much effect it has.

I wonder if it is. I'm having trouble even seeing gains from the ERROR
case and I'm considering dropping this patch due to that.

I ran another scale=5 TPCH benchmark on v4 against f859c2ffa using gcc
9.3. I'm unable to see any gains with this, however, the results were
pretty noisy. I only ran pgbench for 60 seconds per query. I'll likely
need to run that a bit longer. I'll do that tonight.

It would be good if someone else could run some tests on their own
hardware to see if they can see any gains.

David

elog_ereport_attribute_cold_v4.patch (3K) Download Attachment
tpch_scale5_elog_ereport_cold_v4_vs_master.ods (30K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

David Rowley
On Tue, 22 Sep 2020 at 19:08, David Rowley <[hidden email]> wrote:
> I ran another scale=5 TPCH benchmark on v4 against f859c2ffa using gcc
> 9.3. I'm unable to see any gains with this, however, the results were
> pretty noisy. I only ran pgbench for 60 seconds per query. I'll likely
> need to run that a bit longer. I'll do that tonight.

I've attached the results of a TPCH scale=5 run master (f859c2ffa) vs
master + elog_ereport_attribute_cold_v4.patch

It does not look great. The patched version seems to have done about
1.17% less work than master did.

David

tpch_scale5_elog_ereport_cold_v4_vs_master_10min.ods (26K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

Peter Eisentraut-6
On 2020-09-22 22:42, David Rowley wrote:

> On Tue, 22 Sep 2020 at 19:08, David Rowley <[hidden email]> wrote:
>> I ran another scale=5 TPCH benchmark on v4 against f859c2ffa using gcc
>> 9.3. I'm unable to see any gains with this, however, the results were
>> pretty noisy. I only ran pgbench for 60 seconds per query. I'll likely
>> need to run that a bit longer. I'll do that tonight.
>
> I've attached the results of a TPCH scale=5 run master (f859c2ffa) vs
> master + elog_ereport_attribute_cold_v4.patch
>
> It does not look great. The patched version seems to have done about
> 1.17% less work than master did.

I wonder how much benefit you'd get from

a) compiling with -O3 instead of -O2, or
b) compiling with profile-driven optimization

I think that would indicate a target and/or a ceiling of what we should
be expecting from hot/cold/likely/unlikely optimization techniques like
this.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

David Rowley
In reply to this post by David Rowley
On Wed, 23 Sep 2020 at 08:42, David Rowley <[hidden email]> wrote:

>
> On Tue, 22 Sep 2020 at 19:08, David Rowley <[hidden email]> wrote:
> > I ran another scale=5 TPCH benchmark on v4 against f859c2ffa using gcc
> > 9.3. I'm unable to see any gains with this, however, the results were
> > pretty noisy. I only ran pgbench for 60 seconds per query. I'll likely
> > need to run that a bit longer. I'll do that tonight.
>
> I've attached the results of a TPCH scale=5 run master (f859c2ffa) vs
> master + elog_ereport_attribute_cold_v4.patch
>
> It does not look great. The patched version seems to have done about
> 1.17% less work than master did.

I've marked this patch back as waiting for review. It would be good if
someone could run some tests on some intel hardware and see if they
can see any speedup.

David


Reply | Threaded
Open this post in threaded view
|

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

Peter Eisentraut-6
On 2020-09-29 11:26, David Rowley wrote:

> On Wed, 23 Sep 2020 at 08:42, David Rowley <[hidden email]> wrote:
>>
>> On Tue, 22 Sep 2020 at 19:08, David Rowley <[hidden email]> wrote:
>>> I ran another scale=5 TPCH benchmark on v4 against f859c2ffa using gcc
>>> 9.3. I'm unable to see any gains with this, however, the results were
>>> pretty noisy. I only ran pgbench for 60 seconds per query. I'll likely
>>> need to run that a bit longer. I'll do that tonight.
>>
>> I've attached the results of a TPCH scale=5 run master (f859c2ffa) vs
>> master + elog_ereport_attribute_cold_v4.patch
>>
>> It does not look great. The patched version seems to have done about
>> 1.17% less work than master did.
>
> I've marked this patch back as waiting for review. It would be good if
> someone could run some tests on some intel hardware and see if they
> can see any speedup.

What is the way forward here?  What exactly would you like to have tested?

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

David Rowley
On Tue, 3 Nov 2020 at 20:08, Peter Eisentraut
<[hidden email]> wrote:
>
> On 2020-09-29 11:26, David Rowley wrote:
> > I've marked this patch back as waiting for review. It would be good if
> > someone could run some tests on some intel hardware and see if they
> > can see any speedup.
>
> What is the way forward here?  What exactly would you like to have tested?

It would be good to see some small scale bench -S tests with and
without -M prepared.

Also, small scale TPC-H tests would be good.    I really only did
testing on new AMD hardware, so some testing on intel hardware would
be good.

David


Reply | Threaded
Open this post in threaded view
|

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

Peter Eisentraut-6
On 2020-11-03 21:53, David Rowley wrote:

> On Tue, 3 Nov 2020 at 20:08, Peter Eisentraut
> <[hidden email]> wrote:
>>
>> On 2020-09-29 11:26, David Rowley wrote:
>>> I've marked this patch back as waiting for review. It would be good if
>>> someone could run some tests on some intel hardware and see if they
>>> can see any speedup.
>>
>> What is the way forward here?  What exactly would you like to have tested?
>
> It would be good to see some small scale bench -S tests with and
> without -M prepared.
>
> Also, small scale TPC-H tests would be good.    I really only did
> testing on new AMD hardware, so some testing on intel hardware would
> be good.

I did tests of elog_ereport_attribute_cold_v4.patch on an oldish Mac
Intel laptop with pgbench scale 1 (default), and then:

pgbench -S -T 60

master:  tps = 8251.883229 (excluding connections establishing)
patched: tps = 9556.836232 (excluding connections establishing)

pgbench -S -T 60 -M prepared

master:  tps = 14713.821837 (excluding connections establishing)
patched: tps = 16200.066185 (excluding connections establishing)

So from that this seems like an easy win.

I also tested on a newish Mac ARM laptop, and there the patch did not do
anything, but that was because clang does not support the cold
attribute, so that part works as well. ;-)

--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/


Reply | Threaded
Open this post in threaded view
|

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

David Rowley
On Sat, 21 Nov 2020 at 03:26, Peter Eisentraut
<[hidden email]> wrote:

>
> I did tests of elog_ereport_attribute_cold_v4.patch on an oldish Mac
> Intel laptop with pgbench scale 1 (default), and then:
>
> pgbench -S -T 60
>
> master:  tps = 8251.883229 (excluding connections establishing)
> patched: tps = 9556.836232 (excluding connections establishing)
>
> pgbench -S -T 60 -M prepared
>
> master:  tps = 14713.821837 (excluding connections establishing)
> patched: tps = 16200.066185 (excluding connections establishing)
>
> So from that this seems like an easy win.

Well, that makes it look pretty good.  If we can get 10-15% on some
machines without making things slower on any other machines, then that
seems like a good win to me.

Thanks for testing that.

David


Reply | Threaded
Open this post in threaded view
|

Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path

David Rowley
On Tue, 24 Nov 2020 at 09:36, David Rowley <[hidden email]> wrote:
> Well, that makes it look pretty good.  If we can get 10-15% on some
> machines without making things slower on any other machines, then that
> seems like a good win to me.

Pushed.

Thank you both for reviewing this.

David


12