log_destination=file

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

log_destination=file

Magnus Hagander-2
Attached is a very much VIP (AKA rough draft) for $subject.

Right now we have two parameters controlling logging destination, and they work in interesting combinations:

log_destination=stderr, logging_collector=off -> log to stderr (makes sense)
log_destination=stderr, logging_collector=on  -> log to file (*highly* illogical for most users, to set stderr when they want file)
log_destination=csvlog, logging_collector=on -> log to csvfile (makes sense)
log_destination=csvlog, logging_collector=off -> fail (ugh)

(beyond that we also have syslog and eventlog, but those are different and not touched by this patch)

My understanding is that the main reason for this is that we cannot change logging_collector without restarting postmaster, whereas we can change log_destination.

My suggestion is we work around this by just always starting the logging collector, even if we're not planning to use it. Then we'd just have:

log_destination = stderr -> log to stderr
log_destination = file -> log to file
log_destination = csvlog -> log to csv file

The main difference here is that log_destination=stderr would also go through the logging collector which would then assemble it and write it out to it's own stderr.

Do people see an actual problem with that? I agree it's an extra round of indirection there, but is that a problem? It would also cause one more backgorund process to always be running, but again, is that really a problem? The overhead is not exactly large.

It would make configuration a lot more logical for logging. It would also make it possible to switch between all logging configurations without restarting.

The attached WIP is mostly working for the simple cases. It fails completely if the syslogger is restarted at this point, simply because I haven't figured out how to pass the original stderr down to the syslogger. I'm sure there are also many other smaller issues around it, but I wanted to get the discussion done before I spend the time to go through those.

Thoughts?

--


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

log_destination_file.patch (20K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: log_destination=file

Tom Lane-2
Magnus Hagander <[hidden email]> writes:
> My understanding is that the main reason for this is that we cannot change
> logging_collector without restarting postmaster, whereas we can change
> log_destination.

Right, because the decision whether to redirect stdout/stderr can't
be changed on the fly.

> My suggestion is we work around this by just always starting the logging
> collector, even if we're not planning to use it.

Umm....

> Do people see an actual problem with that? I agree it's an extra round of
> indirection there, but is that a problem? It would also cause one more
> backgorund process to always be running, but again, is that really a
> problem? The overhead is not exactly large.

You just made three assertions about "this isn't a problem" without
providing any evidence in support of any of them.  Maybe with some
measurements we could have a real discussion.

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: log_destination=file

Magnus Hagander-2
On Thu, Aug 31, 2017 at 2:34 PM, Tom Lane <[hidden email]> wrote:
Magnus Hagander <[hidden email]> writes:
> My understanding is that the main reason for this is that we cannot change
> logging_collector without restarting postmaster, whereas we can change
> log_destination.

Right, because the decision whether to redirect stdout/stderr can't
be changed on the fly.

Right.

We could of course also say we only care about things generated by our ereport framework, in which case we don't need to redirect stderr and can just use a "regular pipe". But IIRC that's functionality we also specifically wanted (though I don't think I've ever needed it myself, presumably others have).


 
> My suggestion is we work around this by just always starting the logging
> collector, even if we're not planning to use it.

Umm....

> Do people see an actual problem with that? I agree it's an extra round of
> indirection there, but is that a problem? It would also cause one more
> backgorund process to always be running, but again, is that really a
> problem? The overhead is not exactly large.

You just made three assertions about "this isn't a problem" without
providing any evidence in support of any of them.  Maybe with some
measurements we could have a real discussion.

Well, not entirely. The first two are questions "is this really a problem".

The overhead of an extra process certainly isn't much, I'm wiling to say that as an assertion.

The other two, less so, that's more question.

Are you actually asking for a benchmark of if logging gets slower? If so, could you suggest a workload to make an actual benchmark of it (where logging would be high enough that it could be come a bottleneck -- and not writing the log data to disk, but the actual logging). I'm not sure what a good one would be.

--
Reply | Threaded
Open this post in threaded view
|

Re: log_destination=file

Tom Lane-2
Magnus Hagander <[hidden email]> writes:
> On Thu, Aug 31, 2017 at 2:34 PM, Tom Lane <[hidden email]> wrote:
>> Right, because the decision whether to redirect stdout/stderr can't
>> be changed on the fly.

> Right.

> We could of course also say we only care about things generated by our
> ereport framework, in which case we don't need to redirect stderr and can
> just use a "regular pipe". But IIRC that's functionality we also
> specifically wanted (though I don't think I've ever needed it myself,
> presumably others have).

Yes, it's pretty important, because of assorted stuff not-under-our-
control that doesn't know about ereport and will just print to stderr
anyway.  Some examples: dynamic linker can't-resolve-symbol failure
messages, glibc malloc corruption error messages, just about any external
module in plperl or plpython.  I don't find this to be negotiable.

> Are you actually asking for a benchmark of if logging gets slower?

Yes.

> If so,
> could you suggest a workload to make an actual benchmark of it (where
> logging would be high enough that it could be come a bottleneck -- and not
> writing the log data to disk, but the actual logging). I'm not sure what a
> good one would be.

pgbench with log_statement = all would be a pretty easy test case.

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: log_destination=file

Greg Stark
On 31 August 2017 at 13:49, Tom Lane <[hidden email]> wrote:
> Magnus Hagander <[hidden email]> writes:
>> On Thu, Aug 31, 2017 at 2:34 PM, Tom Lane <[hidden email]> wrote:
> Yes, it's pretty important, because of assorted stuff not-under-our-
> control that doesn't know about ereport and will just print to stderr
> anyway.  Some examples: dynamic linker can't-resolve-symbol failure
> messages, glibc malloc corruption error messages, just about any external
> module in plperl or plpython.  I don't find this to be negotiable.

So what happens now with these messages? My understanding is that
they're missing from the CSV logs and are simply inserted into the
text logs without any log_line_prefix? The logging collector doesn't
recognize these messages and reformat them for the CSV logs does it?

I'm actually asking because I'm more concerned with JSON logs or
msgpack logs. Currently these are supported with an emit_log_hook but
they can't capture these non-ereport logs either.

Also the CSV and emit_log_hook based logs don't have any convenient
way to turn them on and off and control the location and filename of
the logs. It would be nice if we could have something like

log_destinations='stderr=text,syslog=json,postgresql-%Y-%m-%d_%H%M%S.csv=csv'

>> Are you actually asking for a benchmark of if logging gets slower?
>
> Yes.

Personally I don't think it's "performance" so much as operational
issues that are more concerning. For all we know there are people out
there who tried to use the logging collector and found it didn't work
well on some system -- perhaps it interacted with systemd or something
else on the system -- and they switched back to just using stderr. I
don't know how to flush these users out though if there are any. Just
making this change early in a release cycle is the best we can do.


--
greg


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: log_destination=file

Magnus Hagander-2


On Fri, Sep 1, 2017 at 6:00 PM, Greg Stark <[hidden email]> wrote:
On 31 August 2017 at 13:49, Tom Lane <[hidden email]> wrote:
> Magnus Hagander <[hidden email]> writes:
>> On Thu, Aug 31, 2017 at 2:34 PM, Tom Lane <[hidden email]> wrote:
> Yes, it's pretty important, because of assorted stuff not-under-our-
> control that doesn't know about ereport and will just print to stderr
> anyway.  Some examples: dynamic linker can't-resolve-symbol failure
> messages, glibc malloc corruption error messages, just about any external
> module in plperl or plpython.  I don't find this to be negotiable.

So what happens now with these messages? My understanding is that
they're missing from the CSV logs and are simply inserted into the
text logs without any log_line_prefix? The logging collector doesn't
recognize these messages and reformat them for the CSV logs does it?

Yeah, that's pretty much it.

Fixing that is also on my plan, but for later :)

 
I'm actually asking because I'm more concerned with JSON logs or
msgpack logs. Currently these are supported with an emit_log_hook but
they can't capture these non-ereport logs either.

Also the CSV and emit_log_hook based logs don't have any convenient
way to turn them on and off and control the location and filename of
the logs. It would be nice if we could have something like

log_destinations='stderr=text,syslog=json,postgresql-%Y-%m-%d_%H%M%S.csv=csv'

That could definitely be an interesting improvement as well. Decoupling the format from the destination would make a lot of sense. (Arguments can certainly be made for example for csv-to-syslog, when you have a custom syslog server)

 
>> Are you actually asking for a benchmark of if logging gets slower?
>
> Yes.

Personally I don't think it's "performance" so much as operational
issues that are more concerning. For all we know there are people out
there who tried to use the logging collector and found it didn't work
well on some system -- perhaps it interacted with systemd or something
else on the system -- and they switched back to just using stderr. I
don't know how to flush these users out though if there are any. Just
making this change early in a release cycle is the best we can do.

The most common use of stderr I think is debian/ubuntu which uses that and then pg_ctl based redirect. Or at least used to (I remember being annoyed by that at least once..)

AFAIK all other major platforms already use the logging collector in the standard packages.
 
--
Reply | Threaded
Open this post in threaded view
|

Re: log_destination=file

Tom Lane-2
Magnus Hagander <[hidden email]> writes:
> On Fri, Sep 1, 2017 at 6:00 PM, Greg Stark <[hidden email]> wrote:
>> So what happens now with these messages? My understanding is that
>> they're missing from the CSV logs and are simply inserted into the
>> text logs without any log_line_prefix? The logging collector doesn't
>> recognize these messages and reformat them for the CSV logs does it?

> Yeah, that's pretty much it.

> Fixing that is also on my plan, but for later :)

Keep in mind that you've got to be really, really conservative about
adding functionality in the logging collector process.  If it ever
crashes, we have problems.  If memory serves, the postmaster is able
to restart it, but we cannot restore the original stdout/stderr
destination, which is problematic if that's where its output had
been going.  So it's pretty close to being as mission-critical as
the postmaster itself.

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: log_destination=file

Magnus Hagander-2


On Fri, Sep 1, 2017 at 6:44 PM, Tom Lane <[hidden email]> wrote:
Magnus Hagander <[hidden email]> writes:
> On Fri, Sep 1, 2017 at 6:00 PM, Greg Stark <[hidden email]> wrote:
>> So what happens now with these messages? My understanding is that
>> they're missing from the CSV logs and are simply inserted into the
>> text logs without any log_line_prefix? The logging collector doesn't
>> recognize these messages and reformat them for the CSV logs does it?

> Yeah, that's pretty much it.

> Fixing that is also on my plan, but for later :)

Keep in mind that you've got to be really, really conservative about
adding functionality in the logging collector process.  If it ever
crashes, we have problems.  If memory serves, the postmaster is able
to restart it, but we cannot restore the original stdout/stderr
destination, which is problematic if that's where its output had
been going.  So it's pretty close to being as mission-critical as
the postmaster itself.

Yeah. That's one of the reasons I'm not trying to do it all in one batch.

But yeah, the postmaster restarts it just fine. And with the WIP patch I posted earlier, the message from the postmaster that it did gets lost if you are logging to stderr. It does end up in the CSV file though. And I'm sure there are plenty of other corner cases around that one to consider. 

--
Reply | Threaded
Open this post in threaded view
|

Re: log_destination=file

Dmitry Dolgov
Hi

> On 31 August 2017 at 14:49, Tom Lane <[hidden email]> wrote:
>> Are you actually asking for a benchmark of if logging gets slower?
>
> Yes.
>
>> If so,
>> could you suggest a workload to make an actual benchmark of it (where
>> logging would be high enough that it could be come a bottleneck -- and not
>> writing the log data to disk, but the actual logging). I'm not sure what a
>> good one would be.
>
> pgbench with log_statement = all would be a pretty easy test case.

This part of the discussion caught my attention, and I tried to perform such
easy test. I was doing:

pgbench -S -j2 -c${clients} -T500 test

with `log_statement=all` and `log_destination=stderr`, what I assume should be
enough to get some approximation for numbers.

scaling factor: 100
average latency:

clients     patch       master

10          1.827       1.456

20          4.027       3.300

30          6.284       4.921

40          8.409       6.767

50          10.985     8.646

It seems that for this particular workload it was about 20-25% slower.
Reply | Threaded
Open this post in threaded view
|

Re: log_destination=file

Tom Lane-2
Dmitry Dolgov <[hidden email]> writes:
>> On 31 August 2017 at 14:49, Tom Lane <[hidden email]> wrote:
>> pgbench with log_statement = all would be a pretty easy test case.

> It seems that for this particular workload it was about 20-25% slower.

Ouch.  That seems like rather a large hit :-(.  I actually expected
it to be close to negligible, but I don't think 20% qualifies.

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: log_destination=file

Magnus Hagander-2
On Thu, Sep 7, 2017 at 7:02 AM, Tom Lane <[hidden email]> wrote:
Dmitry Dolgov <[hidden email]> writes:
>> On 31 August 2017 at 14:49, Tom Lane <[hidden email]> wrote:
>> pgbench with log_statement = all would be a pretty easy test case.

> It seems that for this particular workload it was about 20-25% slower.

Ouch.  That seems like rather a large hit :-(.  I actually expected
it to be close to negligible, but I don't think 20% qualifies.

Agreed, that's a lot more than I expected. A few questions though:

1. where was stderr actually sent? To the console, to /dev/null or to a file?

2. Could you run the same thing (on the same machine) with the logging collector on and logging to file, without the patch? I'd assume that gives performance similar to running with the patch, but it would be good to confirm that.

And thanks for running the benchmark, saved me some time!


--
Reply | Threaded
Open this post in threaded view
|

Re: log_destination=file

Dmitry Dolgov
> On 8 September 2017 at 01:32, Magnus Hagander <[hidden email]> wrote:
>
> 1. where was stderr actually sent? To the console, to /dev/null or to a file?

To the console (but I can also try other options, although I'm not sure if it would have any impact). 

> 2. Could you run the same thing (on the same machine) with the logging collector on and logging to file, without the patch? I'd assume that gives performance similar to running with the patch, but it would be good to confirm that.

Sure, I'll do it this weekend.
Reply | Threaded
Open this post in threaded view
|

Re: log_destination=file

Dmitry Dolgov
> On 7 September 2017 at 15:46, Dmitry Dolgov <[hidden email]> wrote:
>
> It seems that for this particular workload it was about 20-25% slower.​

Hmm...looks like I provided misleading data, sorry. The numbers from previous
email are correct and I'm able to reproduce them, but surprisingly for me only
for one particular situation when `log_statement=all; log_destination=stderr`
and stderr is sent to a console with a tmux session running in it (and apparently
it has some impact on the performance, but I'm not sure how exactly). In all
other cases (when stderr is sent to a plain console, /dev/null, or we send logs
to a file) numbers are different, and the difference between patch and master
is quite less significant.

average latency:

clients     patch       master

10          0.321       0.286

20          0.669       0.602

30          1.016       0.942

40          1.358       1.280

50          1.727       1.637
Reply | Threaded
Open this post in threaded view
|

Re: log_destination=file

Robert Haas
On Sun, Sep 10, 2017 at 5:29 AM, Dmitry Dolgov <[hidden email]> wrote:
> average latency:
>
> clients     patch       master
> 10          0.321       0.286
> 20          0.669       0.602
> 30          1.016       0.942
> 40          1.358       1.280
> 50          1.727       1.637

That's still a noticeable slowdown, though.  And we've had previous
reports of the overhead of logging being significant as well:

http://postgr.es/m/CACLsApsA7U0GCFpojVQem6SGTEkv8vnwdBfhVi+dqO+gu5gdCA@...

I seem to recall a discussion, perhaps in-person, around the time Theo
submitted that patch where it was reported that the logging collector
could not be used on some systems he was working with because it
became a major performance bottleneck.  With each backend writing its
own messages to a file, it was tolerable, but when you tried to funnel
everything through a single process, the back-pressure slowed down the
entire system unacceptably.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Previous Thread Next Thread