Segmentation fault with PG-12

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Segmentation fault with PG-12

Andreas Joseph Krogh-2
In our production-environment we get sig11 every now and then after upgrading to PG-12:
 
2019-10-08 15:45:29.654 CEST [8829-76] LOG:  server process (PID 20631) was terminated by signal 11: Segmentation fault
2019-10-08 15:45:29.654 CEST [8829-77] DETAIL:  Failed process was running: COMMIT
2019-10-08 15:45:29.654 CEST [8829-78] LOG:  terminating any other active server processes

 
Will running a debug-enabled build slow things noticably down?
Is there a way to make it dump a stack-trace (or back-trace in C-land?) on sig11?
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Reply | Threaded
Open this post in threaded view
|

Re: Segmentation fault with PG-12

Tom Lane-2
Andreas Joseph Krogh <[hidden email]> writes:
> Will running a debug-enabled build slow things noticably down?

gcc promises that the generated code is the same with or without debug.
I think clang does too.  With other compilers you may pay some penalty.

> Is there a way
> to make it dump a stack-trace (or back-trace in C-land?) on sig11?

You should be able to get a core file from which you can extract a
stack trace (and other info) after the fact.

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Segmentation fault with PG-12

Andreas Joseph Krogh-2
På tirsdag 08. oktober 2019 kl. 17:24:21, skrev Tom Lane <[hidden email]>:
Andreas Joseph Krogh <[hidden email]> writes:
> Will running a debug-enabled build slow things noticably down?

gcc promises that the generated code is the same with or without debug.
I think clang does too.  With other compilers you may pay some penalty.
 
 
Nice, I'm using the ubuntu-packages, so I'll go ahead and install postgresql-12-dbgsym
 
> Is there a way
> to make it dump a stack-trace (or back-trace in C-land?) on sig11?

You should be able to get a core file from which you can extract a
stack trace (and other info) after the fact.

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend
 
I'll look into that, thanks.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Reply | Threaded
Open this post in threaded view
|

Re: Segmentation fault with PG-12

Andreas Joseph Krogh-2
In reply to this post by Tom Lane-2
På tirsdag 08. oktober 2019 kl. 17:24:21, skrev Tom Lane <[hidden email]>:
Andreas Joseph Krogh <[hidden email]> writes:
> Will running a debug-enabled build slow things noticably down?

gcc promises that the generated code is the same with or without debug.
I think clang does too.  With other compilers you may pay some penalty.

> Is there a way
> to make it dump a stack-trace (or back-trace in C-land?) on sig11?

You should be able to get a core file from which you can extract a
stack trace (and other info) after the fact.

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

regards, tom lane
 
Attached is output from "bt full".
 
Is this helpful?
Anything else I can do to help narrowing down the problem?
 
Thanks.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 

gdb.txt (17K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Segmentation fault with PG-12

Tom Lane-2
Andreas Joseph Krogh <[hidden email]> writes:
> Attached is output from "bt full". Is this helpful?

Well, it shows that the failure is occurring while trying to evaluate
a variable in a trigger's WHEN clause during
"UPDATE origo_email_delivery SET folder_id=$1, created=$2\nWHERE entity_id IN ($3)\nRETURNING entity_id"
And I'd bet that the root cause is something to do with Andres' tuple slot
work.  But (at least to my eye) it's not apparent exactly what's wrong.

Can you show us the table definition and associated trigger definitions
for origo_email_delivery?

This doesn't seem to correlate with your original report, btw,
as that claimed the crash was during COMMIT.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Segmentation fault with PG-12

Andreas Joseph Krogh-2
På onsdag 09. oktober 2019 kl. 16:16:37, skrev Tom Lane <[hidden email]>:
Andreas Joseph Krogh <[hidden email]> writes:
> Attached is output from "bt full". Is this helpful?

Well, it shows that the failure is occurring while trying to evaluate
a variable in a trigger's WHEN clause during
"UPDATE origo_email_delivery SET folder_id=$1, created=$2\nWHERE entity_id IN ($3)\nRETURNING entity_id"
And I'd bet that the root cause is something to do with Andres' tuple slot
work.  But (at least to my eye) it's not apparent exactly what's wrong.

Can you show us the table definition and associated trigger definitions
for origo_email_delivery?

This doesn't seem to correlate with your original report, btw,
as that claimed the crash was during COMMIT.

regards, tom lane
 
FWIW: It doesn't always happen when that UPDATE-statement is issued, so it's not reproducable. We'll see what the next core-dump gives us.
 
Is it OK if I send you the table/trigger-definitions off-list?
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Reply | Threaded
Open this post in threaded view
|

Re: Segmentation fault with PG-12

Tom Lane-2
Andreas Joseph Krogh <[hidden email]> writes:
> Is it OK if I send you the table/trigger-definitions off-list?

Sure, but please share with Andres [cc'ed] as well.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Is my lecturer wrong about PostgreSQL? I think he is!

Pól Ua Laoínecháin
Hi all,

I recently started a Masters in Computer Science (and not at the
institution in my email address).

One of my courses is "Advanced Databases" - yummy I thought - it's not
even compulsory for me but I just *_had_* to take this module. The
lecturer is a bit of an Oracle fan-boy (ACE director no less...
hmmm...) and I want(ed) - becoming less enthusiasic by the minute - to
do my dissertation with him. So, we're having a chat and I make plain
my love of good 'ol PostgreSQL as my RDBMS of choice and he tells me
that there are problems with random block corruption with PostgreSQL.
I said "really" and before that conversation could go any further,
another student came over and asked a question.

So, I toddled off and did some research - I had heard something about
this before (vague fuzzy memories) of a problem with the Linux kernel
so I searched for a bit and duly dug up a couple of pages

https://lwn.net/Articles/752063/ : PostgreSQL's fsync() surprise - and

https://news.ycombinator.com/item?id=19238121 : Linux Fsync Issue for
Buffered IO and Its Preliminary Fix for PostgreSQL

So, this week I go back to my lecturer and say, yep, there was some
issue but it was a Linux kernel problem and not PostgreSQL's fault and
has been resolved.

He tells me that he knew about that but that there was another issue
(he had "spoken to people" at meetings!). I said "well, why isn't it
fixed?" and he replied "where's the impetus?" to which I responded
(quite shocked at this stage) something like "well, I know that the
core team values correctness very highly" to which he came back with
"yes, but they have no commercial imperative to fix anything - they
have to wait until somebody is capable enough and interested enough to
do the work". He then muttered something about this mysterious flaw
having been fixed in EnterpriseDB.

At this point, I lost interest. Having lurked on lists and going by my
general "gut feeling" - if there was a serious issue causing
irrecoverable block corruption, I'm pretty sure that it would be "all
hands on deck" until this problem had been solved and "nice-to-haves"
(GENERATED AS... for example) would have been parked till then.

Now, I have four questions:

1) Is my lecturer full of it or does he really have a point?

2) The actual concrete acknowledged problem with fsync that affected
PostgreSQL - why didn't it affect Oracle? Or MySQL? Or did it but it
was so rare that it never became apparent - it wasn't that obvious
with PostgreSQL either - one of those rare and intermittent problems?

3) Were there ever any problems with BSD?

4) What is the OS of choice for *_serious_* PostgreSQL installations?

I hope that I have been clear, but should anyone require any
clarification, please don't hesitate to ask me.

Tia and rgs,

Pól...


Reply | Threaded
Open this post in threaded view
|

Re: Is my lecturer wrong about PostgreSQL? I think he is!

Stephen Frost
Greetings,

First off- please try to craft a new email in the future rather than
respond to an existing one.  You may not realize this but there's some
headers that get copied when you do a reply that cause the email to show
up as being a reply, even if you remove all the "obvious" bits from it.

* Pól Ua Laoínecháin ([hidden email]) wrote:
> 1) Is my lecturer full of it or does he really have a point?

He's full of it, as far as I can tell anyway, based on what you've
shared with us.  Just look at the committers and the commit history to
PostgreSQL, and look at who the largest contributors are and who they
work for.  That alone might be enough to surprise your lecturer with.

> 2) The actual concrete acknowledged problem with fsync that affected
> PostgreSQL - why didn't it affect Oracle? Or MySQL? Or did it but it
> was so rare that it never became apparent - it wasn't that obvious
> with PostgreSQL either - one of those rare and intermittent problems?

Databases that do direct I/O don't depend on fsync.  That said, I do
think this could have been an issue for Oracle if you ran it without
direct i/o.

> 3) Were there ever any problems with BSD?

As I understand it, no.

> 4) What is the OS of choice for *_serious_* PostgreSQL installations?

BSD and Linux are both quite popular platforms for running PG, and
people run very serious workloads on both.

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Is my lecturer wrong about PostgreSQL? I think he is!

Alan Hodgson-3
In reply to this post by Pól Ua Laoínecháin
Assuming you're not a troll ...

On Wed, 2019-10-09 at 20:06 +0100, Pól Ua Laoínecháin wrote:
1) Is my lecturer full of it or does he really have a point?


He's more than full of it. PostgreSQL has had a few bugs over the year that could have resulted in data corruption, but they're pretty rare and fixed as soon as they're found. PostgreSQL is the most reliable software I run, and virtually the only major piece I don't hesitate to upgrade without waiting to see what bugs other people find first.


4) What is the OS of choice for *_serious_* PostgreSQL installations?

That's a religious question, not a technical question. I think even Microsoft makes a decent server OS nowadays. But I expect a large majority of PostgreSQL installations are running on Linux, as are the vast majority of all server apps nowadays. Having said that, I don't run a "serious" PostgreSQL installation; some of the people here run databases that do tens of thousands of TPS and hold many TiB of data. You'd have to ask them I guess.
Reply | Threaded
Open this post in threaded view
|

Re: Is my lecturer wrong about PostgreSQL? I think he is!

Tim Clarke-3
On 09/10/2019 20:45, Alan Hodgson wrote:

> Assuming you're not a troll ...
>
> On Wed, 2019-10-09 at 20:06 +0100, Pól Ua Laoínecháin wrote:
>> 1) Is my lecturer full of it or does he really have a point?
> He's more than full of it. PostgreSQL has had a few bugs over the year
> that could have resulted in data corruption, but they're pretty rare
> and fixed as soon as they're found. PostgreSQL is the most reliable
> software I run, and virtually the only major piece I don't hesitate to
> upgrade without waiting to see what bugs other people find first.
>> 4) What is the OS of choice for *_serious_* PostgreSQL installations?
> That's a religious question, not a technical question. I think even
> Microsoft makes a decent server OS nowadays. But I expect a large
> majority of PostgreSQL installations are running on Linux, as are the
> vast majority of all server apps nowadays. Having said that, I don't
> run a "serious" PostgreSQL installation; some of the people here run
> databases that do tens of thousands of TPS and hold many TiB of data.
> You'd have to ask them I guess.


Pretty sure if you run through Oracle's back catalogue you'll find a
similar data corruption for an entirely unrelated reason. Just guessing
but chances are....


Tim Clarke



Telephone: Witham: +44(0)1376 503500 | London: +44 (0)20 3009 0853 | Frankfurt: +49 (0)69 7191 6000 | Hong Kong: +852 5803 1687 | Toronto: +1 647 503 2848
Web: https://www.manifest.co.uk/



Minerva Analytics Ltd - A Solactive Company
9 Freebournes Court | Newland Street | Witham | Essex | CM8 2BL | United Kingdom

________________________________

Copyright: This e-mail may contain confidential or legally privileged information. If you are not the named addressee you must not use or disclose such information, instead please report it to [hidden email]<mailto:[hidden email]>
Legal: Minerva Analytics is the trading name of: Minerva Analytics Ltd: Registered in England Number 11260966 & The Manifest Voting Agency Ltd: Registered in England Number 2920820 Registered Office at above address. Please Click Here https://www.manifest.co.uk/legal/ for further information.
Reply | Threaded
Open this post in threaded view
|

Re: Is my lecturer wrong about PostgreSQL? I think he is!

Pól Ua Laoínecháin
In reply to this post by Stephen Frost
Hi, and thanks for responding,

> First off- please try to craft a new email in the future...

My apologies to you and the group - I'll do that in future.

> > 1) Is my lecturer full of it or does he really have a point?

> He's full of it, as far as I can tell anyway, based on what you've
> shared with us.  Just look at the committers and the commit history to
> PostgreSQL, and look at who the largest contributors are and who they
> work for.  That alone might be enough to surprise your lecturer with.

The only non-PostgreSQL company that I could find was Fujitisu - where
can I find a (list of) the others?

> Databases that do direct I/O don't depend on fsync.  That said, I do
> think this could have been an issue for Oracle if you ran it without
> direct i/o.

I think that Oracle are big into asyncio? I know that you have to sudo
dnf install some_library with a name like asio/asyncio or something
like that?

Anyway, why doesn't PostgreSQL use Direct I/O?

Thanks again and rgs,

Pól...

> Stephen


Reply | Threaded
Open this post in threaded view
|

Re: Segmentation fault with PG-12

Andres Freund
In reply to this post by Tom Lane-2
On 2019-10-09 10:16:37 -0400, Tom Lane wrote:
> Andreas Joseph Krogh <[hidden email]> writes:
> > Attached is output from "bt full". Is this helpful?
>
> Well, it shows that the failure is occurring while trying to evaluate
> a variable in a trigger's WHEN clause during
> "UPDATE origo_email_delivery SET folder_id=$1, created=$2\nWHERE entity_id IN ($3)\nRETURNING entity_id"
> And I'd bet that the root cause is something to do with Andres' tuple slot
> work.  But (at least to my eye) it's not apparent exactly what's wrong.

It looks like this could "just" be another report of #16036, which was
already fixed in:

commit d986d4e87f61c68f52c68ebc274960dc664b7b4e
Author: Andres Freund <[hidden email]>
Date:   2019-10-04 11:59:34 -0700

    Fix crash caused by EPQ happening with a before update trigger present.


> This doesn't seem to correlate with your original report, btw,
> as that claimed the crash was during COMMIT.

That however, would be confusing, unless there's some deferred trigger
that causes another update, which then fires a before update trigger
causing the problem.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Segmentation fault with PG-12

Andreas Joseph Krogh-2
På torsdag 10. oktober 2019 kl. 07:25:26, skrev Andres Freund <[hidden email]>:
On 2019-10-09 10:16:37 -0400, Tom Lane wrote:
> Andreas Joseph Krogh <[hidden email]> writes:
> > Attached is output from "bt full". Is this helpful?
>
> Well, it shows that the failure is occurring while trying to evaluate
> a variable in a trigger's WHEN clause during
> "UPDATE origo_email_delivery SET folder_id=$1, created=$2\nWHERE entity_id IN ($3)\nRETURNING entity_id"
> And I'd bet that the root cause is something to do with Andres' tuple slot
> work.  But (at least to my eye) it's not apparent exactly what's wrong.

It looks like this could "just" be another report of #16036, which was
already fixed in:

commit d986d4e87f61c68f52c68ebc274960dc664b7b4e
Author: Andres Freund <[hidden email]>
Date:   2019-10-04 11:59:34 -0700

    Fix crash caused by EPQ happening with a before update trigger present.

 
 
 
(Tom: This mail is only viewable as text/html, to if you're reading the text/plain version it will seem "hashed")
 
Aha, that whould be 60e97d63e5d19098e11fa32431a20eea820e2ae9 in REL_12_STABLE
We'll build and run HEAD of REL_12_STABLE, and report back.
 
 
> This doesn't seem to correlate with your original report, btw,
> as that claimed the crash was during COMMIT.

That however, would be confusing, unless there's some deferred trigger
that causes another update, which then fires a before update trigger
causing the problem.

Greetings,

Andres Freund
 
We have a deferred trigger which updates origo_email_delivery:
 
CREATE OR REPLACE FUNCTION origo_index_email_props_tf() RETURNS TRIGGER AS
$$
declare
    v_prop origo_email_message_property;
BEGIN
    v_prop := NEW;
    UPDATE origo_email_delivery
    SET is_seen      = v_prop.is_seen,
        followup_id  = v_prop.followup_id,
        is_replied   = v_prop.is_replied,
        is_forwarded = v_prop.is_forwarded,
        is_draft     = v_prop.is_draft,
        is_done      = v_prop.is_done,
        is_flagged   = v_prop.is_flagged,
        modseq       = greatest(modseq, v_prop.modseq)
    WHERE message_id = v_prop.message_id
      AND owner_id = v_prop.owner_id;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE CONSTRAINT TRIGGER origo_index_email_props_t
    AFTER INSERT OR UPDATE
    ON origo_email_message_property DEFERRABLE INITIALLY DEFERRED
    FOR EACH ROW
EXECUTE PROCEDURE origo_index_email_props_tf();
 
.. and then trigger the following UPDATE-trigger:
 
CREATE TRIGGER origo_email_delivery_update_t
    BEFORE UPDATE
    ON origo_email_delivery
    FOR EACH ROW
    WHEN (OLD.folder_id <> NEW.folder_id OR NEW.is_deleted <> OLD.is_deleted)
EXECUTE PROCEDURE origo_email_delivery_update_tf();
 
Maybe that will trigger the bug.
 
Thanks.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Reply | Threaded
Open this post in threaded view
|

Re: Segmentation fault with PG-12

Andres Freund
On 2019-10-10 09:05:06 +0200, Andreas Joseph Krogh wrote:
> (Tom: This mail is only viewable as text/html, to if you're reading the
> text/plain version it will seem "hashed")

I'm totally not ok with that btw.


Reply | Threaded
Open this post in threaded view
|

Re: Is my lecturer wrong about PostgreSQL? I think he is!

Wim Bertels-3
In reply to this post by Pól Ua Laoínecháin

>
> Now, I have four questions:
>
> 1) Is my lecturer full of it or does he really have a point?

Hallo Pol,

i don't know, a also teaching a databases,
personally i never experienced this

sometimes people are really a fan of certain product,
sometimes in combination with the thought that all other products are
bad; i don't know if this is the case, you could compare it with
soccer, a barcalona fan will never become a real madrid fan and vice
versa; so "rational" decisions (at first, based on some reasoning) tend
to get loaded emotional feelings.

in these kind of discussions both parties should have there say,
not just one (in this case you, we haven't heard the teacher)

there is no such thing as a perfect dbms,
nor postgres, nor oracle, nor ..,

>
> 2) The actual concrete acknowledged problem with fsync that affected
> PostgreSQL - why didn't it affect Oracle? Or MySQL? Or did it but it
> was so rare that it never became apparent - it wasn't that obvious
> with PostgreSQL either - one of those rare and intermittent problems?

you can set fsync to off (not default), for more performance,
but it comes with the cost of D in ACID, you no longer have it

>
> 3) Were there ever any problems with BSD?

as far as i understand BSD and variants are very solid,
so good for server use, not for desktop

>
> 4) What is the OS of choice for *_serious_* PostgreSQL installations?

it depends,

if transparancy is important to you, choose an opensource os
if time is important to you, choose what you already know
if you are willing to spent time on it, i would personally choose a
linux, bsd or solaris based os

if it helps:
i my case, i had to make a decision about the dbms for the classes as
well, the reasons i have choosen postgres are in a nutshell:
* free
* open
* runs good on servers that are comparable with an average desktop pc
or better
* close to ISO sql standard (the reason why i didn't choose mysql/now
mariadb)
* seems to have a future
-* within all these, postgres seems to have implemented most features
* after using it for a while (18 years now), i should now add: a great
community

some links:
https://www.top500.org/
https://en.wikipedia.org/wiki/DB-Engines_ranking
https://db-engines.com/en/ranking

>
> I hope that I have been clear, but should anyone require any
> clarification, please don't hesitate to ask me.
>
> Tia and rgs,
>
> Pól...
>
>
--
mvg,
Wim Bertels
--
Lector
UC Leuven-Limburg
--
The human race has one really effective weapon, and that is laughter.
                -- Mark Twain


Reply | Threaded
Open this post in threaded view
|

Re: Is my lecturer wrong about PostgreSQL? I think he is!

Geoff Winkless
On Thu, 10 Oct 2019 at 09:31, Wim Bertels <[hidden email]> wrote:
> sometimes people are really a fan of certain product,
> sometimes in combination with the thought that all other products are
> bad; i don't know if this is the case, you could compare it with
> soccer, a barcalona fan will never become a real madrid fan and vice
> versa; so "rational" decisions (at first, based on some reasoning) tend
> to get loaded emotional feelings.

Yeah, this. Bear in mind it's possible that having made a sweeping
statement that he cannot back up and that he secretly knows was
unfounded, your lecturer will be defensive and uncomfortable. Chances
are after your conversation he will have gone away and done the same
research you did and may well have modified his opinion but will be
too embarrassed to admit that to you.

Geoff


Reply | Threaded
Open this post in threaded view
|

Re: Segmentation fault with PG-12

Andreas Joseph Krogh-2
In reply to this post by Andres Freund
We had another crash today, and it appears to be the same:
 
#0  slot_deform_heap_tuple (natts=26, offp=0x5598eba0b968,
    tuple=<optimized out>, slot=0x5598eba0b920)
    at ./build/../src/backend/executor/execTuples.c:895
 
--
Andreas Joseph Krogh
Reply | Threaded
Open this post in threaded view
|

Re: Is my lecturer wrong about PostgreSQL? I think he is!

Simon Riggs
In reply to this post by Pól Ua Laoínecháin
On Wed, 9 Oct 2019 at 20:06, Pól Ua Laoínecháin <[hidden email]> wrote:
 
One of my courses is "Advanced Databases" - yummy I thought - it's not
even compulsory for me but I just *_had_* to take this module. The
lecturer is a bit of an Oracle fan-boy (ACE director no less...
hmmm...) and I want(ed) - becoming less enthusiasic by the minute - to
do my dissertation with him.

It's a good thing that you have the opportunity to do that course and to have an lecturer with strong real-world experience.
 
So, we're having a chat and I make plain
my love of good 'ol PostgreSQL as my RDBMS of choice and he tells me
that there are problems with random block corruption with PostgreSQL.
I said "really" and before that conversation could go any further,
another student came over and asked a question.

I think its true that there have been reported problems with block corruption with both Oracle and PostgreSQL. The main difference is that the PostgreSQL project is open enough for people to see much of that on public record.

Given the efforts made on resilience and recovery, such as PITR, block checksums, those problems are pretty much solved, based upon a statistically sufficient sample of the real world: 2ndQuadrant customers. Some block-level problems do still recur - the recent fsync() problems were reported by us and have been handled (not resolved), but they were and are rare. There are still many issues of other kinds.

Many Oracle customers I have worked with years ago experienced block corruptions and it was very common to hear the reply "talk to your disk vendor". Those happened probably 20-30 years earlier, so in many cases have now been forgotten. There isn't an easy way to go back and check for trends on that.

Having said all of that, its easy to point at some of these things and use them as FUD - fear, uncertainty and doubt. No doubt unintentionally.

I'd go and learn more from your lecturer. Your disagreement has made you think, so he helped you. Learning from your own mistakes takes longer.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Solutions for the Enterprise
Reply | Threaded
Open this post in threaded view
|

Re: Is my lecturer wrong about PostgreSQL? I think he is!

Stephen Frost
In reply to this post by Pól Ua Laoínecháin
Greetings,

* Pól Ua Laoínecháin ([hidden email]) wrote:
> > > 1) Is my lecturer full of it or does he really have a point?
>
> > He's full of it, as far as I can tell anyway, based on what you've
> > shared with us.  Just look at the committers and the commit history to
> > PostgreSQL, and look at who the largest contributors are and who they
> > work for.  That alone might be enough to surprise your lecturer with.
>
> The only non-PostgreSQL company that I could find was Fujitisu - where
> can I find a (list of) the others?

Not sure where you were looking...  The contributors list is here:

https://www.postgresql.org/community/contributors/

The committers list is here:

https://wiki.postgresql.org/wiki/Committers

The git tree is here:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=summary

Perhaps not the best stat, but you can view the contributions by
committer pretty easily, for 2018, here:

https://github.com/postgres/postgres/graphs/contributors?from=2018-01-01&to=2018-12-31&type=c

Note that this isn't very representative of the actual authors though-
we don't track those in the way git would prefer, instead we note who
the author of a given patch was in the commit message itself.

> > Databases that do direct I/O don't depend on fsync.  That said, I do
> > think this could have been an issue for Oracle if you ran it without
> > direct i/o.
>
> I think that Oracle are big into asyncio? I know that you have to sudo
> dnf install some_library with a name like asio/asyncio or something
> like that?

Oracle supports both, but running with direct i/o is pretty popular,
yes.

> Anyway, why doesn't PostgreSQL use Direct I/O?

There's an awful lot that the kernel provides when it comes to things
like good read-ahead and dealing with disks and SSDs and such that we
(currently, at least) prefer to leverage instead of writing lots of new
code to deal with that ourselves, which would be required to use Direct
I/O (and not have it be completely terrible performance wise, anyway).

The whole issue behind fsync was because our expectation (and POSIX's,
if you ask me anyway) was different from what the Linux kernel was
providing (specifically, you could end up in a situation where an
fsync() call "worked" and didn't return an error, even though there
remained pages that were dirty and not written out).  Now, this is under
other error conditions typically and you'll get messages in the kernel
log about such failures usually, so if you're properly monitoring and
managing your systems there's a good chance you would have realized
there was a problem even though the Linux kernel was telling PG that
everything was fine (have backups!!).

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
12