Improving connection scalability: GetSnapshotData()

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
68 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: Improving connection scalability: GetSnapshotData()

Andres Freund
Hi,

On 2020-04-07 16:13:07 -0400, Robert Haas wrote:

> On Tue, Apr 7, 2020 at 3:24 PM Andres Freund <[hidden email]> wrote:
> > > + ProcGlobal->xids[pgxactoff] = InvalidTransactionId;
> > >
> > > Apparently this array is not dense in the sense that it excludes
> > > unused slots, but comments elsewhere don't seem to entirely agree.
> >
> > What do you mean with "unused slots"? Backends that committed?
>
> Backends that have no XID. You mean, I guess, that it is "dense" in
> the sense that only live backends are in there, not "dense" in the
> sense that only active write transactions are in there.

Correct.

I tried the "only active write transaction" approach, btw, and had a
hard time making it scale well (due to the much more frequent moving of
entries at commit/abort time).  If we were to go to a 'only active
transactions' array at some point we'd imo still need pretty much all
the other changes made here - so I'm not worried about it for now.


> > > + uint32 TotalProcs = MaxBackends + NUM_AUXILIARY_PROCS + max_prepared_xacts;
> > >
> > >   /* ProcGlobal */
> > >   size = add_size(size, sizeof(PROC_HDR));
> > > - /* MyProcs, including autovacuum workers and launcher */
> > > - size = add_size(size, mul_size(MaxBackends, sizeof(PGPROC)));
> > > - /* AuxiliaryProcs */
> > > - size = add_size(size, mul_size(NUM_AUXILIARY_PROCS, sizeof(PGPROC)));
> > > - /* Prepared xacts */
> > > - size = add_size(size, mul_size(max_prepared_xacts, sizeof(PGPROC)));
> > > - /* ProcStructLock */
> > > + size = add_size(size, mul_size(TotalProcs, sizeof(PGPROC)));
> > >
> > > This seems like a bad idea. If we establish a precedent that it's OK
> > > to have sizing routines that don't use add_size() and mul_size(),
> > > people are going to cargo cult that into places where there is more
> > > risk of overflow than there is here.
> >
> > Hm. I'm not sure I see the problem. Are you concerned that TotalProcs
> > would overflow due to too big MaxBackends or max_prepared_xacts? The
> > multiplication itself is still protected by add_size(). It didn't seem
> > correct to use add_size for the TotalProcs addition, since that's not
> > really a size. And since the limit for procs is much lower than
> > UINT32_MAX...
>
> I'm concerned that there are 0 uses of add_size in any shared-memory
> sizing function, and I think it's best to keep it that way.

I can't make sense of that sentence?


We already have code like this, and have for a long time:
        /* Size of the ProcArray structure itself */
#define PROCARRAY_MAXPROCS (MaxBackends + max_prepared_xacts)

adding NUM_AUXILIARY_PROCS doesn't really change that, does it?


> If you initialize TotalProcs = add_size(MaxBackends,
> add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts)) then I'm happy.

Will do.


Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Improving connection scalability: GetSnapshotData()

Andres Freund
In reply to this post by Andres Freund
Hi

On 2020-04-07 05:15:03 -0700, Andres Freund wrote:

> SEE BELOW: What, and what not, to do for v13.
>
> [ description of changes ]
>
> I think this is pretty close to being committable.
>
> But: This patch came in very late for v13, and it took me much longer to
> polish it up than I had hoped (partially distraction due to various bugs
> I found (in particular snapshot_too_old), partially covid19, partially
> "hell if I know"). The patchset touches core parts of the system. While
> both Thomas and David have done some review, they haven't for the latest
> version (mea culpa).
>
> In many other instances I would say that the above suggests slipping to
> v14, given the timing.
>
> The main reason I am considering pushing is that I think this patcheset
> addresses one of the most common critiques of postgres, as well as very
> common, hard to fix, real-world production issues. GetSnapshotData() has
> been a major bottleneck for about as long as I have been using postgres,
> and this addresses that to a significant degree.
>
> A second reason I am considering it is that, in my opinion, the changes
> are not all that complicated and not even that large. At least not for a
> change to a problem that we've long tried to improve.
>
>
> Obviously we all have a tendency to think our own work is important, and
> that we deserve a bit more leeway than others. So take the above with a
> grain of salt.
I tried hard, but came up short. It's 5 AM, and I am still finding
comments that aren't quite right. For a while I thought I'd be pushing a
few hours ...  And even if it were ready now: This is too large a patch
to push this tired (but damn, I'd love to).

Unfortunately adressing Robert's comments made me realize I didn't like
some of my own naming. In particular I started to dislike
InvisibleToEveryone, and some of the procarray.c variables around
"visible".  After trying about half a dozen schemes I think I found
something that makes some sense, although I am still not perfectly
happy.

I think the attached set of patches address most of Robert's review
comments, minus a few cases minor quibbles where I thought he was wrong
(fundamentally wrong of course). There are no *Copy fields in PGPROC
anymore, there's a lot more comments above PROC_HDR (not duplicated
elsewhere). I've reduced the interspersed changes to GetSnapshotData()
so those can be done separately.

There's also somewhat meaningful commit messages now. But
    snapshot scalability: Move in-progress xids to ProcGlobal->xids[].
needs to be expanded to mention the changed locking requirements.


Realistically it still 2-3 hours of proof-reading.


This makes me sad :(

v9-0001-snapshot-scalability-Don-t-compute-global-horizon.patch (139K) Download Attachment
v9-0002-snapshot-scalability-Move-PGXACT-xmin-back-to-PGP.patch (19K) Download Attachment
v9-0003-snapshot-scalability-Move-in-progress-xids-to-Pro.patch (49K) Download Attachment
v9-0004-snapshot-scalability-Move-PGXACT-vacuumFlags-to-P.patch (19K) Download Attachment
v9-0005-snapshot-scalability-Move-subxact-info-to-ProcGlo.patch (26K) Download Attachment
v9-0006-snapshot-scalability-cache-snapshots-using-a-xact.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improving connection scalability: GetSnapshotData()

Alexander Korotkov
On Wed, Apr 8, 2020 at 3:43 PM Andres Freund <[hidden email]> wrote:
> Realistically it still 2-3 hours of proof-reading.
>
> This makes me sad :(

Can we ask RMT to extend feature freeze for this particular patchset?
I think it's reasonable assuming extreme importance of this patchset.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Improving connection scalability: GetSnapshotData()

Robert Haas
In reply to this post by Andres Freund
On Tue, Apr 7, 2020 at 4:27 PM Andres Freund <[hidden email]> wrote:

> The main reason is that we want to be able to cheaply check the current
> state of the variables (mostly when checking a backend's own state). We
> can't access the "dense" ones without holding a lock, but we e.g. don't
> want to make ProcArrayEndTransactionInternal() take a lock just to check
> if vacuumFlags is set.
>
> It turns out to also be good for performance to have the copy for
> another reason: The "dense" arrays share cachelines with other
> backends. That's worth it because it allows to make GetSnapshotData(),
> by far the most frequent operation, touch fewer cache lines. But it also
> means that it's more likely that a backend's "dense" array entry isn't
> in a local cpu cache (it'll be pulled out of there when modified in
> another backend). In many cases we don't need the shared entry at commit
> etc time though, we just need to check if it is set - and most of the
> time it won't be.  The local entry allows to do that cheaply.
>
> Basically it makes sense to access the PGPROC variable when checking a
> single backend's data, especially when we have to look at the PGPROC for
> other reasons already.  It makes sense to look at the "dense" arrays if
> we need to look at many / most entries, because we then benefit from the
> reduced indirection and better cross-process cacheability.

That's a good explanation. I think it should be in the comments or a
README somewhere.

> How about:
>         /*
>          * If the current xactCompletionCount is still the same as it was at the
>          * time the snapshot was built, we can be sure that rebuilding the
>          * contents of the snapshot the hard way would result in the same snapshot
>          * contents:
>          *
>          * As explained in transam/README, the set of xids considered running by
>          * GetSnapshotData() cannot change while ProcArrayLock is held. Snapshot
>          * contents only depend on transactions with xids and xactCompletionCount
>          * is incremented whenever a transaction with an xid finishes (while
>          * holding ProcArrayLock) exclusively). Thus the xactCompletionCount check
>          * ensures we would detect if the snapshot would have changed.
>          *
>          * As the snapshot contents are the same as it was before, it is is safe
>          * to re-enter the snapshot's xmin into the PGPROC array. None of the rows
>          * visible under the snapshot could already have been removed (that'd
>          * require the set of running transactions to change) and it fulfills the
>          * requirement that concurrent GetSnapshotData() calls yield the same
>          * xmin.
>          */

That's nice and clear.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Improving connection scalability: GetSnapshotData()

Jonathan S. Katz-3
In reply to this post by Alexander Korotkov
On 4/8/20 8:59 AM, Alexander Korotkov wrote:
> On Wed, Apr 8, 2020 at 3:43 PM Andres Freund <[hidden email]> wrote:
>> Realistically it still 2-3 hours of proof-reading.
>>
>> This makes me sad :(
>
> Can we ask RMT to extend feature freeze for this particular patchset?
> I think it's reasonable assuming extreme importance of this patchset.

One of the features of RMT responsibilities[1] is to be "hands off" as
much as possible, so perhaps a reverse ask: how would people feel about
this patch going into PG13, knowing that the commit would come after the
feature freeze date?

My 2¢, with RMT hat off:

As mentioned earlier[2], we know that connection scalability is a major
pain point with PostgreSQL and any effort that can help alleviate that
is a huge win, even in incremental gains. Andres et al experimentation
show that this is more than incremental gains, and will certainly make a
huge difference in people's PostgreSQL experience. It is one of those
features where you can "plug in and win" -- you get a performance
benefit just by upgrading. That is not insignificant.

However, I also want to ensure that we are fair: in the past there have
also been other patches that have been "oh-so-close" to commit before
feature freeze but have not made it in (an example escapes me at the
moment). Therefore, we really need to have consensus among ourselves
that the right decision is to allow this to go in after feature freeze.

Did this come in (very) late into the development cycle? Yes, and I
think normally that's enough to give cause for pause. But I could also
argue that Andres is fixing a "bug" with PostgreSQL (probably several
bugs ;) with PostgreSQL -- and perhaps the fixes can't be backpatched
per se, but they do improve the overall stability and usability of
PostgreSQL and it'd be a shame if we have to wait on them.

Lastly, with the ongoing world events, perhaps time that could have been
dedicated to this and other patches likely affected their completion. I
know most things in my life take way longer than they used to (e.g.
taking out the trash/recycles has gone from a 15s to 240s routine). The
same could be said about other patches as well, but this one has a far
greater impact (a double-edged sword, of course) given it's a feature
that everyone uses in PostgreSQL ;)

So with my RMT hat off, I say +1 to allowing this post feature freeze,
though within a reasonable window.

My 2¢, with RMT hat on:

I believe in[2] I outlined a way a path for including the patch even at
this stage in the game. If it is indeed committed, I think we
immediately put it on the "Recheck a mid-Beta" list. I know it's not as
trivial to change as something like "Determine if jit="on" by default"
(not picking on Andres, I just remember that example from RMT 11), but
it at least provides a notable reminder that we need to ensure we test
this thoroughly, and point people to really hammer it during beta.

So with my RMT hat on, I say +0 but with a ;)

Thanks,

Jonathan

[1] https://wiki.postgresql.org/wiki/Release_Management_Team#History
[2]
https://www.postgresql.org/message-id/6be8c321-68ea-a865-d8d0-50a3af616463%40postgresql.org


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improving connection scalability: GetSnapshotData()

Robert Haas
On Wed, Apr 8, 2020 at 9:27 AM Jonathan S. Katz <[hidden email]> wrote:
> One of the features of RMT responsibilities[1] is to be "hands off" as
> much as possible, so perhaps a reverse ask: how would people feel about
> this patch going into PG13, knowing that the commit would come after the
> feature freeze date?

Letting something be committed after feature freeze, or at any other
time, is just a risk vs. reward trade-off. Every patch carries some
chance of breaking stuff or making things worse. And every patch has a
chance of making something better that people care about.

On general principle, I would categorize this as a moderate-risk
patch. It doesn't change SQL syntax like, e.g. MERGE, nor does it
touch the on-disk format, like, e.g. INSERT .. ON CONFLICT UPDATE. The
changes are relatively localized, unlike, e.g. parallel query. Those
are all things that reduce risk. On the other hand, it's a brand new
patch which has not been thoroughly reviewed by anyone. Moreover,
shakedown time will be minimal because we're so late in the release
cycle. if it has subtle synchronization problems or if it regresses
performance badly in some cases, we might not find out about any of
that until after release. While in theory we could revert it any time,
since no SQL syntax or on-disk format is affected, in practice it will
be difficult to do that if it's making life better for some people and
worse for others.

I don't know what the right thing to do is. I agree with everyone who
says this is a very important problem, and I have the highest respect
for Andres's technical ability. On the other hand, I have been around
here long enough to know that deciding whether to allow late commits
on the basis of how much we like the feature is a bad plan, because it
takes into account only the upside of a commit, and ignores the
possible downside risk. Typically, the commit is late because the
feature was rushed to completion at the last minute, which can have an
effect on quality. I can say, having read through the patches
yesterday, that they don't suck, but I can't say that they're fully
correct. That's not to say that we shouldn't decide to take them, but
it is a concern to be taken seriously. We have made mistakes before in
what we shipped that had serious implications for many users and for
the project; we should all be wary of making more such mistakes. I am
not trying to say that solving problems and making stuff better is NOT
important, just that every coin has two sides.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Improving connection scalability: GetSnapshotData()

Andres Freund
In reply to this post by Robert Haas
Hi,

On 2020-04-08 09:24:13 -0400, Robert Haas wrote:

> On Tue, Apr 7, 2020 at 4:27 PM Andres Freund <[hidden email]> wrote:
> > The main reason is that we want to be able to cheaply check the current
> > state of the variables (mostly when checking a backend's own state). We
> > can't access the "dense" ones without holding a lock, but we e.g. don't
> > want to make ProcArrayEndTransactionInternal() take a lock just to check
> > if vacuumFlags is set.
> >
> > It turns out to also be good for performance to have the copy for
> > another reason: The "dense" arrays share cachelines with other
> > backends. That's worth it because it allows to make GetSnapshotData(),
> > by far the most frequent operation, touch fewer cache lines. But it also
> > means that it's more likely that a backend's "dense" array entry isn't
> > in a local cpu cache (it'll be pulled out of there when modified in
> > another backend). In many cases we don't need the shared entry at commit
> > etc time though, we just need to check if it is set - and most of the
> > time it won't be.  The local entry allows to do that cheaply.
> >
> > Basically it makes sense to access the PGPROC variable when checking a
> > single backend's data, especially when we have to look at the PGPROC for
> > other reasons already.  It makes sense to look at the "dense" arrays if
> > we need to look at many / most entries, because we then benefit from the
> > reduced indirection and better cross-process cacheability.
>
> That's a good explanation. I think it should be in the comments or a
> README somewhere.

I had a briefer version in the PROC_HDR comment. I've just expanded it
to:
 *
 * The denser separate arrays are beneficial for three main reasons: First, to
 * allow for as tight loops accessing the data as possible. Second, to prevent
 * updates of frequently changing data (e.g. xmin) from invalidating
 * cachelines also containing less frequently changing data (e.g. xid,
 * vacuumFlags). Third to condense frequently accessed data into as few
 * cachelines as possible.
 *
 * There are two main reasons to have the data mirrored between these dense
 * arrays and PGPROC. First, as explained above, a PGPROC's array entries can
 * only be accessed with either ProcArrayLock or XidGenLock held, whereas the
 * PGPROC entries do not require that (obviously there may still be locking
 * requirements around the individual field, separate from the concerns
 * here). That is particularly important for a backend to efficiently checks
 * it own values, which it often can safely do without locking.  Second, the
 * PGPROC fields allow to avoid unnecessary accesses and modification to the
 * dense arrays. A backend's own PGPROC is more likely to be in a local cache,
 * whereas the cachelines for the dense array will be modified by other
 * backends (often removing it from the cache for other cores/sockets). At
 * commit/abort time a check of the PGPROC value can avoid accessing/dirtying
 * the corresponding array value.
 *
 * Basically it makes sense to access the PGPROC variable when checking a
 * single backend's data, especially when already looking at the PGPROC for
 * other reasons already.  It makes sense to look at the "dense" arrays if we
 * need to look at many / most entries, because we then benefit from the
 * reduced indirection and better cross-process cache-ability.
 *
 * When entering a PGPROC for 2PC transactions with ProcArrayAdd(), the data
 * in the dense arrays is initialized from the PGPROC while it already holds

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Improving connection scalability: GetSnapshotData()

Bruce Momjian
In reply to this post by Robert Haas
On Wed, Apr  8, 2020 at 09:44:16AM -0400, Robert Haas wrote:

> I don't know what the right thing to do is. I agree with everyone who
> says this is a very important problem, and I have the highest respect
> for Andres's technical ability. On the other hand, I have been around
> here long enough to know that deciding whether to allow late commits
> on the basis of how much we like the feature is a bad plan, because it
> takes into account only the upside of a commit, and ignores the
> possible downside risk. Typically, the commit is late because the
> feature was rushed to completion at the last minute, which can have an
> effect on quality. I can say, having read through the patches
> yesterday, that they don't suck, but I can't say that they're fully
> correct. That's not to say that we shouldn't decide to take them, but
> it is a concern to be taken seriously. We have made mistakes before in
> what we shipped that had serious implications for many users and for
> the project; we should all be wary of making more such mistakes. I am
> not trying to say that solving problems and making stuff better is NOT
> important, just that every coin has two sides.

If we don't commit this, where does this leave us with the
old_snapshot_threshold feature?  We remove it in back branches and have
no working version in PG 13?  That seems kind of bad.

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Reply | Threaded
Open this post in threaded view
|

Re: Improving connection scalability: GetSnapshotData()

Andres Freund
In reply to this post by Jonathan S. Katz-3
Hi,

On 2020-04-08 09:26:42 -0400, Jonathan S. Katz wrote:
> On 4/8/20 8:59 AM, Alexander Korotkov wrote:
> > On Wed, Apr 8, 2020 at 3:43 PM Andres Freund <[hidden email]> wrote:
> >> Realistically it still 2-3 hours of proof-reading.
> >>
> >> This makes me sad :(
> >
> > Can we ask RMT to extend feature freeze for this particular patchset?
> > I think it's reasonable assuming extreme importance of this patchset.

> One of the features of RMT responsibilities[1] is to be "hands off" as
> much as possible, so perhaps a reverse ask: how would people feel about
> this patch going into PG13, knowing that the commit would come after the
> feature freeze date?

I'm obviously biased, so I don't think there's much point in responding
directly to that question. But I thought it could be helpful if I
described what my thoughts about where the patchset is:

What made me not commit it "earlier" yesterday was not that I had/have
any substantial concerns about the technical details of the patch. But
that there were a few too many comments that didn't yet sound quite
right, that the commit messages didn't yet explain the architecture
/ benefits well enough, and that I noticed that a few variable names
were too easy to be misunderstood by others.

By 5 AM I had addressed most of that, except that some technical details
weren't yet mentioned in the commit messages ([1], they are documented
in the code). I also produce enough typos / odd grammar when fully
awake, so even though I did proof read my changes, I thought that I need
to do that again while awake.

There have been no substantial code changes since yesterday. The
variable renaming prompted by Robert (which I agree is an improvement),
as well as reducing the diff size by deferring some readability
improvements (probably also a good idea) did however produce quite a few
conflicts in subsequent patches that I needed to resolve. Another awake
read-through to confirm that I resolved them correctly seemed the
responsible thing to do before a commit.


> Lastly, with the ongoing world events, perhaps time that could have been
> dedicated to this and other patches likely affected their completion. I
> know most things in my life take way longer than they used to (e.g.
> taking out the trash/recycles has gone from a 15s to 240s routine). The
> same could be said about other patches as well, but this one has a far
> greater impact (a double-edged sword, of course) given it's a feature
> that everyone uses in PostgreSQL ;)

I'm obviously not alone in that, so I agree that it's not an argument
pro/con anything.

But this definitely is the case for me. Leaving aside the general dread,
not having a quiet home-office, nor good exercise, is definitely not
helping.

I'm really bummed that I didn't have the cycles to help the shared
memory stats patch ready as well. It's clearly not yet there (but
improved a lot during the CF). But it's been around for so long, and
there's so many improvements blocked by the current stats
infrastructure...


[1] the "mirroring" of values beteween dense arrays and PGPROC, the
changed locking regimen for ProcArrayAdd/Remove, the widening of
lastCompletedXid to be a 64bit xid
[2] https://www.postgresql.org/message-id/20200407121503.zltbpqmdesurflnm%40alap3.anarazel.de

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Improving connection scalability: GetSnapshotData()

Andres Freund
In reply to this post by Robert Haas
Hi,

On 2020-04-08 09:44:16 -0400, Robert Haas wrote:
> Moreover, shakedown time will be minimal because we're so late in the
> release cycle

My impression increasingly is that there's very little actual shakedown
before beta :(. As e.g. evidenced by the fact that 2PC did basically not
work for several months until I did new benchmarks for this patch.

I don't know what to do about that, but...

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Improving connection scalability: GetSnapshotData()

Andres Freund
In reply to this post by Bruce Momjian
Hi,

On 2020-04-08 18:06:23 -0400, Bruce Momjian wrote:
> If we don't commit this, where does this leave us with the
> old_snapshot_threshold feature?  We remove it in back branches and have
> no working version in PG 13?  That seems kind of bad.

I don't think this patch changes the situation for
old_snapshot_threshold in a meaningful way.

Sure, this patch makes old_snapshot_threshold scale better, and triggers
fewer unnecessary query cancellations.  But there still are wrong query
results, the tests still don't test anything meaningful, and the
determination of which query is cancelled is still wrong.

- Andres


Reply | Threaded
Open this post in threaded view
|

Re: Improving connection scalability: GetSnapshotData()

Bruce Momjian
On Wed, Apr  8, 2020 at 03:25:34PM -0700, Andres Freund wrote:

> Hi,
>
> On 2020-04-08 18:06:23 -0400, Bruce Momjian wrote:
> > If we don't commit this, where does this leave us with the
> > old_snapshot_threshold feature?  We remove it in back branches and have
> > no working version in PG 13?  That seems kind of bad.
>
> I don't think this patch changes the situation for
> old_snapshot_threshold in a meaningful way.
>
> Sure, this patch makes old_snapshot_threshold scale better, and triggers
> fewer unnecessary query cancellations.  But there still are wrong query
> results, the tests still don't test anything meaningful, and the
> determination of which query is cancelled is still wrong.

Oh, OK, so it still needs to be disabled.  I was hoping we could paint
this as a fix.

Based on Robert's analysis of the risk (no SQL syntax, no storage
changes), I think, if you are willing to keep working at this until the
final release, it is reasonable to apply it.

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Reply | Threaded
Open this post in threaded view
|

Re: Improving connection scalability: GetSnapshotData()

Michael Paquier-2
In reply to this post by Andres Freund
On Wed, Apr 08, 2020 at 03:17:41PM -0700, Andres Freund wrote:

> On 2020-04-08 09:26:42 -0400, Jonathan S. Katz wrote:
>> Lastly, with the ongoing world events, perhaps time that could have been
>> dedicated to this and other patches likely affected their completion. I
>> know most things in my life take way longer than they used to (e.g.
>> taking out the trash/recycles has gone from a 15s to 240s routine). The
>> same could be said about other patches as well, but this one has a far
>> greater impact (a double-edged sword, of course) given it's a feature
>> that everyone uses in PostgreSQL ;)
>
> I'm obviously not alone in that, so I agree that it's not an argument
> pro/con anything.
>
> But this definitely is the case for me. Leaving aside the general dread,
> not having a quiet home-office, nor good exercise, is definitely not
> helping.
Another factor to be careful of is that by committing a new feature in
a release cycle, you actually need to think about the extra amount of
resources you may need to address comments and issues about it in time
during the beta/stability period, and that more care is likely needed
if you commit something at the end of the cycle.  On top of that,
currently, that's a bit hard to plan one or two weeks ahead if help is
needed to stabilize something you worked on.  I am pretty sure that
we'll be able to sort things out with a collective effort though.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improving connection scalability: GetSnapshotData()

Michail Nikolaev
Hello, hackers.
Andres, nice work!

Sorry for the off-top.

Some of my work [1] related to the support of index hint bits on
standby is highly interfering with this patch.
Is it safe to consider it committed and start rebasing on top of the patches?

Thanks,
Michail.

[1]: https://www.postgresql.org/message-id/CANtu0ojmkN_6P7CQWsZ%3DuEgeFnSmpCiqCxyYaHnhYpTZHj7Ubw%40mail.gmail.com


Reply | Threaded
Open this post in threaded view
|

Re: Improving connection scalability: GetSnapshotData()

Daniel Gustafsson
In reply to this post by Andres Freund
This patch no longer applies to HEAD, please submit a rebased version.  I've
marked it Waiting on Author in the meantime.

cheers ./daniel


Reply | Threaded
Open this post in threaded view
|

Re: Improving connection scalability: GetSnapshotData()

Andres Freund
Hi,

On 2020-07-01 14:42:59 +0200, Daniel Gustafsson wrote:
> This patch no longer applies to HEAD, please submit a rebased version.  I've
> marked it Waiting on Author in the meantime.

Thanks!

Here's a rebased version.  There's a good bit of commit message
polishing and some code and comment cleanup compared to the last
version. Oh, and obviously the conflicts are resolved.

It could make sense to split the conversion of
VariableCacheData->latestCompletedXid to FullTransactionId out from 0001
into is own commit. Not sure...

I've played with splitting 0003, to have the "infrastructure" pieces
separate, but I think it's not worth it. Without a user the changes look
weird and it's hard to have the comment make sense.

Greetings,

Andres Freund

v11-0001-snapshot-scalability-Don-t-compute-global-horizo.patch (138K) Download Attachment
v11-0002-snapshot-scalability-Move-PGXACT-xmin-back-to-PG.patch (20K) Download Attachment
v11-0003-snapshot-scalability-Introduce-dense-array-of-in.patch (51K) Download Attachment
v11-0004-snapshot-scalability-Move-PGXACT-vacuumFlags-to-.patch (19K) Download Attachment
v11-0005-snapshot-scalability-Move-subxact-info-to-ProcGl.patch (26K) Download Attachment
v11-0006-snapshot-scalability-cache-snapshots-using-a-xac.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improving connection scalability: GetSnapshotData()

Alvaro Herrera-9
On 2020-Jul-15, Andres Freund wrote:

> It could make sense to split the conversion of
> VariableCacheData->latestCompletedXid to FullTransactionId out from 0001
> into is own commit. Not sure...

+1, the commit is large enough and that change can be had in advance.

Note you forward-declare struct GlobalVisState twice in heapam.h.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Improving connection scalability: GetSnapshotData()

Andres Freund
Hi,

On 2020-07-15 21:33:06 -0400, Alvaro Herrera wrote:
> On 2020-Jul-15, Andres Freund wrote:
>
> > It could make sense to split the conversion of
> > VariableCacheData->latestCompletedXid to FullTransactionId out from 0001
> > into is own commit. Not sure...
>
> +1, the commit is large enough and that change can be had in advance.

I've done that in the attached.

I wonder if somebody has an opinion on renaming latestCompletedXid to
latestCompletedFullXid. That's the pattern we already had (cf
nextFullXid), but it also leads to pretty long lines and quite a few
comment etc changes.

I'm somewhat inclined to remove the "Full" out of the variable, and to
also do that for nextFullXid. I feel like including it in the variable
name is basically a poor copy of the (also not great) C type system.  If
we hadn't made FullTransactionId a struct I'd see it differently (and
thus incompatible with TransactionId), but we have ...


> Note you forward-declare struct GlobalVisState twice in heapam.h.

Oh, fixed, thanks.


I've also fixed a correctness bug that Thomas's cfbot found (and he
personally pointed out). There were occasional make check runs with
vacuum erroring out. That turned out to be because it was possible for
the horizon used to make decisions in heap_page_prune() and
lazy_scan_heap() to differ a bit. I've started a thread about my
concerns around the fragility of that logic [1].  The code around that
can use a bit more polish, I think. I mainly wanted to post a new
version so that the patch separated out above can be looked at.

Greetings,

Andres Freund

[1] https://www.postgresql.org/message-id/20200723181018.neey2jd3u7rfrfrn%40alap3.anarazel.de

v12-0001-Track-latest-completed-xid-as-a-FullTransactionI.patch (21K) Download Attachment
v12-0002-snapshot-scalability-Don-t-compute-global-horizo.patch (125K) Download Attachment
v12-0003-snapshot-scalability-Move-PGXACT-xmin-back-to-PG.patch (20K) Download Attachment
v12-0004-snapshot-scalability-Introduce-dense-array-of-in.patch (51K) Download Attachment
v12-0005-snapshot-scalability-Move-PGXACT-vacuumFlags-to-.patch (19K) Download Attachment
v12-0006-snapshot-scalability-Move-subxact-info-to-ProcGl.patch (26K) Download Attachment
v12-0007-snapshot-scalability-cache-snapshots-using-a-xac.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improving connection scalability: GetSnapshotData()

Ranier Vilela-2
In reply to this post by Andres Freund
Latest Postgres
Windows 64 bits
msvc 2019 64 bits

Patches applied v12-0001 to v12-0007:

 C:\dll\postgres\contrib\pgstattuple\pgstatapprox.c(74,28): warning C4013: 'GetOldestXmin' indefinido; assumindo extern retornando int [C:\dll\postgres  C:\dll\postgres\contrib\pg_visibility\pg_visibility.c(569,29): warning C4013: 'GetOldestXmin' indefinido; assumindo extern retornando int [C:\dll\postgres\pg_visibility.
vcxproj]
 C:\dll\postgres\contrib\pgstattuple\pgstatapprox.c(74,56): error C2065: 'PROCARRAY_FLAGS_VACUUM': identificador nao declarado [C:\dll\postgres\pgstattuple.vcxproj]
  C:\dll\postgres\contrib\pg_visibility\pg_visibility.c(569,58): error C2065: 'PROCARRAY_FLAGS_VACUUM': identificador nao declarado [C:\dll\postgres\pg_visibility.vcxproj]
  C:\dll\postgres\contrib\pg_visibility\pg_visibility.c(686,70): error C2065: 'PROCARRAY_FLAGS_VACUUM': identificador nao declarado [C:\dll\postgres\pg_visibility.vcxproj]

regards,
Ranier Vilela


Reply | Threaded
Open this post in threaded view
|

Re: Improving connection scalability: GetSnapshotData()

Andres Freund
On 2020-07-24 14:05:04 -0300, Ranier Vilela wrote:

> Latest Postgres
> Windows 64 bits
> msvc 2019 64 bits
>
> Patches applied v12-0001 to v12-0007:
>
>  C:\dll\postgres\contrib\pgstattuple\pgstatapprox.c(74,28): warning C4013:
> 'GetOldestXmin' indefinido; assumindo extern retornando int
> [C:\dll\postgres
> C:\dll\postgres\contrib\pg_visibility\pg_visibility.c(569,29): warning
> C4013: 'GetOldestXmin' indefinido; assumindo extern retornando int
> [C:\dll\postgres\pg_visibility.
> vcxproj]
>  C:\dll\postgres\contrib\pgstattuple\pgstatapprox.c(74,56): error C2065:
> 'PROCARRAY_FLAGS_VACUUM': identificador nao declarado
> [C:\dll\postgres\pgstattuple.vcxproj]
>   C:\dll\postgres\contrib\pg_visibility\pg_visibility.c(569,58): error
> C2065: 'PROCARRAY_FLAGS_VACUUM': identificador nao declarado
> [C:\dll\postgres\pg_visibility.vcxproj]
>   C:\dll\postgres\contrib\pg_visibility\pg_visibility.c(686,70): error
> C2065: 'PROCARRAY_FLAGS_VACUUM': identificador nao declarado
> [C:\dll\postgres\pg_visibility.vcxproj]

I don't know that's about - there's no call to GetOldestXmin() in
pgstatapprox and pg_visibility after patch 0002? And similarly, the
PROCARRAY_* references are also removed in the same patch?

Greetings,

Andres Freund


1234