Speed up transaction completion faster after many relations are accessed in a transaction

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
89 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

Speed up transaction completion faster after many relations are accessed in a transaction

Tsunakawa, Takayuki
Hello,

The attached patch speeds up transaction completion when any prior transaction accessed many relations in the same session.

The transaction records its own acquired lock information in the LOCALLOCK structure (a pair of lock object and lock mode).  It stores LOCALLOCKs in a hash table in its local memory.  The hash table automatically expands when the transaction acquires many relations.  The hash table doesn't shrink.  When the transaction commits or aborts, it scans the hash table to find LOCALLOCKs to release locks.

The problem is that once some transaction accesses many relations, even subsequent transactions in the same session that only access a few relations take unreasonably long time to complete, because it has to scan the expanded hash table.

The attached patch links LOCALLOCKS to PGPROC, so that releasing locks should only scan the list instead of the hash table.  The hash table is kept because some functions want to find a particular LOCALLOCK quickly based on its hash value.

This problem was uncovered while evaluating partitioning performance.  When the application PREPAREs a statement once and then EXECUTE-COMMIT repeatedly, the server creates a generic plan on the 6th EXECUTE.  Unfortunately, creation of the generic plan of UPDATE/DELETE currently accesses all partitions of the target table (this itself needs improvement), expanding the LOCALLOCK hash table.  As a result, 7th and later EXECUTEs get slower.

Imai-san confirmed performance improvement with this patch:

https://commitfest.postgresql.org/22/1993/


Regards
Takayuki Tsunakawa


faster-locallock-scan.patch (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Speed up transaction completion faster after many relations are accessed in a transaction

Tom Lane-2
"Tsunakawa, Takayuki" <[hidden email]> writes:
> The attached patch speeds up transaction completion when any prior transaction accessed many relations in the same session.

Hm.  Putting a list header for a purely-local data structure into shared
memory seems quite ugly.  Isn't there a better place to keep that?

Do we really want a dlist here at all?  I'm concerned that bloating
LOCALLOCK will cost us when there are many locks involved.  This patch
increases the size of LOCALLOCK by 25% if I counted right, which does
not seem like a negligible penalty.

My own thought about how to improve this situation was just to destroy
and recreate LockMethodLocalHash at transaction end (or start)
if its size exceeded $some-value.  Leaving it permanently bloated seems
like possibly a bad idea, even if we get rid of all the hash_seq_searches
on it.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Speed up transaction completion faster after many relations are accessed in a transaction

David Rowley-3
On Tue, 19 Feb 2019 at 12:42, Tom Lane <[hidden email]> wrote:
> My own thought about how to improve this situation was just to destroy
> and recreate LockMethodLocalHash at transaction end (or start)
> if its size exceeded $some-value.  Leaving it permanently bloated seems
> like possibly a bad idea, even if we get rid of all the hash_seq_searches
> on it.

That seems like a good idea. Although, it would be good to know that
it didn't add too much overhead dropping and recreating the table when
every transaction happened to obtain more locks than $some-value.  If
it did, then maybe we could track the average locks per of recent
transactions and just ditch the table after the locks are released if
the locks held by the last transaction exceeded the average *
1.something. No need to go near shared memory to do that.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Speed up transaction completion faster after many relations are accessed in a transaction

Andres Freund
Hi,

On 2019-02-19 12:52:08 +1300, David Rowley wrote:

> On Tue, 19 Feb 2019 at 12:42, Tom Lane <[hidden email]> wrote:
> > My own thought about how to improve this situation was just to destroy
> > and recreate LockMethodLocalHash at transaction end (or start)
> > if its size exceeded $some-value.  Leaving it permanently bloated seems
> > like possibly a bad idea, even if we get rid of all the hash_seq_searches
> > on it.
>
> That seems like a good idea. Although, it would be good to know that
> it didn't add too much overhead dropping and recreating the table when
> every transaction happened to obtain more locks than $some-value.  If
> it did, then maybe we could track the average locks per of recent
> transactions and just ditch the table after the locks are released if
> the locks held by the last transaction exceeded the average *
> 1.something. No need to go near shared memory to do that.

Isn't a large portion of benefits in this patch going to be mooted by
the locking improvements discussed in the other threads? I.e. there's
hopefully not going to be a ton of cases with low overhead where we
acquire a lot of locks and release them very soon after. Sure, for DDL
etc we will, but I can't see this mattering from a performance POV?

I'm not against doing something like Tom proposes, but heuristics with
magic constants like this tend to age purely / are hard to tune well
across systems.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Speed up transaction completion faster after many relations are accessed in a transaction

Tom Lane-2
In reply to this post by David Rowley-3
David Rowley <[hidden email]> writes:
> On Tue, 19 Feb 2019 at 12:42, Tom Lane <[hidden email]> wrote:
>> My own thought about how to improve this situation was just to destroy
>> and recreate LockMethodLocalHash at transaction end (or start)
>> if its size exceeded $some-value.  Leaving it permanently bloated seems
>> like possibly a bad idea, even if we get rid of all the hash_seq_searches
>> on it.

> That seems like a good idea. Although, it would be good to know that
> it didn't add too much overhead dropping and recreating the table when
> every transaction happened to obtain more locks than $some-value.  If
> it did, then maybe we could track the average locks per of recent
> transactions and just ditch the table after the locks are released if
> the locks held by the last transaction exceeded the average *
> 1.something. No need to go near shared memory to do that.

Yeah, I'd deliberately avoided saying how we'd choose $some-value ;-).
Making it adaptive might not be a bad plan.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Speed up transaction completion faster after many relations are accessed in a transaction

Tom Lane-2
In reply to this post by Andres Freund
Andres Freund <[hidden email]> writes:
> Isn't a large portion of benefits in this patch going to be mooted by
> the locking improvements discussed in the other threads? I.e. there's
> hopefully not going to be a ton of cases with low overhead where we
> acquire a lot of locks and release them very soon after. Sure, for DDL
> etc we will, but I can't see this mattering from a performance POV?

Mmm ... AIUI, the patches currently proposed can only help for what
David called "point lookup" queries.  There are still going to be
queries that scan a large proportion of a partition tree, so if you've
got tons of partitions, you'll be concerned about this sort of thing.

> I'm not against doing something like Tom proposes, but heuristics with
> magic constants like this tend to age purely / are hard to tune well
> across systems.

I didn't say it had to be a constant ...

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Speed up transaction completion faster after many relations are accessed in a transaction

David Rowley-3
In reply to this post by Andres Freund
On Tue, 19 Feb 2019 at 12:56, Andres Freund <[hidden email]> wrote:
> Isn't a large portion of benefits in this patch going to be mooted by
> the locking improvements discussed in the other threads? I.e. there's
> hopefully not going to be a ton of cases with low overhead where we
> acquire a lot of locks and release them very soon after. Sure, for DDL
> etc we will, but I can't see this mattering from a performance POV?

I think this patch was born from Amit's partition planner improvement
patch. If not that one, which other threads did you have in mind?

A problem exists where, if using a PREPAREd statement to plan a query
to a partitioned table containing many partitions that a generic plan
will never be favoured over a custom plan since the generic plan might
not be able to prune partitions like the custom plan can.   The actual
problem is around that we do need to at some point generate a generic
plan in order to know it's more costly and that requires locking
possibly every partition.  When plan_cache_mode = auto, this is done
on the 6th execution of the statement.  After Amit's partition planner
changes [1], the custom plan will only lock partitions that are not
pruned, so the 6th execution of the statement bloats the local lock
table.

[1] https://commitfest.postgresql.org/22/1778/

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Speed up transaction completion faster after many relations are accessed in a transaction

Andres Freund
In reply to this post by Tom Lane-2
Hi,

On 2019-02-18 19:01:06 -0500, Tom Lane wrote:

> Andres Freund <[hidden email]> writes:
> > Isn't a large portion of benefits in this patch going to be mooted by
> > the locking improvements discussed in the other threads? I.e. there's
> > hopefully not going to be a ton of cases with low overhead where we
> > acquire a lot of locks and release them very soon after. Sure, for DDL
> > etc we will, but I can't see this mattering from a performance POV?
>
> Mmm ... AIUI, the patches currently proposed can only help for what
> David called "point lookup" queries.  There are still going to be
> queries that scan a large proportion of a partition tree, so if you've
> got tons of partitions, you'll be concerned about this sort of thing.

Agreed - but it seems not unlikely that for those the rest of the
planner / executor overhead will entirely swamp any improvement we could
make here. If I understand correctly the benchmarks here were made with
"point" update and select queries, although the reference in the first
post in this thread is a bit vague.


> > I'm not against doing something like Tom proposes, but heuristics with
> > magic constants like this tend to age purely / are hard to tune well
> > across systems.
>
> I didn't say it had to be a constant ...

Do you have good idea?

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Speed up transaction completion faster after many relations are accessed in a transaction

Tom Lane-2
Andres Freund <[hidden email]> writes:
> On 2019-02-18 19:01:06 -0500, Tom Lane wrote:
>> Mmm ... AIUI, the patches currently proposed can only help for what
>> David called "point lookup" queries.  There are still going to be
>> queries that scan a large proportion of a partition tree, so if you've
>> got tons of partitions, you'll be concerned about this sort of thing.

> Agreed - but it seems not unlikely that for those the rest of the
> planner / executor overhead will entirely swamp any improvement we could
> make here. If I understand correctly the benchmarks here were made with
> "point" update and select queries, although the reference in the first
> post in this thread is a bit vague.

I think what Maumau-san is on about here is that not only does your
$giant-query take a long time, but it has a permanent negative effect
on all subsequent transactions in the session.  That seems worth
doing something about.

>> I didn't say it had to be a constant ...

> Do you have good idea?

I think David's on the right track --- keep some kind of moving average of
the LOCALLOCK table size for each transaction, and nuke it if it exceeds
some multiple of the recent average.  Not sure offhand about how to get
the data cheaply --- it might not be sufficient to look at transaction
end, if we release LOCALLOCK entries before that (but do we?)

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Speed up transaction completion faster after many relations are accessed in a transaction

Andres Freund
In reply to this post by Tom Lane-2
Hi,

On 2019-02-18 18:42:32 -0500, Tom Lane wrote:
> "Tsunakawa, Takayuki" <[hidden email]> writes:
> > The attached patch speeds up transaction completion when any prior transaction accessed many relations in the same session.
>
> Hm.  Putting a list header for a purely-local data structure into shared
> memory seems quite ugly.  Isn't there a better place to keep that?

Yea, I think it'd be just as fine to store that in a static
variable (best defined directly besides LockMethodLocalHash).

(Btw, I'd be entirely unsurprised if moving away from a dynahash for
LockMethodLocalHash would be beneficial)


> Do we really want a dlist here at all?  I'm concerned that bloating
> LOCALLOCK will cost us when there are many locks involved.  This patch
> increases the size of LOCALLOCK by 25% if I counted right, which does
> not seem like a negligible penalty.

It's currently

struct LOCALLOCK {
        LOCALLOCKTAG               tag;                  /*     0    20 */

        /* XXX 4 bytes hole, try to pack */

        LOCK *                     lock;                 /*    24     8 */
        PROCLOCK *                 proclock;             /*    32     8 */
        uint32                     hashcode;             /*    40     4 */

        /* XXX 4 bytes hole, try to pack */

        int64                      nLocks;               /*    48     8 */
        _Bool                      holdsStrongLockCount; /*    56     1 */
        _Bool                      lockCleared;          /*    57     1 */

        /* XXX 2 bytes hole, try to pack */

        int                        numLockOwners;        /*    60     4 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        int                        maxLockOwners;        /*    64     4 */

        /* XXX 4 bytes hole, try to pack */

        LOCALLOCKOWNER *           lockOwners;           /*    72     8 */

        /* size: 80, cachelines: 2, members: 10 */
        /* sum members: 66, holes: 4, sum holes: 14 */
        /* last cacheline: 16 bytes */
};

seems we could trivially squeeze most of the bytes for a dlist node out
of padding.


> My own thought about how to improve this situation was just to destroy
> and recreate LockMethodLocalHash at transaction end (or start)
> if its size exceeded $some-value.  Leaving it permanently bloated seems
> like possibly a bad idea, even if we get rid of all the hash_seq_searches
> on it.

OTOH, that'll force constant incremental resizing of the hashtable, for
workloads that regularly need a lot of locks. And I'd assume in most
cases if one transaction needs a lot of locks it's quite likely that
future ones will need a lot of locks, too.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Speed up transaction completion faster after many relations are accessed in a transaction

Andres Freund
In reply to this post by Tom Lane-2
Hi,

On 2019-02-18 19:13:31 -0500, Tom Lane wrote:

> Andres Freund <[hidden email]> writes:
> > On 2019-02-18 19:01:06 -0500, Tom Lane wrote:
> >> Mmm ... AIUI, the patches currently proposed can only help for what
> >> David called "point lookup" queries.  There are still going to be
> >> queries that scan a large proportion of a partition tree, so if you've
> >> got tons of partitions, you'll be concerned about this sort of thing.
>
> > Agreed - but it seems not unlikely that for those the rest of the
> > planner / executor overhead will entirely swamp any improvement we could
> > make here. If I understand correctly the benchmarks here were made with
> > "point" update and select queries, although the reference in the first
> > post in this thread is a bit vague.
>
> I think what Maumau-san is on about here is that not only does your
> $giant-query take a long time, but it has a permanent negative effect
> on all subsequent transactions in the session.  That seems worth
> doing something about.

Ah, yes, that makes sense.  I'm inclined to think however that the
original approach presented in this thread is better than the
reset-the-whole-hashtable approach. Because:


> I think David's on the right track --- keep some kind of moving average of
> the LOCALLOCK table size for each transaction, and nuke it if it exceeds
> some multiple of the recent average.  Not sure offhand about how to get
> the data cheaply --- it might not be sufficient to look at transaction
> end, if we release LOCALLOCK entries before that (but do we?)

Seems too complicated for my taste. And it doesn't solve the issue of
having some transactions with few locks (say because the plan can be
nicely pruned) interspersed with transactions where a lot of locks are
acquired.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Speed up transaction completion faster after many relations are accessed in a transaction

Tom Lane-2
In reply to this post by Andres Freund
Andres Freund <[hidden email]> writes:
> On 2019-02-18 18:42:32 -0500, Tom Lane wrote:
>> Do we really want a dlist here at all?  I'm concerned that bloating
>> LOCALLOCK will cost us when there are many locks involved.  This patch
>> increases the size of LOCALLOCK by 25% if I counted right, which does
>> not seem like a negligible penalty.

> It's currently [ 80 bytes with several padding holes ]
> seems we could trivially squeeze most of the bytes for a dlist node out
> of padding.

Yeah, but if we want to rearrange the members into an illogical order
to save some space, we should do that independently of this patch ---
and then the overhead of this patch would be even worse than 25%.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Speed up transaction completion faster after many relations are accessed in a transaction

Andres Freund
Hi,

On 2019-02-18 19:24:54 -0500, Tom Lane wrote:
> Yeah, but if we want to rearrange the members into an illogical order
> to save some space, we should do that independently of this patch ---

Sure, we should do that. I don't buy the "illogical" bit, just moving
hashcode up to after tag isn't more or less logical, and saves most of
the padding, and moving the booleans to the end isn't better/worse
either.

You always bring up that argument. While I agree that sometimes the most
optimal ordering can be less natural, I think most of the time it vastly
overstates how intelligent the original ordering was. Often new elements
were either just added iteratively without consideration for padding, or
the attention to padding was paid in 32bit times.

I don't find

struct LOCALLOCK {
        LOCALLOCKTAG               tag;                  /*     0    20 */
        uint32                     hashcode;             /*    20     4 */
        LOCK *                     lock;                 /*    24     8 */
        PROCLOCK *                 proclock;             /*    32     8 */
        int64                      nLocks;               /*    40     8 */
        int                        numLockOwners;        /*    48     4 */
        int                        maxLockOwners;        /*    52     4 */
        LOCALLOCKOWNER *           lockOwners;           /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        _Bool                      holdsStrongLockCount; /*    64     1 */
        _Bool                      lockCleared;          /*    65     1 */

        /* size: 72, cachelines: 2, members: 10 */
        /* padding: 6 */
        /* last cacheline: 8 bytes */
};

less clear than

struct LOCALLOCK {
        LOCALLOCKTAG               tag;                  /*     0    20 */

        /* XXX 4 bytes hole, try to pack */

        LOCK *                     lock;                 /*    24     8 */
        PROCLOCK *                 proclock;             /*    32     8 */
        uint32                     hashcode;             /*    40     4 */

        /* XXX 4 bytes hole, try to pack */

        int64                      nLocks;               /*    48     8 */
        _Bool                      holdsStrongLockCount; /*    56     1 */
        _Bool                      lockCleared;          /*    57     1 */

        /* XXX 2 bytes hole, try to pack */

        int                        numLockOwners;        /*    60     4 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        int                        maxLockOwners;        /*    64     4 */

        /* XXX 4 bytes hole, try to pack */

        LOCALLOCKOWNER *           lockOwners;           /*    72     8 */

        /* size: 80, cachelines: 2, members: 10 */
        /* sum members: 66, holes: 4, sum holes: 14 */
        /* last cacheline: 16 bytes */
};

but it's smaller (althoug there's plenty trailing space).


> and then the overhead of this patch would be even worse than 25%.

IDK, we, including you, very often make largely independent improvements
to make the cost of something else more palpable. Why's that not OK
here?  Especially because we're not comparing to an alternative where no
cost is added, keeping track of e.g. a running average of the hashtable
size isn't free either; nor does it help in the intermittent cases.

- Andres

Reply | Threaded
Open this post in threaded view
|

Re: Speed up transaction completion faster after many relations are accessed in a transaction

Tom Lane-2
Andres Freund <[hidden email]> writes:
> On 2019-02-18 19:24:54 -0500, Tom Lane wrote:
>> Yeah, but if we want to rearrange the members into an illogical order
>> to save some space, we should do that independently of this patch ---

> Sure, we should do that. I don't buy the "illogical" bit, just moving
> hashcode up to after tag isn't more or less logical, and saves most of
> the padding, and moving the booleans to the end isn't better/worse
> either.

I hadn't looked at the details closely, but if we can squeeze out the
padding space without any loss of intelligibility, sure let's do so.
I still say that's independent of whether to adopt this patch though.

> but it's smaller (althoug there's plenty trailing space).

I think you're supposing that these things are independently palloc'd, but
they aren't --- dynahash lays them out in arrays without palloc padding.

> IDK, we, including you, very often make largely independent improvements
> to make the cost of something else more palpable. Why's that not OK
> here?

When we do that, we aren't normally talking about overheads as high as
25% (even more, if it's measured as I think it ought to be).  What I'm
concerned about is that the patch is being advocated for cases where
there are lots of LOCALLOCK entries --- which is exactly where the
space overhead is going to hurt the most.

> Especially because we're not comparing to an alternative where no
> cost is added, keeping track of e.g. a running average of the hashtable
> size isn't free either; nor does it help in the intermittent cases.

What I was hoping for --- though perhaps it's not achievable --- was
statistical overhead amounting to just a few more instructions per
transaction.  Adding dlist linking would add more instructions per
hashtable entry/removal, which seems like it'd be a substantially
bigger time penalty.  As for the intermittent-usage issue, that largely
depends on the details of the when-to-reset heuristic, which we don't
have a concrete design for yet.  But I could certainly imagine it waiting
for a few transactions before deciding to chomp.

Anyway, I'm not trying to veto the patch in this form, just suggesting
that there are alternatives worth thinking about.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Speed up transaction completion faster after many relations are accessed in a transaction

Andres Freund
Hi,

On 2019-02-18 20:29:29 -0500, Tom Lane wrote:
> Andres Freund <[hidden email]> writes:
> > but it's smaller (althoug there's plenty trailing space).
>
> I think you're supposing that these things are independently palloc'd, but
> they aren't --- dynahash lays them out in arrays without palloc padding.

I don't think that matters, given that the trailing six bytes are
included in sizeof() (and have to, to guarantee suitable alignment in
arrays etc).

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Speed up transaction completion faster after many relations are accessed in a transaction

Simon Riggs
In reply to this post by Andres Freund
On Tue, 19 Feb 2019 at 00:20, Andres Freund <[hidden email]> wrote:
 
On 2019-02-18 19:13:31 -0500, Tom Lane wrote:
> Andres Freund <[hidden email]> writes:
> > On 2019-02-18 19:01:06 -0500, Tom Lane wrote:
> >> Mmm ... AIUI, the patches currently proposed can only help for what
> >> David called "point lookup" queries.  There are still going to be
> >> queries that scan a large proportion of a partition tree, so if you've
> >> got tons of partitions, you'll be concerned about this sort of thing.
 
> I think what Maumau-san is on about here is that not only does your
> $giant-query take a long time, but it has a permanent negative effect
> on all subsequent transactions in the session.  That seems worth
> doing something about.

Ah, yes, that makes sense.  I'm inclined to think however that the
original approach presented in this thread is better than the
reset-the-whole-hashtable approach.

If it was just many-tables then blowing away the hash table would work fine.

The main issue seems to be with partitioning, not with the general case of many-tables. For that case, it seems like reset hashtable is too much.

Can we use our knowledge of the structure of locks, i.e. partition locks are all children of the partitioned table, to do a better job?

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: Speed up transaction completion faster after many relations are accessed in a transaction

Tomas Vondra-4
In reply to this post by Tsunakawa, Takayuki


On 2/12/19 7:33 AM, Tsunakawa, Takayuki wrote:

> ...
>
> This problem was uncovered while evaluating partitioning performance.
> When the application PREPAREs a statement once and then
> EXECUTE-COMMIT repeatedly, the server creates a generic plan on the
> 6th EXECUTE.  Unfortunately, creation of the generic plan of
> UPDATE/DELETE currently accesses all partitions of the target table
> (this itself needs improvement), expanding the LOCALLOCK hash table.
> As a result, 7th and later EXECUTEs get slower.
>
> Imai-san confirmed performance improvement with this patch:
>
> https://commitfest.postgresql.org/22/1993/
>

Can you quantify the effects? That is, how much slower/faster does it get?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

RE: Speed up transaction completion faster after many relations are accessed in a transaction

Tsunakawa, Takayuki
From: Tomas Vondra [mailto:[hidden email]]
> On 2/12/19 7:33 AM, Tsunakawa, Takayuki wrote:
> > Imai-san confirmed performance improvement with this patch:
> >
> > https://commitfest.postgresql.org/22/1993/
> >
>
> Can you quantify the effects? That is, how much slower/faster does it get?

Ugh, sorry, I wrote a wrong URL.  The correct page is:

https://www.postgresql.org/message-id/0F97FA9ABBDBE54F91744A9B37151A512787EC%40g01jpexmbkw24

The quoted figures re:

[v20 + faster-locallock-scan.patch]
auto:   9,069 TPS
custom: 9,015 TPS

[v20]
auto:   8,037 TPS
custom: 8,933 TPS


In the original problematic case, plan_cache_mode = auto (default), we can see about 13% improvement.


Regards
Takayuki Tsunakawa



Reply | Threaded
Open this post in threaded view
|

RE: Speed up transaction completion faster after many relations are accessed in a transaction

Tsunakawa, Takayuki
In reply to this post by Simon Riggs
From: Tom Lane [mailto:[hidden email]]
> Hm.  Putting a list header for a purely-local data structure into shared
> memory seems quite ugly.  Isn't there a better place to keep that?

Agreed.  I put it in the global variable.


> Do we really want a dlist here at all?  I'm concerned that bloating
> LOCALLOCK will cost us when there are many locks involved.  This patch
> increases the size of LOCALLOCK by 25% if I counted right, which does
> not seem like a negligible penalty.

To delete the LOCALLOCK in RemoveLocalLock(), we need a dlist.  slist requires the list iterator to be passed from callers.


From: Andres Freund [mailto:[hidden email]]
> Sure, we should do that. I don't buy the "illogical" bit, just moving
> hashcode up to after tag isn't more or less logical, and saves most of
> the padding, and moving the booleans to the end isn't better/worse
> either.
>
> I don't find

Thanks, I've done it.  


From: Simon Riggs [mailto:[hidden email]]
> Can we use our knowledge of the structure of locks, i.e. partition locks
> are all children of the partitioned table, to do a better job?

I couldn't come up with a idea.


Regards
Takayuki Tsunakawa


faster-locallock-scan_v2.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Speed up transaction completion faster after many relations are accessed in a transaction

Peter Eisentraut-6
On 2019-02-20 07:20, Tsunakawa, Takayuki wrote:
> From: Tom Lane [mailto:[hidden email]]
>> Hm.  Putting a list header for a purely-local data structure into shared
>> memory seems quite ugly.  Isn't there a better place to keep that?
>
> Agreed.  I put it in the global variable.

I think there is agreement on the principles of this patch.  Perhaps it
could be polished a bit.

Your changes in LOCALLOCK still refer to PGPROC, from your first version
of the patch.

I think the reordering of struct members could be done as a separate
preliminary patch.

Some more documentation in the comment before dlist_head LocalLocks to
explain this whole mechanism would be nice.

You posted a link to some performance numbers, but I didn't see the test
setup explained there.  I'd like to get some more information on this
impact of this.  Is there an effect with 100 tables, or do you need 100000?

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

12345