SubtransControlLock and performance problems

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

SubtransControlLock and performance problems

Lars Aksel Opsahl-2

Hi


On a server with 32 cores and 250 GB memory, with CentOS 7 and kernel 4.4.214-1.el7.elrepo.x86_64, I try to run 30 parallel threads using dblink. (https://github.com/larsop/postgres_execute_parallel) . I have tried to disconnect and reconnect in the dblink code and that did not help.

If I reduce the number of threads I get less CPU usage and much less SubtransControlLock.

Each thread are inserting many lines into a Postgis Topology layer. I have a lot of try catch in this code to avoid missing lines (https://github.com/larsop/resolve-overlap-and-gap/tree/add_postgis_topology_using_proc) .


What happens is that after some minutes the CPU can fall to maybe 20% usage and most of the threads are blocked by SubtransControlLock, and when the number SubtransControlLock goes down the CPU load increases again. The jobs usually goes through without any errors, but it takes to long time because of the SubtransControlLock blocks.


There is no iowait on the server and there is plenty of free memory on the server. There seems to be no locks on the common tables. 

“SELECT relation::regclass, * FROM pg_locks WHERE NOT GRANTED;” is always empty. 


I am using a lot temp tables and unlogged tables.


To reduce the number locks I have a simple check before I kick off new jobs like the one below, but that did not help very much either. Yes it does a lot waiting, but SubtransControlLock kick in when all threads are up running again.


LOOP

EXECUTE Format('SELECT count(*) from pg_stat_activity where wait_event = %L and query like %L',

'SubtransControlLock',

'CALL resolve_overlap_gap_save_single_cells%') into subtransControlLock;

EXIT WHEN subtransControlLock = 0;

subtransControlLock_count := subtransControlLock_count + 1;

PERFORM pg_sleep(subtransControlLock*subtransControlLock_count*0.1);

END LOOP;


I have tested with postgres 11, postgres 12, postgis 2.5 , postgis 3.0 and it seems to behave save.


I have also tried to recompile postgres with the setting below and that did not solve the problem either.

/* Number of SLRU buffers to use for subtrans */

#define NUM_SUBTRANS_BUFFERS 2048



I have tested different values for memory and other settings nothing seems to matter. Here are the settings right now.


maintenance_work_mem = 8GB

max_connections = 600

work_mem = 500MB

temp_buffers = 100MB

shared_buffers = 64GB

effective_cache_size = 124GB

wal_buffers = 640MB

seq_page_cost = 2.0

random_page_cost = 2.0

checkpoint_flush_after = 2MB

checkpoint_completion_target = 0.9

default_statistics_target = 1000

shared_preload_libraries = 'pg_stat_statements'

pg_stat_statements.max = 10000

pg_stat_statements.track = all

effective_io_concurrency = 500 # 1-1000; 0 disables prefetching

# test to avoid SubtransControlLock

#bgwriter_lru_maxpages = 100000

#bgwriter_lru_maxpages=0

#bgwriter_delay = 20ms

synchronous_commit = off


Any idea about how to solve this ?


Lars


Reply | Threaded
Open this post in threaded view
|

Re: SubtransControlLock and performance problems

Laurenz Albe
Lars Aksel Opsahl wrote:
> What happens is that after some minutes the CPU can fall to maybe 20% usage and most of
> the threads are blocked by SubtransControlLock, and when the number SubtransControlLock
> goes down the CPU load increases again. The jobs usually goes through without any errors,
> but it takes to long time because of the SubtransControlLock blocks.

That's typically a sign that you are using more than 64 subtransactions per transaction.

That could either be SAVEPOINT SQL statements or PL/pgSQL code with blocks
containing the EXCEPTION clause.

The data structure in shared memory that holds information for each session
can cache 64 subtransactions, beyond that it has to access "pg_subtrans" to get
the required information, which leads to contention.

Often the problem is caused by a misguided attempt to wrape every single
statement in a subtransaction to emulate the behavior of other database
systems, for example with the "autosave = always" option of the JDBC driver.

The solution is to use fewer subtransactions per transaction.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com



Reply | Threaded
Open this post in threaded view
|

Re: SubtransControlLock and performance problems

Lars Aksel Opsahl-2


>From: Laurenz Albe <laurenz.albe@cybertec.at>

>Sent: Monday, February 17, 2020 10:53 AM

>To: Lars Aksel Opsahl <Lars.Opsahl@nibio.no>; pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>

>Subject: Re: SubtransControlLock and performance problems

> 

>Lars Aksel Opsahl wrote:

>> What happens is that after some minutes the CPU can fall to maybe 20% usage and most of

>> the threads are blocked by SubtransControlLock, and when the number SubtransControlLock

>> goes down the CPU load increases again. The jobs usually goes through without any errors,

>> but it takes to long time because of the SubtransControlLock blocks.

>

>That's typically a sign that you are using more than 64 subtransactions per transaction.

>

>That could either be SAVEPOINT SQL statements or PL/pgSQL code with blocks

>containing the EXCEPTION clause.

>

>The data structure in shared memory that holds information for each session

>can cache 64 subtransactions, beyond that it has to access "pg_subtrans" to get

> the required information, which leads to contention.

>

> Often the problem is caused by a misguided attempt to wrape every single

> statement in a subtransaction to emulate the behavior of other database

> systems, for example with the "autosave = always" option of the JDBC driver.

>

> The solution is to use fewer subtransactions per transaction.

>


Hi


I have tested in branch ( https://github.com/larsop/resolve-overlap-and-gap/tree/add_postgis_topology_using_func) where I use only have functions and no procedures and I still have the same problem with subtransaction locks. 


Can I based on this assume that the problem is only related to exceptions  ?


Does this mean that if have 32 threads running in parallel and I get 2 exceptions in each thread I have reached a state where I will get contention ?


Is it any way increase from 64 to a much higher level, when compiling the code ?


Basically what I do here is that I catch exceptions when get them and tries to solve the problem in a alternative way.


Thanks a lot.

 

Lars


Reply | Threaded
Open this post in threaded view
|

Re: SubtransControlLock and performance problems

Laurenz Albe
On Mon, 2020-02-17 at 15:03 +0000, Lars Aksel Opsahl wrote:
> I have tested in branch ( https://github.com/larsop/resolve-overlap-and-gap/tree/add_postgis_topology_using_func)
> where I use only have functions and no procedures and I still have the same problem with subtransaction locks.
>
> Can I based on this assume that the problem is only related to exceptions  ?

No, it is related to BEGIN ... EXCEPTION ... END blocks, no matter if
an exception is thrown or not.

As soon as execution enters such a block, a subtransaction is started.

> Does this mean that if have 32 threads running in parallel and I get 2 exceptions in each thread I have reached a state where I will get contention ?

No, it means that if you enter a block with an EXCEPTION clause more
than 64 times in a single transaction, performance will drop.

> Is it any way increase from 64 to a much higher level, when compiling the code ?

Yes, you can increase PGPROC_MAX_CACHED_SUBXIDS in src/include/storage/proc.h

> Basically what I do here is that I catch exceptions when get them and tries to solve the problem in a alternative way.

Either use shorter transactions, or start fewer subtransactions.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com



Reply | Threaded
Open this post in threaded view
|

Re: SubtransControlLock and performance problems

Pavel Stehule

Hi

po 17. 2. 2020 v 17:36 odesílatel Laurenz Albe <[hidden email]> napsal:
On Mon, 2020-02-17 at 15:03 +0000, Lars Aksel Opsahl wrote:
> I have tested in branch ( https://github.com/larsop/resolve-overlap-and-gap/tree/add_postgis_topology_using_func)
> where I use only have functions and no procedures and I still have the same problem with subtransaction locks.
>
> Can I based on this assume that the problem is only related to exceptions  ?

No, it is related to BEGIN ... EXCEPTION ... END blocks, no matter if
an exception is thrown or not.

As soon as execution enters such a block, a subtransaction is started.

> Does this mean that if have 32 threads running in parallel and I get 2 exceptions in each thread I have reached a state where I will get contention ?

No, it means that if you enter a block with an EXCEPTION clause more
than 64 times in a single transaction, performance will drop.

> Is it any way increase from 64 to a much higher level, when compiling the code ?

Yes, you can increase PGPROC_MAX_CACHED_SUBXIDS in src/include/storage/proc.h

> Basically what I do here is that I catch exceptions when get them and tries to solve the problem in a alternative way.

Either use shorter transactions, or start fewer subtransactions.

Yours,
Laurenz Albe

it is interesting topic, but I don't see it in my example

CREATE OR REPLACE FUNCTION public.fx(integer)
 RETURNS void
 LANGUAGE plpgsql
AS $function$
begin
  for i in 1..$1 loop
  begin
    --raise notice 'xx';
exception when others then
  raise notice 'yyy';
end;
end loop;
end;
$function$

the execution time is without performance drops.

Is there some prerequisite to see performance problems?

Pavel

--
Cybertec | https://www.cybertec-postgresql.com



Reply | Threaded
Open this post in threaded view
|

Re: SubtransControlLock and performance problems

Tom Lane-2
Pavel Stehule <[hidden email]> writes:
> po 17. 2. 2020 v 17:36 odesílatel Laurenz Albe <[hidden email]>
> napsal:
>> Either use shorter transactions, or start fewer subtransactions.

> it is interesting topic, but I don't see it in my example

> CREATE OR REPLACE FUNCTION public.fx(integer)
>  RETURNS void
>  LANGUAGE plpgsql
> AS $function$
> begin
>   for i in 1..$1 loop
>   begin
>     --raise notice 'xx';
> exception when others then
>   raise notice 'yyy';
> end;
> end loop;
> end;
> $function$

This example doesn't create or modify any table rows within the
subtransactions, so (I think) we won't assign XIDs to them.
It's consumption of subtransaction XIDs that causes the issue.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: SubtransControlLock and performance problems

Pavel Stehule


po 17. 2. 2020 v 19:23 odesílatel Tom Lane <[hidden email]> napsal:
Pavel Stehule <[hidden email]> writes:
> po 17. 2. 2020 v 17:36 odesílatel Laurenz Albe <[hidden email]>
> napsal:
>> Either use shorter transactions, or start fewer subtransactions.

> it is interesting topic, but I don't see it in my example

> CREATE OR REPLACE FUNCTION public.fx(integer)
>  RETURNS void
>  LANGUAGE plpgsql
> AS $function$
> begin
>   for i in 1..$1 loop
>   begin
>     --raise notice 'xx';
> exception when others then
>   raise notice 'yyy';
> end;
> end loop;
> end;
> $function$

This example doesn't create or modify any table rows within the
subtransactions, so (I think) we won't assign XIDs to them.
It's consumption of subtransaction XIDs that causes the issue.

I tested

CREATE OR REPLACE FUNCTION public.fx(integer)
 RETURNS void
 LANGUAGE plpgsql
AS $function$
begin
  for i in 1..$1 loop
  begin
    insert into foo values(i);
exception when others then
  raise notice 'yyy';
end;
end loop;
end;
$function$

and I don't see any significant difference between numbers less than 64 and higher



                        regards, tom lane
Reply | Threaded
Open this post in threaded view
|

Re: SubtransControlLock and performance problems

Alvaro Herrera-9
In reply to this post by Lars Aksel Opsahl-2
On 2020-Feb-16, Lars Aksel Opsahl wrote:

> On a server with 32 cores and 250 GB memory, with CentOS 7 and kernel
> 4.4.214-1.el7.elrepo.x86_64, I try to run 30 parallel threads using
> dblink. (https://github.com/larsop/postgres_execute_parallel) . I have
> tried to disconnect and reconnect in the dblink code and that did not
> help.

I think one issue is that pg_clog has 128 buffers (per commit
5364b357fb1) while subtrans only has 32.  It might be productive to
raise the number of subtrans buffers (see #define NUM_SUBTRANS_BUFFERS
in src/include/access/subtrans.h; requires a recompile.)  Considering
that each subtrans entry is 16 times larger than clog (2 bits vs. 4
bytes), you'd require 2048 subtrans buffers to cover the same XID range
without I/O if my math is right.  That's only 16 MB ...  though slru.c
code might not be prepared to deal with that many buffers.  Worth some
experimentation, I guess.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: SubtransControlLock and performance problems

Laurenz Albe
In reply to this post by Pavel Stehule
On Mon, 2020-02-17 at 19:41 +0100, Pavel Stehule wrote:

> I tested
>
> CREATE OR REPLACE FUNCTION public.fx(integer)
>  RETURNS void
>  LANGUAGE plpgsql
> AS $function$
> begin
>   for i in 1..$1 loop
>   begin
>     insert into foo values(i);
> exception when others then
>   raise notice 'yyy';
> end;
> end loop;
> end;
> $function$
>
> and I don't see any significant difference between numbers less than 64 and higher

Did you have several concurrent sessions accessing the rows that others created?

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com



Reply | Threaded
Open this post in threaded view
|

Re: SubtransControlLock and performance problems

Pavel Stehule


út 18. 2. 2020 v 18:27 odesílatel Laurenz Albe <[hidden email]> napsal:
On Mon, 2020-02-17 at 19:41 +0100, Pavel Stehule wrote:
> I tested
>
> CREATE OR REPLACE FUNCTION public.fx(integer)
>  RETURNS void
>  LANGUAGE plpgsql
> AS $function$
> begin
>   for i in 1..$1 loop
>   begin
>     insert into foo values(i);
> exception when others then
>   raise notice 'yyy';
> end;
> end loop;
> end;
> $function$
>
> and I don't see any significant difference between numbers less than 64 and higher

Did you have several concurrent sessions accessing the rows that others created?

no, I didn't

Pavel


Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

Reply | Threaded
Open this post in threaded view
|

Re: SubtransControlLock and performance problems

Lars Aksel Opsahl-2
In reply to this post by Laurenz Albe
Hi


>From: Laurenz Albe <laurenz.albe@cybertec.at>

>Sent: Tuesday, February 18, 2020 6:27 PM

>ATo: Pavel Stehule <pavel.stehule@gmail.com>; Tom Lane <tgl@sss.pgh.pa.us>

>Cc: Lars Aksel Opsahl <Lars.Opsahl@nibio.no>; pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>

>Subject: Re: SubtransControlLock and performance problems

>

>Did you have several concurrent sessions accessing the rows that others created?


Hi


Thanks every body, I have done more testing here..


- I was not able fix this problem by increasing this values

src/include/access/subtrans.h, define NUM_SUBTRANS_BUFFERS 8196

src/include/storage/proc.h , PGPROC_MAX_CACHED_SUBXIDS 128


If tried to increase PGPROC_MAX_CACHED_SUBXIDS more than 128 Postgres core dumped. I tried to increase shared memory and other settings but I was not able to get it statble.


With the values above I did see same performance problems and we ended with a lot of subtransControlLock.


So I started to change the code based on your feedbacks.


- What seems to work very good in combination with a catch exception and retry pattern is to insert the data in to separate table for each job. (I the current testcase we reduced the number of subtransControlLock from many hundreds to almost none.)


Then I later can pick up these results from different the tables with another job that inserts data in to common data structure and in this job I don’t have any catch retry pattern. Then I was able to handle 534 of 592 jobs/cells with out any subtransControlLock at all.


But 58 jobs did not finish so for these I had to use a catch retry pattern and then then I got the subtransControlLock problems, but thats for a limited sets of the data.


Between each job I also close open the connections I dblink.


In this test I used dataset with data set 619230 surface with total of 25909671 and it did finish in 24:42.363, with NUM_SUBTRANS_BUFFERS 8196 and PGPROC_MAX_CACHED_SUBXIDS 128. When I changed this back to the original values the same test took 23:54.973.


For me it’s seems like in Postgres it’s better to have functions that returns an error state together with the result and not throws an exceptions, because exceptions leads performance degeneration when working with big datasets.


Thanks


Lars


Reply | Threaded
Open this post in threaded view
|

Re: SubtransControlLock and performance problems

Alvaro Herrera-9
On 2020-Feb-19, Lars Aksel Opsahl wrote:

> With the values above I did see same performance problems and we ended
> with a lot of subtransControlLock.
>
> So I started to change the code based on your feedbacks.
>
> - What seems to work very good in combination with a catch exception
> and retry pattern is to insert the data in to separate table for each
> job. (I the current testcase we reduced the number of
> subtransControlLock from many hundreds to almost none.)

I think at this point your only recourse is to start taking profiles to
see where the time is going.  Without that, you're just flying blind and
whatever you do will not necessarily move your needle at all.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: SubtransControlLock and performance problems

Lars Aksel Opsahl-2
Hi


>From: Alvaro Herrera <[hidden email]>

>Sent: Wednesday, February 19, 2020 4:23 PM

>To: Lars Aksel Opsahl <Lars.Opsahl@nibio.no>

>Cc: Laurenz Albe <laurenz.albe@cybertec.at>; Pavel Stehule <pavel.stehule@gmail.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>

>Subject: Re: SubtransControlLock and performance problems

> 

>On 2020-Feb-19, Lars Aksel Opsahl wrote:

>

>> With the values above I did see same performance problems and we ended

>> with a lot of subtransControlLock.

>> 

>> So I started to change the code based on your feedbacks.

>> 

>> - What seems to work very good in combination with a catch exception

>> and retry pattern is to insert the data in to separate table for each

>> job. (I the current testcase we reduced the number of

>> subtransControlLock from many hundreds to almost none.)

>

>I think at this point your only recourse is to start taking profiles to

>see where the time is going.  Without that, you're just flying blind and

>whatever you do will not necessarily move your needle at all.


Hi

Yes I totally agree with you and yes I have tried to do some profiling and testing while developing.

From the worst case to best case the time is reduced 15 times (from 300 minutes to 20 minutes) when testing a small dataset for with 619230 surface (25909671 total line points) with the test below “resolve_overlap_gap_run('org_jm.jm_ukomm_flate','figurid','geo',4258,false,'test_topo_jm',0.000001,31,3000); “

The reason for this seems to be related to the problems described by Laurenz Albe related to how Postgres handles try and catch and sub transactions, which I did not know about. If we don't have this is mind and we start to get subtranslocks it seems to kill the performance in some cases.

In this test I ran with 31 parallel threads which is very high on a server with only 32 cores and maybe not realistic. I just did this now see what happens when I try to push a server to it’s limits and maximise the performance increase. If I reduce this to 1 single thread, there should be now difference and if run on 16 threads the difference would much much smaller. 

I will now start to run on datasets which are 10 times bigger to check how thing scales, but then run with around maybe 28 parallel jobs. 

The two branches I have tested on now which should show the main difference are here.

https://github.com/larsop/resolve-overlap-and-gap/tree/add_postgis_topology_using_TopoGeo_addLinestringwhich is the faster one.

https://github.com/larsop/resolve-overlap-and-gap/tree/add_postgis_topology which is slower one, but here I have now added a check on number of subtranslocks before I kick of new jobs and that reduced time form 9 hours  to 3 hours.


Thanks.


Lars