reWriteBatchedInserts causes Inserts to freeze/lock

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

reWriteBatchedInserts causes Inserts to freeze/lock

Semen Yefimenko
Hi all,

Recently I have come across an issue while inserting data.

I have a rather small table with 23 columns and barely any size in columns. The biggest column has a size of 128 characters but the average is about 50~ char. 
I was looking into optimizing performance for our table generation and thus found out about 'reWriteBatchedInserts'. So I set this parameter to 'true' and after that, we encounter a 'freeze/lock' on the table I'm trying to insert the data into. 

CREATE TABLE schema.tableplaceholder
(
    col1 character varying(24) COLLATE pg_catalog."default" NOT NULL,
    col2 timestamp without time zone,
    col3 character varying(24) COLLATE pg_catalog."default",
    col4 character varying(24) COLLATE pg_catalog."default",
    col5 character varying(24) COLLATE pg_catalog."default",
    col6 character varying(15) COLLATE pg_catalog."default",
    col7 boolean,
    col8 timestamp without time zone,
    col9 timestamp without time zone,
    col10 character varying(12) COLLATE pg_catalog."default",
    col11 character varying(53) COLLATE pg_catalog."default",
    col12 character varying(24) COLLATE pg_catalog."default",
    col13 character varying(30) COLLATE pg_catalog."default",
    col14 character varying(45) COLLATE pg_catalog."default",
    col15 character varying(4) COLLATE pg_catalog."default",
    col16 character varying(5) COLLATE pg_catalog."default",
    col17 boolean,
    col18 character varying(128) COLLATE pg_catalog."default",
    col19 numeric,
    col20 numeric,
    col21 character varying(4) COLLATE pg_catalog."default",
    col22 character varying(24) COLLATE pg_catalog."default",
    col23 character varying[] COLLATE pg_catalog."default",
    CONSTRAINT pkeyconst PRIMARY KEY (col1)
        USING INDEX TABLESPACE tablespace
)  


SELECT a.datname, l.relation::regclass, l.transactionid, l.mode, l.GRANTED, a.usename, a.query, a.query_start, age(now(), a.query_start) AS "age", a.pid
FROM pg_stat_activity a
JOIN pg_locks l ON l.pid = a.pid
ORDER BY a.query_start;

Result:
"databaseName" "schema.sequenceName" "RowExclusiveLock" true "schema" "INSERT INTO tableplaceholder ( col1 ... col22 )
VALUES ( rpad('...')||lpad(to_char(nextval('...'), '...'),16,'0'), $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22 ),(... ),(...  " "2021-05-19 09:26:54.106769+00" "00:19:30.578581" 117624
" databaseName " "107270" "ExclusiveLock" true "schema" "
 INSERT INTO tableplaceholder ( col1 ... col22 )
VALUES ( rpad('...')||lpad(to_char(nextval('...'), '...'),16,'0'), $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22 ),(... ),(...      " "2021-05-19 09:26:54.106769+00" "00:19:30.578581" 117624
" databaseName " "ExclusiveLock" true "schema" " INSERT INTO tableplaceholder ( col1 ... col22 )
VALUES ( rpad('...')||lpad(to_char(nextval('...'), '...'),16,'0'), $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22 ),(... ),(...    " "2021-05-19 09:26:54.106769+00" "00:19:30.578581" 117624
" databaseName " "schema. tableplaceholder " "RowExclusiveLock" true "schema" " INSERT INTO tableplaceholder ( col1 ... col22 )
VALUES ( rpad('...')||lpad(to_char(nextval('...'), '...'),16,'0'), $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22 ),(... ),(...    " "2021-05-19 09:26:54.106769+00" "00:19:30.578581" 117624


We have a Postgres Version 12.5 (PostgreSQL 12.5 on x86_64-suse-linux-gnu, compiled by gcc (SUSE Linux) 4.8.5, 64-bit) and are running batches of 25.000  while committing every 100.000. The jdbc version we're running is: 42.2.20
When I was optimizing the performance in another project, I found out about the jdbc BUG https://github.com/pgjdbc/pgjdbc/issues/194 which had the exact same behaviour.
But there I was inserting much more data, a table with 55 Columns and 15 columns having 2000 Characters and most of them having 100+ Characters in column size.
I found out that manually setting the bufferSize of the pgStream in the connection, it was possible to avoid this issue. Now with much less data AND with this workaround incorporated, we still get this issue.
In the trace log of the server, I can see the statements being rewritten in batches of 128.
So it looks somewhat like this on database:

INSERT INTO tableplaceholder (col1 .. col22), (val1..val22), .. (val2795 .. 2816) 
INSERT INTO tableplaceholder (col1 .. col22), (val1..val22), .. (val2795 .. 2816)   
...
      x42 

This runs a total of 42 times meaning there should be around 5376~ entries done.
In pgAdmin I can see the session with the insert being stuck on Client:ClientWrite all the time. When I kill the session, I see 5436 in the tuples IN graph but no actual data is inserted. 

Any ideas on what else I could do?
As I said, this issue is gone once I deactivate reWriteBatchedInserts. But this results in a more than 2x times runtime of inserts.
Reply | Threaded
Open this post in threaded view
|

Re: reWriteBatchedInserts causes Inserts to freeze/lock

Dave Cramer-7


On Wed, 19 May 2021 at 09:26, Semen Yefimenko <[hidden email]> wrote:
Hi all,

Recently I have come across an issue while inserting data.

I have a rather small table with 23 columns and barely any size in columns. The biggest column has a size of 128 characters but the average is about 50~ char. 
I was looking into optimizing performance for our table generation and thus found out about 'reWriteBatchedInserts'. So I set this parameter to 'true' and after that, we encounter a 'freeze/lock' on the table I'm trying to insert the data into. 

CREATE TABLE schema.tableplaceholder
(
    col1 character varying(24) COLLATE pg_catalog."default" NOT NULL,
    col2 timestamp without time zone,
    col3 character varying(24) COLLATE pg_catalog."default",
    col4 character varying(24) COLLATE pg_catalog."default",
    col5 character varying(24) COLLATE pg_catalog."default",
    col6 character varying(15) COLLATE pg_catalog."default",
    col7 boolean,
    col8 timestamp without time zone,
    col9 timestamp without time zone,
    col10 character varying(12) COLLATE pg_catalog."default",
    col11 character varying(53) COLLATE pg_catalog."default",
    col12 character varying(24) COLLATE pg_catalog."default",
    col13 character varying(30) COLLATE pg_catalog."default",
    col14 character varying(45) COLLATE pg_catalog."default",
    col15 character varying(4) COLLATE pg_catalog."default",
    col16 character varying(5) COLLATE pg_catalog."default",
    col17 boolean,
    col18 character varying(128) COLLATE pg_catalog."default",
    col19 numeric,
    col20 numeric,
    col21 character varying(4) COLLATE pg_catalog."default",
    col22 character varying(24) COLLATE pg_catalog."default",
    col23 character varying[] COLLATE pg_catalog."default",
    CONSTRAINT pkeyconst PRIMARY KEY (col1)
        USING INDEX TABLESPACE tablespace
)  


SELECT a.datname, l.relation::regclass, l.transactionid, l.mode, l.GRANTED, a.usename, a.query, a.query_start, age(now(), a.query_start) AS "age", a.pid
FROM pg_stat_activity a
JOIN pg_locks l ON l.pid = a.pid
ORDER BY a.query_start;

Result:
"databaseName" "schema.sequenceName" "RowExclusiveLock" true "schema" "INSERT INTO tableplaceholder ( col1 ... col22 )
VALUES ( rpad('...')||lpad(to_char(nextval('...'), '...'),16,'0'), $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22 ),(... ),(...  " "2021-05-19 09:26:54.106769+00" "00:19:30.578581" 117624
" databaseName " "107270" "ExclusiveLock" true "schema" "
 INSERT INTO tableplaceholder ( col1 ... col22 )
VALUES ( rpad('...')||lpad(to_char(nextval('...'), '...'),16,'0'), $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22 ),(... ),(...      " "2021-05-19 09:26:54.106769+00" "00:19:30.578581" 117624
" databaseName " "ExclusiveLock" true "schema" " INSERT INTO tableplaceholder ( col1 ... col22 )
VALUES ( rpad('...')||lpad(to_char(nextval('...'), '...'),16,'0'), $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22 ),(... ),(...    " "2021-05-19 09:26:54.106769+00" "00:19:30.578581" 117624
" databaseName " "schema. tableplaceholder " "RowExclusiveLock" true "schema" " INSERT INTO tableplaceholder ( col1 ... col22 )
VALUES ( rpad('...')||lpad(to_char(nextval('...'), '...'),16,'0'), $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22 ),(... ),(...    " "2021-05-19 09:26:54.106769+00" "00:19:30.578581" 117624


We have a Postgres Version 12.5 (PostgreSQL 12.5 on x86_64-suse-linux-gnu, compiled by gcc (SUSE Linux) 4.8.5, 64-bit) and are running batches of 25.000  while committing every 100.000. The jdbc version we're running is: 42.2.20
When I was optimizing the performance in another project, I found out about the jdbc BUG https://github.com/pgjdbc/pgjdbc/issues/194 which had the exact same behaviour.
But there I was inserting much more data, a table with 55 Columns and 15 columns having 2000 Characters and most of them having 100+ Characters in column size.
I found out that manually setting the bufferSize of the pgStream in the connection, it was possible to avoid this issue. Now with much less data AND with this workaround incorporated, we still get this issue.
In the trace log of the server, I can see the statements being rewritten in batches of 128.
So it looks somewhat like this on database:

INSERT INTO tableplaceholder (col1 .. col22), (val1..val22), .. (val2795 .. 2816) 
INSERT INTO tableplaceholder (col1 .. col22), (val1..val22), .. (val2795 .. 2816)   
...
      x42 

This runs a total of 42 times meaning there should be around 5376~ entries done.
In pgAdmin I can see the session with the insert being stuck on Client:ClientWrite all the time. When I kill the session, I see 5436 in the tuples IN graph but no actual data is inserted. 

Any ideas on what else I could do?
As I said, this issue is gone once I deactivate reWriteBatchedInserts. But this results in a more than 2x times runtime of inserts.


Commit more often would be my recommendation.

Dave 
Reply | Threaded
Open this post in threaded view
|

Re: reWriteBatchedInserts causes Inserts to freeze/lock

Semen Yefimenko
Commitsize has had no effect only reducing batchSize resulted in it working again at around size = 2000.
Unfortunately it seems setting the BufferSize also doesn't change the behaviour. Not sure why it works on one system and not the other. 

Am Mi., 19. Mai 2021 um 18:47 Uhr schrieb Dave Cramer <[hidden email]>:


On Wed, 19 May 2021 at 09:26, Semen Yefimenko <[hidden email]> wrote:
Hi all,

Recently I have come across an issue while inserting data.

I have a rather small table with 23 columns and barely any size in columns. The biggest column has a size of 128 characters but the average is about 50~ char. 
I was looking into optimizing performance for our table generation and thus found out about 'reWriteBatchedInserts'. So I set this parameter to 'true' and after that, we encounter a 'freeze/lock' on the table I'm trying to insert the data into. 

CREATE TABLE schema.tableplaceholder
(
    col1 character varying(24) COLLATE pg_catalog."default" NOT NULL,
    col2 timestamp without time zone,
    col3 character varying(24) COLLATE pg_catalog."default",
    col4 character varying(24) COLLATE pg_catalog."default",
    col5 character varying(24) COLLATE pg_catalog."default",
    col6 character varying(15) COLLATE pg_catalog."default",
    col7 boolean,
    col8 timestamp without time zone,
    col9 timestamp without time zone,
    col10 character varying(12) COLLATE pg_catalog."default",
    col11 character varying(53) COLLATE pg_catalog."default",
    col12 character varying(24) COLLATE pg_catalog."default",
    col13 character varying(30) COLLATE pg_catalog."default",
    col14 character varying(45) COLLATE pg_catalog."default",
    col15 character varying(4) COLLATE pg_catalog."default",
    col16 character varying(5) COLLATE pg_catalog."default",
    col17 boolean,
    col18 character varying(128) COLLATE pg_catalog."default",
    col19 numeric,
    col20 numeric,
    col21 character varying(4) COLLATE pg_catalog."default",
    col22 character varying(24) COLLATE pg_catalog."default",
    col23 character varying[] COLLATE pg_catalog."default",
    CONSTRAINT pkeyconst PRIMARY KEY (col1)
        USING INDEX TABLESPACE tablespace
)  


SELECT a.datname, l.relation::regclass, l.transactionid, l.mode, l.GRANTED, a.usename, a.query, a.query_start, age(now(), a.query_start) AS "age", a.pid
FROM pg_stat_activity a
JOIN pg_locks l ON l.pid = a.pid
ORDER BY a.query_start;

Result:
"databaseName" "schema.sequenceName" "RowExclusiveLock" true "schema" "INSERT INTO tableplaceholder ( col1 ... col22 )
VALUES ( rpad('...')||lpad(to_char(nextval('...'), '...'),16,'0'), $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22 ),(... ),(...  " "2021-05-19 09:26:54.106769+00" "00:19:30.578581" 117624
" databaseName " "107270" "ExclusiveLock" true "schema" "
 INSERT INTO tableplaceholder ( col1 ... col22 )
VALUES ( rpad('...')||lpad(to_char(nextval('...'), '...'),16,'0'), $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22 ),(... ),(...      " "2021-05-19 09:26:54.106769+00" "00:19:30.578581" 117624
" databaseName " "ExclusiveLock" true "schema" " INSERT INTO tableplaceholder ( col1 ... col22 )
VALUES ( rpad('...')||lpad(to_char(nextval('...'), '...'),16,'0'), $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22 ),(... ),(...    " "2021-05-19 09:26:54.106769+00" "00:19:30.578581" 117624
" databaseName " "schema. tableplaceholder " "RowExclusiveLock" true "schema" " INSERT INTO tableplaceholder ( col1 ... col22 )
VALUES ( rpad('...')||lpad(to_char(nextval('...'), '...'),16,'0'), $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22 ),(... ),(...    " "2021-05-19 09:26:54.106769+00" "00:19:30.578581" 117624


We have a Postgres Version 12.5 (PostgreSQL 12.5 on x86_64-suse-linux-gnu, compiled by gcc (SUSE Linux) 4.8.5, 64-bit) and are running batches of 25.000  while committing every 100.000. The jdbc version we're running is: 42.2.20
When I was optimizing the performance in another project, I found out about the jdbc BUG https://github.com/pgjdbc/pgjdbc/issues/194 which had the exact same behaviour.
But there I was inserting much more data, a table with 55 Columns and 15 columns having 2000 Characters and most of them having 100+ Characters in column size.
I found out that manually setting the bufferSize of the pgStream in the connection, it was possible to avoid this issue. Now with much less data AND with this workaround incorporated, we still get this issue.
In the trace log of the server, I can see the statements being rewritten in batches of 128.
So it looks somewhat like this on database:

INSERT INTO tableplaceholder (col1 .. col22), (val1..val22), .. (val2795 .. 2816) 
INSERT INTO tableplaceholder (col1 .. col22), (val1..val22), .. (val2795 .. 2816)   
...
      x42 

This runs a total of 42 times meaning there should be around 5376~ entries done.
In pgAdmin I can see the session with the insert being stuck on Client:ClientWrite all the time. When I kill the session, I see 5436 in the tuples IN graph but no actual data is inserted. 

Any ideas on what else I could do?
As I said, this issue is gone once I deactivate reWriteBatchedInserts. But this results in a more than 2x times runtime of inserts.


Commit more often would be my recommendation.

Dave 
Reply | Threaded
Open this post in threaded view
|

Re: reWriteBatchedInserts causes Inserts to freeze/lock

Dave Cramer-7


On Fri, 21 May 2021 at 05:33, Semen Yefimenko <[hidden email]> wrote:
Commitsize has had no effect only reducing batchSize resulted in it working again at around size = 2000.
Unfortunately it seems setting the BufferSize also doesn't change the behaviour. Not sure why it works on one system and not the other. 

Am Mi., 19. Mai 2021 um 18:47 Uhr schrieb Dave Cramer <[hidden email]>:


On Wed, 19 May 2021 at 09:26, Semen Yefimenko <[hidden email]> wrote:
Hi all,

Recently I have come across an issue while inserting data.

I have a rather small table with 23 columns and barely any size in columns. The biggest column has a size of 128 characters but the average is about 50~ char. 
I was looking into optimizing performance for our table generation and thus found out about 'reWriteBatchedInserts'. So I set this parameter to 'true' and after that, we encounter a 'freeze/lock' on the table I'm trying to insert the data into. 

CREATE TABLE schema.tableplaceholder
(
    col1 character varying(24) COLLATE pg_catalog."default" NOT NULL,
    col2 timestamp without time zone,
    col3 character varying(24) COLLATE pg_catalog."default",
    col4 character varying(24) COLLATE pg_catalog."default",
    col5 character varying(24) COLLATE pg_catalog."default",
    col6 character varying(15) COLLATE pg_catalog."default",
    col7 boolean,
    col8 timestamp without time zone,
    col9 timestamp without time zone,
    col10 character varying(12) COLLATE pg_catalog."default",
    col11 character varying(53) COLLATE pg_catalog."default",
    col12 character varying(24) COLLATE pg_catalog."default",
    col13 character varying(30) COLLATE pg_catalog."default",
    col14 character varying(45) COLLATE pg_catalog."default",
    col15 character varying(4) COLLATE pg_catalog."default",
    col16 character varying(5) COLLATE pg_catalog."default",
    col17 boolean,
    col18 character varying(128) COLLATE pg_catalog."default",
    col19 numeric,
    col20 numeric,
    col21 character varying(4) COLLATE pg_catalog."default",
    col22 character varying(24) COLLATE pg_catalog."default",
    col23 character varying[] COLLATE pg_catalog."default",
    CONSTRAINT pkeyconst PRIMARY KEY (col1)
        USING INDEX TABLESPACE tablespace
)  


SELECT a.datname, l.relation::regclass, l.transactionid, l.mode, l.GRANTED, a.usename, a.query, a.query_start, age(now(), a.query_start) AS "age", a.pid
FROM pg_stat_activity a
JOIN pg_locks l ON l.pid = a.pid
ORDER BY a.query_start;

Result:
"databaseName" "schema.sequenceName" "RowExclusiveLock" true "schema" "INSERT INTO tableplaceholder ( col1 ... col22 )
VALUES ( rpad('...')||lpad(to_char(nextval('...'), '...'),16,'0'), $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22 ),(... ),(...  " "2021-05-19 09:26:54.106769+00" "00:19:30.578581" 117624
" databaseName " "107270" "ExclusiveLock" true "schema" "
 INSERT INTO tableplaceholder ( col1 ... col22 )
VALUES ( rpad('...')||lpad(to_char(nextval('...'), '...'),16,'0'), $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22 ),(... ),(...      " "2021-05-19 09:26:54.106769+00" "00:19:30.578581" 117624
" databaseName " "ExclusiveLock" true "schema" " INSERT INTO tableplaceholder ( col1 ... col22 )
VALUES ( rpad('...')||lpad(to_char(nextval('...'), '...'),16,'0'), $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22 ),(... ),(...    " "2021-05-19 09:26:54.106769+00" "00:19:30.578581" 117624
" databaseName " "schema. tableplaceholder " "RowExclusiveLock" true "schema" " INSERT INTO tableplaceholder ( col1 ... col22 )
VALUES ( rpad('...')||lpad(to_char(nextval('...'), '...'),16,'0'), $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22 ),(... ),(...    " "2021-05-19 09:26:54.106769+00" "00:19:30.578581" 117624


We have a Postgres Version 12.5 (PostgreSQL 12.5 on x86_64-suse-linux-gnu, compiled by gcc (SUSE Linux) 4.8.5, 64-bit) and are running batches of 25.000  while committing every 100.000. The jdbc version we're running is: 42.2.20
When I was optimizing the performance in another project, I found out about the jdbc BUG https://github.com/pgjdbc/pgjdbc/issues/194 which had the exact same behaviour.
But there I was inserting much more data, a table with 55 Columns and 15 columns having 2000 Characters and most of them having 100+ Characters in column size.
I found out that manually setting the bufferSize of the pgStream in the connection, it was possible to avoid this issue. Now with much less data AND with this workaround incorporated, we still get this issue.
In the trace log of the server, I can see the statements being rewritten in batches of 128.
So it looks somewhat like this on database:

INSERT INTO tableplaceholder (col1 .. col22), (val1..val22), .. (val2795 .. 2816) 
INSERT INTO tableplaceholder (col1 .. col22), (val1..val22), .. (val2795 .. 2816)   
...
      x42 

This runs a total of 42 times meaning there should be around 5376~ entries done.
In pgAdmin I can see the session with the insert being stuck on Client:ClientWrite all the time. When I kill the session, I see 5436 in the tuples IN graph but no actual data is inserted. 

Any ideas on what else I could do?
As I said, this issue is gone once I deactivate reWriteBatchedInserts. But this results in a more than 2x times runtime of inserts.


Commit more often would be my recommendation.

Dave 


So yes, there is an issue with the size of the buffers being sent back and forth. pgjdbc/faq.md at f61fbfe7b72ccf2ca0ac2e2c366230fdb93260e5 · pgjdbc/pgjdbc (github.com)

Dave