Generate test data inserts - 1GB

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Generate test data inserts - 1GB

Shital A
Hello

Postgresql 9.6

Need to generate 1GB test data in very less time. I got some techniques online but they take around 40mins for 400GB. Any quicker way? 


Thanks.

Reply | Threaded
Open this post in threaded view
|

Re: Generate test data inserts - 1GB

Adrian Klaver-4
On 8/9/19 4:12 AM, Shital A wrote:
> Hello
>
> Postgresql 9.6
>
> Need to generate 1GB test data in very less time. I got some techniques
> online but they take around 40mins for 400GB. Any quicker way?

1) Postgres version?

2) Data going into single table or multiple tables?

3) What is the time frame you are trying to achieve?

4) What techniques have you tried?

5) If you need only 1GB why the 400GB number?


>
>
> Thanks.
>


--
Adrian Klaver
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Generate test data inserts - 1GB

Shital A


On Fri, 9 Aug 2019, 20:08 Adrian Klaver, <[hidden email]> wrote:
On 8/9/19 4:12 AM, Shital A wrote:
> Hello
>
> Postgresql 9.6
>
> Need to generate 1GB test data in very less time. I got some techniques
> online but they take around 40mins for 400GB. Any quicker way?

1) Postgres version?

2) Data going into single table or multiple tables?

3) What is the time frame you are trying to achieve?

4) What techniques have you tried?

5) If you need only 1GB why the 400GB number?


>
>
> Thanks.
>


--
Adrian Klaver
[hidden email]


Hello,

Sorry 400GB was a typo. Its 400 MB Details are below:

1) Postgres version?
9.6

2) Data going into single table or multiple tables?
Single table having Multiple columns

3) What is the time frame you are trying to achieve?
As quick as possible. 

4) What techniques have you tried?
Insert into with With statement, inserting 2000000 rows at a time. This takes 40 mins.

5) If you need only 1GB why the 400GB number?
That was 400MB. M checking size using \l+ option.


Thanks. 

Reply | Threaded
Open this post in threaded view
|

Re: Generate test data inserts - 1GB

Adrian Klaver-4
On 8/9/19 8:14 AM, Shital A wrote:

>
>
> On Fri, 9 Aug 2019, 20:08 Adrian Klaver, <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     On 8/9/19 4:12 AM, Shital A wrote:
>      > Hello
>      >
>      > Postgresql 9.6
>      >
>      > Need to generate 1GB test data in very less time. I got some
>     techniques
>      > online but they take around 40mins for 400GB. Any quicker way?
>
>     1) Postgres version?
>
>     2) Data going into single table or multiple tables?
>
>     3) What is the time frame you are trying to achieve?
>
>     4) What techniques have you tried?
>
>     5) If you need only 1GB why the 400GB number?
>
>
>      >
>      >
>      > Thanks.
>      >
>
>
>     --
>     Adrian Klaver
>     [hidden email] <mailto:[hidden email]>
>
>
>
> Hello,
>
> Sorry 400GB was a typo. Its 400 MB Details are below:
>
> 1) Postgres version?
> 9.6
>
> 2) Data going into single table or multiple tables?
> Single table having Multiple columns
>
> 3) What is the time frame you are trying to achieve?
> As quick as possible.
>
> 4) What techniques have you tried?
> Insert into with With statement, inserting 2000000 rows at a time. This
> takes 40 mins.

Might take a look at COPY:

https://www.postgresql.org/docs/11/sql-copy.html

or its psql equivalent:

\copy.

COPY runs as the server user and sees files relative to the server location.
\copy runs as the psql(client) user.

In either case suitability will depend on where you are sourcing the
test data from and what format it is in to begin with.


If you want a more sophisticated wrapper over the above then:

https://ossc-db.github.io/pg_bulkload/pg_bulkload.html

>
> 5) If you need only 1GB why the 400GB number?
> That was 400MB. M checking size using \l+ option.
>
>
> Thanks.
>
>


--
Adrian Klaver
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Generate test data inserts - 1GB

Adrian Klaver-4
In reply to this post by Shital A
On 8/9/19 8:14 AM, Shital A wrote:
>

> Hello,

>
> 4) What techniques have you tried?
> Insert into with With statement, inserting 2000000 rows at a time. This
> takes 40 mins.
>

To add to my previous post. If you already have data in a Postgres
database then you could do:

pg_dump -d db -t some_table -a -f test_data.sql

That will dump the data only for the table in COPY format. Then you
could apply that to your test database(after TRUNCATE on table, assuming
you want to start fresh):

psql -d test_db -f test_data.sql




--
Adrian Klaver
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Generate test data inserts - 1GB

Shital A


On Fri, 9 Aug 2019, 21:25 Adrian Klaver, <[hidden email]> wrote:
On 8/9/19 8:14 AM, Shital A wrote:
>

> Hello,

>
> 4) What techniques have you tried?
> Insert into with With statement, inserting 2000000 rows at a time. This
> takes 40 mins.
>

To add to my previous post. If you already have data in a Postgres
database then you could do:

pg_dump -d db -t some_table -a -f test_data.sql

That will dump the data only for the table in COPY format. Then you
could apply that to your test database(after TRUNCATE on table, assuming
you want to start fresh):

psql -d test_db -f test_data.sql




--
Adrian Klaver
[hidden email]

Thanks for the reply Adrian. 

Missed one requirement. Will these methods generate wal logs needed for replication? 

Actually the data is to check if replication catches up. Below is scenario :

1. Have a master slave cluster with replication setup

2. Kill master so that standby takes over. We are using pacemaker for auto failure. 
Insert 1 GB data in new master while replication is broken.

3 Start oldnode as standby and check if 1GB data gets replicated.

As such testing might be frequent we needed to spend minimum time in generating data. 
Master slave are in same network. 

Thanks ! 
lup
Reply | Threaded
Open this post in threaded view
|

Re: Generate test data inserts - 1GB

lup




--
Adrian Klaver
[hidden email]

Thanks for the reply Adrian. 

Missed one requirement. Will these methods generate wal logs needed for replication? 

Actually the data is to check if replication catches up. Below is scenario :

1. Have a master slave cluster with replication setup

2. Kill master so that standby takes over. We are using pacemaker for auto failure. 
Insert 1 GB data in new master while replication is broken.

3 Start oldnode as standby and check if 1GB data gets replicated.

As such testing might be frequent we needed to spend minimum time in generating data. 
Master slave are in same network. 

Thanks !
How quickly does your production instance grow, in terms of GB/per-unit-time?
Reply | Threaded
Open this post in threaded view
|

Re: Generate test data inserts - 1GB

Adrian Klaver-4
In reply to this post by Shital A
On 8/9/19 9:51 AM, Shital A wrote:

>
>
> On Fri, 9 Aug 2019, 21:25 Adrian Klaver, <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     On 8/9/19 8:14 AM, Shital A wrote:
>      >
>
>      > Hello,
>
>      >
>      > 4) What techniques have you tried?
>      > Insert into with With statement, inserting 2000000 rows at a
>     time. This
>      > takes 40 mins.
>      >
>
>     To add to my previous post. If you already have data in a Postgres
>     database then you could do:
>
>     pg_dump -d db -t some_table -a -f test_data.sql
>
>     That will dump the data only for the table in COPY format. Then you
>     could apply that to your test database(after TRUNCATE on table,
>     assuming
>     you want to start fresh):
>
>     psql -d test_db -f test_data.sql
>
>
>
>
>     --
>     Adrian Klaver
>     [hidden email] <mailto:[hidden email]>
>
>
> Thanks for the reply Adrian.
>
> Missed one requirement. Will these methods generate wal logs needed for
> replication?

For COPY, AFAIK yes.

To verify set up a small test table and COPY to it and sees if the data
shows up on the standby.

pg_bulkload:

https://ossc-db.github.io/pg_bulkload/pg_bulkload.html

"IMPORTANT NOTE: Under streaming replication environment, pg_bulkload
does not work properly. See here for details.



>
> Actually the data is to check if replication catches up. Below is scenario :
>
> 1. Have a master slave cluster with replication setup
>
> 2. Kill master so that standby takes over. We are using pacemaker for
> auto failure.
> Insert 1 GB data in new master while replication is broken.
>
> 3 Start oldnode as standby and check if 1GB data gets replicated.
>
> As such testing might be frequent we needed to spend minimum time in
> generating data.
> Master slave are in same network.
>
> Thanks !


--
Adrian Klaver
[hidden email]