Conflict handling for COPY FROM

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Conflict handling for COPY FROM

Surafel Temesgen

Hellow hackers,

A few commitfest ago there was same effort to add errors handling to COPY FROM[1] and i see there that we already have infrastructure for supporting handling of unique violation or exclusion constraint violation error and I think it is independently useful too. Attached is a patch to do that.

In order to prevent extreme condition the patch also add a new GUC variable called copy_max_error_limit that control the amount of error to swallow before start to error and new failed record file options for copy to write a failed record so the user can examine it.

With the new option COPY FROM can be specified like:

COPY table_name [ ( column_name [, ...] ) ]

FROM { 'filename' | PROGRAM 'command' | STDIN }[ON CONFLICT IGNORE failed_record_filename] [ [ WITH ] ( option [, ...] ) ]

[1].https://www.postgresql.org/message-id/flat/7179F2FD-49CE-4093-AE14-1B26C5DFB0DA@...

Comment?

Regards

Surafel



conflict-handling-onCopy-from-v1.patch (27K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Conflict handling for COPY FROM

Karen Huddleston
Hi Surafel,

Andrew and I began reviewing your patch. It applied cleanly and seems to mostly have the functionality you describe. We did have some comments/questions.

1. It sounded like you added the copy_max_error_limit GUC as part of this patch to allow users to specify how many errors they want to swallow with this new functionality. The GUC didn't seem to be defined and we saw no mention of it in the code. My guess is this might be good to address one of the concerns mentioned in the initial email thread of this generating too many transaction IDs so it would probably be good to have it.
2. I was curious why you only have support for skipping errors on UNIQUE and EXCLUSION constraints and not other kinds of constraints? I'm not sure how difficult it would be to add support for those, but it seems they could also be useful.
3. We think the wording "ON CONFLICT IGNORE" may not be the clearest description of what this is doing since it is writing the failed rows to a file for a user to process later, but they are not being ignored. We considered things like STASH or LOG as alternatives to IGNORE. Andrew may have some other suggestions for wording.
4. We also noticed this has no tests and thought it would be good to add some to ensure this functionality works how you intend it and continues to work. We started running some SQL to validate this, but haven't gotten the chance to put it into a clean test yet. We can send you what we have so far, or we are also willing to put a little time in to turn it into tests ourselves that we could contribute to this patch.

Thanks for writing this patch!
Karen
Reply | Threaded
Open this post in threaded view
|

Re: Conflict handling for COPY FROM

Surafel Temesgen
Thanks for looking at it
1. It sounded like you added the copy_max_error_limit GUC as part of this patch to allow users to specify how many errors they want to swallow with this new functionality. The GUC didn't seem to be defined and we saw no mention of it in the code. My guess is this might be good to address one of the concerns mentioned in the initial email thread of this generating too many transaction IDs so it would probably be good to have it.
By mistake I write it copy_max_error_limit  here but in the patch it is  copy_maximum_error_limit with a default value of 100 sorry for mismatch
2. I was curious why you only have support for skipping errors on UNIQUE and EXCLUSION constraints and not other kinds of constraints? I'm not sure how difficult it would be to add support for those, but it seems they could also be useful.
I see it now that most of formatting error can be handle safely I will attache the patch for it this week
3. We think the wording "ON CONFLICT IGNORE" may not be the clearest description of what this is doing since it is writing the failed rows to a file for a user to process later, but they are not being ignored. We considered things like STASH or LOG as alternatives to IGNORE. Andrew may have some other suggestions for wording.
I agree.I will change it to ON CONFLICT LOG if we can’t find better naming
4. We also noticed this has no tests and thought it would be good to add some to ensure this functionality works how you intend it and continues to work. We started running some SQL to validate this, but haven't gotten the chance to put it into a clean test yet. We can send you what we have so far, or we are also willing to put a little time in to turn it into tests ourselves that we could contribute to this patch.
okay
Reply | Threaded
Open this post in threaded view
|

Re: Conflict handling for COPY FROM

Robert Haas
In reply to this post by Surafel Temesgen
On Sat, Aug 4, 2018 at 9:10 AM Surafel Temesgen <[hidden email]> wrote:

In order to prevent extreme condition the patch also add a new GUC variable called copy_max_error_limit that control the amount of error to swallow before start to error and new failed record file options for copy to write a failed record so the user can examine it.


Why should this be a GUC rather than a COPY option?

In fact, try doing all of this by adding more options to COPY rather than new syntax. 

COPY ... WITH (ignore_conflicts 1000, ignore_logfile '...')

It kind of stinks to use a log file written by the server as the dup-reporting channel though. That would have to be superuser-only.

...Robert
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Reply | Threaded
Open this post in threaded view
|

Re: Conflict handling for COPY FROM

Surafel Temesgen

Hello,

The attached patch add error handling for
Extra data

missing data

invalid oid

null oid and 

row count mismatch

And the record that field on the above case write to the file with appended error message in it and in case of unique violation or exclusion constraint violation error the failed record write as it is because the case of the error can not be identified specifically

The new syntax became :

COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';


Regards

Surafel


conflict-handling-onCopy-from-v2.patch (20K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Conflict handling for COPY FROM

Dmitry Dolgov
> On Thu, Aug 23, 2018 at 4:16 PM Surafel Temesgen <[hidden email]> wrote:
>
> The attached patch add error handling for
> Extra data
>
> missing data
>
> invalid oid
>
> null oid and
>
> row count mismatch

Hi,

Unfortunately, the patch conflict-handling-onCopy-from-v2.patch has some
conflicts now, could you rebase it? I'm moving it to the next CF as "Waiting on
Author". Also I would appreciate if someone from the reviewers (Karen
Huddleston ?) could post a full patch review.

Reply | Threaded
Open this post in threaded view
|

Re: Conflict handling for COPY FROM

Nasby, Jim-2
In reply to this post by Robert Haas
On Aug 20, 2018, at 5:14 AM, Robert Haas <[hidden email]> wrote:

>
> On Sat, Aug 4, 2018 at 9:10 AM Surafel Temesgen <[hidden email]> wrote:
> In order to prevent extreme condition the patch also add a new GUC variable called copy_max_error_limit that control the amount of error to swallow before start to error and new failed record file options for copy to write a failed record so the user can examine it.
>
> Why should this be a GUC rather than a COPY option?
>
> In fact, try doing all of this by adding more options to COPY rather than new syntax.
>
> COPY ... WITH (ignore_conflicts 1000, ignore_logfile '...')
>
> It kind of stinks to use a log file written by the server as the dup-reporting channel though. That would have to be superuser-only.

Perhaps a better option would be to allow the user to specify a name for a cursor, and have COPY do the moral equivalent of DECLARE name? Calling a function for each bad row would be another option.
Reply | Threaded
Open this post in threaded view
|

Re: Conflict handling for COPY FROM

Surafel Temesgen
In reply to this post by Dmitry Dolgov


On Thu, Nov 29, 2018 at 3:15 PM Dmitry Dolgov <[hidden email]> wrote:
 

Unfortunately, the patch conflict-handling-onCopy-from-v2.patch has some
conflicts now, could you rebase it?

Thank you for informing, attach is rebased patch against current master

Regards

Surafel

 

conflict-handling-onCopy-from-v3.patch (19K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Conflict handling for COPY FROM

Michael Paquier-2
On Wed, Dec 19, 2018 at 02:48:14PM +0300, Surafel Temesgen wrote:
> Thank you for informing, attach is rebased patch against current
> master

copy.c conflicts on HEAD, please rebase.  I am moving the patch to
next CF, waiting on author.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Conflict handling for COPY FROM

Andres Freund
In reply to this post by Surafel Temesgen
Hi,

On 2018-08-23 17:11:04 +0300, Surafel Temesgen wrote:
> COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';

This doesn't seem to address Robert's point that a log file requires to
be super user only, which seems to restrict the feature more than
necessary?

- Andres

Reply | Threaded
Open this post in threaded view
|

Re: Conflict handling for COPY FROM

Andrew Dunstan-8

On 2/16/19 12:24 AM, Andres Freund wrote:
> Hi,
>
> On 2018-08-23 17:11:04 +0300, Surafel Temesgen wrote:
>> COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';
> This doesn't seem to address Robert's point that a log file requires to
> be super user only, which seems to restrict the feature more than
> necessary?
>

I liked Jim Nasby's idea of having it call a function rather than
writing to a log file.


cheers


andrew


--
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Conflict handling for COPY FROM

Surafel Temesgen
In reply to this post by Michael Paquier-2


On Mon, Feb 4, 2019 at 9:06 AM Michael Paquier <[hidden email]> wrote:
On Wed, Dec 19, 2018 at 02:48:14PM +0300, Surafel Temesgen wrote:
> Thank you for informing, attach is rebased patch against current
> master

copy.c conflicts on HEAD, please rebase.  I am moving the patch to
next CF, waiting on author.
--

Thank you, here is a rebased patch against current master

regards
Surafel

conflict-handling-onCopy-from-v4.patch (19K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Conflict handling for COPY FROM

Surafel Temesgen
In reply to this post by Andres Freund


On Sat, Feb 16, 2019 at 8:24 AM Andres Freund <[hidden email]> wrote:
Hi,

On 2018-08-23 17:11:04 +0300, Surafel Temesgen wrote:
> COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';

This doesn't seem to address Robert's point that a log file requires to
be super user only, which seems to restrict the feature more than
necessary?

- Andres


I think having write permission on specified directory is enough.
we use out put file name in COPY TO similarly.

regards
Surafel
Reply | Threaded
Open this post in threaded view
|

Re: Conflict handling for COPY FROM

Andres Freund


On February 19, 2019 3:05:37 AM PST, Surafel Temesgen <[hidden email]> wrote:

>On Sat, Feb 16, 2019 at 8:24 AM Andres Freund <[hidden email]>
>wrote:
>
>> Hi,
>>
>> On 2018-08-23 17:11:04 +0300, Surafel Temesgen wrote:
>> > COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';
>>
>> This doesn't seem to address Robert's point that a log file requires
>to
>> be super user only, which seems to restrict the feature more than
>> necessary?
>>
>> - Andres
>>
>
>
>I think having write permission on specified directory is enough.
>we use out put file name in COPY TO similarly.

Err, what? Again, that requires super user permissions (in contrast to copy from/to stdin/out). Backends run as the user postgres runs under - it will always have write permissions to at least the entire data directory.  I think not addressing this just about guarantees the feature will be rejected.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Reply | Threaded
Open this post in threaded view
|

Re: Conflict handling for COPY FROM

Surafel Temesgen


On Tue, Feb 19, 2019 at 3:47 PM Andres Freund <[hidden email]> wrote:


Err, what? Again, that requires super user permissions (in contrast to copy from/to stdin/out). Backends run as the user postgres runs under

 
okay i see it now and modified the patch similarly 

regards
Surafel

conflict-handling-onCopy-from-v5.patch (20K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Conflict handling for COPY FROM

Andrew Dunstan-8

On 2/20/19 8:01 AM, Surafel Temesgen wrote:

>
>
> On Tue, Feb 19, 2019 at 3:47 PM Andres Freund <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>
>
>     Err, what? Again, that requires super user permissions (in
>     contrast to copy from/to stdin/out). Backends run as the user
>     postgres runs under
>
>
>  
> okay i see it now and modified the patch similarly 
>
>


Why log to a file at all? We do have, you know, a database handy, where
we might more usefully log errors. You could usefully log the offending
row as an array of text, possibly.


cheers


andrew

--
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Conflict handling for COPY FROM

Andres Freund


On February 20, 2019 6:05:53 AM PST, Andrew Dunstan <[hidden email]> wrote:

>
>On 2/20/19 8:01 AM, Surafel Temesgen wrote:
>>
>>
>> On Tue, Feb 19, 2019 at 3:47 PM Andres Freund <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>
>>
>>     Err, what? Again, that requires super user permissions (in
>>     contrast to copy from/to stdin/out). Backends run as the user
>>     postgres runs under
>>
>>
>>  
>> okay i see it now and modified the patch similarly 
>>
>>
>
>
>Why log to a file at all? We do have, you know, a database handy, where
>we might more usefully log errors. You could usefully log the offending
>row as an array of text, possibly.

Or even just return it as a row. CopyBoth is relatively widely supported these days.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Reply | Threaded
Open this post in threaded view
|

Re: Re: Conflict handling for COPY FROM

David Steele
Hi Surafel,

On 2/20/19 8:03 PM, Andres Freund wrote:
> On February 20, 2019 6:05:53 AM PST, Andrew Dunstan <[hidden email]> wrote:
>>
>> Why log to a file at all? We do have, you know, a database handy, where
>> we might more usefully log errors. You could usefully log the offending
>> row as an array of text, possibly.
>
> Or even just return it as a row. CopyBoth is relatively widely supported these days.

This patch no longer applies so marked Waiting on Author.

Also, it appears that you have some comments from Andrew and Andres that
you should reply to.

Regards,
--
-David
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Conflict handling for COPY FROM

Andres Freund
Hi,

On 2019-03-25 12:50:13 +0400, David Steele wrote:

> On 2/20/19 8:03 PM, Andres Freund wrote:
> > On February 20, 2019 6:05:53 AM PST, Andrew Dunstan <[hidden email]> wrote:
> > >
> > > Why log to a file at all? We do have, you know, a database handy, where
> > > we might more usefully log errors. You could usefully log the offending
> > > row as an array of text, possibly.
> >
> > Or even just return it as a row. CopyBoth is relatively widely supported these days.
>
> This patch no longer applies so marked Waiting on Author.
>
> Also, it appears that you have some comments from Andrew and Andres that you
> should reply to.

As nothing has happened the last weeks, I've now marked this as
returned with feedback.

- Andres


Reply | Threaded
Open this post in threaded view
|

Re: Conflict handling for COPY FROM

Surafel Temesgen
In reply to this post by Andres Freund

On Wed, Feb 20, 2019 at 7:04 PM Andres Freund <[hidden email]> wrote:


On February 20, 2019 6:05:53 AM PST, Andrew Dunstan <[hidden email]> wrote:
>
>On 2/20/19 8:01 AM, Surafel Temesgen wrote:
>>
>>
>> On Tue, Feb 19, 2019 at 3:47 PM Andres Freund <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>
>>
>>     Err, what? Again, that requires super user permissions (in
>>     contrast to copy from/to stdin/out). Backends run as the user
>>     postgres runs under
>>
>>
>>  
>> okay i see it now and modified the patch similarly 
>>
>>
>
>
>Why log to a file at all? We do have, you know, a database handy, where
>we might more usefully log errors. You could usefully log the offending
>row as an array of text, possibly.

Or even just return it as a row. CopyBoth is relatively widely supported these days.


hello,
i think generating warning about it also sufficiently meet its propose of
notifying user about skipped record with existing logging facility
and we use it for similar propose in other place too. The different
i see is the number of warning that can be generated

In addition to the above change in the attached patch i also change
the syntax to ERROR LIMIT because it is no longer only skip
unique and exclusion constrain violation
regards
Surafel 
 

conflict-handling-onCopy-from-v6.patch (19K) Download Attachment
12