copy command bug

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

copy command bug

Mary LaClair
The copy command fails to load a large csv file (>2GB) with a "Could not stat file" error.  This occurs on Windows with all versions since 10.7.
It happens because fstat() on Windows fails with large csv files. 

Example:
COPY table_name FROM 'filename.csv' WITH CSV HEADER
succeeds with small files and fails with larger ones.

In my tests, using other methods to load the file, e.g.
COPY table_name FROM PROGRAM 'cmd /c \"type filename.csv"' WITH CSV HEADER
succeeds with all file sizes but runs 10-30% slower, which is a significant difference.

Thanks, Mary

Mary LaClair

Vice President, Software Development | Caliper Corporation

|||||||||||||||||||||||||||||||||||||||||||||||||

1172 Beacon St, Ste 300 • Newton MA 02461 USA

Direct: 617-340-2003 • Main: 617-527-4700

[hidden email]www.caliper.com


P.S.
I downloaded the  master source and implemented a patch but was unable to test because I do not have a version of Visual Studio (11 or 12) old enough to try it:
Here's the change to BeginCopyFrom() in copy.c, starting at line 3529:
  {
            unsigned short st_mode;

            cstate->copy_file = AllocateFile(cstate->filename, PG_BINARY_R);
            if (cstate->copy_file == NULL)
            {
                /* copy errno because ereport subfunctions might change it */
                int            save_errno = errno;

                ereport(ERROR,
                        (errcode_for_file_access(),
                         errmsg("could not open file \"%s\" for reading: %m",
                                cstate->filename),
                         (save_errno == ENOENT || save_errno == EACCES) ?
                         errhint("COPY FROM instructs the PostgreSQL server process to read a file. "
                                 "You may want a client-side facility such as psql's \\copy.") : 0));
            }

#ifndef WIN32
            {
            struct _stat32i64 st;
            // need the i64 version to handle files > 1GB
            if (_fstat32i64(fileno(cstate->copy_file), &st))
                ereport(ERROR,
                        (errcode_for_file_access(),
                         errmsg("could not stat file \"%s\": %m",
                                cstate->filename)));
            st_mode = st.st_mode;
            }
#else
            {
            struct stat st;
   
        if (fstat(fileno(cstate->copy_file), &st))
                ereport(ERROR,
                        (errcode_for_file_access(),
                         errmsg("could not stat file \"%s\": %m",
                                cstate->filename)));
            st_mode = st.st_mode;
            }
#endif

            if (S_ISDIR(st_mode))
                ereport(ERROR,
                        (errcode(ERRCODE_WRONG_OBJECT_TYPE),
                         errmsg("\"%s\" is a directory", cstate->filename)));
        }

Reply | Threaded
Open this post in threaded view
|

Re: copy command bug

Tom Lane-2
Mary LaClair <[hidden email]> writes:
> The copy command fails to load a large csv file (>2GB) with a "Could not
> stat file" error.  This occurs on Windows with all versions since 10.7.
> It happens because fstat() on Windows fails with large csv files.

Yeah, that's a well-known problem.  It has been fixed in HEAD, cf
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=bed90759fcbcd72d4d06969eebab81e47326f9a2
and some follow-up patches.  But unfortunately that's too invasive
for a back-patch, since it causes an ABI break for everything using
"struct stat".  I don't think we'll be terribly interested in hacking
things to fix individual call sites instead, though.

                        regards, tom lane