Creating 2D arrays for pg_copy_from, reading tab-delimted text file that contains comma and double quotes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Creating 2D arrays for pg_copy_from, reading tab-delimted text file that contains comma and double quotes

s400t
Hello Good People of the Forum!

I am trying to insert some data into a PostgreSQL database using PHP and struggling to create an array so that pg_copy_from function will accept and process the data.

I can insert data but not the way I want- my data this case contains comma, space, double quotes and unpaired double quote.
I need to use Excel to create the data and save it as tab delimited text file, and then convert encoding to UTF-8 (from Excel's ANSI) before passing it to the PHP.

Here is my dummy data in Excel:
rec_no    item1    item2    item3    item4    item5
1    Denny's    orange juice    1,500 yen    "Dear John"    32" TV

Explanation: the first row is header.
The second row is data for each column.
I tried to create variation like apostrophe, space between words, comma, double quotes and unpaired double quote (32" TV).

When I save that Excel as a tab delimited text file, I get this:
rec_no    item1    item2    item3    item4    item5
1    Denny's    orange juice    "1,500 yen"    """Dear John"""    "32"" TV"
(As seen when I opened that file with Notepad)

Because my data also contains non-ascii, I saved the file with UTF-8 encoding using the Notepad.

Then I created a two-dimensional array with PHP:

1.$file = 'test.txt'; //tab delimited file
2.$fileRead = fopen($file, 'r');
           
3.$row = 1;
4.$twoDarray = array();
5.while (($line = fgetcsv($fileRead, 0, "\t")) !== FALSE) { //0 means I can read row whatever its length
6.    if($row == 1){ $row++; continue; } //skip header
7.    $line = implode(" ",$line). "\n";
8.    $twoDarray[] = $line;
9.}
10.fclose($fileRead);

Then I passed that twoDarray to pg_copy_from.

$con=pg_connect("host=$host dbname=$dbname port=5432 user=$user password=$password");

11.if (!$con) {
12.    die("Couldn't open..<br>\n");
13.}

14.if (pg_copy_from($con, $tableName, $twoDarray) !== FALSE) {
15.    print "OK!";
16.}
17.else{
18.    print "Not OK.";   
19.}

When I run the program, I have this error:
Warning: pg_copy_from(): Copy command failed:
ERROR: value too long for type character varying(32) CONTEXT: COPY test_table, line 1, column rec_no: "1 Denny's orange juice 1,500 yen "Dear John" 32" TV" in testProgram.php line xx.

My table definition is:
CREATE TABLE test_table (
rec_no VARCHAR(32) PRIMARY KEY NOT NULL,
item1 VARCHAR(255),..item2 .. until item5.);

Obviously, my program thinks everything in the data row is for the first field.
No, no.

How to make it think that
1 is for the 'rec_no' field,
Denny's is for the 'item1' field,
orange juice is for the 'item2' field,
1,500 yen is for the 'item3' field,
"Dear John" is for the 'item4' field and
32" TV is for the 'item5' field?

When I tried removing '0' from line 5, that is,
while (($line = fgetcsv($fileRead, "\t")) !== FALSE) { //without read length

I can see data written in the database, but with some extra double quotes and a missing comma!
That is data was saved as
Denny's, orange juice, "1 500 yen", """Dear John""", and  "32"" TV"
into the respective fields.
I cannot have those extra double quotes, and I cannot have missing comma in my data.
Reply | Threaded
Open this post in threaded view
|

Re: Creating 2D arrays for pg_copy_from, reading tab-delimted text file that contains comma and double quotes

rob stone-2
Hello,

On Mon, 2018-12-17 at 21:34 +0900, [hidden email] wrote:

> Hello Good People of the Forum!
>
> I am trying to insert some data into a PostgreSQL database using PHP
> and struggling to create an array so that pg_copy_from function will
> accept and process the data.
>
> I can insert data but not the way I want- my data this case contains
> comma, space, double quotes and unpaired double quote.
> I need to use Excel to create the data and save it as tab delimited
> text file, and then convert encoding to UTF-8 (from Excel's ANSI)
> before passing it to the PHP.
>
> Here is my dummy data in Excel:
> rec_no    item1    item2    item3    item4    item5
> 1    Denny's    orange juice    1,500 yen    "Dear John"    32" TV
>
> Explanation: the first row is header.
> The second row is data for each column.
> I tried to create variation like apostrophe, space between words,
> comma, double quotes and unpaired double quote (32" TV).
>
> When I save that Excel as a tab delimited text file, I get this:
> rec_no    item1    item2    item3    item4    item5
> 1    Denny's    orange juice    "1,500 yen"    """Dear John"""  
> "32"" TV"
> (As seen when I opened that file with Notepad)
>
> Because my data also contains non-ascii, I saved the file with UTF-8
> encoding using the Notepad.
>
> Then I created a two-dimensional array with PHP:
>
> 1.$file = 'test.txt'; //tab delimited file
> 2.$fileRead = fopen($file, 'r');
>            
> 3.$row = 1;
> 4.$twoDarray = array();
> 5.while (($line = fgetcsv($fileRead, 0, "\t")) !== FALSE) { //0 means
> I can read row whatever its length
> 6.    if($row == 1){ $row++; continue; } //skip header
> 7.    $line = implode(" ",$line). "\n";
> 8.    $twoDarray[] = $line;
> 9.}
> 10.fclose($fileRead);
>
> Then I passed that twoDarray to pg_copy_from.
>
> $con=pg_connect("host=$host dbname=$dbname port=5432 user=$user
> password=$password");
>
> 11.if (!$con) {
> 12.    die("Couldn't open..<br>\n");
> 13.}
>
> 14.if (pg_copy_from($con, $tableName, $twoDarray) !== FALSE) {
> 15.    print "OK!";
> 16.}
> 17.else{
> 18.    print "Not OK.";    
> 19.}
>
> When I run the program, I have this error:
> Warning: pg_copy_from(): Copy command failed:
> ERROR: value too long for type character varying(32) CONTEXT: COPY
> test_table, line 1, column rec_no: "1 Denny's orange juice 1,500 yen
> "Dear John" 32" TV" in testProgram.php line xx.
>
> My table definition is:
> CREATE TABLE test_table (
> rec_no VARCHAR(32) PRIMARY KEY NOT NULL,
> item1 VARCHAR(255),..item2 .. until item5.);
>
> Obviously, my program thinks everything in the data row is for the
> first field.
> No, no.
>
> How to make it think that
> 1 is for the 'rec_no' field,
> Denny's is for the 'item1' field,
> orange juice is for the 'item2' field,
> 1,500 yen is for the 'item3' field,
> "Dear John" is for the 'item4' field and
> 32" TV is for the 'item5' field?
>
> When I tried removing '0' from line 5, that is,
> while (($line = fgetcsv($fileRead, "\t")) !== FALSE) { //without read
> length
>
> I can see data written in the database, but with some extra double
> quotes and a missing comma!
> That is data was saved as
> Denny's, orange juice, "1 500 yen", """Dear John""", and
> "32"" TV"
> into the respective fields.
> I cannot have those extra double quotes, and I cannot have missing
> comma in my data.



fgetscsv returns an array from reading a record from a text file.
So  $line = fgetcsv($fileRead, 0, "\t", '"') would use tab as the
delimiter and remove any enclosure character equal to ".

I don't use Excel. With Libreoffice you can set the delimiter to the
pipe character and tell it not to use enclosures. So I don't know how
to massage your Excel file so that 32" TV is rendered correctly.

Secondly, in PHP implode takes an array and turns it into a string with
a specified character used to delimit the values from the array.

I can only suggest that you read the PHP manual, as well as do some
searches for the use of pg_copy_from, although I doubt anything useful
will turn up.

Cheers,
Rob


Reply | Threaded
Open this post in threaded view
|

Re: Creating 2D arrays for pg_copy_from, reading tab-delimted text file that contains comma and double quotes

Daniel Verite
In reply to this post by s400t
          <[hidden email]> wrote:

> When I save that Excel as a tab delimited text file, I get this:rec_no  
> item1    item2    item3    item4    item5
> 1    Denny's    orange juice    "1,500 yen"    """Dear John"""    "32""
> TV"(As seen when I opened that file with Notepad)

This looks good. Fields are properly enclosed and double quotes
in contents are doubled, as expected in CSV.

> 5.while (($line = fgetcsv($fileRead, 0, "\t")) !== FALSE) { //0 means I can
> read row whatever its length
> 6.    if($row == 1){ $row++; continue; } //skip header
> 7.    $line = implode(" ",$line). "\n";
> 8.    $twoDarray[] = $line;
> ...
> 14.if (pg_copy_from($con, $tableName, $twoDarray) !== FALSE) {

It goes wrong at line 7. pg_copy_from() expects lines in the
COPY "text format" documented at
https://www.postgresql.org/docs/current/sql-copy.html

It implies that:
- since your call to pg_copy_from() doesn't specify a delimiter
it uses tab, not a space, so implode() must be passed a tab,
not a space.
- if there are backslashes in the contents they must be quoted
by doubling them.
- if there are newline or carriage return characters in the contents
they must be replaced by \n and \r respectively, so as to
not be confused with an end of record.
- if there are tabs in the contents they must be replaced by \t.

These replacements can all be done by a single strtr() call in php.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Reply | Threaded
Open this post in threaded view
|

Re: Creating 2D arrays for pg_copy_from, reading tab-delimted text file that contains comma and double quotes

s400t
Hi Daniel,

Thank you for the comments.

I changed my line 7 to this:
$line = implode("\t",$line). "\n";

Now, I get the following.

Warning: pg_copy_from(): Copy command failed: ERROR: extra data after last expected column CONTEXT: COPY test_table, line 1: "1    Denny's    orange juice    "1    500 yen"    """Dear John""" "32"" TV"" in






----- Original Message -----

> From: Daniel Verite <[hidden email]>
> To: [hidden email]
> Cc: [hidden email]
> Date: 2018/12/18, Tue 00:35
> Subject: Re: Creating 2D arrays for pg_copy_from, reading tab-delimted text file  that contains comma and double quotes
>
>       <[hidden email]> wrote:
>
>>  When I save that Excel as a tab delimited text file, I get this:rec_no 
>>  item1    item2    item3    item4    item5
>>  1    Denny's    orange juice    "1,500 yen"   
> """Dear John"""    "32""
>>  TV"(As seen when I opened that file with Notepad)
>
> This looks good. Fields are properly enclosed and double quotes
> in contents are doubled, as expected in CSV.
>
>>  5.while (($line = fgetcsv($fileRead, 0, "\t")) !== FALSE) {
> //0 means I can
>>  read row whatever its length
>>  6.    if($row == 1){ $row++; continue; } //skip header
>>  7.    $line = implode(" ",$line). "\n";
>>  8.    $twoDarray[] = $line;
>>  ...
>>  14.if (pg_copy_from($con, $tableName, $twoDarray) !== FALSE) {
>
> It goes wrong at line 7. pg_copy_from() expects lines in the
> COPY "text format" documented at
> https://www.postgresql.org/docs/current/sql-copy.html 
>
> It implies that:
> - since your call to pg_copy_from() doesn't specify a delimiter
> it uses tab, not a space, so implode() must be passed a tab,
> not a space.
> - if there are backslashes in the contents they must be quoted
> by doubling them.
> - if there are newline or carriage return characters in the contents
> they must be replaced by \n and \r respectively, so as to
> not be confused with an end of record.
> - if there are tabs in the contents they must be replaced by \t.
>
> These replacements can all be done by a single strtr() call in php.
>
>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org 
> Twitter: @DanielVerite
>


Reply | Threaded
Open this post in threaded view
|

Re: Creating 2D arrays for pg_copy_from, reading tab-delimted text file that contains comma and double quotes

s400t
In reply to this post by Daniel Verite
Hello Daniel!

This afternoon, I tried it again, and yes, your suggestion that I use a tab
delimited symbol ("\t") instead of a space for the implode works flawlessly.

Yesterday, I reported I got "Warning: pg_copy_from(): Copy command failed: ERROR: extra data after last expected..".
That was because while reading file, I had used this:
     while (($line = fgetcsv($fileRead, "\t")) !== FALSE) {


Today, I used this:
     while (($line = fgetcsv($fileRead, 0, "\t")) !== FALSE) {


.. and now I can see my data saved in the database without those extra double quotes.

It made my day!

Thank you.





----- Original Message -----

> From: Daniel Verite <[hidden email]>
> To: [hidden email]
> Cc: [hidden email]
> Date: 2018/12/18, Tue 00:35
> Subject: Re: Creating 2D arrays for pg_copy_from, reading tab-delimted text file  that contains comma and double quotes
>
>       <[hidden email]> wrote:
>
>>  When I save that Excel as a tab delimited text file, I get this:rec_no 
>>  item1    item2    item3    item4    item5
>>  1    Denny's    orange juice    "1,500 yen"   
> """Dear John"""    "32""
>>  TV"(As seen when I opened that file with Notepad)
>
> This looks good. Fields are properly enclosed and double quotes
> in contents are doubled, as expected in CSV.
>
>>  5.while (($line = fgetcsv($fileRead, 0, "\t")) !== FALSE) {
> //0 means I can
>>  read row whatever its length
>>  6.    if($row == 1){ $row++; continue; } //skip header
>>  7.    $line = implode(" ",$line). "\n";
>>  8.    $twoDarray[] = $line;
>>  ...
>>  14.if (pg_copy_from($con, $tableName, $twoDarray) !== FALSE) {
>
> It goes wrong at line 7. pg_copy_from() expects lines in the
> COPY "text format" documented at
> https://www.postgresql.org/docs/current/sql-copy.html 
>
> It implies that:
> - since your call to pg_copy_from() doesn't specify a delimiter
> it uses tab, not a space, so implode() must be passed a tab,
> not a space.
> - if there are backslashes in the contents they must be quoted
> by doubling them.
> - if there are newline or carriage return characters in the contents
> they must be replaced by \n and \r respectively, so as to
> not be confused with an end of record.
> - if there are tabs in the contents they must be replaced by \t.
>
> These replacements can all be done by a single strtr() call in php.
>
>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org 
> Twitter: @DanielVerite
>


Reply | Threaded
Open this post in threaded view
|

Re: Creating 2D arrays for pg_copy_from, reading tab-delimted text file that contains comma and double quotes

s400t
>and now I can see my data saved in the database without those extra double quotes.
.. and with my comma intact as well.




----- Original Message -----

> From: "[hidden email]" <[hidden email]>
> To: Daniel Verite <[hidden email]>
> Cc: "[hidden email]" <[hidden email]>
> Date: 2018/12/18, Tue 14:15
> Subject: Re: Creating 2D arrays for pg_copy_from, reading tab-delimted text file  that contains comma and double quotes
>
> Hello Daniel!
>
> This afternoon, I tried it again, and yes, your suggestion that I use a tab
> delimited symbol ("\t") instead of a space for the implode works
> flawlessly.
>
> Yesterday, I reported I got "Warning: pg_copy_from(): Copy command failed:
> ERROR: extra data after last expected..".
> That was because while reading file, I had used this:
>      while (($line = fgetcsv($fileRead, "\t")) !== FALSE) {
>
>
> Today, I used this:
>      while (($line = fgetcsv($fileRead, 0, "\t")) !== FALSE) {
>
>
> .. and now I can see my data saved in the database without those extra double
> quotes.
>
> It made my day!
>
> Thank you.
>
>
>
>
>
> ----- Original Message -----
>>  From: Daniel Verite <[hidden email]>
>>  To: [hidden email]
>>  Cc: [hidden email]
>>  Date: 2018/12/18, Tue 00:35
>>  Subject: Re: Creating 2D arrays for pg_copy_from, reading tab-delimted text
> file  that contains comma and double quotes
>>
>>        <[hidden email]> wrote:
>>
>>>   When I save that Excel as a tab delimited text file, I get
> this:rec_no 
>>>   item1    item2    item3    item4    item5
>>>   1    Denny's    orange juice    "1,500 yen"   
>>  """Dear John"""    "32""
>>>   TV"(As seen when I opened that file with Notepad)
>>
>>  This looks good. Fields are properly enclosed and double quotes
>>  in contents are doubled, as expected in CSV.
>>
>>>   5.while (($line = fgetcsv($fileRead, 0, "\t")) !==
> FALSE) {
>>  //0 means I can
>>>   read row whatever its length
>>>   6.    if($row == 1){ $row++; continue; } //skip header
>>>   7.    $line = implode(" ",$line). "\n";
>>>   8.    $twoDarray[] = $line;
>>>   ...
>>>   14.if (pg_copy_from($con, $tableName, $twoDarray) !== FALSE) {
>>
>>  It goes wrong at line 7. pg_copy_from() expects lines in the
>>  COPY "text format" documented at
>>  https://www.postgresql.org/docs/current/sql-copy.html 
>>
>>  It implies that:
>>  - since your call to pg_copy_from() doesn't specify a delimiter
>>  it uses tab, not a space, so implode() must be passed a tab,
>>  not a space.
>>  - if there are backslashes in the contents they must be quoted
>>  by doubling them.
>>  - if there are newline or carriage return characters in the contents
>>  they must be replaced by \n and \r respectively, so as to
>>  not be confused with an end of record.
>>  - if there are tabs in the contents they must be replaced by \t.
>>
>>  These replacements can all be done by a single strtr() call in php.
>>
>>
>>  Best regards,
>>  --
>>  Daniel Vérité
>>  PostgreSQL-powered mailer: http://www.manitou-mail.org 
>>  Twitter: @DanielVerite
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: Creating 2D arrays for pg_copy_from, reading tab-delimited text file that contains comma and double quotes

s400t
In reply to this post by rob stone-2
Hello Rob,

Thank you for your comments.
As I wrote just a few minutes ago, using a tab delimiter inside the implode as suggested by Daniel (and specifying the read length ("0") for the fgetcsv has solved this problem.

At one point, I had tried to use the last parameter - enclosure character - in your reply, but I got an error, something like,
Warning: pg_copy_from(): Copy command failed: ERROR: missing data for column
"item1" CONTEXT: COPY test_table, line 1: "1 Denny's orange juice 1,500
yen" ...

Then Daniel's reply came and I tried that, still made errors here and there
(removed the read length- but forgot to put it back in subsequent
tests), and was looking for ways to strip off the leading and trailing
double quotes while leaving the comma intact.

FF now..


Anyway, crisis averted this time!

Thank you again for taking time to share your thoughts.

Cheers!




----- Original Message -----

> From: rob stone <[hidden email]>
> To: [hidden email]; "[hidden email]" <[hidden email]>
> Cc:
> Date: 2018/12/17, Mon 23:38
> Subject: Re: Creating 2D arrays for pg_copy_from, reading tab-delimted text file that contains comma and double quotes
>
> Hello,
>
> On Mon, 2018-12-17 at 21:34 +0900, [hidden email] wrote:
>>  Hello Good People of the Forum!
>>
>>  I am trying to insert some data into a PostgreSQL database using PHP
>>  and struggling to create an array so that pg_copy_from function will
>>  accept and process the data.
>>
>>  I can insert data but not the way I want- my data this case contains
>>  comma, space, double quotes and unpaired double quote.
>>  I need to use Excel to create the data and save it as tab delimited
>>  text file, and then convert encoding to UTF-8 (from Excel's ANSI)
>>  before passing it to the PHP.
>>
>>  Here is my dummy data in Excel:
>>  rec_no    item1    item2    item3    item4    item5
>>  1    Denny's    orange juice    1,500 yen    "Dear John"   
> 32" TV
>>
>>  Explanation: the first row is header.
>>  The second row is data for each column.
>>  I tried to create variation like apostrophe, space between words,
>>  comma, double quotes and unpaired double quote (32" TV).
>>
>>  When I save that Excel as a tab delimited text file, I get this:
>>  rec_no    item1    item2    item3    item4    item5
>>  1    Denny's    orange juice    "1,500 yen"   
> """Dear John""" 
>>  "32"" TV"
>>  (As seen when I opened that file with Notepad)
>>
>>  Because my data also contains non-ascii, I saved the file with UTF-8
>>  encoding using the Notepad.
>>
>>  Then I created a two-dimensional array with PHP:
>>
>>  1.$file = 'test.txt'; //tab delimited file
>>  2.$fileRead = fopen($file, 'r');
>>            
>>  3.$row = 1;
>>  4.$twoDarray = array();
>>  5.while (($line = fgetcsv($fileRead, 0, "\t")) !== FALSE) {
> //0 means
>>  I can read row whatever its length
>>  6.    if($row == 1){ $row++; continue; } //skip header
>>  7.    $line = implode(" ",$line). "\n";
>>  8.    $twoDarray[] = $line;
>>  9.}
>>  10.fclose($fileRead);
>>
>>  Then I passed that twoDarray to pg_copy_from.
>>
>>  $con=pg_connect("host=$host dbname=$dbname port=5432 user=$user
>>  password=$password");
>>
>>  11.if (!$con) {
>>  12.    die("Couldn't open..<br>\n");
>>  13.}
>>
>>  14.if (pg_copy_from($con, $tableName, $twoDarray) !== FALSE) {
>>  15.    print "OK!";
>>  16.}
>>  17.else{
>>  18.    print "Not OK.";   
>>  19.}
>>
>>  When I run the program, I have this error:
>>  Warning: pg_copy_from(): Copy command failed:
>>  ERROR: value too long for type character varying(32) CONTEXT: COPY
>>  test_table, line 1, column rec_no: "1 Denny's orange juice 1,500
> yen
>>  "Dear John" 32" TV" in testProgram.php line xx.
>>
>>  My table definition is:
>>  CREATE TABLE test_table (
>>  rec_no VARCHAR(32) PRIMARY KEY NOT NULL,
>>  item1 VARCHAR(255),..item2 .. until item5.);
>>
>>  Obviously, my program thinks everything in the data row is for the
>>  first field.
>>  No, no.
>>
>>  How to make it think that
>>  1 is for the 'rec_no' field,
>>  Denny's is for the 'item1' field,
>>  orange juice is for the 'item2' field,
>>  1,500 yen is for the 'item3' field,
>>  "Dear John" is for the 'item4' field and
>>  32" TV is for the 'item5' field?
>>
>>  When I tried removing '0' from line 5, that is,
>>  while (($line = fgetcsv($fileRead, "\t")) !== FALSE) {
> //without read
>>  length
>>
>>  I can see data written in the database, but with some extra double
>>  quotes and a missing comma!
>>  That is data was saved as
>>  Denny's,    orange juice,    "1 500 yen",   
> """Dear John""", and
>>      "32"" TV"
>>  into the respective fields.
>>  I cannot have those extra double quotes, and I cannot have missing
>>  comma in my data.
>
>
>
> fgetscsv returns an array from reading a record from a text file.
> So  $line = fgetcsv($fileRead, 0, "\t", '"') would
> use tab as the
> delimiter and remove any enclosure character equal to ".
>
> I don't use Excel. With Libreoffice you can set the delimiter to the
> pipe character and tell it not to use enclosures. So I don't know how
> to massage your Excel file so that 32" TV is rendered correctly.
>
> Secondly, in PHP implode takes an array and turns it into a string with
> a specified character used to delimit the values from the array.
>
> I can only suggest that you read the PHP manual, as well as do some
> searches for the use of pg_copy_from, although I doubt anything useful
> will turn up.
>
> Cheers,
> Rob
>