(no subject)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

(no subject)

Mark Teper

Hi,

I'm trying to build some numerical processing algorithms on Postgres Array data types.  Using PL/Python I can get access to the numpy libraries but the performance is not great.  A guess is that there is a lot of overhead going from Postgres -> Python List -> Numpy and back again.

I'd like to test if that's the issue, and potentially fix by creating a C extension to convert directly from Postgres Array types to Numpy Array types.  I _think_ I have the C side somewhat working, but I can't get Postgres to use the transform.

What I have:

---

CREATE FUNCTION arr_to_np(val internal) RETURNS internal LANGUAGE C AS 'MODULE_PATHNAME', 'arr_to_np';

CREATE FUNCTION np_to_arr(val internal) RETURNS real[] LANGUAGE C AS 'MODULE_PATHNAME', 'np_to_arr';

CREATE TRANSFORM FOR real[] LANGUAGE plpythonu (

    FROM SQL WITH FUNCTION arr_to_np(internal),

    TO SQL WITH FUNCTION np_to_arr(internal)

);

CREATE FUNCTION fn (a integer[]) RETURNS integer

    TRANSFORM FOR TYPE real[]  

     AS $$  return a $$ LANGUAGE plpythonu;

----

The problem is this produces an error that transforms for type "real" doesn't work.  It doesn't seem to allow for transforms on array's as opposed to underlying types.  Is it possible to tell it to apply the transform to the array?  

Thanks,

Mark

Reply | Threaded
Open this post in threaded view
|

Re: plpython transforms vs. arrays

Tom Lane-2
Mark Teper <[hidden email]> writes:
> The problem is this produces an error that transforms for type "real"
> doesn't work.  It doesn't seem to allow for transforms on array's as
> opposed to underlying types.  Is it possible to tell it to apply the
> transform to the array?

Yeah, see PLy_input_setup_func and PLy_output_setup_func, which both
say

     * Choose conversion method.  Note that transform functions are checked
     * for composite and scalar types, but not for arrays or domains.  This is
     * somewhat historical, but we'd have a problem allowing them on domains,
     * since we drill down through all levels of a domain nest without looking
     * at the intermediate levels at all.

At least for arrays, it might be sufficient to switch the order of the
array-lookup and transform-lookup cases to fix this.  I don't think
anyone's felt motivated to look into that, up to now.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: plpython transforms vs. arrays

Jiří Fejfar
In reply to this post by Mark Teper

Dear Mark,


I am also looking around to find way how to do effectively (in parallel?) computations (with potentially large and sometimes sparse) matrices directly in PostgreSQL. Some time ago I have found this experimental extension https://github.com/PandaPost/panda_post which "allow you to represent Python NumPy/Pandas objects in Postgres". There is also http://madlib.apache.org/ but is seems too much "heavyweight" for my use-case.

Now I do not have much time to spent with this, but I hope in late summer it will be better. I am looking forward what will be your conclusions.


Good luck, Jiří.


On 5/2/19 7:51 PM, Mark Teper wrote:

Hi,

I'm trying to build some numerical processing algorithms on Postgres Array data types.  Using PL/Python I can get access to the numpy libraries but the performance is not great.  A guess is that there is a lot of overhead going from Postgres -> Python List -> Numpy and back again.

I'd like to test if that's the issue, and potentially fix by creating a C extension to convert directly from Postgres Array types to Numpy Array types.  I _think_ I have the C side somewhat working, but I can't get Postgres to use the transform.

What I have:

---

CREATE FUNCTION arr_to_np(val internal) RETURNS internal LANGUAGE C AS 'MODULE_PATHNAME', 'arr_to_np';

CREATE FUNCTION np_to_arr(val internal) RETURNS real[] LANGUAGE C AS 'MODULE_PATHNAME', 'np_to_arr';

CREATE TRANSFORM FOR real[] LANGUAGE plpythonu (

    FROM SQL WITH FUNCTION arr_to_np(internal),

    TO SQL WITH FUNCTION np_to_arr(internal)

);

CREATE FUNCTION fn (a integer[]) RETURNS integer

    TRANSFORM FOR TYPE real[]  

     AS $$  return a $$ LANGUAGE plpythonu;

----

The problem is this produces an error that transforms for type "real" doesn't work.  It doesn't seem to allow for transforms on array's as opposed to underlying types.  Is it possible to tell it to apply the transform to the array?  

Thanks,

Mark

Reply | Threaded
Open this post in threaded view
|

AW:

Sonnenberg-Carstens, Stefan
In reply to this post by Mark Teper

Hi,

 

is the Python code running inside the PostgreSQL instance?

If not, the performance penalty might only get marginally better because it always involves to copy the data to the client and back again.

Mit freundlichen Grüßen
Stefan Sonnenberg-Carstens

Teamleiter Softwareentwicklung WECO / DevOps Engineer

B.Sc. Wirtschaftsinformatik

Signatur_50-Jahre-ingo-man

OPHARDT HYGIENE-TECHNIK GmbH + Co. KG
Lindenau 27
D - 47661 Issum

Tel:  +49 2835 18-492
Fax: +49 2835 18-28492
Mail: [hidden email]

www.OPHARDT.com

Handelsregistereintrag beim Amtsgericht Kleve:
Kommanditgesellschaft HRA 1681, Komplementärin:
OPHARDT HYGIENE-TECHNIK VerwaltungsGmbH HRB 3910

Geschäftsführer: Thomas Houcken

USt.-Identifikations-Nr.: DE813067257


CONFIDENTIALITY NOTICE:

This e-mail communication and any attachments may contain confidential and privileged information for the use
of the designated recipients named above.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies
of the original message.

 

Von: Mark Teper [mailto:[hidden email]]
Gesendet: Donnerstag, 2. Mai 2019 19:51
An: [hidden email]
Betreff:

 

Hi,

I'm trying to build some numerical processing algorithms on Postgres Array data types.  Using PL/Python I can get access to the numpy libraries but the performance is not great.  A guess is that there is a lot of overhead going from Postgres -> Python List -> Numpy and back again.

I'd like to test if that's the issue, and potentially fix by creating a C extension to convert directly from Postgres Array types to Numpy Array types.  I _think_ I have the C side somewhat working, but I can't get Postgres to use the transform.

What I have:

---

CREATE FUNCTION arr_to_np(val internal) RETURNS internal LANGUAGE C AS 'MODULE_PATHNAME', 'arr_to_np';

CREATE FUNCTION np_to_arr(val internal) RETURNS real[] LANGUAGE C AS 'MODULE_PATHNAME', 'np_to_arr';

CREATE TRANSFORM FOR real[] LANGUAGE plpythonu (

    FROM SQL WITH FUNCTION arr_to_np(internal),

    TO SQL WITH FUNCTION np_to_arr(internal)

);

CREATE FUNCTION fn (a integer[]) RETURNS integer

    TRANSFORM FOR TYPE real[]  

     AS $$  return a $$ LANGUAGE plpythonu;

----

The problem is this produces an error that transforms for type "real" doesn't work.  It doesn't seem to allow for transforms on array's as opposed to underlying types.  Is it possible to tell it to apply the transform to the array?  

Thanks,

Mark

Reply | Threaded
Open this post in threaded view
|

Re: plpython transforms vs. arrays

Mark Teper
In reply to this post by Jiří Fejfar
Hi Stefan,

is the Python code running inside the PostgreSQL instance?  

Yes, it is using the PL Python language so it runs in the PostgreSQL instance.  With some help from Tom, I've been able to make the change needed to allow this transform.  

I need to do some more testing, but my preliminary results for the arrays I'm interested in show transforming Postgres to Numpy is about 5x faster and transforming Numpy back to Postgres about 2x faster.

Regards,
Mark
Reply | Threaded
Open this post in threaded view
|

AW: plpython transforms vs. arrays

Sonnenberg-Carstens, Stefan

Understood, a great example what is possible if the toolset is open ;-)

 

Mit freundlichen Grüßen
Stefan Sonnenberg-Carstens

Teamleiter Softwareentwicklung WECO / DevOps Engineer

B.Sc. Wirtschaftsinformatik

Signatur_50-Jahre-ingo-man

OPHARDT HYGIENE-TECHNIK GmbH + Co. KG
Lindenau 27
D - 47661 Issum

Tel:  +49 2835 18-492
Fax: +49 2835 18-28492
Mail: [hidden email]

www.OPHARDT.com

Handelsregistereintrag beim Amtsgericht Kleve:
Kommanditgesellschaft HRA 1681, Komplementärin:
OPHARDT HYGIENE-TECHNIK VerwaltungsGmbH HRB 3910

Geschäftsführer: Thomas Houcken

USt.-Identifikations-Nr.: DE813067257


CONFIDENTIALITY NOTICE:

This e-mail communication and any attachments may contain confidential and privileged information for the use
of the designated recipients named above.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies
of the original message.

 

Von: Mark Teper [mailto:[hidden email]]
Gesendet: Montag, 6. Mai 2019 11:33
An: Sonnenberg-Carstens, Stefan; [hidden email]
Betreff: Re: plpython transforms vs. arrays

 

Hi Stefan,

 

is the Python code running inside the PostgreSQL instance?  

 

Yes, it is using the PL Python language so it runs in the PostgreSQL instance.  With some help from Tom, I've been able to make the change needed to allow this transform.  

 

I need to do some more testing, but my preliminary results for the arrays I'm interested in show transforming Postgres to Numpy is about 5x faster and transforming Numpy back to Postgres about 2x faster.

 

Regards,

Mark