One process per session lack of sharing

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
55 messages Options
123
Reply | Threaded
Open this post in threaded view
|

One process per session lack of sharing

AMatveev
Hi

Is  there  any  plan  to  implement  "session  per  thread" or "shared
sessions between thread"?
We  have analyzed  the  ability to contribute  pgSql to jvm bytecode compiler but with
current   thread   model  this  idea  is  far  from optimal.(Vm can be different of course.
But currently we use oracle and jvm is important for us)

We have faced with some lack of sharing resources.
So in our test memory usage per session:
Oracle: about 5M
MSSqlServer: about 4M
postgreSql: about 160М

It's discussed on [hidden email]:
http://www.mail-archive.com/pgsql-general@.../msg206452.html


>I think the "problem" that he is having is fixable only by changing how
>PostgreSQL itself works. His problem is a PL/pgSQL function which is 11K
>lines in length. When invoked, this function is "compiled" into a large
>tokenized parse tree. This parse tree is only usable in the session which
>invoked the the function. Apparently this parse tree takes a lot of memory.
>And "n" concurrent users of this, highly used, function will therefore
>require "n" times as much memory because the parse tree is _not_
>shareable.  This is explained in:
>https://www.postgresql.org/docs/9.5/static/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING

Next  interesting  answer(from  Karl  Czajkowski  <[hidden email]>  in
private):
>  But, I search the
> archives of the mailing list, and when others have previously
> suggested such caching or reuse, it was immediately shot down by core
> developers.




--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: One process per session lack of sharing

Tom Lane-2
[hidden email] writes:
> Is  there  any  plan  to  implement  "session  per  thread" or "shared
> sessions between thread"?

No, not really.  The amount of overhead that would add --- eg, the need
for locking on what used to be single-use caches --- makes the benefit
highly questionable.  Also, most people who need this find that sticking
a connection pooler in front of the database solves their problem, so
there's not that much motivation to do a ton of work inside the database
to solve it there.

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: One process per session lack of sharing

AMatveev
Hi

> [hidden email] writes:
>> Is  there  any  plan  to  implement  "session  per  thread" or "shared
>> sessions between thread"?

> No, not really.  The amount of overhead that would add --- eg, the need
> for locking on what used to be single-use caches --- makes the benefit
> highly questionable.
A two-layer cache is the best answer.
>  Also, most people who need this find that sticking
> a connection pooler in front of the database solves their problem
It has some disadvantages. Lack of temporary table for example
Practical  usage  of  that  table  with  connection  poller is  highly
questionable.
And so on.
> , so
> there's not that much motivation to do a ton of work inside the database
> to solve it there.
It is clear. Thank you.


--



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: One process per session lack of sharing

Robert Haas
In reply to this post by Tom Lane-2
On Tue, Jul 12, 2016 at 9:18 AM, Tom Lane <[hidden email]> wrote:

> [hidden email] writes:
>> Is  there  any  plan  to  implement  "session  per  thread" or "shared
>> sessions between thread"?
>
> No, not really.  The amount of overhead that would add --- eg, the need
> for locking on what used to be single-use caches --- makes the benefit
> highly questionable.  Also, most people who need this find that sticking
> a connection pooler in front of the database solves their problem, so
> there's not that much motivation to do a ton of work inside the database
> to solve it there.

I agree that there's not really a plan to implement this, but I don't
agree that connection pooling solves the whole problem.  Most people
can't get by with statement pooling, so in practice you are looking at
transaction pooling or session pooling.  And that means that you can't
really keep the pool size as small as you'd like because backends can
be idle in transaction for long enough to force the pool size to be
pretty large.  Also, pooling causes the same backends to get reused
for different sessions which touch different relations and different
functions so that, for example, the relcache and the PL/pgsql function
caches grow until every one of those sessions has everything cached
that any client needs.  That can cause big problems.

So, I actually think it would be a good idea to think about this.  The
problem, of course, is that as long as we allow arbitrary parts of the
code - including extension code - to declare global variables and
store arbitrary stuff in them without any coordination, it's
impossible to imagine hibernating and resuming a session without a
risk of things going severely awry.  This was a major issue for
parallel query, but we've solved it, mostly, by designating the things
that rely on global variables as parallel-restricted, and there
actually aren't a ton of those.  So I think it's imaginable that we
can get to a point where we can, at least in some circumstances, let a
backend exit and reconstitute its state at a later time.  It's not an
easy project, but I think it is one we will eventually need to do.
Insisting that the current model is working is just sticking our head
in the sand.  It's mostly working, but there are workloads where it
fails badly - and competing database products survive a number of
scenarios where we just fall on our face.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: One process per session lack of sharing

Craig Ringer-3
On 14 July 2016 at 03:59, Robert Haas <[hidden email]> wrote:
 
I agree that there's not really a plan to implement this, but I don't
agree that connection pooling solves the whole problem.  Most people
can't get by with statement pooling, so in practice you are looking at
transaction pooling or session pooling.  And that means that you can't
really keep the pool size as small as you'd like because backends can
be idle in transaction for long enough to force the pool size to be
pretty large.  Also, pooling causes the same backends to get reused
for different sessions which touch different relations and different
functions so that, for example, the relcache and the PL/pgsql function
caches grow until every one of those sessions has everything cached
that any client needs.  That can cause big problems.

So, I actually think it would be a good idea to think about this.

I agree. It's been on my mind for a while, but I've been assuming it's likely to involve such architectural upheaval as to be impractical.

Right now PostgreSQL conflates "user session state" and "execution engine" into one process. This means we need an external connection pooler to handle things if we want more user connections than we can efficiently handle in terms of number of executors. Current poolers don't do much to keep track of user state, they just arbitrate access to executors and expect applications to re-establish any needed state (SET vars, LISTEN, etc) or not use features that require persistence across the current pooling level.

This leaves users in the hard position of using very high, inefficient max_connections values to keep track of application<->DB state or jump through awkward hoops to use transaction pooling, either at the application level (Java appserver pools, etc) or through a proxy. 

If using the high max_connections approach the user must also ensure that they don't have all those max_connections actually doing work at the same time using some kind of external coordination. Otherwise they'll thrash the server and face out of memory issues (especially with our rather simplistic work_mem management, etc) and poor performance.

The solution, to me, is to separate "user state" and "executor". Sounds nice, but we use global variables _everywhere_ and it's assumed throughout the code that we have one user session for the life of a backend, though with some exceptions for SET SESSION AUTHORIZATION. It's not likely to be fun.

The
problem, of course, is that as long as we allow arbitrary parts of the
code - including extension code - to declare global variables and
store arbitrary stuff in them without any coordination, it's
impossible to imagine hibernating and resuming a session without a
risk of things going severely awry.

Yeah. We'd definitely need a session state management mechanism with save and restore functionality.

There's also stuff like:

* LISTEN
* advistory locking at the session level
* WITH HOLD cursors

that isn't simple to just save and restore. Those are some of the same things that are painful with transaction pooling right now.
 
This was a major issue for
parallel query, but we've solved it, mostly, by designating the things
that rely on global variables as parallel-restricted, and there
actually aren't a ton of those.  So I think it's imaginable that we
can get to a point where we can, at least in some circumstances, let a
backend exit and reconstitute its state at a later time.  It's not an
easy project, but I think it is one we will eventually need to do.

I agree on both points, but I think "not easy" is rather an understatement.

Starting with a narrow scope would help. Save/restore GUCs and the other easy stuff, and disallow sessions that are actively LISTENing, hold advisory locks, have open cursors, etc from being saved and restored.

BTW, I think this would also give us a useful path toward allowing connection poolers to change the active user and re-authenticate on an existing backend. Right now you have to use SET ROLE or SET SESSION AUTHORIZATION (ugh) and can't stop the client you hand the connection to from just RESETing back to the pooler's user and doing whatever it wants.
 
Insisting that the current model is working is just sticking our head
in the sand.  It's mostly working, but there are workloads where it
fails badly - and competing database products survive a number of
scenarios where we just fall on our face.

Yep, and like parallel query it's a long path, but it's one we've got to face sooner or later.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: One process per session lack of sharing

Craig Ringer-3
In reply to this post by AMatveev
On 12 July 2016 at 21:57, <[hidden email]> wrote:
Hi

Is  there  any  plan  to  implement  "session  per  thread" or "shared
sessions between thread"?

As has been noted by others, there isn't any such plan right now.

PostgreSQL isn't threaded. It uses a multi-processing shared-nothing-by-default memory with explicit shared memory, plus copy-on-write memory forking from the postmaster for initial backend state (except on Windows, where we emulate that). It relies on processes being cheap and light-weight.

This is a very poor fit with Java's thread-based shared-by-default model with expensive heavyweight process startup and cheap threads. Process forking doesn't clone all threads and the JVM has lots of worker threads, so we can't start the JVM once in the postmaster then clone it with each forked postgres backend. Plus the JVM just isn't designed to cope with that and would surely get thoroughly confused when its file handles are cloned, its process ID changes, etc. This is one of the reasons PL/Java has never really taken off. We can mitigate the JVM startup costs a bit by preloading the JVM libraries into the postmaster and using the JVM's base class library preloading, but unless you're running trivial Java code you still do a lot of work at each JVM start after the postgres backend forks.
 
We  have analyzed  the  ability to contribute  pgSql to jvm bytecode compiler but with
current   thread   model  this  idea  is  far  from optimal.(Vm can be different of course.
But currently we use oracle and jvm is important for us)

Yep, that's a real sticking point for a number of people.

The usual solution at this point is to move most of the work into an application-server mid-layer. That moves work further away from the DB, which has its own costs, and isn't something you're likely to be happy with if you're looking at things like optimising PL/PgSQL with a bytecode compiler. But it's the best we have right now.
 
We have faced with some lack of sharing resources.
So in our test memory usage per session:
Oracle: about 5M
MSSqlServer: about 4M
postgreSql: about 160М

Yep, that sounds about right. Unfortunately.

You may be able to greatly reduce that cost if you can store your cached compiled data in a shared memory segment created by your extension. This will get a bit easier with the new dynamic shared memory infrastructure, but it's going to be no fun at all to make that play with the JVM. You'll probably need a lot of JNI.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: One process per session lack of sharing

Vladimir Sitnikov
Craig>That moves work further away from the DB, which has its own costs, and isn't something you're likely to be happy with if you're looking at things like optimising PL/PgSQL with a bytecode compiler. But it's the best we have right now.

What if JVM was started within a background worker?
Then JVM can spawn several threads that serve PL requests on a "thread per backend" basis.

Craig>You may be able to greatly reduce that cost if you can store your cached compiled data in a shared memory segment created by your extension.
Craig>This will get a bit easier with the new dynamic shared memory infrastructure, but it's going to be no fun at all to make that play with the JVM. You'll probably need a lot of JNI.

There's https://github.com/jnr/jnr-ffi that enables to call C functions without resorting to writing JNI wrappers.

Vladimir
Reply | Threaded
Open this post in threaded view
|

Re: One process per session lack of sharing

Craig Ringer-3
On 14 July 2016 at 14:28, Vladimir Sitnikov <[hidden email]> wrote:
Craig>That moves work further away from the DB, which has its own costs, and isn't something you're likely to be happy with if you're looking at things like optimising PL/PgSQL with a bytecode compiler. But it's the best we have right now.

What if JVM was started within a background worker?
Then JVM can spawn several threads that serve PL requests on a "thread per backend" basis.

You can't really execute the plpgsql-compiled-to-bytecode outside the user session, so you need a JVM in it anyway. 
 
You probably could have a bgworker or pool of bgworkers doing your plpgsql compilation and caching. But because your plpgsql might reference uncommitted catalog entries in the local backend, you'd the bgworker to join an exported snapshot from the backend you're compiling the plpgsql for. If it doesn't let you avoid having a jvm in each backend it's not likely to be too useful.

Craig>You may be able to greatly reduce that cost if you can store your cached compiled data in a shared memory segment created by your extension.
Craig>This will get a bit easier with the new dynamic shared memory infrastructure, but it's going to be no fun at all to make that play with the JVM. You'll probably need a lot of JNI.

There's https://github.com/jnr/jnr-ffi that enables to call C functions without resorting to writing JNI wrappers.

Yes, and JNA as well.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: One process per session lack of sharing

AMatveev
In reply to this post by Robert Haas
Hi

> On Tue, Jul 12, 2016 at 9:18 AM, Tom Lane <[hidden email]> wrote:
>> [hidden email] writes:
>>> Is  there  any  plan  to  implement  "session  per  thread" or "shared
>>> sessions between thread"?
>>...
>> so
>> there's not that much motivation to do a ton of work inside the database
>> to solve it there.

> I agree that there's not really a plan to implement this, but I don't
> ...

> So, I actually think it would be a good idea to think about this.

I just want to note that converting global variables to  thread-specific variables.
It's large work offcourse.
But it's not seemed to be a ton of work.
And it's the most part of refactoring for  "session  per  thread".
Offcourse that's not all.
But it  seemed to be the most valuable reason not to do that work.






--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: One process per session lack of sharing

AMatveev
In reply to this post by Robert Haas
Hi

>  It's mostly working, but there are workloads where it
> fails badly - and competing database products survive a number of
> scenarios where we just fall on our face.

> So, I actually think it would be a good idea to think about this.

Just to think.

http://www.tiobe.com/tiobe_index

The pl/sql has 18-th position.
Where is pgSql.

I've  looked  up the Ide for pgSql. If compare with oracle I can say
there  is  not  tools  which  can  compete  with "PlSql developer" for
example. (functinality / price)
https://www.allroundautomations.com/plsqldev.html?gclid=CjwKEAjw8Jy8BRCE0pOC9qzRhkMSJABC1pvJepfRpWeyMJ7CTZzlQE_PojlBO0vqGIZvVSW4jiQxShoC4PLw_wcB
Why?
May it because choosing another database is more profitable?

I can't say for others, but for us:
Offcourse We can implement some of our task in postgreSql.
But when I think on full migration, it's just not real.
We can contribute something but we can't work against postgreSql architecture.
Our  calculation  shows  that  it is cheaper to implement "Session per
thread"   themselfs for example.   But  it's more cheaper to buy Oracle(Even if we
would write from scratch).
And there is just no customers which want to pay for that.
Note, we don't have enough skill at postgreSql and the think that postgresql core team may do for a month, we can do for years.
So  in  our  layer  we just can't attract resource for that task.

At  other side there is people who have infrastructure, skills and experience but they
fill comfortable as is ""

> there's not that much motivation to do a ton of work inside the database
> to solve it there.
It's clear, they work on there task. We all work on our task.
But it's just a wall.
It's sad.

There is proverbial in russia: "It's shine and poverty of open source"

May be it is this case :)



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: One process per session lack of sharing

Craig Ringer-3
In reply to this post by AMatveev
On 14 July 2016 at 16:41, <[hidden email]> wrote:
Hi

> On Tue, Jul 12, 2016 at 9:18 AM, Tom Lane <[hidden email]> wrote:
>> [hidden email] writes:
>>> Is  there  any  plan  to  implement  "session  per  thread" or "shared
>>> sessions between thread"?
>>...
>> so
>> there's not that much motivation to do a ton of work inside the database
>> to solve it there.

> I agree that there's not really a plan to implement this, but I don't
> ...

> So, I actually think it would be a good idea to think about this.

I just want to note that converting global variables to  thread-specific variables.

I don't think anyone's considering moving from multi-processing to multi-threading in PostgreSQL. I really, really like the protection that the shared-nothing-by-default process model gives us, among other things.

I'm personally not absolutely opposed to threading, but you'll find it hard to convince anyone it's worth the huge work required to ensure that everything in PostgreSQL is done thread-safely, adapt all our logic to handle thread IDs where we use process IDs, etc. It'd be a massive amount of work for no practical gain for most users, and a huge reliability loss in the short to medium term as we ironed out all the bugs.

Where I agreed with you, and where I think Robert sounded like he was agreeing, was that our current design where we have one executor per user sessions and can't suspend/resume sessions is problematic.
 
It's large work offcourse.
But it's not seemed to be a ton of work.

Er.... yeah, it really is. It's not just the mechanical changes. It's verifying that everything's correct on all the supported platforms. Ensuring that all the C library stuff we do is thread-safe, all the SSL stuff, etc. Getting rid of all the function-static variable use. Lots more.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: One process per session lack of sharing

AMatveev
Hi


>>>> Is  there  any  plan  to  implement  "session  per  thread" or "shared
>>>> sessions between thread"?


> I'm personally not absolutely opposed to threading, but you'll find
> it hard to convince anyone it's worth the huge work required to
> ensure that everything in PostgreSQL is done thread-safely
It's  clear  for  me, I understand that organizing that work is really very
hard. It's work for new segment of market in long perspective.
For   most  open  source  project this is very difficult. In some case
it may be not possible at all.

But  in the most cases there is proverb: "We make the road by walking on it"

It's very important just to start.

And may be the right start is to fix the Faq
https://wiki.postgresql.org/wiki/FAQ#Why_does_PostgreSQL_use_so_much_memory.3F
>Why does PostgreSQL use so much memory?
>Despite appearances, this is absolutely normal
It's not normal. It's "as is". You should use pgBouncer. See "Re: [HACKERS] One process per session lack of sharing"
And it is why
>there are workloads where it
>fails badly - and competing database products survive a number of
>scenarios where we just fall on our face


> Er.... yeah, it really is. It's not just the mechanical changes.
> It's verifying that everything's correct on all the supported
> platforms. Ensuring that all the C library stuff we do is
> thread-safe, all the SSL stuff, etc. Getting rid of all the
> function-static variable use. Lots more.
In the most cases the work can be done part by part.
May be there is such parts. It's not necessary to do everything at once.




--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: One process per session lack of sharing

Pavel Stehule


2016-07-15 11:29 GMT+02:00 <[hidden email]>:
Hi


>>>> Is  there  any  plan  to  implement  "session  per  thread" or "shared
>>>> sessions between thread"?


> I'm personally not absolutely opposed to threading, but you'll find
> it hard to convince anyone it's worth the huge work required to
> ensure that everything in PostgreSQL is done thread-safely
It's  clear  for  me, I understand that organizing that work is really very
hard. It's work for new segment of market in long perspective.
For   most  open  source  project this is very difficult. In some case
it may be not possible at all.

But  in the most cases there is proverb: "We make the road by walking on it"

It's very important just to start.

I disagree - there is lot of possible targets with much higher benefits - columns storage, effective execution - compiled execution, implementation of temporal databases, better support for dynamic structures, better support for XML, JSON, integration of connection pooling, ...

There is only few use cases - mostly related to Oracle emulation when multi threading is necessary - and few can be solved better - PLpgSQL to C compilation and similar techniques.

The organization of work is hard, but pretty harder is doing this work - and doing it without impact on current code base, current users. MySQL is thread based database - is better than Postgres, or there is more users migrated from Orace? Not.

Regards

Pavel

 

And may be the right start is to fix the Faq
https://wiki.postgresql.org/wiki/FAQ#Why_does_PostgreSQL_use_so_much_memory.3F
>Why does PostgreSQL use so much memory?
>Despite appearances, this is absolutely normal
It's not normal. It's "as is". You should use pgBouncer. See "Re: [HACKERS] One process per session lack of sharing"
And it is why
>there are workloads where it
>fails badly - and competing database products survive a number of
>scenarios where we just fall on our face


> Er.... yeah, it really is. It's not just the mechanical changes.
> It's verifying that everything's correct on all the supported
> platforms. Ensuring that all the C library stuff we do is
> thread-safe, all the SSL stuff, etc. Getting rid of all the
> function-static variable use. Lots more.
In the most cases the work can be done part by part.
May be there is such parts. It's not necessary to do everything at once.




--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply | Threaded
Open this post in threaded view
|

Re: One process per session lack of sharing

AMatveev
Hi


> I disagree - there is lot of possible targets with much higher
> benefits - columns storage, effective execution - compiled
> execution, implementation of temporal databases, better support for
> dynamic structures, better support for XML, JSON, integration of connection pooling, ...
Off course  the  task is different so optimal configuration is different too.
So the best balance between process per thread can change.
But now he is in one extreme point.


> There is only few use cases - mostly related to Oracle emulation
It's few cases for one and it's most cases for others.
> when multi threading is necessary - and few can be solved better -
> PLpgSQL to C compilation and similar techniques.
It's few cases for one and it's most cases for others.
In our cases we just buy oracle and it's would be cheeper.
Off  course  if  our customers for some reason would agree to pay  for that
technique. We have nothing against.

> The organization of work is hard, but pretty harder is doing this
> work - and doing it without impact on current code base, current
> users. MySQL is thread based database - is better than Postgres, or
> there is more users migrated from Orace? Not.

We want to decide our task by PostgreSql as easy as by Oracle.
So you can say  You should buy oracle and You will be right.

I'm just interested if this is the position of the majority.



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: One process per session lack of sharing

Pavel Stehule


2016-07-15 12:20 GMT+02:00 <[hidden email]>:
Hi


> I disagree - there is lot of possible targets with much higher
> benefits - columns storage, effective execution - compiled
> execution, implementation of temporal databases, better support for
> dynamic structures, better support for XML, JSON, integration of connection pooling, ...
Off course  the  task is different so optimal configuration is different too.
So the best balance between process per thread can change.
But now he is in one extreme point.


> There is only few use cases - mostly related to Oracle emulation
It's few cases for one and it's most cases for others.
> when multi threading is necessary - and few can be solved better -
> PLpgSQL to C compilation and similar techniques.
It's few cases for one and it's most cases for others.
In our cases we just buy oracle and it's would be cheeper.
Off  course  if  our customers for some reason would agree to pay  for that
technique. We have nothing against.

> The organization of work is hard, but pretty harder is doing this
> work - and doing it without impact on current code base, current
> users. MySQL is thread based database - is better than Postgres, or
> there is more users migrated from Orace? Not.

We want to decide our task by PostgreSql as easy as by Oracle.
So you can say  You should buy oracle and You will be right.

Can be nice, if we can help to all Oracle users - but it is not possible in this world :( - there is lot of barriers - threading is only one, second should be different design of PL/SQL - it is based on out processed, next can be libraries, JAVA integration, and lot of others. I believe so lot of users can be simple migrated, NTT has statistics - 60% is migrated just with using Orafce. But still there will be 10% where migration is not possible without significant refactoring. I don't believe so is cheaper to modify Postgres to support threads than modify some Oracle applications.

The threading for Postgres is not small projects - it can require hundreds man days.

 

I'm just interested if this is the position of the majority.


sure - it is my personal opinion.

Regards

Pavel
Reply | Threaded
Open this post in threaded view
|

Re: One process per session lack of sharing

Craig Ringer-3
In reply to this post by Pavel Stehule
On 15 July 2016 at 18:05, Pavel Stehule <[hidden email]> wrote:
 
There is only few use cases - mostly related to Oracle emulation when multi threading is necessary - and few can be solved better - PLpgSQL to C compilation and similar techniques.


Right.

If amatveev (username, unsure of full name) wants to improve PL/PgSQL performance and the ability of a JVM to share resources between backends, then it would be more productive to focus on that than on threading.

As for "fixing" the FAQ... for the great majority of people the FAQ entry on memory use is accurate. Sure, if you load a JVM into each backend and load a bunch of cached data in it, you'll get bad memory use. So don't do that. You're not measuring PostgreSQL, you're measuring PostgreSQL-plus-my-JVM-extension. Why does it use so much memory? 'cos it loads a whole bunch of stuff into each backend.

Now, there are other cases where individual PostgreSQL backends use lots of memory. But that FAQ entry refers to the common misconception that each PostgreSQL process's reported memory use is the actual system memory it uses. That isn't the case because most systems account badly for shared memory, and it confuses a lot of people. The FAQ entry doesn't need fixing.

Maybe the FAQ entry needs rewording to qualify it so it says that "in most cases" it's just shared memory mis-accounting. But that's about it.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: One process per session lack of sharing

AMatveev
In reply to this post by Pavel Stehule
Hi


> Can be nice, if we can help to all Oracle users - but it is not
> possible in this world :( - there is lot of barriers - threading is
> only one, second should be different design of PL/SQL - it is based
> on out processed, next can be libraries, JAVA integration, and lot
> of others. I believe so lot of users can be simple migrated, NTT has
> statistics - 60% is migrated just with using Orafce. But still there
> will be 10% where migration is not possible without significant
> refactoring.

The most of our customers now use oracle enterprise edition.
You can know better how important this is.

But I agree with you that in other cases we can use PostgreSql.
We  can  use  postgreSql  with some disadvantages of pgBouncer anywhare
where  the  scalability  is not main risk.(Such customers usually don't
buy Enterprise)

>I don't believe so is cheaper to modify Postgres to
> support threads than modify some Oracle applications.

The key is Scaling.
Some parallels processing just can not be divorced from data without reducing performance.
It  very  difficult  question  would  be  it  possible  at  all to get
comparable performance at application server for such cases.
If we "inject" applications server to postgreSql for that scalability and functionality we need multithreading.

If customization for every project is not big.
It's may be tuned. But from some point the tuning is not profitable.
(The database works in 24x7 and we need the ability to fix bugs on the fly)
So If for some reason we would start to use postgresql.
There is always a question what to choose funcionality or scalability.
And usually our customers need both.

>I don't believe so is cheaper
For us it's may be not cheaper. It's just imposible.



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: One process per session lack of sharing

AMatveev
In reply to this post by Craig Ringer-3
Hi


> If amatveev (username, unsure of full name) wants to improve
> PL/PgSQL performance and the ability of a JVM to share resources
> between backends, then it would be more productive to focus on that than on threading.

Note, I've statred this post with
https://www.postgresql.org/message-id/flat/409604420.20160711111532%40bitec.ru#409604420.20160711111532@...

Oracle: about 5M
MSSqlServer: about 4M
postgreSql: about 160М


It's 11K loc of pgSql.

And our code base is more than 4000k(for pgSql) lines of code.




--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: One process per session lack of sharing

Pavel Stehule
In reply to this post by AMatveev


2016-07-15 13:25 GMT+02:00 <[hidden email]>:
Hi


> Can be nice, if we can help to all Oracle users - but it is not
> possible in this world :( - there is lot of barriers - threading is
> only one, second should be different design of PL/SQL - it is based
> on out processed, next can be libraries, JAVA integration, and lot
> of others. I believe so lot of users can be simple migrated, NTT has
> statistics - 60% is migrated just with using Orafce. But still there
> will be 10% where migration is not possible without significant
> refactoring.

The most of our customers now use oracle enterprise edition.
You can know better how important this is.

But I agree with you that in other cases we can use PostgreSql.
We  can  use  postgreSql  with some disadvantages of pgBouncer anywhare
where  the  scalability  is not main risk.(Such customers usually don't
buy Enterprise)

>I don't believe so is cheaper to modify Postgres to
> support threads than modify some Oracle applications.

The key is Scaling.
Some parallels processing just can not be divorced from data without reducing performance.
It  very  difficult  question  would  be  it  possible  at  all to get
comparable performance at application server for such cases.
If we "inject" applications server to postgreSql for that scalability and functionality we need multithreading.

but parallel processing doesn't requires threading support - see PostgreSQL 9.6 features.

I am not sure, but I am thinking so PL/SQL is based on processed and not on threads too. So maybe this discussion is little bit out, because we use different terms.

Regards

Pavel

 

If customization for every project is not big.
It's may be tuned. But from some point the tuning is not profitable.
(The database works in 24x7 and we need the ability to fix bugs on the fly)
So If for some reason we would start to use postgresql.
There is always a question what to choose funcionality or scalability.
And usually our customers need both.

>I don't believe so is cheaper
For us it's may be not cheaper. It's just imposible.


Reply | Threaded
Open this post in threaded view
|

Re: One process per session lack of sharing

AMatveev
Hi


> but parallel processing doesn't requires threading support - see PostgreSQL 9.6 features.

To   share  dynamic  execution  code between threads much more easy(If sharing this code between process is possible).
There  is  many  other  interaction techniques  between threads which is
absence between process.



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
123