JIT compiling expressions/deform + inlining prototype v2.0

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
256 messages Options
1234 ... 13
Reply | Threaded
Open this post in threaded view
|

JIT compiling expressions/deform + inlining prototype v2.0

Andres Freund
Hi,

I previously had an early prototype of JITing [1] expression evaluation
and tuple deforming.  I've since then worked a lot on this.

Here's an initial, not really pretty but functional, submission. This
supports all types of expressions, and tuples, and allows, albeit with
some drawbacks, inlining of builtin functions.  Between the version at
[1] and this I'd done some work in c++, because that allowed to
experiment more with llvm, but I've now translated everything back.
Some features I'd to re-implement due to limitations of C API.

As a teaser:
tpch_5[9586][1]=# set jit_expressions=0;set jit_tuple_deforming=0;
tpch_5[9586][1]=# \i ~/tmp/tpch/pg-tpch/queries/q01.sql
┌──────────────┬──────────────┬───────────┬──────────────────┬──────────────────┬──────────────────┬──────────────────┬──────────────────┬────────────────────┬─────────────┐
│ l_returnflag │ l_linestatus │  sum_qty  │  sum_base_price  │  sum_disc_price  │    sum_charge    │     avg_qty      │    avg_price     │      avg_disc      │ count_order │
├──────────────┼──────────────┼───────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼────────────────────┼─────────────┤
│ A            │ F            │ 188818373 │ 283107483036.109 │ 268952035589.054 │  279714361804.23 │ 25.5025937044707 │ 38237.6725307617 │ 0.0499976863510723 │     7403889 │
│ N            │ F            │   4913382 │ 7364213967.94998 │  6995782725.6633 │ 7275821143.98952 │ 25.5321530459003 │ 38267.7833908406 │ 0.0500308669240696 │      192439 │
│ N            │ O            │ 375088356 │ 562442339707.852 │ 534321895537.884 │ 555701690243.972 │ 25.4978961033505 │ 38233.9150565265 │ 0.0499956453049625 │    14710561 │
│ R            │ F            │ 188960009 │ 283310887148.206 │ 269147687267.211 │ 279912972474.866 │ 25.5132328961366 │ 38252.4148049933 │ 0.0499958481590264 │     7406353 │
└──────────────┴──────────────┴───────────┴──────────────────┴──────────────────┴──────────────────┴──────────────────┴──────────────────┴────────────────────┴─────────────┘
(4 rows)

Time: 4367.486 ms (00:04.367)
tpch_5[9586][1]=# set jit_expressions=1;set jit_tuple_deforming=1;
tpch_5[9586][1]=# \i ~/tmp/tpch/pg-tpch/queries/q01.sql
<repeat>
(4 rows)

Time: 3158.575 ms (00:03.159)

tpch_5[9586][1]=# set jit_expressions=0;set jit_tuple_deforming=0;
tpch_5[9586][1]=# \i ~/tmp/tpch/pg-tpch/queries/q01.sql
<repeat>
(4 rows)
Time: 4383.562 ms (00:04.384)

The potential wins of the JITing itself are considerably larger than the
already significant gains demonstrated above - this version here doesn't
exactly generate the nicest native code around.  After these patches the
bottlencks for TCP-H's Q01 are largely inside the float* functions and
the non-expressionified execGrouping.c code.  The latter needs to be
expressified to gain benefits due to JIT - that shouldn't be very hard.

The code generation can be improved by moving more of the variable data
into llvm allocated stack data, that also has other benefits.

The patch series currently consists out of the following:

0001-Rely-on-executor-utils-to-build-targetlist-for-DML-R.patch
- boring prep work

0002-WIP-Allow-tupleslots-to-have-a-fixed-tupledesc-use-i.patch
- for JITed deforming we need to know whether a slot's tupledesc will
  change

0003-WIP-Add-configure-infrastructure-to-enable-LLVM.patch
- boring

0004-WIP-Beginning-of-a-LLVM-JIT-infrastructure.patch
- infrastructure for llvm, including memory lifetime management, and
  bulk emission of functions.

0005-Perform-slot-validity-checks-in-a-separate-pass-over.patch
- boring, prep work for expression jiting

0006-WIP-deduplicate-int-float-overflow-handling-code.patch
- boring

0007-Pass-through-PlanState-parent-to-expression-instanti.patch
- boring

0008-WIP-JIT-compile-expression.patch
- that's the biggest patch, actually adding JITing
- code needs to be better documented, tested, and deduplicated

0009-Simplify-aggregate-code-a-bit.patch
0010-More-efficient-AggState-pertrans-iteration.patch
0011-Avoid-dereferencing-tts_values-nulls-repeatedly.patch
0012-Centralize-slot-deforming-logic-a-bit.patch
- boring, mostly to make comparison between JITed and non-jitted a bit
  fairer and to remove unnecessary other bottlenecks.

0013-WIP-Make-scan-desc-available-for-all-PlanStates.patch
- this isn't clean enough.

0014-WIP-JITed-tuple-deforming.patch

- do JITing of deforming, but only when called from within expression,
  there we know which columns we want to be deformed etc.

- Not clear what'd be a good way to also JIT other deforming without
  additional infrastructure - doing a separate function emission for
  every slot_deform_tuple() is unattractive performancewise and
  memory-lifetime wise, I did have that at first.

0015-WIP-Expression-based-agg-transition.patch
- allows to JIT aggregate transition invocation, but also speeds up
  aggregates without JIT.

0016-Hacky-Preliminary-inlining-implementation.patch
- allows to inline functions, by using bitcode. That bitcode can be
  loaded from a list of directories - as long as compatibly configured
  the bitcode doesn't have to be generated by the same compiler as the
  postgres binary. i.e. gcc postgres + clang bitcode works.

I've whacked this around quite heavily today, this likely has some new
bugs, sorry for that :(


I plan to spend some considerable time over the next weeks to clean this
up and address some of the areas where the performance isn't yet as good
as desirable.


Greetings,

Andres Freund

[1] http://archives.postgresql.org/message-id/20161206034955.bh33paeralxbtluv%40alap3.anarazel.de


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

0001-Rely-on-executor-utils-to-build-targetlist-for-DML-R.patch (2K) Download Attachment
0002-WIP-Allow-tupleslots-to-have-a-fixed-tupledesc-use-i.patch (70K) Download Attachment
0003-WIP-Add-configure-infrastructure-to-enable-LLVM.patch (6K) Download Attachment
0004-WIP-Beginning-of-a-LLVM-JIT-infrastructure.patch (26K) Download Attachment
0005-Perform-slot-validity-checks-in-a-separate-pass-over.patch (13K) Download Attachment
0006-WIP-deduplicate-int-float-overflow-handling-code.patch (10K) Download Attachment
0007-Pass-through-PlanState-parent-to-expression-instanti.patch (3K) Download Attachment
0008-WIP-JIT-compile-expression.patch (79K) Download Attachment
0009-Simplify-aggregate-code-a-bit.patch (9K) Download Attachment
0010-More-efficient-AggState-pertrans-iteration.patch (4K) Download Attachment
0011-Avoid-dereferencing-tts_values-nulls-repeatedly.patch (2K) Download Attachment
0012-Centralize-slot-deforming-logic-a-bit.patch (8K) Download Attachment
0013-WIP-Make-scan-desc-available-for-all-PlanStates.patch (1K) Download Attachment
0014-WIP-JITed-tuple-deforming.patch (26K) Download Attachment
0015-WIP-Expression-based-agg-transition.patch (64K) Download Attachment
0016-Hacky-Preliminary-inlining-implementation.patch (16K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: JIT & function naming

Andres Freund
Hi,

On 2017-08-31 23:41:31 -0700, Andres Freund wrote:
> I previously had an early prototype of JITing [1] expression evaluation
> and tuple deforming.  I've since then worked a lot on this.
>
> Here's an initial, not really pretty but functional, submission.

One of the things I'm not really happy about yet is the naming of the
generated functions. Those primarily matter when doing profiling, where
the function name will show up when the profiler supports JIT stuff
(e.g. with a patch I proposed to LLVM that emits perf compatible output,
there's also existing LLVM support for a profiler by intel and
oprofile).

Currently there's essentially a per EState counter and the generated
functions get named deform$n and evalexpr$n. That allows for profiling
of a single query, because different compiled expressions are
disambiguated. It even allows to run the same query over and over, still
giving meaningful results.  But it breaks down when running multiple
queries while profiling - evalexpr0 can mean something entirely
different for different queries.

The best idea I have so far would be to name queries like
evalexpr_$fingerprint_$n, but for that we'd need fingerprinting support
outside of pg_stat_statement, which seems painful-ish.

Perhaps somebody has a better idea?

Regards,

Andres


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: JIT & function naming

konstantin knizhnik
On 09/03/2017 02:59 AM, Andres Freund wrote:

> Hi,
>
> On 2017-08-31 23:41:31 -0700, Andres Freund wrote:
>> I previously had an early prototype of JITing [1] expression evaluation
>> and tuple deforming.  I've since then worked a lot on this.
>>
>> Here's an initial, not really pretty but functional, submission.
> One of the things I'm not really happy about yet is the naming of the
> generated functions. Those primarily matter when doing profiling, where
> the function name will show up when the profiler supports JIT stuff
> (e.g. with a patch I proposed to LLVM that emits perf compatible output,
> there's also existing LLVM support for a profiler by intel and
> oprofile).
>
> Currently there's essentially a per EState counter and the generated
> functions get named deform$n and evalexpr$n. That allows for profiling
> of a single query, because different compiled expressions are
> disambiguated. It even allows to run the same query over and over, still
> giving meaningful results.  But it breaks down when running multiple
> queries while profiling - evalexpr0 can mean something entirely
> different for different queries.
>
> The best idea I have so far would be to name queries like
> evalexpr_$fingerprint_$n, but for that we'd need fingerprinting support
> outside of pg_stat_statement, which seems painful-ish.
>
> Perhaps somebody has a better idea?

As far as I understand we do not need precise fingerprint.
So may be just calculate some lightweight fingerprint?
For example take query text (es_sourceText from EText), replace all non-alphanumeric characters spaces with '_' and take first N (16?) characters of the result?
It seems to me that in most cases it will help to identify the query...


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: JIT & function naming

Tom Lane-2
In reply to this post by Andres Freund
Andres Freund <[hidden email]> writes:
> Currently there's essentially a per EState counter and the generated
> functions get named deform$n and evalexpr$n. That allows for profiling
> of a single query, because different compiled expressions are
> disambiguated. It even allows to run the same query over and over, still
> giving meaningful results.  But it breaks down when running multiple
> queries while profiling - evalexpr0 can mean something entirely
> different for different queries.

> The best idea I have so far would be to name queries like
> evalexpr_$fingerprint_$n, but for that we'd need fingerprinting support
> outside of pg_stat_statement, which seems painful-ish.

Yeah.  Why not just use a static counter to give successive unique IDs
to each query that gets JIT-compiled?  Then the function names would
be like deform_$querynumber_$subexprnumber.

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: JIT & function naming

Andres Freund
On 2017-09-03 10:11:37 -0400, Tom Lane wrote:

> Andres Freund <[hidden email]> writes:
> > Currently there's essentially a per EState counter and the generated
> > functions get named deform$n and evalexpr$n. That allows for profiling
> > of a single query, because different compiled expressions are
> > disambiguated. It even allows to run the same query over and over, still
> > giving meaningful results.  But it breaks down when running multiple
> > queries while profiling - evalexpr0 can mean something entirely
> > different for different queries.
>
> > The best idea I have so far would be to name queries like
> > evalexpr_$fingerprint_$n, but for that we'd need fingerprinting support
> > outside of pg_stat_statement, which seems painful-ish.
>
> Yeah.  Why not just use a static counter to give successive unique IDs
> to each query that gets JIT-compiled?  Then the function names would
> be like deform_$querynumber_$subexprnumber.

That works, but unfortunately it doesn't keep the names the same over
reruns. So if you rerun the query inside the same session - a quite
reasonable thing to get more accurate profiles - the names in the
profile will change. That makes it quite hard to compare profiles,
especially when a single execution of the query is too quick to see
something meaningful.

Greetings,

Andres Freund


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: JIT compiling expressions/deform + inlining prototype v2.0

konstantin knizhnik
In reply to this post by Andres Freund


On 01.09.2017 09:41, Andres Freund wrote:

> Hi,
>
> I previously had an early prototype of JITing [1] expression evaluation
> and tuple deforming.  I've since then worked a lot on this.
>
> Here's an initial, not really pretty but functional, submission. This
> supports all types of expressions, and tuples, and allows, albeit with
> some drawbacks, inlining of builtin functions.  Between the version at
> [1] and this I'd done some work in c++, because that allowed to
> experiment more with llvm, but I've now translated everything back.
> Some features I'd to re-implement due to limitations of C API.
>
>
> I've whacked this around quite heavily today, this likely has some new
> bugs, sorry for that :(

Can you please clarify the following fragment calculating attributes
alignment:


         /* compute what following columns are aligned to */
+        if (att->attlen < 0)
+        {
+            /* can't guarantee any alignment after varlen field */
+            attcuralign = -1;
+        }
+        else if (att->attnotnull && attcuralign >= 0)
+        {
+            Assert(att->attlen > 0);
+            attcuralign += att->attlen;
+        }
+        else if (att->attnotnull)
+        {
+            /*
+             * After a NOT NULL fixed-width column, alignment is
+             * guaranteed to be the minimum of the forced alignment and
+             * length.  XXX
+             */
+            attcuralign = alignto + att->attlen;
+            Assert(attcuralign > 0);
+        }
+        else
+        {
+            //elog(LOG, "attnotnullreset: %d", attnum);
+            attcuralign = -1;
+        }


I wonder why in this branch (att->attnotnull && attcuralign >= 0)
we are not adding "alignto" and comment in the following branch else if
(att->attnotnull)
seems to be not related to this branch, because in this case attcuralign
is expected to be less then zero wjhich means that previous attribute is
varlen field.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: JIT compiling expressions/deform + inlining prototype v2.0

Andres Freund
Hi,

On 2017-09-04 20:01:03 +0300, Konstantin Knizhnik wrote:

> > I previously had an early prototype of JITing [1] expression evaluation
> > and tuple deforming.  I've since then worked a lot on this.
> >
> > Here's an initial, not really pretty but functional, submission. This
> > supports all types of expressions, and tuples, and allows, albeit with
> > some drawbacks, inlining of builtin functions.  Between the version at
> > [1] and this I'd done some work in c++, because that allowed to
> > experiment more with llvm, but I've now translated everything back.
> > Some features I'd to re-implement due to limitations of C API.
> >
> >
> > I've whacked this around quite heavily today, this likely has some new
> > bugs, sorry for that :(
>
> Can you please clarify the following fragment calculating attributes
> alignment:

Hi. That piece of code isn't particularly clear (and has a bug in the
submitted version), I'm revising it.

>
>         /* compute what following columns are aligned to */
> +        if (att->attlen < 0)
> +        {
> +            /* can't guarantee any alignment after varlen field */
> +            attcuralign = -1;
> +        }
> +        else if (att->attnotnull && attcuralign >= 0)
> +        {
> +            Assert(att->attlen > 0);
> +            attcuralign += att->attlen;
> +        }
> +        else if (att->attnotnull)
> +        {
> +            /*
> +             * After a NOT NULL fixed-width column, alignment is
> +             * guaranteed to be the minimum of the forced alignment and
> +             * length.  XXX
> +             */
> +            attcuralign = alignto + att->attlen;
> +            Assert(attcuralign > 0);
> +        }
> +        else
> +        {
> +            //elog(LOG, "attnotnullreset: %d", attnum);
> +            attcuralign = -1;
> +        }
>
>
> I wonder why in this branch (att->attnotnull && attcuralign >= 0)
> we are not adding "alignto" and comment in the following branch else if
> (att->attnotnull)
> seems to be not related to this branch, because in this case attcuralign is
> expected to be less then zero wjhich means that previous attribute is varlen
> field.

Yea, I've changed that already, although it's currently added earlier,
because the alignment is needed before, to access the column correctly.
I've also made number of efficiency improvements, primarily to access
columns with an absolute offset if all preceding ones are fixed width
not null columns - that is quite noticeable performancewise.


Greetings,

Andres Freund


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: JIT compiling expressions/deform + inlining prototype v2.0

konstantin knizhnik


On 04.09.2017 23:52, Andres Freund wrote:
>
> Yea, I've changed that already, although it's currently added earlier,
> because the alignment is needed before, to access the column correctly.
> I've also made number of efficiency improvements, primarily to access
> columns with an absolute offset if all preceding ones are fixed width
> not null columns - that is quite noticeable performancewise.

Unfortunately, in most of real table columns are nullable.
I wonder if we can perform some optimization in this case (assuming that
in typical cases column either contains mostly non-null values, either
mostly null values).

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: JIT compiling expressions/deform + inlining prototype v2.0

Andres Freund
On 2017-09-05 13:58:56 +0300, Konstantin Knizhnik wrote:

>
>
> On 04.09.2017 23:52, Andres Freund wrote:
> >
> > Yea, I've changed that already, although it's currently added earlier,
> > because the alignment is needed before, to access the column correctly.
> > I've also made number of efficiency improvements, primarily to access
> > columns with an absolute offset if all preceding ones are fixed width
> > not null columns - that is quite noticeable performancewise.
>
> Unfortunately, in most of real table columns are nullable.

I'm not sure I agree with that assertion, but:


> I wonder if we can perform some optimization in this case (assuming that in
> typical cases column either contains mostly non-null values, either mostly
> null values).

Even if all columns are NULLABLE, the JITed code is still a good chunk
faster (a significant part of that is the slot->tts_{nulls,values}
accesses). Alignment is still cheaper with constants, and often enough
the alignment can be avoided (consider e.g. a table full of nullable
ints - everything is guaranteed to be aligned, or columns after an
individual NOT NULL column is also guaranteed to be aligned). What
largely changes is that the 'offset' from the start of the tuple has to
be tracked.

Greetings,

Andres Freund


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: JIT compiling expressions/deform + inlining prototype v2.0

Greg Stark
In reply to this post by konstantin knizhnik
On 5 September 2017 at 11:58, Konstantin Knizhnik
<[hidden email]> wrote:
>
> I wonder if we can perform some optimization in this case (assuming that in
> typical cases column either contains mostly non-null values, either mostly
> null values).

If you really wanted to go crazy here you could do lookup tables of
bits of null bitmaps. Ie, you look at the first byte of the null
bitmap, index into an array and it points to 8 offsets for the 8
fields covered by that much of the bitmap. The lookup table might be
kind of large since offsets are 16-bits so you're talking 256 * 16
bytes or 2kB for every 8 columns up until the first variable size
column (or I suppose you could even continue in the case where the
variable size column is null).

--
greg


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: JIT compiling expressions/deform + inlining prototype v2.0

Andres Freund
On 2017-09-05 19:43:33 +0100, Greg Stark wrote:

> On 5 September 2017 at 11:58, Konstantin Knizhnik
> <[hidden email]> wrote:
> >
> > I wonder if we can perform some optimization in this case (assuming that in
> > typical cases column either contains mostly non-null values, either mostly
> > null values).
>
> If you really wanted to go crazy here you could do lookup tables of
> bits of null bitmaps. Ie, you look at the first byte of the null
> bitmap, index into an array and it points to 8 offsets for the 8
> fields covered by that much of the bitmap. The lookup table might be
> kind of large since offsets are 16-bits so you're talking 256 * 16
> bytes or 2kB for every 8 columns up until the first variable size
> column (or I suppose you could even continue in the case where the
> variable size column is null).

I'm missing something here. What's this saving?  The code for lookups
with NULLs after jitting effectively is
a) one load for every 8 columns (could be optimized to one load every
   sizeof(void*) cols)
b) one bitmask for every column + one branch for null
c) load for the datum, indexed by register
d) saving the column value, that's independent of NULLness
e) one addi adding the length to the offset

Greetings,

Andres Freund


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: JIT compiling expressions/deform + inlining prototype v2.0

konstantin knizhnik
In reply to this post by Andres Freund


On 04.09.2017 23:52, Andres Freund wrote:
>
> Hi. That piece of code isn't particularly clear (and has a bug in the
> submitted version), I'm revising it.

...
> Yea, I've changed that already, although it's currently added earlier,
> because the alignment is needed before, to access the column correctly.
> I've also made number of efficiency improvements, primarily to access
> columns with an absolute offset if all preceding ones are fixed width
> not null columns - that is quite noticeable performancewise.
>
>
Should I wait for new version of your patch or continue review of this code?


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: JIT compiling expressions/deform + inlining prototype v2.0

Andres Freund
On 2017-09-19 12:57:33 +0300, Konstantin Knizhnik wrote:

>
>
> On 04.09.2017 23:52, Andres Freund wrote:
> >
> > Hi. That piece of code isn't particularly clear (and has a bug in the
> > submitted version), I'm revising it.
>
> ...
> > Yea, I've changed that already, although it's currently added earlier,
> > because the alignment is needed before, to access the column correctly.
> > I've also made number of efficiency improvements, primarily to access
> > columns with an absolute offset if all preceding ones are fixed width
> > not null columns - that is quite noticeable performancewise.
> >
> >
> Should I wait for new version of your patch or continue review of this code?

I'll update the posted version later this week, sorry for the delay.

Regards,

Andres


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: JIT compiling - v4.0

Andres Freund
In reply to this post by Andres Freund
Hi,

Here's an updated version of the patchset.  There's some substantial
changes here, but it's still very obviously very far from committable as
a whole. There's some helper commmits that are simple and independent
enough to be committable earlier on.

The git tree of this work, which is *frequently* rebased, is at:
https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/jit

The biggest changes are:

- The JIT "infrastructure" is less bad than before, and starting to
  shape up.
- The tuple deforming logic is considerably faster than before due to
  various optimizations. The optimizations are:
  - build deforming exactly to the required natts for the specific caller
  - avoid checking the tuple's natts for attributes that have
    "following" NOT NULL columns.
  - a bunch of minor codegen improvements.
- The tuple deforming codegen also got simpler by relying on LLVM to
  promote a stack variable to a register, instead of working with a
  register manually - the need to keep IR in SSA form makes doing so
  manually rather painful.
- WIP patch to do execGrouping.c TupleHashTableMatch() via JIT. That
  makes the column comparison faster, but more importantly it JITs the
  deforming (one side at least always is a MinimalTuple).
- All tests pass with JITed expression, tuple deforming, agg transition
  value computation and execGrouping logic. There were a number of bugs,
  who would have imagined that.
- some more experimental changes later in the series to address some
  bottlenecks.

Functionally this covers all of what I think a sensible goal for v11
is. There's a lot of details to figure out, and the inlining
*implementation* isn't what I think we should do.  I'll follow up, not
tonight though, with an email outlining the first few design decisions
we're going to have to finalize, which'll be around the memory/lifetime
management of functions, and other infrastructure pieces (currently
patch 0006).

As the patchset is pretty large already, and not going to get any
smaller, I'll make smaller adjustments solely via the git tree, rather
than full reposts.

Greetings,

Andres Freund


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

0001-Rely-on-executor-utils-to-build-targetlist-for-DM.v4.patch.gz (1K) Download Attachment
0002-WIP-Allow-tupleslots-to-have-a-fixed-tupledesc-us.v4.patch.gz (16K) Download Attachment
0003-Perform-slot-validity-checks-in-a-separate-pass-o.v4.patch.gz (4K) Download Attachment
0004-Pass-through-PlanState-parent-to-expression-insta.v4.patch.gz (1K) Download Attachment
0005-Add-configure-infrastructure-to-enable-LLVM.v4.patch.gz (3K) Download Attachment
0006-Beginning-of-a-LLVM-JIT-infrastructure.v4.patch.gz (9K) Download Attachment
0007-JIT-compile-expressions.v4.patch.gz (17K) Download Attachment
0008-Centralize-slot-deforming-logic-a-bit.v4.patch.gz (3K) Download Attachment
0009-WIP-Make-scan-desc-available-for-all-PlanStates.v4.patch.gz (1K) Download Attachment
0010-JITed-tuple-deforming.v4.patch.gz (9K) Download Attachment
0011-Simplify-aggregate-code-a-bit.v4.patch.gz (3K) Download Attachment
0012-More-efficient-AggState-pertrans-iteration.v4.patch.gz (1K) Download Attachment
0013-Avoid-dereferencing-tts_values-nulls-repeatedly-i.v4.patch.gz (1K) Download Attachment
0014-WIP-Expression-based-agg-transition.v4.patch.gz (20K) Download Attachment
0015-Hacky-Preliminary-inlining-implementation.v4.patch.gz (6K) Download Attachment
0016-WIP-Inline-ExecScan-mostly-to-make-profiles-easie.v4.patch.gz (6K) Download Attachment
0017-WIP-Do-execGrouping.c-via-expression-eval-machine.v4.patch.gz (6K) Download Attachment
0018-WIP-deduplicate-int-float-overflow-handling-code.v4.patch.gz (2K) Download Attachment
0019-Make-timestamp_cmp_internal-an-inline-function.v4.patch.gz (1K) Download Attachment
0020-Make-hot-path-of-pg_detoast_datum-an-inline-funct.v4.patch.gz (1K) Download Attachment
0021-WIP-Inline-additional-function.v4.patch.gz (972 bytes) Download Attachment
0022-WIP-Faster-order.v4.patch.gz (886 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: JIT compiling - v4.0

Ants Aasma
On Wed, Oct 4, 2017 at 9:48 AM, Andres Freund <[hidden email]> wrote:
> Here's an updated version of the patchset.  There's some substantial
> changes here, but it's still very obviously very far from committable as
> a whole. There's some helper commmits that are simple and independent
> enough to be committable earlier on.

Looks pretty impressive already.

I wanted to take it for a spin, but got errors about the following
symbols being missing:

LLVMOrcUnregisterPerf
LLVMOrcRegisterGDB
LLVMOrcRegisterPerf
LLVMOrcGetSymbolAddressIn
LLVMLinkModules2Needed

As far as I can tell these are not in mainline LLVM. Is there a branch
or patchset of LLVM available somewhere that I need to use this?

Regards,
Ants Aasma


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: JIT compiling - v4.0

Andres Freund
On 2017-10-04 11:56:47 +0300, Ants Aasma wrote:
> On Wed, Oct 4, 2017 at 9:48 AM, Andres Freund <[hidden email]> wrote:
> > Here's an updated version of the patchset.  There's some substantial
> > changes here, but it's still very obviously very far from committable as
> > a whole. There's some helper commmits that are simple and independent
> > enough to be committable earlier on.
>
> Looks pretty impressive already.

Thanks!


> I wanted to take it for a spin, but got errors about the following
> symbols being missing:
>
> LLVMOrcUnregisterPerf
> LLVMOrcRegisterGDB
> LLVMOrcRegisterPerf
> LLVMOrcGetSymbolAddressIn
> LLVMLinkModules2Needed
>
> As far as I can tell these are not in mainline LLVM. Is there a branch
> or patchset of LLVM available somewhere that I need to use this?
Oops, I'd forgotten about the modifications. Sorry. I've attached them
here.  The GDB and Perf stuff should now be an optional dependency,
too.  The required changes are fairly small, so they hopefully shouldn't
be too hard to upstream.

Please check the git tree for a rebased version of the pg patches, with
a bunch bugfixes (oops, some last minute "cleanups") and performance
fixes.

Here's some numbers for a a TPC-H scale 5 run. Obviously the Q01 numbers
are pretty nice in partcular. But it's also visible that the shorter
query can loose, which is largely due to the JIT overhead - that can be
ameliorated to some degree, but JITing obviously isn't always going to
be a win.

It's pretty impressive that in q01, even after all of this, expression
evaluation *still* is 35% of the total time (25% in the aggregate
transition function). That's partially just because the query does
primarily aggregation, but also because the generated code can stand a
good chunk of improvements.

master q01 min: 14146.498     dev min: 11479.05 [diff -23.24]     dev-jit min: 8659.961 [diff -63.36]     dev-jit-deform min: 7279.395 [diff -94.34]     dev-jit-deform-inline min: 6997.956 [diff -102.15]
master q02 min: 1234.229     dev min: 1208.102 [diff -2.16]     dev-jit min: 1292.983 [diff +4.54]     dev-jit-deform min: 1580.505 [diff +21.91]     dev-jit-deform-inline min: 1809.046 [diff +31.77]
master q03 min: 6220.814     dev min: 5424.107 [diff -14.69]     dev-jit min: 5175.125 [diff -20.21]     dev-jit-deform min: 4257.368 [diff -46.12]     dev-jit-deform-inline min: 4218.115 [diff -47.48]
master q04 min: 947.476     dev min: 970.608 [diff +2.38]     dev-jit min: 969.944 [diff +2.32]     dev-jit-deform min: 999.006 [diff +5.16]     dev-jit-deform-inline min: 1033.78 [diff +8.35]
master q05 min: 4729.9     dev min: 4059.665 [diff -16.51]     dev-jit min: 4182.941 [diff -13.08]     dev-jit-deform min: 4147.493 [diff -14.04]     dev-jit-deform-inline min: 4284.473 [diff -10.40]
master q06 min: 1603.708     dev min: 1592.107 [diff -0.73]     dev-jit min: 1556.216 [diff -3.05]     dev-jit-deform min: 1516.078 [diff -5.78]     dev-jit-deform-inline min: 1579.839 [diff -1.51]
master q07 min: 4549.738     dev min: 4331.565 [diff -5.04]     dev-jit min: 4475.654 [diff -1.66]     dev-jit-deform min: 4645.773 [diff +2.07]     dev-jit-deform-inline min: 4885.781 [diff +6.88]
master q08 min: 1394.428     dev min: 1350.363 [diff -3.26]     dev-jit min: 1434.366 [diff +2.78]     dev-jit-deform min: 1716.65 [diff +18.77]     dev-jit-deform-inline min: 1938.152 [diff +28.05]
master q09 min: 5958.198     dev min: 5700.329 [diff -4.52]     dev-jit min: 5491.683 [diff -8.49]     dev-jit-deform min: 5582.431 [diff -6.73]     dev-jit-deform-inline min: 5797.475 [diff -2.77]
master q10 min: 5228.69     dev min: 4475.154 [diff -16.84]     dev-jit min: 4269.365 [diff -22.47]     dev-jit-deform min: 3767.888 [diff -38.77]     dev-jit-deform-inline min: 3962.084 [diff -31.97]
master q11 min: 281.201     dev min: 280.132 [diff -0.38]     dev-jit min: 351.85 [diff +20.08]     dev-jit-deform min: 455.885 [diff +38.32]     dev-jit-deform-inline min: 532.093 [diff +47.15]
master q12 min: 4289.268     dev min: 4082.359 [diff -5.07]     dev-jit min: 4007.199 [diff -7.04]     dev-jit-deform min: 3752.396 [diff -14.31]     dev-jit-deform-inline min: 3916.653 [diff -9.51]
master q13 min: 7110.545     dev min: 6898.576 [diff -3.07]     dev-jit min: 6579.554 [diff -8.07]     dev-jit-deform min: 6304.15 [diff -12.79]     dev-jit-deform-inline min: 6135.952 [diff -15.88]
master q14 min: 678.024     dev min: 650.943 [diff -4.16]     dev-jit min: 682.387 [diff +0.64]     dev-jit-deform min: 746.354 [diff +9.16]     dev-jit-deform-inline min: 878.437 [diff +22.81]
master q15 min: 1641.897     dev min: 1650.57 [diff +0.53]     dev-jit min: 1661.591 [diff +1.19]     dev-jit-deform min: 1821.02 [diff +9.84]     dev-jit-deform-inline min: 1863.304 [diff +11.88]
master q16 min: 1890.246     dev min: 1819.423 [diff -3.89]     dev-jit min: 1838.079 [diff -2.84]     dev-jit-deform min: 1962.274 [diff +3.67]     dev-jit-deform-inline min: 2096.154 [diff +9.82]
master q17 min: 502.605     dev min: 462.881 [diff -8.58]     dev-jit min: 495.648 [diff -1.40]     dev-jit-deform min: 537.666 [diff +6.52]     dev-jit-deform-inline min: 613.144 [diff +18.03]
master q18 min: 12863.972     dev min: 11257.57 [diff -14.27]     dev-jit min: 10847.61 [diff -18.59]     dev-jit-deform min: 10119.769 [diff -27.12]     dev-jit-deform-inline min: 10103.051 [diff -27.33]
master q19 min: 281.991     dev min: 264.191 [diff -6.74]     dev-jit min: 331.102 [diff +14.83]     dev-jit-deform min: 373.759 [diff +24.55]     dev-jit-deform-inline min: 531.07 [diff +46.90]
master q20 min: 541.154     dev min: 511.372 [diff -5.82]     dev-jit min: 565.378 [diff +4.28]     dev-jit-deform min: 662.926 [diff +18.37]     dev-jit-deform-inline min: 805.835 [diff +32.85]
master q22 min: 678.266     dev min: 656.643 [diff -3.29]     dev-jit min: 676.886 [diff -0.20]     dev-jit-deform min: 735.058 [diff +7.73]     dev-jit-deform-inline min: 943.013 [diff +28.07]

master total min: 76772.848     dev min: 69125.71 [diff -11.06]     dev-jit min: 65545.522 [diff -17.13]     dev-jit-deform min: 62963.844 [diff -21.93]     dev-jit-deform-inline min: 64925.407 [diff -18.25]


Greetings,

Andres Freund


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

0001-ORC-Add-findSymbolIn-wrapper-to-C-bindings.patch (3K) Download Attachment
0002-C-API-WIP-Add-LLVMGetHostCPUName.patch (1K) Download Attachment
0003-C-API-Add-LLVMLinkModules2Needed.patch (1K) Download Attachment
0004-MCJIT-Call-JIT-notifiers-only-after-code-sections-ar.patch (2K) Download Attachment
0005-Add-PerfJITEventListener-for-perf-profiling-support.patch (23K) Download Attachment
0006-ORC-JIT-event-listener-support.patch (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: JIT compiling - v4.0

David Rowley-3
On 5 October 2017 at 19:57, Andres Freund <[hidden email]> wrote:
> Here's some numbers for a a TPC-H scale 5 run. Obviously the Q01 numbers
> are pretty nice in partcular. But it's also visible that the shorter
> query can loose, which is largely due to the JIT overhead - that can be
> ameliorated to some degree, but JITing obviously isn't always going to
> be a win.

It's pretty exciting to see thing being worked on.

I've not looked at the code, but I'm thinking, could you not just JIT
if the total cost of the plan is estimated to be > X ? Where X is some
JIT threshold GUC.


--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: JIT compiling - v4.0

Andres Freund
On 2017-10-05 23:43:37 +1300, David Rowley wrote:

> On 5 October 2017 at 19:57, Andres Freund <[hidden email]> wrote:
> > Here's some numbers for a a TPC-H scale 5 run. Obviously the Q01 numbers
> > are pretty nice in partcular. But it's also visible that the shorter
> > query can loose, which is largely due to the JIT overhead - that can be
> > ameliorated to some degree, but JITing obviously isn't always going to
> > be a win.
>
> It's pretty exciting to see thing being worked on.
>
> I've not looked at the code, but I'm thinking, could you not just JIT
> if the total cost of the plan is estimated to be > X ? Where X is some
> JIT threshold GUC.

Right, that's the plan. But it seems fairly important to make the
envelope in which it is beneficial as broad as possible. Also, test
coverage is more interesting for me right now ;)

Greetings,

Andres Freund


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: JIT compiling - v4.0

Robert Haas
In reply to this post by Andres Freund
On Thu, Oct 5, 2017 at 2:57 AM, Andres Freund <[hidden email]> wrote:
> master q01 min: 14146.498     dev min: 11479.05 [diff -23.24]     dev-jit min: 8659.961 [diff -63.36]     dev-jit-deform min: 7279.395 [diff -94.34]     dev-jit-deform-inline min: 6997.956 [diff -102.15]

I think this is a really strange way to display this information.
Instead of computing the percentage of time that you saved, you've
computed the negative of the percentage that you would have lost if
the patch were already committed and you reverted it.  That's just
confusing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: JIT compiling expressions/deform + inlining prototype v2.0

Michael Paquier
In reply to this post by Andres Freund
On Thu, Sep 21, 2017 at 2:52 AM, Andres Freund <[hidden email]> wrote:

> On 2017-09-19 12:57:33 +0300, Konstantin Knizhnik wrote:
>>
>>
>> On 04.09.2017 23:52, Andres Freund wrote:
>> >
>> > Hi. That piece of code isn't particularly clear (and has a bug in the
>> > submitted version), I'm revising it.
>>
>> ...
>> > Yea, I've changed that already, although it's currently added earlier,
>> > because the alignment is needed before, to access the column correctly.
>> > I've also made number of efficiency improvements, primarily to access
>> > columns with an absolute offset if all preceding ones are fixed width
>> > not null columns - that is quite noticeable performancewise.
>> >
>> >
>> Should I wait for new version of your patch or continue review of this code?
>
> I'll update the posted version later this week, sorry for the delay.

I know that you are working on this actively per the set of patches
you have sent lately, but this thread has stalled, so I am marking it
as returned with feedback. There is now only one CF entry to track
this work: https://commitfest.postgresql.org/15/1285/. Depending on
the work you are doing you may want to spawn a CF entry for each
sub-item. Just an idea.
--
Michael

1234 ... 13
Previous Thread Next Thread