gs_group_1 crashing on 13beta2/s390x

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

gs_group_1 crashing on 13beta2/s390x

Christoph Berg-2
On the Debian s390x buildd, the 13beta2 build is crashing:

2020-07-15 01:19:59.149 UTC [859] LOG:  server process (PID 1415) was terminated by signal 11: Segmentation fault
2020-07-15 01:19:59.149 UTC [859] DETAIL:  Failed process was running: create table gs_group_1 as
        select g100, g10, sum(g::numeric), count(*), max(g::text)
        from gs_data_1 group by cube (g1000, g100,g10);

Full build log at https://buildd.debian.org/status/fetch.php?pkg=postgresql-13&arch=s390x&ver=13%7Ebeta2-1&stamp=1594776007&raw=0

The failure is reproducible there: https://buildd.debian.org/status/logs.php?pkg=postgresql-13&ver=13%7Ebeta2-1&arch=s390x

I tried a manual build on a s390x machine, but that one went through
fine, so I can't provide a backtrace at the moment.

Christoph


Reply | Threaded
Open this post in threaded view
|

Re: gs_group_1 crashing on 13beta2/s390x

Christoph Berg-2
Re: To PostgreSQL Hackers
> On the Debian s390x buildd, the 13beta2 build is crashing:
>
> 2020-07-15 01:19:59.149 UTC [859] LOG:  server process (PID 1415) was terminated by signal 11: Segmentation fault
> 2020-07-15 01:19:59.149 UTC [859] DETAIL:  Failed process was running: create table gs_group_1 as
> select g100, g10, sum(g::numeric), count(*), max(g::text)
> from gs_data_1 group by cube (g1000, g100,g10);

I wired gdb into the build process and got this backtrace:

2020-07-15 16:03:38.310 UTC [21073] LOG:  server process (PID 21575) was terminated by signal 11: Segmentation fault
2020-07-15 16:03:38.310 UTC [21073] DETAIL:  Failed process was running: create table gs_group_1 as
        select g100, g10, sum(g::numeric), count(*), max(g::text)
        from gs_data_1 group by cube (g1000, g100,g10);

******** build/src/bin/pg_upgrade/tmp_check/data.old/core ********
[New LWP 21575]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/s390x-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: buildd regression [local] CREATE TABLE AS                           '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  datumCopy (typByVal=false, typLen=-1, value=0) at ./build/../src/backend/utils/adt/datum.c:142
142 if (VARATT_IS_EXTERNAL_EXPANDED(vl))
#0  datumCopy (typByVal=false, typLen=-1, value=0) at ./build/../src/backend/utils/adt/datum.c:142
        vl = 0x0
        res = <optimized out>
        res = <optimized out>
        vl = <optimized out>
        eoh = <optimized out>
        resultsize = <optimized out>
        resultptr = <optimized out>
        realSize = <optimized out>
        resultptr = <optimized out>
        realSize = <optimized out>
        resultptr = <optimized out>
#1  datumCopy (value=0, typByVal=false, typLen=-1) at ./build/../src/backend/utils/adt/datum.c:131
        res = <optimized out>
        vl = <optimized out>
        eoh = <optimized out>
        resultsize = <optimized out>
        resultptr = <optimized out>
        realSize = <optimized out>
        resultptr = <optimized out>
#2  0x000002aa04423af8 in finalize_aggregate (aggstate=aggstate@entry=0x2aa05775920, peragg=peragg@entry=0x2aa056e02f0, resultVal=0x2aa056e0208, resultIsNull=0x2aa056e022a, pergroupstate=<optimized out>, pergroupstate=<optimized out>) at ./build/../src/backend/executor/nodeAgg.c:1128
        fcinfodata = {fcinfo = {flinfo = 0x2aa056e0250, context = 0x2aa05775920, resultinfo = 0x0, fncollation = 0, isnull = false, nargs = 1, args = 0x3fff6a7b578}, fcinfo_data = "\000\000\002\252\005n\002P\000\000\002\252\005wY ", '\000' <repeats 13 times>, "\247\000\001\000\000\002\252\005t\265\250\000\000\003\377\211\341\207F\000\000\003\377\000\000\002\001\000\000\000\000\000\000\003\376\000\000\000\000\000\000\017\370\000\000\000\000\000\000\001\377\000\000\000\000\000\000\000\260\000\000\000k\000\000\000k\000\000\000\000\000\000 \000\000\000\003\377\213\016J \000\000\000p\000\000\000k\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\020", '\000' <repeats 11 times>, "w\000\000\000|\000\000\000\000\000\000\000\002\000\000\002\252\006&9\250\000\000\002\252\005wZh\000\000\002\252\005wZH\000\000\003\377\213\n_\210\000\000\000\000\000\000\000\000\000\000"...}
        fcinfo = 0x3fff6a7b558
        anynull = <optimized out>
        oldContext = <optimized out>
        i = <optimized out>
        lc = <optimized out>
        pertrans = <error reading variable pertrans (value has been optimized out)>
#3  0x000002aa04423ff4 in finalize_aggregates (aggstate=aggstate@entry=0x2aa05775920, peraggs=peraggs@entry=0x2aa056e0240, pergroup=0x2aa056c8ed8) at ./build/../src/backend/executor/nodeAgg.c:1345
        peragg = 0x2aa056e02f0
        transno = <optimized out>
        pergroupstate = 0x2aa056c8ef8
        econtext = <optimized out>
        aggvalues = 0x2aa056e01f8
        aggnulls = 0x2aa056e0228
        aggno = 2
        transno = <optimized out>
#4  0x000002aa04424f5c in agg_retrieve_direct (aggstate=0x2aa05775920) at ./build/../src/backend/executor/nodeAgg.c:2480
        econtext = 0x2aa05776080
        firstSlot = 0x2aa062639a8
        numGroupingSets = <optimized out>
        node = <optimized out>
        tmpcontext = 0x2aa05775d60
        peragg = 0x2aa056e0240
        outerslot = <optimized out>
        nextSetSize = <optimized out>
        pergroups = 0x2aa056c8ea8
        result = <optimized out>
        hasGroupingSets = <optimized out>
        currentSet = <optimized out>
        numReset = <optimized out>
        i = <optimized out>
        node = <optimized out>
        econtext = <optimized out>
        tmpcontext = <optimized out>
        peragg = <optimized out>
        pergroups = <optimized out>
        outerslot = <optimized out>
        firstSlot = <optimized out>
        result = <optimized out>
        hasGroupingSets = <optimized out>
        numGroupingSets = <optimized out>
        currentSet = <optimized out>
        nextSetSize = <optimized out>
        numReset = <optimized out>
        i = <optimized out>
#5  ExecAgg (pstate=0x2aa05775920) at ./build/../src/backend/executor/nodeAgg.c:2140
        node = 0x2aa05775920
        result = 0x0
#6  0x000002aa0441001a in ExecProcNode (node=0x2aa05775920) at ./build/../src/include/executor/executor.h:245
No locals.
#7  ExecutePlan (execute_once=<optimized out>, dest=0x2aa0565fa58, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x2aa05775920, estate=0x2aa057756f8) at ./build/../src/backend/executor/execMain.c:1646
        slot = <optimized out>
        current_tuple_count = 0
        slot = <optimized out>
        current_tuple_count = <optimized out>
#8  standard_ExecutorRun (queryDesc=0x2aa062df508, direction=<optimized out>, count=0, execute_once=<optimized out>) at ./build/../src/backend/executor/execMain.c:364
        estate = 0x2aa057756f8
        operation = CMD_SELECT
        dest = 0x2aa0565fa58
        sendTuples = <optimized out>
        oldcontext = 0x2aa0565f830
        __func__ = "standard_ExecutorRun"
#9  0x000002aa043933fa in ExecCreateTableAs (pstate=pstate@entry=0x2aa0565f948, stmt=stmt@entry=0x2aa055bea28, params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, qc=0x3fff6a7ca30) at ./build/../src/backend/commands/createas.c:354
        query = <optimized out>
        into = <optimized out>
        is_matview = <optimized out>
        dest = 0x2aa0565fa58
        save_userid = 0
        save_sec_context = 0
        save_nestlevel = 0
        address = {classId = 70885274, objectId = 1023, objectSubId = -156778696}
        rewritten = <optimized out>
        plan = 0x2aa062df3f8
        queryDesc = 0x2aa062df508
        __func__ = "ExecCreateTableAs"
#10 0x000002aa0459f378 in ProcessUtilitySlow (pstate=pstate@entry=0x2aa0565f948, pstmt=pstmt@entry=0x2aa055bead8, queryString=queryString@entry=0x2aa055bcfb8 "create table gs_group_1 as\nselect g100, g10, sum(g::numeric), count(*), max(g::text)\nfrom gs_data_1 group by cube (g1000, g100,g10);", context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0, queryEnv=0x0, qc=0x3fff6a7ca30, dest=<error reading variable: value has been optimized out>) at ./build/../src/backend/tcop/utility.c:1600
        _save_exception_stack = 0x3fff6a7c820
        _save_context_stack = 0x0
        _local_sigjmp_buf = {{__jmpbuf = {{__gregs = {0, 2929258264904, 2929257605848, 0, 2929257605672, 2929257598904, 4396084256648, 1, 18739560736704083, 18738779339296963}, __fpregs = {2929259797352, 2929259797352, 2929257598904, 2929258351376, 0, 4397889735684, 4397889735216, 0}}}, __mask_was_saved = 0, __saved_mask = {__val = {4397889733480, 10, 2929257605400, 0, 0, 2929257606120, 4096, 1, 0, 4396084256648, 2929242951088, 2929239505962, 4397889733120, 18736975094525723, 2929257598904, 18446744069489843728}}}}
        _do_rethrow = false
        parsetree = 0x2aa055bea28
        isTopLevel = true
        isCompleteQuery = true
        needCleanup = false
        commandCollected = false
        address = {classId = 1023, objectId = 4138189632, objectSubId = 682}
        secondaryObject = {classId = 0, objectId = 0, objectSubId = 0}
        __func__ = "ProcessUtilitySlow"
#11 0x000002aa0459dd36 in standard_ProcessUtility (pstmt=0x2aa055bead8, queryString=0x2aa055bcfb8 "create table gs_group_1 as\nselect g100, g10, sum(g::numeric), count(*), max(g::text)\nfrom gs_data_1 group by cube (g1000, g100,g10);", context=<optimized out>, params=0x0, queryEnv=<optimized out>, dest=0x2aa057d5b68, qc=0x3fff6a7ca30) at ./build/../src/backend/tcop/utility.c:1069
        parsetree = 0x2aa055bea28
        isTopLevel = <optimized out>
        isAtomicContext = <optimized out>
        pstate = 0x2aa0565f948
        readonly_flags = <optimized out>
        __func__ = "standard_ProcessUtility"
#12 0x000002aa0459e874 in ProcessUtility (pstmt=pstmt@entry=0x2aa055bead8, queryString=<optimized out>, context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=<optimized out>, queryEnv=queryEnv@entry=0x0, dest=0x2aa057d5b68, qc=0x3fff6a7ca30) at ./build/../src/backend/tcop/utility.c:524
No locals.
#13 0x000002aa0459b210 in PortalRunUtility (portal=0x2aa05620008, pstmt=0x2aa055bead8, isTopLevel=<optimized out>, setHoldSnapshot=<optimized out>, dest=<optimized out>, qc=0x3fff6a7ca30) at ./build/../src/backend/tcop/pquery.c:1157
        utilityStmt = <optimized out>
        snapshot = 0x2aa05674c58
#14 0x000002aa0459bca0 in PortalRunMulti (portal=portal@entry=0x2aa05620008, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, dest=<optimized out>, dest@entry=0x2aa057d5b68, altdest=<optimized out>, altdest@entry=0x2aa057d5b68, qc=0x3fff6a7ca30) at ./build/../src/backend/tcop/pquery.c:1303
        pstmt = <optimized out>
        stmtlist_item__state = {l = 0x2aa057d5b18, i = 0}
        active_snapshot_set = false
        stmtlist_item = 0x2aa057d5b30
#15 0x000002aa0459c9f4 in PortalRun (portal=portal@entry=0x2aa05620008, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x2aa057d5b68, altdest=0x2aa057d5b68, qc=0x3fff6a7ca30) at ./build/../src/backend/tcop/pquery.c:779
        _save_exception_stack = 0x3fff6a7ccc0
        _save_context_stack = 0x0
        _local_sigjmp_buf = {{__jmpbuf = {{__gregs = {2929258004488, 2, 2929258004488, 2929167695872, 2929257605768, 1, 4396084256648, 4397889736544, 18739560736712267, 18738779339319315}, __fpregs = {89, 0, 2929257598904, 2929258351376, 4397889735216, 4397889735684, 4397889735214, 0}}}, __mask_was_saved = 0, __saved_mask = {__val = {0, 8192, 2, 1, 1, 2929243650206, 2929258004488, 2929257605768, 2, 4396084256648, 4397889736544, 2929237235946, 4397889734816, 4397889736544, 2929239411110, 4397889734656}}}}
        _do_rethrow = <optimized out>
        result = <optimized out>
        nprocessed = <optimized out>
        saveTopTransactionResourceOwner = 0x2aa055e9bf0
        saveTopTransactionContext = 0x2aa05674b10
        saveActivePortal = 0x0
        saveResourceOwner = 0x2aa055e9bf0
        savePortalContext = 0x0
        saveMemoryContext = 0x2aa05674b10
        __func__ = "PortalRun"
#16 0x000002aa0459830a in exec_simple_query (query_string=<optimized out>) at ./build/../src/backend/tcop/postgres.c:1239
        snapshot_set = <optimized out>
        per_parsetree_context = <optimized out>
        plantree_list = <optimized out>
        parsetree = 0x2aa055bea58
        commandTag = <optimized out>
        qc = {commandTag = CMDTAG_UNKNOWN, nprocessed = 0}
        querytree_list = <optimized out>
        portal = 0x2aa05620008
        receiver = 0x2aa057d5b68
        format = 0
        parsetree_item__state = {l = 0x2aa055bea88, i = 0}
        dest = DestRemote
        oldcontext = <optimized out>
        parsetree_list = 0x2aa055bea88
        parsetree_item = 0x2aa055beaa0
        save_log_statement_stats = false
        was_logged = false
        use_implicit_block = false
        msec_str = "\000\000\002\252\000\000\000Q\000\000\002\252\005[ϸ\000\000\002\252\000\000\000\000\000\000\003\377\213\n_\210"
        __func__ = "exec_simple_query"
#17 0x000002aa0459a05e in PostgresMain (argc=<optimized out>, argv=argv@entry=0x2aa055e8190, dbname=<optimized out>, username=<optimized out>) at ./build/../src/backend/tcop/postgres.c:4315
        query_string = 0x2aa055bcfb8 "create table gs_group_1 as\nselect g100, g10, sum(g::numeric), count(*), max(g::text)\nfrom gs_data_1 group by cube (g1000, g100,g10);"
        firstchar = 81
        input_message = {data = 0x2aa055bcfb8 "create table gs_group_1 as\nselect g100, g10, sum(g::numeric), count(*), max(g::text)\nfrom gs_data_1 group by cube (g1000, g100,g10);", len = 133, maxlen = 1024, cursor = 133}
        local_sigjmp_buf = {{__jmpbuf = {{__gregs = {8388608, 64, 2929244860962, 2929257583240, 2929244863032, 2929244863024, 4396084256648, 4397889736544, 18739560736695925, 18738779339318659}, __fpregs = {4397889736544, 4397889736532, 6, 4397419976768, 2929690573168, 2930182562592, 4397889736696, 0}}}, __mask_was_saved = 1, __saved_mask = {__val = {0, 2929239505782, 2929257748144, 4397889736288, 4396084256648, 4397889736544, 2929242274994, 4397889735960, 1024, 4396084256648, 4397889736544, 2929242135450, 4, 0, 4397889736320, 4397889736160}}}}
        send_ready_for_query = false
        disable_idle_in_transaction_timeout = false
        __func__ = "PostgresMain"
#18 0x000002aa04512066 in BackendRun (port=0x2aa055e16b0, port=0x2aa055e16b0) at ./build/../src/backend/postmaster/postmaster.c:4523
        av = 0x2aa055e8190
        maxac = <optimized out>
        ac = 1
        i = 1
        av = <optimized out>
        maxac = <optimized out>
        ac = <optimized out>
        i = <optimized out>
        __func__ = "BackendRun"
        __errno_location = <optimized out>
        __errno_location = <optimized out>
        __errno_location = <optimized out>
#19 BackendStartup (port=0x2aa055e16b0) at ./build/../src/backend/postmaster/postmaster.c:4215
        bn = <optimized out>
        pid = <optimized out>
        bn = <optimized out>
        pid = <optimized out>
        __func__ = "BackendStartup"
        __errno_location = <optimized out>
        __errno_location = <optimized out>
        save_errno = <optimized out>
        __errno_location = <optimized out>
        __errno_location = <optimized out>
#20 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1727
        port = 0x2aa055e16b0
        i = <optimized out>
        rmask = {fds_bits = {32, 0 <repeats 15 times>}}
        selres = <optimized out>
        now = <optimized out>
        readmask = {fds_bits = {32, 0 <repeats 15 times>}}
        nSockets = 0
        last_lockfile_recheck_time = 1594829011
        last_touch_time = 1594829011
        __func__ = "ServerLoop"
#21 0x000002aa04513128 in PostmasterMain (argc=<optimized out>, argv=<optimized out>) at ./build/../src/backend/postmaster/postmaster.c:1400
        opt = <optimized out>
        status = <optimized out>
        userDoption = <optimized out>
        listen_addr_saved = false
        i = <optimized out>
        output_config_variable = <optimized out>
        __func__ = "PostmasterMain"
#22 0x000002aa04243fb4 in main (argc=<optimized out>, argv=0x2aa055b71b0) at ./build/../src/backend/main/main.c:210
        do_check_root = <optimized out>

Christoph


Reply | Threaded
Open this post in threaded view
|

Re: gs_group_1 crashing on 13beta2/s390x

Tom Lane-2
Christoph Berg <[hidden email]> writes:
>> On the Debian s390x buildd, the 13beta2 build is crashing:

> I wired gdb into the build process and got this backtrace:

> #0  datumCopy (typByVal=false, typLen=-1, value=0) at ./build/../src/backend/utils/adt/datum.c:142
>         vl = 0x0
>         res = <optimized out>
>         res = <optimized out>
>         vl = <optimized out>
>         eoh = <optimized out>
>         resultsize = <optimized out>
>         resultptr = <optimized out>
>         realSize = <optimized out>
>         resultptr = <optimized out>
>         realSize = <optimized out>
>         resultptr = <optimized out>
> #1  datumCopy (value=0, typByVal=false, typLen=-1) at ./build/../src/backend/utils/adt/datum.c:131
>         res = <optimized out>
>         vl = <optimized out>
>         eoh = <optimized out>
>         resultsize = <optimized out>
>         resultptr = <optimized out>
>         realSize = <optimized out>
>         resultptr = <optimized out>
> #2  0x000002aa04423af8 in finalize_aggregate (aggstate=aggstate@entry=0x2aa05775920, peragg=peragg@entry=0x2aa056e02f0, resultVal=0x2aa056e0208, resultIsNull=0x2aa056e022a, pergroupstate=<optimized out>, pergroupstate=<optimized out>) at ./build/../src/backend/executor/nodeAgg.c:1128

Hmm.  If gdb isn't lying to us, that has to be coming from here:

    /*
     * If result is pass-by-ref, make sure it is in the right context.
     */
    if (!peragg->resulttypeByVal && !*resultIsNull &&
        !MemoryContextContains(CurrentMemoryContext,
                               DatumGetPointer(*resultVal)))
        *resultVal = datumCopy(*resultVal,
                               peragg->resulttypeByVal,
                               peragg->resulttypeLen);

The line numbers in HEAD are a bit different, but that's the only
call of datumCopy() in finalize_aggregate().

It's hardly surprising that datumCopy would segfault when given
a null "value" and told it is pass-by-reference.  However, to get to
the datumCopy call, we must have passed the MemoryContextContains
check on that very same pointer value, and that would surely have
segfaulted as well, one would think.

Given the apparently-can't-happen situation at the call site,
and the fact that we're not seeing similar failures reported
elsewhere (and note that every line shown above is at least
five years old), I'm kind of forced to the conclusion that this
is a compiler bug.  Does adjusting the -O level make it go away?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: gs_group_1 crashing on 13beta2/s390x

Andrew Gierth
>>>>> "Tom" == Tom Lane <[hidden email]> writes:

 Tom> It's hardly surprising that datumCopy would segfault when given a
 Tom> null "value" and told it is pass-by-reference. However, to get to
 Tom> the datumCopy call, we must have passed the MemoryContextContains
 Tom> check on that very same pointer value, and that would surely have
 Tom> segfaulted as well, one would think.

Nope, because MemoryContextContains just returns "false" if passed a
NULL pointer.

--
Andrew (irc:RhodiumToad)


Reply | Threaded
Open this post in threaded view
|

Re: gs_group_1 crashing on 13beta2/s390x

Christoph Berg-2
In reply to this post by Tom Lane-2
Re: Tom Lane
> Given the apparently-can't-happen situation at the call site,
> and the fact that we're not seeing similar failures reported
> elsewhere (and note that every line shown above is at least
> five years old), I'm kind of forced to the conclusion that this
> is a compiler bug.  Does adjusting the -O level make it go away?

The problem is that a manual build doesn't crash, and I'm somewhat
reluctant to do a full new package upload (which will keep buildds for
all architectures busy) just for a -O0 test unless we are sure it
helps.

I'd rather play more with the manual build artifacts (which should be
using the same compiler and everything), if anyone has ideas what I
should be trying.

Christoph


Reply | Threaded
Open this post in threaded view
|

Re: gs_group_1 crashing on 13beta2/s390x

Tom Lane-2
In reply to this post by Andrew Gierth
Andrew Gierth <[hidden email]> writes:
> "Tom" == Tom Lane <[hidden email]> writes:
>  Tom> It's hardly surprising that datumCopy would segfault when given a
>  Tom> null "value" and told it is pass-by-reference. However, to get to
>  Tom> the datumCopy call, we must have passed the MemoryContextContains
>  Tom> check on that very same pointer value, and that would surely have
>  Tom> segfaulted as well, one would think.

> Nope, because MemoryContextContains just returns "false" if passed a
> NULL pointer.

Ah, right.  So you could imagine getting here if the finalfn had returned
PointerGetDatum(NULL) with isnull = false.  We have some aggregate
transfns that are capable of doing that for internal-type transvalues,
I think, but the finalfn never should do it.

In any case we still have the fact that this isn't being seen in our
buildfarm; and that's not for lack of s390 machines.  So I still think
the most likely explanation is a compiler bug in bleeding-edge gcc.

Probably what Christoph should be trying to figure out is why he can't
reproduce it manually.  There must be some discrepancy between his
environment and the build machines; but what?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: gs_group_1 crashing on 13beta2/s390x

Christoph Berg-2
Re: Tom Lane

> >  Tom> It's hardly surprising that datumCopy would segfault when given a
> >  Tom> null "value" and told it is pass-by-reference. However, to get to
> >  Tom> the datumCopy call, we must have passed the MemoryContextContains
> >  Tom> check on that very same pointer value, and that would surely have
> >  Tom> segfaulted as well, one would think.
>
> > Nope, because MemoryContextContains just returns "false" if passed a
> > NULL pointer.
>
> Ah, right.  So you could imagine getting here if the finalfn had returned
> PointerGetDatum(NULL) with isnull = false.  We have some aggregate
> transfns that are capable of doing that for internal-type transvalues,
> I think, but the finalfn never should do it.

So I had another stab at this. As expected, the 13.0 upload to
Debian/unstable crashed again on the buildd, while a manual
everything-should-be-the-same build succeeded. I don't know why I
didn't try this before, but this time I took this manual build and
started a PG instance from it. Pasting the gs_group_1 queries made it
segfault instantly.

So here we are:

#0  datumCopy (value=0, typLen=-1, typByVal=false) at ./build/../src/backend/utils/adt/datum.c:142
#1  0x000002aa3bf6322e in datumCopy (value=<optimized out>, typByVal=<optimized out>, typLen=<optimized out>)
    at ./build/../src/backend/utils/adt/datum.c:178
#2  0x000002aa3bda6dd6 in finalize_aggregate (aggstate=aggstate@entry=0x2aa3defbfd0, peragg=peragg@entry=0x2aa3e0671f0,
    pergroupstate=pergroupstate@entry=0x2aa3e026b78, resultVal=resultVal@entry=0x2aa3e067108, resultIsNull=0x2aa3e06712a)
    at ./build/../src/backend/executor/nodeAgg.c:1153

(gdb) p *resultVal
$3 = 0
(gdb) p *resultIsNull
$6 = false

(gdb) p *peragg
$7 = {aggref = 0x2aa3deef218, transno = 2, finalfn_oid = 0, finalfn = {fn_addr = 0x0, fn_oid = 0, fn_nargs = 0, fn_strict = false,
    fn_retset = false, fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, numFinalArgs = 1, aggdirectargs = 0x0,
  resulttypeLen = -1, resulttypeByVal = false, shareable = true}

Since finalfn_oid is 0, resultVal/resultIsNull were set by the `else`
branch of the if (OidIsValid) in finalize_aggregate():

    else
    {
        /* Don't need MakeExpandedObjectReadOnly; datumCopy will copy it */
        *resultVal = pergroupstate->transValue;
        *resultIsNull = pergroupstate->transValueIsNull;
    }

(gdb) p *pergroupstate
$12 = {transValue = 0, transValueIsNull = false, noTransValue = false}

That comes from finalize_aggregates:

#3  0x000002aa3bda7e10 in finalize_aggregates (aggstate=aggstate@entry=0x2aa3defbfd0, peraggs=peraggs@entry=0x2aa3e067140,
    pergroup=0x2aa3e026b58) at ./build/../src/backend/executor/nodeAgg.c:1369

    /*
     * Run the final functions.
     */
    for (aggno = 0; aggno < aggstate->numaggs; aggno++)
    {
        AggStatePerAgg peragg = &peraggs[aggno];
        int         transno = peragg->transno;
        AggStatePerGroup pergroupstate;

        pergroupstate = &pergroup[transno];

        if (DO_AGGSPLIT_SKIPFINAL(aggstate->aggsplit))
            finalize_partialaggregate(aggstate, peragg, pergroupstate,
                                      &aggvalues[aggno], &aggnulls[aggno]);
        else
            finalize_aggregate(aggstate, peragg, pergroupstate,
                               &aggvalues[aggno], &aggnulls[aggno]);
    }

... but at that point I'm lost. Maybe "aggno" and "transno" got mixed
up here?

(I'll leave the gdb session open for further suggestions.)

Christoph


Reply | Threaded
Open this post in threaded view
|

Re: gs_group_1 crashing on 13beta2/s390x

Christoph Berg-2
I poked around with the SET in the offending tests, and the crash is
only present if `set jit_above_cost = 0;` is present. Removing that
makes it pass. Removing work_mem or enable_hashagg does not make a
difference. llvm version is 10.0.1.


Test file:

--
-- Compare results between plans using sorting and plans using hash
-- aggregation. Force spilling in both cases by setting work_mem low
-- and altering the statistics.
--

create table gs_data_1 as
select g%1000 as g1000, g%100 as g100, g%10 as g10, g
   from generate_series(0,1999) g;

analyze gs_data_1;
alter table gs_data_1 set (autovacuum_enabled = 'false');
update pg_class set reltuples = 10 where relname='gs_data_1';

SET work_mem='64kB';

-- Produce results with sorting.

set enable_hashagg = false;
set jit_above_cost = 0; -- remove this to remove crash

explain (costs off)
select g100, g10, sum(g::numeric), count(*), max(g::text)
from gs_data_1 group by cube (g1000, g100,g10);

create table gs_group_1 as
select g100, g10, sum(g::numeric), count(*), max(g::text)
from gs_data_1 group by cube (g1000, g100,g10);



Christoph


Reply | Threaded
Open this post in threaded view
|

Re: gs_group_1 crashing on 13beta2/s390x

Christoph Berg-2
Re: To Tom Lane
> I poked around with the SET in the offending tests, and the crash is
> only present if `set jit_above_cost = 0;` is present. Removing that
> makes it pass. Removing work_mem or enable_hashagg does not make a
> difference. llvm version is 10.0.1.

I put jit_above_cost=0 into postgresql.conf and ran "make installcheck"
again. There are more crashes:

From src/test/regress/sql/interval.sql:

2020-09-25 17:00:25.130 CEST [8135] LOG:  Serverprozess (PID 8369) wurde von Signal 11 beendet: Speicherzugriffsfehler
2020-09-25 17:00:25.130 CEST [8135] DETAIL:  Der fehlgeschlagene Prozess führte aus: select avg(f1) from interval_tbl;

From src/test/regress/sql/tid.sql:

2020-09-25 17:01:20.593 CEST [8135] LOG:  Serverprozess (PID 9015) wurde von Signal 11 beendet: Speicherzugriffsfehler
2020-09-25 17:01:20.593 CEST [8135] DETAIL:  Der fehlgeschlagene Prozess führte aus: SELECT max(ctid) FROM tid_tab;

From src/test/regress/sql/collate*.sql:

2020-09-25 17:07:17.852 CEST [8135] LOG:  Serverprozess (PID 12232) wurde von Signal 11 beendet: Speicherzugriffsfehler
2020-09-25 17:07:17.852 CEST [8135] DETAIL:  Der fehlgeschlagene Prozess führte aus: SELECT min(b), max(b) FROM collate_test1;

From src/test/regress/sql/plpgsql.sql:

2020-09-25 17:11:56.495 CEST [8135] LOG:  Serverprozess (PID 15562) wurde von Signal 11 beendet: Speicherzugriffsfehler
2020-09-25 17:11:56.495 CEST [8135] DETAIL:  Der fehlgeschlagene Prozess führte aus: select * from returnqueryf();

Contrary to the others this one is not related to aggregates:

   -- test RETURN QUERY with dropped columns
   
   create table tabwithcols(a int, b int, c int, d int);
   insert into tabwithcols values(10,20,30,40),(50,60,70,80);
   
   create or replace function returnqueryf()
   returns setof tabwithcols as $$
   begin
     return query select * from tabwithcols;
     return query execute 'select * from tabwithcols';
   end;
   $$ language plpgsql;
   
   select * from returnqueryf();
   
   alter table tabwithcols drop column b;
   
   select * from returnqueryf();
   
   alter table tabwithcols drop column d;
   
   select * from returnqueryf();
   
   alter table tabwithcols add column d int;
   
   select * from returnqueryf();
   
   drop function returnqueryf();
   drop table tabwithcols;

src/test/regress/sql/rangefuncs.sql:

2020-09-25 17:16:04.209 CEST [8135] LOG:  Serverprozess (PID 17372) wurde von Signal 11 beendet: Speicherzugriffsfehler
2020-09-25 17:16:04.209 CEST [8135] DETAIL:  Der fehlgeschlagene Prozess führte aus: select * from usersview;

src/test/regress/sql/alter_table.sql:

2020-09-25 17:21:36.187 CEST [8135] LOG:  Serverprozess (PID 19217) wurde von Signal 11 beendet: Speicherzugriffsfehler
2020-09-25 17:21:36.187 CEST [8135] DETAIL:  Der fehlgeschlagene Prozess führte aus: update atacc3 set test2 = 4 where test2 is null;

src/test/regress/sql/polymorphism.sql:

2020-09-25 17:23:55.509 CEST [8135] LOG:  Serverprozess (PID 21010) wurde von Signal 11 beendet: Speicherzugriffsfehler
2020-09-25 17:23:55.509 CEST [8135] DETAIL:  Der fehlgeschlagene Prozess führte aus: select myleast(1.1, 0.22, 0.55);

2020-09-25 17:28:26.222 CEST [8135] LOG:  Serverprozess (PID 22325) wurde von Signal 11 beendet: Speicherzugriffsfehler
2020-09-25 17:28:26.222 CEST [8135] DETAIL:  Der fehlgeschlagene Prozess führte aus: select f.longname from fullname f;

(stopping here)


There are also a lot of these log lines (without prefix):

ORC error: No callback manager available for s390x-ibm-linux-gnu

Is that worrying? I'm not sure but I think I've seen these on other
architectures as well.


I guess that suggests two things:
* jit is not ready for prime time on s390x and I should disable it
* jit is not exercised enough by "make installcheck"

Christoph


Reply | Threaded
Open this post in threaded view
|

Re: gs_group_1 crashing on 13beta2/s390x

Andres Freund
Hi,

On 2020-09-25 17:29:07 +0200, Christoph Berg wrote:
> I guess that suggests two things:
> * jit is not ready for prime time on s390x and I should disable it

I don't know how good LLVMs support for s390x JITing is, and given that
it's unrealistic for people to get access to s390x...


> * jit is not exercised enough by "make installcheck"

So far we've exercised more widely it by setting up machines that use it
for all queries (by setting the config option). I'm doubtful it's worth
doing differently.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: gs_group_1 crashing on 13beta2/s390x

Christoph Berg
Am 25. September 2020 18:42:04 MESZ schrieb Andres Freund <[hidden email]>
>> * jit is not exercised enough by "make installcheck"
>
>So far we've exercised more widely it by setting up machines that use
>it
>for all queries (by setting the config option). I'm doubtful it's worth
>doing differently.

Ok, but given that Debian is currently targeting 22 architectures, I doubt the PostgreSQL buildfarm covers all of them with the extra JIT option, so I should probably make sure to do that here when running tests.


Reply | Threaded
Open this post in threaded view
|

Re: gs_group_1 crashing on 13beta2/s390x

Ranier Vilela-2
In reply to this post by Christoph Berg-2
Em sex., 25 de set. de 2020 às 11:30, Christoph Berg <[hidden email]> escreveu:
Re: Tom Lane
> >  Tom> It's hardly surprising that datumCopy would segfault when given a
> >  Tom> null "value" and told it is pass-by-reference. However, to get to
> >  Tom> the datumCopy call, we must have passed the MemoryContextContains
> >  Tom> check on that very same pointer value, and that would surely have
> >  Tom> segfaulted as well, one would think.
>
> > Nope, because MemoryContextContains just returns "false" if passed a
> > NULL pointer.
>
> Ah, right.  So you could imagine getting here if the finalfn had returned
> PointerGetDatum(NULL) with isnull = false.  We have some aggregate
> transfns that are capable of doing that for internal-type transvalues,
> I think, but the finalfn never should do it.

So I had another stab at this. As expected, the 13.0 upload to
Debian/unstable crashed again on the buildd, while a manual
everything-should-be-the-same build succeeded. I don't know why I
didn't try this before, but this time I took this manual build and
started a PG instance from it. Pasting the gs_group_1 queries made it
segfault instantly.

So here we are:

#0  datumCopy (value=0, typLen=-1, typByVal=false) at ./build/../src/backend/utils/adt/datum.c:142
#1  0x000002aa3bf6322e in datumCopy (value=<optimized out>, typByVal=<optimized out>, typLen=<optimized out>)
    at ./build/../src/backend/utils/adt/datum.c:178
#2  0x000002aa3bda6dd6 in finalize_aggregate (aggstate=aggstate@entry=0x2aa3defbfd0, peragg=peragg@entry=0x2aa3e0671f0,
    pergroupstate=pergroupstate@entry=0x2aa3e026b78, resultVal=resultVal@entry=0x2aa3e067108, resultIsNull=0x2aa3e06712a)
    at ./build/../src/backend/executor/nodeAgg.c:1153

(gdb) p *resultVal
$3 = 0
(gdb) p *resultIsNull
$6 = false

(gdb) p *peragg
$7 = {aggref = 0x2aa3deef218, transno = 2, finalfn_oid = 0, finalfn = {fn_addr = 0x0, fn_oid = 0, fn_nargs = 0, fn_strict = false,
    fn_retset = false, fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, numFinalArgs = 1, aggdirectargs = 0x0,
  resulttypeLen = -1, resulttypeByVal = false, shareable = true}

Since finalfn_oid is 0, resultVal/resultIsNull were set by the `else`
branch of the if (OidIsValid) in finalize_aggregate():

    else
    {
        /* Don't need MakeExpandedObjectReadOnly; datumCopy will copy it */
        *resultVal = pergroupstate->transValue;
        *resultIsNull = pergroupstate->transValueIsNull;
    }

(gdb) p *pergroupstate
$12 = {transValue = 0, transValueIsNull = false, noTransValue = false}
Here transValueIsNull shouldn't be "true"?
thus, DatumCopy would be protected, for this test: "!*resultIsNull"
 
regards,
Ranier Vilela
Reply | Threaded
Open this post in threaded view
|

Re: gs_group_1 crashing on 13beta2/s390x

Ranier Vilela-2
Em sex., 25 de set. de 2020 às 14:36, Ranier Vilela <[hidden email]> escreveu:
Em sex., 25 de set. de 2020 às 11:30, Christoph Berg <[hidden email]> escreveu:
Re: Tom Lane
> >  Tom> It's hardly surprising that datumCopy would segfault when given a
> >  Tom> null "value" and told it is pass-by-reference. However, to get to
> >  Tom> the datumCopy call, we must have passed the MemoryContextContains
> >  Tom> check on that very same pointer value, and that would surely have
> >  Tom> segfaulted as well, one would think.
>
> > Nope, because MemoryContextContains just returns "false" if passed a
> > NULL pointer.
>
> Ah, right.  So you could imagine getting here if the finalfn had returned
> PointerGetDatum(NULL) with isnull = false.  We have some aggregate
> transfns that are capable of doing that for internal-type transvalues,
> I think, but the finalfn never should do it.

So I had another stab at this. As expected, the 13.0 upload to
Debian/unstable crashed again on the buildd, while a manual
everything-should-be-the-same build succeeded. I don't know why I
didn't try this before, but this time I took this manual build and
started a PG instance from it. Pasting the gs_group_1 queries made it
segfault instantly.

So here we are:

#0  datumCopy (value=0, typLen=-1, typByVal=false) at ./build/../src/backend/utils/adt/datum.c:142
#1  0x000002aa3bf6322e in datumCopy (value=<optimized out>, typByVal=<optimized out>, typLen=<optimized out>)
    at ./build/../src/backend/utils/adt/datum.c:178
#2  0x000002aa3bda6dd6 in finalize_aggregate (aggstate=aggstate@entry=0x2aa3defbfd0, peragg=peragg@entry=0x2aa3e0671f0,
    pergroupstate=pergroupstate@entry=0x2aa3e026b78, resultVal=resultVal@entry=0x2aa3e067108, resultIsNull=0x2aa3e06712a)
    at ./build/../src/backend/executor/nodeAgg.c:1153

(gdb) p *resultVal
$3 = 0
(gdb) p *resultIsNull
$6 = false

(gdb) p *peragg
$7 = {aggref = 0x2aa3deef218, transno = 2, finalfn_oid = 0, finalfn = {fn_addr = 0x0, fn_oid = 0, fn_nargs = 0, fn_strict = false,
    fn_retset = false, fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, numFinalArgs = 1, aggdirectargs = 0x0,
  resulttypeLen = -1, resulttypeByVal = false, shareable = true}

Since finalfn_oid is 0, resultVal/resultIsNull were set by the `else`
branch of the if (OidIsValid) in finalize_aggregate():

    else
    {
        /* Don't need MakeExpandedObjectReadOnly; datumCopy will copy it */
        *resultVal = pergroupstate->transValue;
        *resultIsNull = pergroupstate->transValueIsNull;
    }

(gdb) p *pergroupstate
$12 = {transValue = 0, transValueIsNull = false, noTransValue = false}
Here transValueIsNull shouldn't be "true"?
thus, DatumCopy would be protected, for this test: "!*resultIsNull"
Observe this excerpt (line 1129):
/* don't call a strict function with NULL inputs */
*resultVal = (Datum) 0;
*resultIsNull = true;

Now, it does not contradict this principle.
If all the values are null, they should be filled with True (1),
and not 0 (false)?

Line (4711), function ExecReScanAgg:
MemSet(econtext->ecxt_aggvalues, 0, sizeof(Datum) * node->numaggs);
MemSet(econtext->ecxt_aggnulls, true, sizeof(bool) * node->numaggs);

zero, here, mean False, aggvalues is Null? Not.

regards,
Ranier Vilela
Reply | Threaded
Open this post in threaded view
|

Re: gs_group_1 crashing on 13beta2/s390x

Tom Lane-2
In reply to this post by Christoph Berg
Christoph Berg <[hidden email]> writes:
> Ok, but given that Debian is currently targeting 22 architectures, I doubt the PostgreSQL buildfarm covers all of them with the extra JIT option, so I should probably make sure to do that here when running tests.

+1.  I rather doubt our farm is running this type of test on anything
but x86_64.

Of course, we can't actually *fix* any LLVM bugs, but it'd be nice
to know whether they're there.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: gs_group_1 crashing on 13beta2/s390x

Andres Freund
In reply to this post by Christoph Berg
Hi,

On 2020-09-25 19:05:52 +0200, Christoph Berg wrote:

> Am 25. September 2020 18:42:04 MESZ schrieb Andres Freund <[hidden email]>
> >> * jit is not exercised enough by "make installcheck"
> >
> >So far we've exercised more widely it by setting up machines that use
> >it
> >for all queries (by setting the config option). I'm doubtful it's worth
> >doing differently.
>
> Ok, but given that Debian is currently targeting 22 architectures, I
> doubt the PostgreSQL buildfarm covers all of them with the extra JIT
> option, so I should probably make sure to do that here when running
> tests.

Forcing to JIT a lot of queries that are otherwise really fast
unfortunately has a significant time cost. Doing that on slow
architectures might be prohibitively slow.  Kinda wonder if we shouldn't
just restrict JIT to a few architectures that we have a bit more regular
access to (x86, arm, maybe also ppc?). It's not like anybody would run
large analytics queries on mips.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: gs_group_1 crashing on 13beta2/s390x

Andres Freund
In reply to this post by Tom Lane-2
Hi,

On 2020-09-25 14:11:46 -0400, Tom Lane wrote:
> Christoph Berg <[hidden email]> writes:
> > Ok, but given that Debian is currently targeting 22 architectures, I doubt the PostgreSQL buildfarm covers all of them with the extra JIT option, so I should probably make sure to do that here when running tests.
>
> +1.  I rather doubt our farm is running this type of test on anything
> but x86_64.

There's quite a few arm animals and at least one mips animal that do
some minimal coverage of JITing (i.e. the queries that are actually
somewhat expensive). I pinged two owners asking whether one of the arm
animals could be changed to force JITing.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: gs_group_1 crashing on 13beta2/s390x

Christoph Berg-2
Re: Andres Freund
> > > Ok, but given that Debian is currently targeting 22 architectures, I doubt the PostgreSQL buildfarm covers all of them with the extra JIT option, so I should probably make sure to do that here when running tests.
> >
> > +1.  I rather doubt our farm is running this type of test on anything
> > but x86_64.
>
> There's quite a few arm animals and at least one mips animal that do
> some minimal coverage of JITing (i.e. the queries that are actually
> somewhat expensive). I pinged two owners asking whether one of the arm
> animals could be changed to force JITing.

I pushed a change that should enable LLVM-10-JIT-testing everywhere [*]
and (admittedly to my surprise) all other architectures passed just
fine:

https://buildd.debian.org/status/logs.php?pkg=postgresql-13&ver=13.0-2

For the record, the architectures with llvm disabled are these:

clang-10 [!alpha !hppa !hurd-i386 !ia64 !kfreebsd-amd64 !kfreebsd-i386 !m68k !powerpc !riscv64 !s390x !sh4 !sparc64 !x32],

After the tests I realized that LLVM 11 is also already packaged, but
s390x still segfaults with that version.

Christoph

[*] apparently pgbench --temp-config=/no/such/file is not an error,
which makes verifying this change a bit harder


Reply | Threaded
Open this post in threaded view
|

Re: gs_group_1 crashing on 13beta2/s390x

Andres Freund
Hi,

On 2020-09-28 14:22:01 +0200, Christoph Berg wrote:

> Re: Andres Freund
> > > > Ok, but given that Debian is currently targeting 22 architectures, I doubt the PostgreSQL buildfarm covers all of them with the extra JIT option, so I should probably make sure to do that here when running tests.
> > >
> > > +1.  I rather doubt our farm is running this type of test on anything
> > > but x86_64.
> >
> > There's quite a few arm animals and at least one mips animal that do
> > some minimal coverage of JITing (i.e. the queries that are actually
> > somewhat expensive). I pinged two owners asking whether one of the arm
> > animals could be changed to force JITing.
>
> I pushed a change that should enable LLVM-10-JIT-testing everywhere [*]
> and (admittedly to my surprise) all other architectures passed just
> fine:
>
> https://buildd.debian.org/status/logs.php?pkg=postgresql-13&ver=13.0-2

Thanks!


> For the record, the architectures with llvm disabled are these:
>
> clang-10 [!alpha !hppa !hurd-i386 !ia64 !kfreebsd-amd64 !kfreebsd-i386 !m68k !powerpc !riscv64 !s390x !sh4 !sparc64 !x32],

!powerpc doesn't exclude ppc64, I assume?


> After the tests I realized that LLVM 11 is also already packaged, but
> s390x still segfaults with that version.
>
> Christoph
>
> [*] apparently pgbench --temp-config=/no/such/file is not an error,
> which makes verifying this change a bit harder

pgbench? I assume you mean pg_regress?

FWIW, an easy way to enable JIT for just about all tests, including tap
tests, is to set
PGOPTIONS='-c jit=1 -c jit_above_cost=0 ...'
in the environment before starting the tests.


Can a non-debian dev get access to a s390x? It'd be nice to isolate this
enough to report a bug to LLVM - and that's probably a lot easier for me
than you... My guess would be that some relocation processing or such is
off.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: gs_group_1 crashing on 13beta2/s390x

Christoph Berg-2
Re: Andres Freund
> > clang-10 [!alpha !hppa !hurd-i386 !ia64 !kfreebsd-amd64 !kfreebsd-i386 !m68k !powerpc !riscv64 !s390x !sh4 !sparc64 !x32],
>
> !powerpc doesn't exclude ppc64, I assume?

That's direct matches only, there's no architecture-family logic in
there.

> > [*] apparently pgbench --temp-config=/no/such/file is not an error,
> > which makes verifying this change a bit harder
>
> pgbench? I assume you mean pg_regress?

Err yes of course.

> FWIW, an easy way to enable JIT for just about all tests, including tap
> tests, is to set
> PGOPTIONS='-c jit=1 -c jit_above_cost=0 ...'
> in the environment before starting the tests.

Ok, that might simplify the setup a bit.

> Can a non-debian dev get access to a s390x? It'd be nice to isolate this
> enough to report a bug to LLVM - and that's probably a lot easier for me
> than you... My guess would be that some relocation processing or such is
> off.

You already had an account there in the past I think. I'll see if I
can get that reactivated. Thanks for the offer!

Christoph


Reply | Threaded
Open this post in threaded view
|

Re: gs_group_1 crashing on 13beta2/s390x

Andres Freund
In reply to this post by Christoph Berg-2
Hi,

Christoph helped me to get access to a s390x machine - I wasn't able to
reproduce exactly the error he hit. Initially all tests passed, but
after recompiling with build flags more similar to Christop's I was able
to hit another instance of what I assume to be the same bug.

I am fairly sure that I see the problem. Before a post LLVM 10 change
the "runtime linker" for JITed code only asserted that relocations that
need to be performed are of a known type. Since the debian build -
correctly - uses a release version of LLVM, this results in unhandled
relocations basically being resolved to 0.

I suspect that building with LDFLAGS="-Wl,-z,relro -Wl,-z,now" - which
is what I think the debian package does - creates the types of
relocations that LLVM doesn't handle for elf + s390.

10 release branch:

void RuntimeDyldELF::resolveSystemZRelocation(const SectionEntry &Section,
                                              uint64_t Offset, uint64_t Value,
                                              uint32_t Type, int64_t Addend) {
  uint8_t *LocalAddress = Section.getAddressWithOffset(Offset);
  switch (Type) {
  default:
    llvm_unreachable("Relocation type not implemented yet!");
    break;

11/master:

void RuntimeDyldELF::resolveSystemZRelocation(const SectionEntry &Section,
                                              uint64_t Offset, uint64_t Value,
                                              uint32_t Type, int64_t Addend) {
  uint8_t *LocalAddress = Section.getAddressWithOffset(Offset);
  switch (Type) {
  default:
    report_fatal_error("Relocation type not implemented yet!");
    break;

Verifying that that's the case by rebuilding against 11 (and then an
LLVM debug build, which will take a day or two).

Greetings,

Andres Freund


12