tsvector extraction patch

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

tsvector extraction patch

"Hans-Jürgen Schönig (PostgreSQL)"
hello,

this patch has not made it through yesterday, so i am trying to send it
again.
i made a small patch which i found useful for my personal tasks.
it would be nice to see this in 8.5. if not core then maybe contrib.
it transforms a tsvector to table format which is really nice for text
processing and comparison.

test=# SELECT * FROM tsvcontent(to_tsvector('english', 'i am pretty sure
this is a good patch'));
 lex   | rank
--------+------
good   |    8
patch  |    9
pretti |    3
sure   |    4
(4 rows)

   many thanks,

      hans

--
Cybertec Schoenig & Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: tsvector extraction patch

"Hans-Jürgen Schönig (PostgreSQL)"
Hans-Juergen Schoenig -- PostgreSQL wrote:

> hello,
>
> this patch has not made it through yesterday, so i am trying to send
> it again.
> i made a small patch which i found useful for my personal tasks.
> it would be nice to see this in 8.5. if not core then maybe contrib.
> it transforms a tsvector to table format which is really nice for text
> processing and comparison.
>
> test=# SELECT * FROM tsvcontent(to_tsvector('english', 'i am pretty
> sure this is a good patch'));
> lex   | rank
> --------+------
> good   |    8
> patch  |    9
> pretti |    3
> sure   |    4
> (4 rows)
>
>   many thanks,
>
>      hans
>

--
Cybertec Schoenig & Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


diff -dcrpN postgresql-8.4.0.old/contrib/Makefile postgresql-8.4.0/contrib/Makefile
*** postgresql-8.4.0.old/contrib/Makefile 2009-03-26 00:20:01.000000000 +0100
--- postgresql-8.4.0/contrib/Makefile 2009-06-29 11:03:04.000000000 +0200
*************** WANTED_DIRS = \
*** 39,44 ****
--- 39,45 ----
  tablefunc \
  test_parser \
  tsearch2 \
+ tsvcontent \
  vacuumlo
 
  ifeq ($(with_openssl),yes)
diff -dcrpN postgresql-8.4.0.old/contrib/tsvcontent/Makefile postgresql-8.4.0/contrib/tsvcontent/Makefile
*** postgresql-8.4.0.old/contrib/tsvcontent/Makefile 1970-01-01 01:00:00.000000000 +0100
--- postgresql-8.4.0/contrib/tsvcontent/Makefile 2009-06-29 11:20:21.000000000 +0200
***************
*** 0 ****
--- 1,19 ----
+ # $PostgreSQL: pgsql/contrib/tablefunc/Makefile,v 1.9 2007/11/10 23:59:51 momjian Exp $
+
+ MODULES = tsvcontent
+ DATA_built = tsvcontent.sql
+ DATA = uninstall_tsvcontent.sql
+
+
+ SHLIB_LINK += $(filter -lm, $(LIBS))
+
+ ifdef USE_PGXS
+ PG_CONFIG = pg_config
+ PGXS := $(shell $(PG_CONFIG) --pgxs)
+ include $(PGXS)
+ else
+ subdir = contrib/tsvcontent
+ top_builddir = ../..
+ include $(top_builddir)/src/Makefile.global
+ include $(top_srcdir)/contrib/contrib-global.mk
+ endif
diff -dcrpN postgresql-8.4.0.old/contrib/tsvcontent/tsvcontent.c postgresql-8.4.0/contrib/tsvcontent/tsvcontent.c
*** postgresql-8.4.0.old/contrib/tsvcontent/tsvcontent.c 1970-01-01 01:00:00.000000000 +0100
--- postgresql-8.4.0/contrib/tsvcontent/tsvcontent.c 2009-06-29 11:18:35.000000000 +0200
***************
*** 0 ****
--- 1,169 ----
+ #include "postgres.h"
+
+ #include "fmgr.h"
+ #include "funcapi.h"
+ #include "miscadmin.h"
+ #include "executor/spi.h"
+ #include "lib/stringinfo.h"
+ #include "nodes/nodes.h"
+ #include "utils/builtins.h"
+ #include "utils/lsyscache.h"
+ #include "utils/syscache.h"
+ #include "utils/memutils.h"
+ #include "tsearch/ts_type.h"
+ #include "tsearch/ts_utils.h"
+ #include "catalog/pg_type.h"
+
+ #include "tsvcontent.h"
+
+ PG_MODULE_MAGIC;
+
+ PG_FUNCTION_INFO_V1(tsvcontent);
+
+ Datum
+ tsvcontent(PG_FUNCTION_ARGS)
+ {
+ FuncCallContext *funcctx;
+ TupleDesc ret_tupdesc;
+ AttInMetadata *attinmeta;
+ int call_cntr;
+ int max_calls;
+ ts_to_txt_fctx *fctx;
+ Datum result[2];
+ bool isnull[2] = { false, false };
+ MemoryContext oldcontext;
+
+ /* input value containing the TS vector */
+ TSVector         in = PG_GETARG_TSVECTOR(0);
+
+ /* stuff done only on the first call of the function */
+ if (SRF_IS_FIRSTCALL())
+ {
+ TupleDesc tupdesc;
+ int i, j;
+ char *wepv_base;
+
+ /* create a function context for cross-call persistence */
+ funcctx = SRF_FIRSTCALL_INIT();
+
+ /*
+ * switch to memory context appropriate for multiple function calls
+ */
+ oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+ switch (get_call_result_type(fcinfo, NULL, &tupdesc))
+ {
+ case TYPEFUNC_COMPOSITE:
+ /* success */
+ break;
+ case TYPEFUNC_RECORD:
+ /* failed to determine actual type of RECORD */
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("function returning record called in context "
+ "that cannot accept type record")));
+ break;
+ default:
+ /* result type isn't composite */
+ elog(ERROR, "return type must be a row type");
+ break;
+ }
+
+ /* make sure we have a persistent copy of the tupdesc */
+ tupdesc = CreateTupleDescCopy(tupdesc);
+
+ /*
+ * Generate attribute metadata needed later to produce tuples from raw
+ * C strings
+ */
+ attinmeta = TupleDescGetAttInMetadata(tupdesc);
+ funcctx->attinmeta = attinmeta;
+
+ /* allocate memory */
+ fctx = (ts_to_txt_fctx *) palloc(sizeof(ts_to_txt_fctx));
+
+ wepv_base = (char *)in + offsetof(TSVectorData, entries) + in->size * sizeof(WordEntry);
+
+ fctx->n_tsvt = 0;
+ for (i = 0; i < in->size; i++)
+ {
+ if (in->entries[i].haspos)
+ {
+ WordEntryPosVector *wepv = (WordEntryPosVector *)
+ (wepv_base + in->entries[i].pos + SHORTALIGN(in->entries[i].len));
+
+ fctx->n_tsvt += wepv->npos;
+ }
+ else
+ fctx->n_tsvt++;
+ }
+
+ fctx->tsvt = palloc(fctx->n_tsvt * sizeof(tsvec_tuple));
+
+ for (i = 0, j = 0; i < in->size; i++)
+ {
+ int pos = in->entries[i].pos;
+ int len = in->entries[i].len;
+
+ if (in->entries[i].haspos)
+ {
+ WordEntryPosVector *wepv = (WordEntryPosVector *)
+ (wepv_base + in->entries[i].pos + SHORTALIGN(len));
+ uint16 npos = wepv->npos;
+ int o;
+ for (o = 0; o < npos; o++)
+ {
+ fctx->tsvt[j].txt = palloc(len + 1);
+ memcpy(fctx->tsvt[j].txt, wepv_base + pos, len);
+ fctx->tsvt[j].txt[len] = '\0';
+ fctx->tsvt[j].pos = wepv->pos[o];
+ j++;
+ }
+ }
+ else
+ {
+ fctx->tsvt[j].txt = palloc(len + 1);
+ memcpy(fctx->tsvt[j].txt, wepv_base + pos, len);
+ fctx->tsvt[j].txt[len] = '\0';
+ fctx->tsvt[j].pos = 0;
+ j++;
+ }
+ }
+
+ /* total number of tuples to be returned */
+                 funcctx->max_calls = fctx->n_tsvt;
+
+ funcctx->user_fctx = fctx;
+ MemoryContextSwitchTo(oldcontext);
+ }
+
+ funcctx = SRF_PERCALL_SETUP();
+
+ call_cntr = funcctx->call_cntr;
+ max_calls = funcctx->max_calls;
+ fctx = funcctx->user_fctx;
+
+ /* attribute return type and return tuple description */
+ attinmeta = funcctx->attinmeta;
+ ret_tupdesc = attinmeta->tupdesc;
+
+ /* are there any records inside the tsvector left? */
+ if (call_cntr < max_calls && call_cntr < fctx->n_tsvt) /* do when there is more left to send */
+ {
+ HeapTuple tuple;
+
+ result[0] = DirectFunctionCall1(textin, CStringGetDatum(fctx->tsvt[call_cntr].txt));
+ result[1] = Int32GetDatum(fctx->tsvt[call_cntr].pos);
+
+ tuple = heap_form_tuple(ret_tupdesc, result, isnull);
+
+ /* send the result */
+ SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(tuple));
+ }
+ else
+ {
+ /* do when there is no more left */
+ SRF_RETURN_DONE(funcctx);
+ }
+ }
+
diff -dcrpN postgresql-8.4.0.old/contrib/tsvcontent/tsvcontent.h postgresql-8.4.0/contrib/tsvcontent/tsvcontent.h
*** postgresql-8.4.0.old/contrib/tsvcontent/tsvcontent.h 1970-01-01 01:00:00.000000000 +0100
--- postgresql-8.4.0/contrib/tsvcontent/tsvcontent.h 2009-06-29 11:18:13.000000000 +0200
***************
*** 0 ****
--- 1,13 ----
+ typedef struct
+ {
+ char *txt;
+ int pos;
+ } tsvec_tuple;
+
+ typedef struct
+ {
+ int n_tsvt;
+ tsvec_tuple *tsvt;
+ } ts_to_txt_fctx;
+
+ extern Datum tsvcontent(PG_FUNCTION_ARGS);
diff -dcrpN postgresql-8.4.0.old/contrib/tsvcontent/tsvcontent.sql.in postgresql-8.4.0/contrib/tsvcontent/tsvcontent.sql.in
*** postgresql-8.4.0.old/contrib/tsvcontent/tsvcontent.sql.in 1970-01-01 01:00:00.000000000 +0100
--- postgresql-8.4.0/contrib/tsvcontent/tsvcontent.sql.in 2009-06-29 11:19:04.000000000 +0200
***************
*** 0 ****
--- 1,6 ----
+ CREATE TYPE tsvcontent AS (lex text, rank integer);
+
+ -- List words in "tsvector format" and their occurences found in a tsvector.
+ CREATE OR REPLACE FUNCTION tsvcontent(vec tsvector) RETURNS SETOF tsvcontent
+ AS '$libdir/tsvcontent', 'tsvcontent'
+ LANGUAGE C STRICT;


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: tsvector extraction patch

Peter Eisentraut-2
In reply to this post by "Hans-Jürgen Schönig (PostgreSQL)"
On Friday 03 July 2009 10:49:41 Hans-Juergen Schoenig -- PostgreSQL wrote:

> hello,
>
> this patch has not made it through yesterday, so i am trying to send it
> again.
> i made a small patch which i found useful for my personal tasks.
> it would be nice to see this in 8.5. if not core then maybe contrib.
> it transforms a tsvector to table format which is really nice for text
> processing and comparison.
>
> test=# SELECT * FROM tsvcontent(to_tsvector('english', 'i am pretty sure
> this is a good patch'));
>  lex   | rank
> --------+------
> good   |    8
> patch  |    9
> pretti |    3
> sure   |    4
> (4 rows)

Sounds useful.  But in the interest of orthogonality (or whatever), how about
instead you write a cast from tsvector to text[], and then you can use
unnest() to convert that to a table, e.g.,

SELECT * FROM unnest(CAST(to_tsvector('...') AS text[]));


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: tsvector extraction patch

Mike Rylander
In reply to this post by "Hans-Jürgen Schönig (PostgreSQL)"
On Fri, Jul 3, 2009 at 3:49 AM, Hans-Juergen Schoenig --
PostgreSQL<[hidden email]> wrote:

> hello,
>
> this patch has not made it through yesterday, so i am trying to send it
> again.
> i made a small patch which i found useful for my personal tasks.
> it would be nice to see this in 8.5. if not core then maybe contrib.
> it transforms a tsvector to table format which is really nice for text
> processing and comparison.
>
> test=# SELECT * FROM tsvcontent(to_tsvector('english', 'i am pretty sure
> this is a good patch'));
> lex   | rank
> --------+------
> good   |    8
> patch  |    9
> pretti |    3
> sure   |    4
> (4 rows)
>

This looks very useful!  I wonder if providing a "weight" column would
be relatively simple?  I think this would present problems with the
cast-to-text[] idea that Peter suggests, though.

--
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  [hidden email]
 | web:  http://www.esilibrary.com

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: tsvector extraction patch

Alvaro Herrera-7
Mike Rylander escribió:
> On Fri, Jul 3, 2009 at 3:49 AM, Hans-Juergen Schoenig --
> PostgreSQL<[hidden email]> wrote:

> > test=# SELECT * FROM tsvcontent(to_tsvector('english', 'i am pretty sure
> > this is a good patch'));
> > lex   | rank
> > --------+------
> > good   |    8
> > patch  |    9
> > pretti |    3
> > sure   |    4
> > (4 rows)
> >
>
> This looks very useful!  I wonder if providing a "weight" column would
> be relatively simple?  I think this would present problems with the
> cast-to-text[] idea that Peter suggests, though.

Where would the weight come from?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Fwd: tsvector extraction patch

Mike Rylander
Sorry, forgot to reply-all.


---------- Forwarded message ----------
From: Mike Rylander <[hidden email]>
Date: Wed, Jul 8, 2009 at 4:17 PM
Subject: Re: [HACKERS] tsvector extraction patch
To: Alvaro Herrera <[hidden email]>


On Wed, Jul 8, 2009 at 3:38 PM, Alvaro
Herrera<[hidden email]> wrote:

> Mike Rylander escribió:
>> On Fri, Jul 3, 2009 at 3:49 AM, Hans-Juergen Schoenig --
>> PostgreSQL<[hidden email]> wrote:
>
>> > test=# SELECT * FROM tsvcontent(to_tsvector('english', 'i am pretty sure
>> > this is a good patch'));
>> > lex   | rank
>> > --------+------
>> > good   |    8
>> > patch  |    9
>> > pretti |    3
>> > sure   |    4
>> > (4 rows)
>> >
>>
>> This looks very useful!  I wonder if providing a "weight" column would
>> be relatively simple?  I think this would present problems with the
>> cast-to-text[] idea that Peter suggests, though.
>
> Where would the weight come from?
>

From a tsvector column that has weights set via setweight().

--
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  [hidden email]
 | web:  http://www.esilibrary.com



--
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  [hidden email]
 | web:  http://www.esilibrary.com

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: tsvector extraction patch

Robert Haas
In reply to this post by "Hans-Jürgen Schönig (PostgreSQL)"
On Fri, Jul 3, 2009 at 3:01 AM, Hans-Juergen Schoenig -- PostgreSQL
<[hidden email]> wrote:

> Hans-Juergen Schoenig -- PostgreSQL wrote:
>>
>> hello,
>>
>> this patch has not made it through yesterday, so i am trying to send it
>> again.
>> i made a small patch which i found useful for my personal tasks.
>> it would be nice to see this in 8.5. if not core then maybe contrib.
>> it transforms a tsvector to table format which is really nice for text
>> processing and comparison.
>>
>> test=# SELECT * FROM tsvcontent(to_tsvector('english', 'i am pretty sure
>> this is a good patch'));
>> lex   | rank
>> --------+------
>> good   |    8
>> patch  |    9
>> pretti |    3
>> sure   |    4
>> (4 rows)
>>
>>  many thanks,
>>
>>     hans

Hmm, looks like we never did anything about this.  Hans-Juergen, you
should probably update this and add it to the open CommitFest if you
want it to be considered for 8.5.

https://commitfest.postgresql.org/action/commitfest_view/open

...Robert

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers