kind of a bag of attributes in a DB . . .

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

kind of a bag of attributes in a DB . . .

Albretch Mueller-3
Say, you get lots of data and their corresponding metadata, which in
some cases may be undefined or undeclared (left as an empty string).
Think of youtube json files or the result of the "file" command.

I need to be able to "instantly" search that metadata and I think DBs
are best for such jobs and get some metrics out of it.

I know this is not exactly a kosher way to deal with data which can't
be represented in a nice tabular form, but I don't find the idea that
half way off either.

What is the pattern, anti-pattern or whatever relating to such design?

Do you know of such implementations with such data?

lbrtchx


Reply | Threaded
Open this post in threaded view
|

Re: kind of a bag of attributes in a DB . . .

Adrian Klaver-4
On 9/7/19 5:45 AM, Albretch Mueller wrote:
> Say, you get lots of data and their corresponding metadata, which in
> some cases may be undefined or undeclared (left as an empty string).
> Think of youtube json files or the result of the "file" command.
>
> I need to be able to "instantly" search that metadata and I think DBs
> are best for such jobs and get some metrics out of it.

Is the metadata uniform or are you dealing with a variety of different data?


>
> I know this is not exactly a kosher way to deal with data which can't
> be represented in a nice tabular form, but I don't find the idea that
> half way off either.
>
> What is the pattern, anti-pattern or whatever relating to such design?
>
> Do you know of such implementations with such data?
>
> lbrtchx
>
>
>


--
Adrian Klaver
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: kind of a bag of attributes in a DB . . .

Chris Travers-5
In reply to this post by Albretch Mueller-3


On Sat, Sep 7, 2019 at 5:17 PM Albretch Mueller <[hidden email]> wrote:
Say, you get lots of data and their corresponding metadata, which in
some cases may be undefined or undeclared (left as an empty string).
Think of youtube json files or the result of the "file" command.

I need to be able to "instantly" search that metadata and I think DBs
are best for such jobs and get some metrics out of it.

I know this is not exactly a kosher way to deal with data which can't
be represented in a nice tabular form, but I don't find the idea that
half way off either.

What is the pattern, anti-pattern or whatever relating to such design?

Do you know of such implementations with such data?

We do the debug logs of JSONB with some indexing.    It works in some limited cases but you need to have a good sense of index possibilities and how the indexes actually work.


lbrtchx




--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.
Reply | Threaded
Open this post in threaded view
|

Re: kind of a bag of attributes in a DB . . .

Albretch Mueller-3
In reply to this post by Adrian Klaver-4
On 9/7/19, Adrian Klaver <[hidden email]> wrote:
> Is the metadata uniform or are you dealing with a variety of different
> data?

 You can expect for all files to have a filename and size, but their
kinds (the metadata describing them) can be really colorful and wild
when it comes to formatting.

 lbrtchx


Reply | Threaded
Open this post in threaded view
|

Re: kind of a bag of attributes in a DB . . .

Adrian Klaver-4
On 9/10/19 9:59 AM, Albretch Mueller wrote:
> On 9/7/19, Adrian Klaver <[hidden email]> wrote:
>> Is the metadata uniform or are you dealing with a variety of different
>> data?
>
>   You can expect for all files to have a filename and size, but their
> kinds (the metadata describing them) can be really colorful and wild
> when it comes to formatting.

If there is no rhyme or reason to the metadata I am not sure how you
could come up with an efficient search strategy. Seems it would be a
brute search over everything.

>
>   lbrtchx
>


--
Adrian Klaver
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: kind of a bag of attributes in a DB . . .

Albretch Mueller-3
On 9/10/19, Adrian Klaver <[hidden email]> wrote:
> If there is no rhyme or reason to the metadata I am not sure how you
> could come up with an efficient search strategy. Seems it would be a
> brute search over everything.

 Not exactly. Say some things have colours but now weight. You could
still Group them as being "weighty" and then tell about how heavy they
are, with the colorful ones you could specify the colours and then see
if there is some correlation between weights and colours ...

 lbrtchx


Reply | Threaded
Open this post in threaded view
|

Re: kind of a bag of attributes in a DB . . .

Adrian Klaver-4
On 9/11/19 9:46 AM, Albretch Mueller wrote:
> On 9/10/19, Adrian Klaver <[hidden email]> wrote:
>> If there is no rhyme or reason to the metadata I am not sure how you
>> could come up with an efficient search strategy. Seems it would be a
>> brute search over everything.
>
>   Not exactly. Say some things have colours but now weight. You could
> still Group them as being "weighty" and then tell about how heavy they
> are, with the colorful ones you could specify the colours and then see
> if there is some correlation between weights and colours ...

It would help to see some sample data, otherwise any answer would be
pure speculation.

>
>   lbrtchx
>


--
Adrian Klaver
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: kind of a bag of attributes in a DB . . .

Albretch Mueller-3
 just download a bunch of json info files from youtube data Feeds

 Actually, does postgresql has a json Driver of import feature?

 the metadata contained in json files would require more than one
small databases, but such an import feature should be trivial

 C


Reply | Threaded
Open this post in threaded view
|

Re: kind of a bag of attributes in a DB . . .

Adrian Klaver-4
On 9/14/19 2:06 AM, Albretch Mueller wrote:
>   just download a bunch of json info files from youtube data Feeds
>
>   Actually, does postgresql has a json Driver of import feature?

Not sure what you mean by above?

Postgres has json(b) data types that you can import JSON into:

https://www.postgresql.org/docs/11/datatype-json.html
>
>   the metadata contained in json files would require more than one
> small databases, but such an import feature should be trivial

Again, not sure I understand why small databases are required?

>
>   C
>


--
Adrian Klaver
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: kind of a bag of attributes in a DB . . .

Adrian Klaver-4
In reply to this post by Albretch Mueller-3
On 9/14/19 2:06 AM, Albretch Mueller wrote:
>   just download a bunch of json info files from youtube data Feeds
>
>   Actually, does postgresql has a json Driver of import feature?

I'm working without a net(coffee) and so I forgot to mention that for
Python there is:

http://initd.org/psycopg/docs/extras.html?highlight=json

Not sure if this is what you are looking for or not.

>
>   the metadata contained in json files would require more than one
> small databases, but such an import feature should be trivial
>
>   C
>


--
Adrian Klaver
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: kind of a bag of attributes in a DB . . .

Chris Travers-5
In reply to this post by Albretch Mueller-3


On Sat, Sep 14, 2019 at 5:11 PM Albretch Mueller <[hidden email]> wrote:
 just download a bunch of json info files from youtube data Feeds

 Actually, does postgresql has a json Driver of import feature?

Sort of....  There  are a bunch of features around JSON and JSONB data types which could be useful.

 the metadata contained in json files would require more than one
small databases, but such an import feature should be trivial

It is not at all trivial for a bunch of reasons inherent to the JSON specification.  How to handle duplicate keys, for example.

However writing an import for JSON objects into a particular database is indeed trivial. 

 C




--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.