Ideas for building a system that parses medical research publications/articles

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Ideas for building a system that parses medical research publications/articles

Achilleas Mantzios
Hello

I am imagining a system that can parse papers from various sources
(web/files/etc) and in various formats (text, pdf, etc) and can store
metadata for this paper ,some kind of global ID if applicable, authors,
areas of research, whether the paper is "new", "highlighted",
"historical", type (e.g. Case reports, Clinical trials), symptoms (e.g.
tics, GI pain, psychological changes, anxiety, ), and other key
attributes (I guess dynamic), it must be full text searchable, etc.

I am at the very beginning in this and it is done on a fully volunteer
basis.

Lots of questions : is there any scientific/scholar analysis software
already available? If yes and is really good and open source , then this
will influence the rest of decisions. Otherwise , I'll have to form a
team that can write one, in this case I'll have to decide DB, language,
etc. I work 20 years with pgsql so it is the natural choice for any kind
of data, I just ask this for the sake of completeness.

All ideas welcome.



Reply | Threaded
Open this post in threaded view
|

Re: Ideas for building a system that parses medical research publications/articles

Laura Smith

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, 5 June 2021 10:49, Achilleas Mantzios <[hidden email]> wrote:

> Hello
>
> I am imagining a system that can parse papers from various sources
> (web/files/etc) and in various formats (text, pdf, etc) and can store
> metadata for this paper ,some kind of global ID if applicable, authors,
> areas of research, whether the paper is "new", "highlighted",
> "historical", type (e.g. Case reports, Clinical trials), symptoms (e.g.
> tics, GI pain, psychological changes, anxiety, ), and other key
> attributes (I guess dynamic), it must be full text searchable, etc.
>
> I am at the very beginning in this and it is done on a fully volunteer
> basis.
>
> Lots of questions : is there any scientific/scholar analysis software
> already available? If yes and is really good and open source , then this
> will influence the rest of decisions. Otherwise , I'll have to form a
> team that can write one, in this case I'll have to decide DB, language,
> etc. I work 20 years with pgsql so it is the natural choice for any kind
> of data, I just ask this for the sake of completeness.
>
> All ideas welcome.

Hello Achilleas

Not wishing to be discouraging, but you have very ambitious goals for what sounds like a one-person project ?

You are effectively looking at competing with platforms such as Elsevier Scopus/Scival which are market-leaders in the area for good reason (i.e. it takes a lot of manpower to write algorithms, manage metadata etc., and the only way to consistently maintain that manpower is to employ people, lots of them).   There are also things like Google Scholar around the place.

I think before starting on the technical side of Postgres etc., the honest truth is that you need to do more planning, both in terms of implementation and long-term sustainability.

For example, before we even get to metadata, you talk of various sources and formats.  Have you considered licensing issues ?  Have you considered how to keep the dataset clean ? (If you are thinking you can just scrape the web, then you'll be in for a surprise).

Laura


Reply | Threaded
Open this post in threaded view
|

Re: Ideas for building a system that parses medical research publications/articles

Achilleas Mantzios
Στις 5/6/21 1:52 μ.μ., ο/η Laura Smith έγραψε:

> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Saturday, 5 June 2021 10:49, Achilleas Mantzios <[hidden email]> wrote:
>
>> Hello
>>
>> I am imagining a system that can parse papers from various sources
>> (web/files/etc) and in various formats (text, pdf, etc) and can store
>> metadata for this paper ,some kind of global ID if applicable, authors,
>> areas of research, whether the paper is "new", "highlighted",
>> "historical", type (e.g. Case reports, Clinical trials), symptoms (e.g.
>> tics, GI pain, psychological changes, anxiety, ), and other key
>> attributes (I guess dynamic), it must be full text searchable, etc.
>>
>> I am at the very beginning in this and it is done on a fully volunteer
>> basis.
>>
>> Lots of questions : is there any scientific/scholar analysis software
>> already available? If yes and is really good and open source , then this
>> will influence the rest of decisions. Otherwise , I'll have to form a
>> team that can write one, in this case I'll have to decide DB, language,
>> etc. I work 20 years with pgsql so it is the natural choice for any kind
>> of data, I just ask this for the sake of completeness.
>>
>> All ideas welcome.
> Hello Achilleas
>
> Not wishing to be discouraging, but you have very ambitious goals for what sounds like a one-person project ?
>
> You are effectively looking at competing with platforms such as Elsevier Scopus/Scival which are market-leaders in the area for good reason (i.e. it takes a lot of manpower to write algorithms, manage metadata etc., and the only way to consistently maintain that manpower is to employ people, lots of them).   There are also things like Google Scholar around the place.
>
> I think before starting on the technical side of Postgres etc., the honest truth is that you need to do more planning, both in terms of implementation and long-term sustainability.
>
> For example, before we even get to metadata, you talk of various sources and formats.  Have you considered licensing issues ?  Have you considered how to keep the dataset clean ? (If you are thinking you can just scrape the web, then you'll be in for a surprise).

All I got is some very vague descriptions coming from either ppl from
the advocacy side or the medical side.

I got no idea on the legal status of those documents, as you know some
are covered by the artistic license (a few in PubMed) some not,

I am not a lawyer. The data are not to be stored locally AFAIK, so only
metadata will be kept locally and can be reset, refreshed, amended, etc

Parsing will be equivalent to a one-off human reading the article on the
web. There is a lawyer handling all those. From the whole network of ppl
interested in this whole endeavor,  I am the only guy with DB/software
knowledge, hence why I volunteered.

I know its a huge work, but you are missing a point. Nobody wishes to
compete with anyone. This is a about a project, a parent-advocacy
non-profit that *ONLY* aims to save the sick children (or maybe also
very young adults) of a certain spectrum . So the goal is to make the
right tools for researchers, clinicians and parents. This market is too
small to even consider making any money out of it, but the research is
still very expensive and the progress slower than optimum.

> Laura


Reply | Threaded
Open this post in threaded view
|

Aw: Ideas for building a system that parses medical research publications/articles

Karsten Hilbert
In reply to this post by Achilleas Mantzios
> I am imagining a system that can parse papers from various sources
> (web/files/etc) and in various formats (text, pdf, etc) and can store
> metadata for this paper ,some kind of global ID if applicable, authors,
> areas of research, whether the paper is "new", "highlighted",
> "historical", type

Those three categories won't help much. I'm sure though you had
something specific in mind with them ?

Karsten


Reply | Threaded
Open this post in threaded view
|

Re: Ideas for building a system that parses medical research publications/articles

Laura Smith
In reply to this post by Achilleas Mantzios



Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, 5 June 2021 12:14, Achilleas Mantzios <[hidden email]> wrote:


>
> I know its a huge work, but you are missing a point. Nobody wishes to
> compete with anyone. This is a about a project, a parent-advocacy
> non-profit that ONLY aims to save the sick children (or maybe also
> very young adults) of a certain spectrum . So the goal is to make the
> right tools for researchers, clinicians and parents. This market is too
> small to even consider making any money out of it, but the research is
> still very expensive and the progress slower than optimum.


Unfortunately I'm not "missing a point", your final paragraph summarises your position.

You have been taken in by the very charitable goal of saving sick children.

Unfortunately your head has been disconnected from your heart.

If we put the charitable purpose to one side and take a purely objective view at what you want to do, my original statement still stands, i.e. the certainty that you are grossly underestimating the technical and practical complexities of what you want to achieve.


Reply | Threaded
Open this post in threaded view
|

Re: Ideas for building a system that parses medical research publications/articles

Vijaykumar Jain-2
To get started with collecting doc metadata. It looks this tool can help you started.
postgres does support fuzzy text search, so I do think dumping meta data /abstract in postgresql and then using trigram tsearch etc like extensions it should work well for a POC.
this being a pg mailing list :) what would be your expectation of type of data and growth of data would be your queries.
If you store data to support multiple lingual papers, will postgresql be able to handle ?
Ideally the docs would be stored somewhere on a object storage etc and the link of the same would be stored in the db when someone would request to read the whole paper.
Long before I read this
So if this could work, your POC should too :) with postgresql.


On Sat, 5 Jun 2021 at 5:14 PM Laura Smith <[hidden email]> wrote:



Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, 5 June 2021 12:14, Achilleas Mantzios <[hidden email]> wrote:


>
> I know its a huge work, but you are missing a point. Nobody wishes to
> compete with anyone. This is a about a project, a parent-advocacy
> non-profit that ONLY aims to save the sick children (or maybe also
> very young adults) of a certain spectrum . So the goal is to make the
> right tools for researchers, clinicians and parents. This market is too
> small to even consider making any money out of it, but the research is
> still very expensive and the progress slower than optimum.


Unfortunately I'm not "missing a point", your final paragraph summarises your position.

You have been taken in by the very charitable goal of saving sick children.

Unfortunately your head has been disconnected from your heart.

If we put the charitable purpose to one side and take a purely objective view at what you want to do, my original statement still stands, i.e. the certainty that you are grossly underestimating the technical and practical complexities of what you want to achieve.


--
Thanks,
Vijay
Mumbai, India
Reply | Threaded
Open this post in threaded view
|

Re: Ideas for building a system that parses medical research publications/articles

Adrian Klaver-4
In reply to this post by Achilleas Mantzios
On 6/5/21 2:49 AM, Achilleas Mantzios wrote:

> Hello
>
> I am imagining a system that can parse papers from various sources
> (web/files/etc) and in various formats (text, pdf, etc) and can store
> metadata for this paper ,some kind of global ID if applicable, authors,
> areas of research, whether the paper is "new", "highlighted",
> "historical", type (e.g. Case reports, Clinical trials), symptoms (e.g.
> tics, GI pain, psychological changes, anxiety, ), and other key
> attributes (I guess dynamic), it must be full text searchable, etc.
>
> I am at the very beginning in this and it is done on a fully volunteer
> basis.
>
> Lots of questions : is there any scientific/scholar analysis software
> already available? If yes and is really good and open source , then this
> will influence the rest of decisions. Otherwise , I'll have to form a
> team that can write one, in this case I'll have to decide DB, language,
> etc. I work 20 years with pgsql so it is the natural choice for any kind
> of data, I just ask this for the sake of completeness.
>
> All ideas welcome.

A quick search found this:

https://solutionsreview.com/data-management/the-best-open-source-data-catalog-tools-to-consider/

Might be a good starting point on what is already out there.

There is also this:

The Directory of Open Access Journals
https://doaj.org/

It seems to be a service, not downloadable software.


>
>
>


--
Adrian Klaver
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Ideas for building a system that parses medical research publications/articles

Achilleas Mantzios

Στις 5/6/21 6:34 μ.μ., ο/η Adrian Klaver έγραψε:

> On 6/5/21 2:49 AM, Achilleas Mantzios wrote:
>> Hello
>>
>> I am imagining a system that can parse papers from various sources
>> (web/files/etc) and in various formats (text, pdf, etc) and can store
>> metadata for this paper ,some kind of global ID if applicable,
>> authors, areas of research, whether the paper is "new",
>> "highlighted", "historical", type (e.g. Case reports, Clinical
>> trials), symptoms (e.g. tics, GI pain, psychological changes,
>> anxiety, ), and other key attributes (I guess dynamic), it must be
>> full text searchable, etc.
>>
>> I am at the very beginning in this and it is done on a fully
>> volunteer basis.
>>
>> Lots of questions : is there any scientific/scholar analysis software
>> already available? If yes and is really good and open source , then
>> this will influence the rest of decisions. Otherwise , I'll have to
>> form a team that can write one, in this case I'll have to decide DB,
>> language, etc. I work 20 years with pgsql so it is the natural choice
>> for any kind of data, I just ask this for the sake of completeness.
>>
>> All ideas welcome.
>
> A quick search found this:
>
> https://solutionsreview.com/data-management/the-best-open-source-data-catalog-tools-to-consider/ 
>
>
> Might be a good starting point on what is already out there.

This is interesting, so the keywords are "Data Catalog" ?

>
> There is also this:
>
> The Directory of Open Access Journals
> https://doaj.org/
>
This seems very very poor. Just try a search there and then repeat in
PMC (PubMed Central).
> It seems to be a service, not downloadable software.
>
>
>>
>>
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Ideas for building a system that parses medical research publications/articles

Achilleas Mantzios
In reply to this post by Vijaykumar Jain-2


Στις 5/6/21 4:45 μ.μ., ο/η Vijaykumar Jain έγραψε:

I checked, it behaves better with downloaded PDF rather than URL PDFs, in the 2nd case the metadata are poor.

Does not work with nih articles (but this is general problem not tika's )

To get started with collecting doc metadata. It looks this tool can help you started.
postgres does support fuzzy text search, so I do think dumping meta data /abstract in postgresql and then using trigram tsearch etc like extensions it should work well for a POC.
this being a pg mailing list :) what would be your expectation of type of data and growth of data would be your queries.
If you store data to support multiple lingual papers, will postgresql be able to handle ?
Ideally the docs would be stored somewhere on a object storage etc and the link of the same would be stored in the db when someone would request to read the whole paper.
Long before I read this
So if this could work, your POC should too :) with postgresql.


On Sat, 5 Jun 2021 at 5:14 PM Laura Smith <[hidden email]> wrote:



Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, 5 June 2021 12:14, Achilleas Mantzios <[hidden email]> wrote:


>
> I know its a huge work, but you are missing a point. Nobody wishes to
> compete with anyone. This is a about a project, a parent-advocacy
> non-profit that ONLY aims to save the sick children (or maybe also
> very young adults) of a certain spectrum . So the goal is to make the
> right tools for researchers, clinicians and parents. This market is too
> small to even consider making any money out of it, but the research is
> still very expensive and the progress slower than optimum.


Unfortunately I'm not "missing a point", your final paragraph summarises your position.

You have been taken in by the very charitable goal of saving sick children.

Unfortunately your head has been disconnected from your heart.

If we put the charitable purpose to one side and take a purely objective view at what you want to do, my original statement still stands, i.e. the certainty that you are grossly underestimating the technical and practical complexities of what you want to achieve.


--
Thanks,
Vijay
Mumbai, India
Reply | Threaded
Open this post in threaded view
|

Re: Ideas for building a system that parses medical research publications/articles

Adrian Klaver-4
In reply to this post by Achilleas Mantzios
On 6/5/21 9:56 AM, Achilleas Mantzios wrote:

>
> Στις 5/6/21 6:34 μ.μ., ο/η Adrian Klaver έγραψε:
>> On 6/5/21 2:49 AM, Achilleas Mantzios wrote:
>>> Hello
>>>
>>> I am imagining a system that can parse papers from various sources
>>> (web/files/etc) and in various formats (text, pdf, etc) and can store
>>> metadata for this paper ,some kind of global ID if applicable,
>>> authors, areas of research, whether the paper is "new",
>>> "highlighted", "historical", type (e.g. Case reports, Clinical
>>> trials), symptoms (e.g. tics, GI pain, psychological changes,
>>> anxiety, ), and other key attributes (I guess dynamic), it must be
>>> full text searchable, etc.
>>>
>>> I am at the very beginning in this and it is done on a fully
>>> volunteer basis.
>>>
>>> Lots of questions : is there any scientific/scholar analysis software
>>> already available? If yes and is really good and open source , then
>>> this will influence the rest of decisions. Otherwise , I'll have to
>>> form a team that can write one, in this case I'll have to decide DB,
>>> language, etc. I work 20 years with pgsql so it is the natural choice
>>> for any kind of data, I just ask this for the sake of completeness.
>>>
>>> All ideas welcome.
>>
>> A quick search found this:
>>
>> https://solutionsreview.com/data-management/the-best-open-source-data-catalog-tools-to-consider/ 
>>
>>
>> Might be a good starting point on what is already out there.
>
> This is interesting, so the keywords are "Data Catalog" ?

What I searched on was 'open source article catalog'.

>
>>
>> There is also this:
>>
>> The Directory of Open Access Journals
>> https://doaj.org/
>>
> This seems very very poor. Just try a search there and then repeat in
> PMC (PubMed Central).

This is down to copyright issues I'm sure. For PubMed Central see:

https://www.ncbi.nlm.nih.gov/pmc/about/copyright/

for the if/ands/buts that restrict what you can do with the information
and stay legal.

>> It seems to be a service, not downloadable software.
>>
>>
>>>
>>>
>>>
>>
>>


--
Adrian Klaver
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Ideas for building a system that parses medical research publications/articles

Achilleas Mantzios

Στις 5/6/21 8:03 μ.μ., ο/η Adrian Klaver έγραψε:

> On 6/5/21 9:56 AM, Achilleas Mantzios wrote:
>>
>> Στις 5/6/21 6:34 μ.μ., ο/η Adrian Klaver έγραψε:
>>> On 6/5/21 2:49 AM, Achilleas Mantzios wrote:
>>>> Hello
>>>>
>>>> I am imagining a system that can parse papers from various sources
>>>> (web/files/etc) and in various formats (text, pdf, etc) and can
>>>> store metadata for this paper ,some kind of global ID if
>>>> applicable, authors, areas of research, whether the paper is "new",
>>>> "highlighted", "historical", type (e.g. Case reports, Clinical
>>>> trials), symptoms (e.g. tics, GI pain, psychological changes,
>>>> anxiety, ), and other key attributes (I guess dynamic), it must be
>>>> full text searchable, etc.
>>>>
>>>> I am at the very beginning in this and it is done on a fully
>>>> volunteer basis.
>>>>
>>>> Lots of questions : is there any scientific/scholar analysis
>>>> software already available? If yes and is really good and open
>>>> source , then this will influence the rest of decisions. Otherwise
>>>> , I'll have to form a team that can write one, in this case I'll
>>>> have to decide DB, language, etc. I work 20 years with pgsql so it
>>>> is the natural choice for any kind of data, I just ask this for the
>>>> sake of completeness.
>>>>
>>>> All ideas welcome.
>>>
>>> A quick search found this:
>>>
>>> https://solutionsreview.com/data-management/the-best-open-source-data-catalog-tools-to-consider/ 
>>>
>>>
>>> Might be a good starting point on what is already out there.
>>
>> This is interesting, so the keywords are "Data Catalog" ?
>
> What I searched on was 'open source article catalog'.
>
>>
>>>
>>> There is also this:
>>>
>>> The Directory of Open Access Journals
>>> https://doaj.org/
>>>
>> This seems very very poor. Just try a search there and then repeat in
>> PMC (PubMed Central).
>
> This is down to copyright issues I'm sure. For PubMed Central see:
>
> https://www.ncbi.nlm.nih.gov/pmc/about/copyright/
>
> for the if/ands/buts that restrict what you can do with the
> information and stay legal.

maybe but still :

https://www.ncbi.nlm.nih.gov/pmc/?term=open+access%5Bfilter%5D+PANDAS+IVIG

 >

https://doaj.org/search/articles?ref=homepage-box&source=%7B%22query%22%3A%7B%22query_string%22%3A%7B%22query%22%3A%22IVIG%20PANDAS%22%2C%22default_operator%22%3A%22AND%22%7D%7D%7D

>
>>> It seems to be a service, not downloadable software.
>>>
>>>
>>>>
>>>>
>>>>
>>>
>>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Ideas for building a system that parses medical research publications/articles

Adrian Klaver-4
On 6/5/21 10:39 AM, Achilleas Mantzios wrote:

>
> Στις 5/6/21 8:03 μ.μ., ο/η Adrian Klaver έγραψε:
>> On 6/5/21 9:56 AM, Achilleas Mantzios wrote:
>>>
>>> Στις 5/6/21 6:34 μ.μ., ο/η Adrian Klaver έγραψε:
>>>> On 6/5/21 2:49 AM, Achilleas Mantzios wrote:
>>>>> Hello
>>>>>
>>>>> I am imagining a system that can parse papers from various sources
>>>>> (web/files/etc) and in various formats (text, pdf, etc) and can
>>>>> store metadata for this paper ,some kind of global ID if
>>>>> applicable, authors, areas of research, whether the paper is "new",
>>>>> "highlighted", "historical", type (e.g. Case reports, Clinical
>>>>> trials), symptoms (e.g. tics, GI pain, psychological changes,
>>>>> anxiety, ), and other key attributes (I guess dynamic), it must be
>>>>> full text searchable, etc.
>>>>>
>>>>> I am at the very beginning in this and it is done on a fully
>>>>> volunteer basis.
>>>>>
>>>>> Lots of questions : is there any scientific/scholar analysis
>>>>> software already available? If yes and is really good and open
>>>>> source , then this will influence the rest of decisions. Otherwise
>>>>> , I'll have to form a team that can write one, in this case I'll
>>>>> have to decide DB, language, etc. I work 20 years with pgsql so it
>>>>> is the natural choice for any kind of data, I just ask this for the
>>>>> sake of completeness.
>>>>>
>>>>> All ideas welcome.
>>>>
>>>> A quick search found this:
>>>>
>>>> https://solutionsreview.com/data-management/the-best-open-source-data-catalog-tools-to-consider/ 
>>>>
>>>>
>>>> Might be a good starting point on what is already out there.
>>>
>>> This is interesting, so the keywords are "Data Catalog" ?
>>
>> What I searched on was 'open source article catalog'.
>>
>>>
>>>>
>>>> There is also this:
>>>>
>>>> The Directory of Open Access Journals
>>>> https://doaj.org/
>>>>
>>> This seems very very poor. Just try a search there and then repeat in
>>> PMC (PubMed Central).
>>
>> This is down to copyright issues I'm sure. For PubMed Central see:
>>
>> https://www.ncbi.nlm.nih.gov/pmc/about/copyright/
>>
>> for the if/ands/buts that restrict what you can do with the
>> information and stay legal.
>
> maybe but still :
>
> https://www.ncbi.nlm.nih.gov/pmc/?term=open+access%5Bfilter%5D+PANDAS+IVIG

Yeah it is nice to have the resources of the NIH behind you. Still I
would point out under Copyright and License information:

"This article is made available via the PMC Open Access Subset for
unrestricted research re-use and secondary analysis in any form or by
any means with acknowledgement of the original source. These permissions
are granted for the duration of the World Health Organization (WHO)
declaration of COVID-19 as a global pandemic."

Further on PMC Open Access Subset:

https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/

Again more ifs/ands/buts.

The point being, dealing with articles is a descent into legalese.  I am
not saying this is show stopper, just that it will consume considerable
resources to sort out. I for one applaud your effort and given what I
have seen you do with the shipping software over the years I don't see
this project as out of the realm of possibility.

>
>  >
>
> https://doaj.org/search/articles?ref=homepage-box&source=%7B%22query%22%3A%7B%22query_string%22%3A%7B%22query%22%3A%22IVIG%20PANDAS%22%2C%22default_operator%22%3A%22AND%22%7D%7D%7D 
>
>
>>
>>>> It seems to be a service, not downloadable software.
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>
>>
>
>


--
Adrian Klaver
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Ideas for building a system that parses medical research publications/articles

Achilleas Mantzios

Στις 5/6/21 10:12 μ.μ., ο/η Adrian Klaver έγραψε:

> On 6/5/21 10:39 AM, Achilleas Mantzios wrote:
>>
>> Στις 5/6/21 8:03 μ.μ., ο/η Adrian Klaver έγραψε:
>>> On 6/5/21 9:56 AM, Achilleas Mantzios wrote:
>>>>
>>>> Στις 5/6/21 6:34 μ.μ., ο/η Adrian Klaver έγραψε:
>>>>> On 6/5/21 2:49 AM, Achilleas Mantzios wrote:
>>>>>> Hello
>>>>>>
>>>>>> I am imagining a system that can parse papers from various
>>>>>> sources (web/files/etc) and in various formats (text, pdf, etc)
>>>>>> and can store metadata for this paper ,some kind of global ID if
>>>>>> applicable, authors, areas of research, whether the paper is
>>>>>> "new", "highlighted", "historical", type (e.g. Case reports,
>>>>>> Clinical trials), symptoms (e.g. tics, GI pain, psychological
>>>>>> changes, anxiety, ), and other key attributes (I guess dynamic),
>>>>>> it must be full text searchable, etc.
>>>>>>
>>>>>> I am at the very beginning in this and it is done on a fully
>>>>>> volunteer basis.
>>>>>>
>>>>>> Lots of questions : is there any scientific/scholar analysis
>>>>>> software already available? If yes and is really good and open
>>>>>> source , then this will influence the rest of decisions.
>>>>>> Otherwise , I'll have to form a team that can write one, in this
>>>>>> case I'll have to decide DB, language, etc. I work 20 years with
>>>>>> pgsql so it is the natural choice for any kind of data, I just
>>>>>> ask this for the sake of completeness.
>>>>>>
>>>>>> All ideas welcome.
>>>>>
>>>>> A quick search found this:
>>>>>
>>>>> https://solutionsreview.com/data-management/the-best-open-source-data-catalog-tools-to-consider/ 
>>>>>
>>>>>
>>>>> Might be a good starting point on what is already out there.
>>>>
>>>> This is interesting, so the keywords are "Data Catalog" ?
>>>
>>> What I searched on was 'open source article catalog'.
>>>
>>>>
>>>>>
>>>>> There is also this:
>>>>>
>>>>> The Directory of Open Access Journals
>>>>> https://doaj.org/
>>>>>
>>>> This seems very very poor. Just try a search there and then repeat
>>>> in PMC (PubMed Central).
>>>
>>> This is down to copyright issues I'm sure. For PubMed Central see:
>>>
>>> https://www.ncbi.nlm.nih.gov/pmc/about/copyright/
>>>
>>> for the if/ands/buts that restrict what you can do with the
>>> information and stay legal.
>>
>> maybe but still :
>>
>> https://www.ncbi.nlm.nih.gov/pmc/?term=open+access%5Bfilter%5D+PANDAS+IVIG 
>>
>
> Yeah it is nice to have the resources of the NIH behind you. Still I
> would point out under Copyright and License information:
>
> "This article is made available via the PMC Open Access Subset for
> unrestricted research re-use and secondary analysis in any form or by
> any means with acknowledgement of the original source. These
> permissions are granted for the duration of the World Health
> Organization (WHO) declaration of COVID-19 as a global pandemic."
>
> Further on PMC Open Access Subset:
>
> https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/
>
> Again more ifs/ands/buts.
>
> The point being, dealing with articles is a descent into legalese.  I
> am not saying this is show stopper, just that it will consume
> considerable resources to sort out. I for one applaud your effort and
> given what I have seen you do with the shipping software over the
> years I don't see this project as out of the realm of possibility.
Thank you Adrian, there is no money in this project, but the stakes are
much much higher.

>>
>>  >
>>
>> https://doaj.org/search/articles?ref=homepage-box&source=%7B%22query%22%3A%7B%22query_string%22%3A%7B%22query%22%3A%22IVIG%20PANDAS%22%2C%22default_operator%22%3A%22AND%22%7D%7D%7D 
>>
>>
>>>
>>>>> It seems to be a service, not downloadable software.
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>
>>>
>>
>>
>
>


Reply | Threaded
Open this post in threaded view
|

RE: Ideas for building a system that parses medical research publications/articles [EXT]

Daniel Perrett
In reply to this post by Achilleas Mantzios
I think the key word here that will help you is biocuration and it's an established field involving people with scientific, computational, and linguistic backgrounds who are familiar with the problem space so I would suggest talking to people working in this area first to get an idea of what's feasible, what's already out there, etc., as they will know this better than the Postgres community.

You can see an example of the sort of annotation that is fully automated at the moment here:

https://monarchinitiative.org/tools/text-annotate

Given the potential impact on human health, some level of manual involvement in annotation is frequently part of the workflow.

Daniel

-----Original Message-----
From: Achilleas Mantzios <[hidden email]>
Sent: 05 June 2021 10:49
To: [hidden email]
Subject: Ideas for building a system that parses medical research publications/articles [EXT]

Hello

I am imagining a system that can parse papers from various sources
(web/files/etc) and in various formats (text, pdf, etc) and can store metadata for this paper ,some kind of global ID if applicable, authors, areas of research, whether the paper is "new", "highlighted", "historical", type (e.g. Case reports, Clinical trials), symptoms (e.g.
tics, GI pain, psychological changes, anxiety, ), and other key attributes (I guess dynamic), it must be full text searchable, etc.

I am at the very beginning in this and it is done on a fully volunteer basis.

Lots of questions : is there any scientific/scholar analysis software already available? If yes and is really good and open source , then this will influence the rest of decisions. Otherwise , I'll have to form a team that can write one, in this case I'll have to decide DB, language, etc. I work 20 years with pgsql so it is the natural choice for any kind of data, I just ask this for the sake of completeness.

All ideas welcome.







--
 The Wellcome Sanger Institute is operated by Genome Research
 Limited, a charity registered in England with number 1021457 and a
 company registered in England with number 2742969, whose registered
 office is 215 Euston Road, London, NW1 2BE.