Curation Cafe's supporting the FAIR-ness of content on WikiPathways

Since last year, we try to organise a socalled Curation Cafe every month, to do some "hard-core" curation on a specific topic, related to the pathways in WikiPathways.
Every database needs curation, with the aim to maintain the value of the data in the database over time, and to allow the data to remain available for reuse and preservation. Since we want our database to adhere to the FAIR principles, we have several ways in which we curate the content of WikiPathways critically. These include (described in more detail here): 
* Quality tags for each pathway * An Academy to show (new) users how pathways can be build for WikiPathways * A dedicated Quality Assurance protocol * Several involved people, volunteering for the Curation team (following the QA protocol every week) * Computer-aided curation with Jenkins
And since Sept. 2017, there is also the Curation Cafe, were we sit together for a few hours (outside the regular office space) and work on a specific topic. Fig. 1 give…

Biomarkers of Diseases

Since November, six students of three different faculties ('Faculty of Psychology and Neuroscience', 'Department of Data Science and Knowledge Engineering' and the 'Faculty of Law') are aiding me in digitising biomarker information related to inherited metabolic diseases. They are working on this project as an extracurricular activity, called Honours+. What we want to do, is the following:

1. Add the biological pathways related to these diseases to WikiPathway (example see this pathway on Neurotransmitter Diseases).
2. Add information on biomarkers to a .ttl file on Github (which we can use to transform our data to an RDF-structured database).
3. Perform queries on digitised biomarker info; perhaps we could do a federated SPARQL query with Wikidata, to find literature? Or find out is similar biomarkers are measured for different diseases? Validate the OMIM-links with proteins/genes we can pull out of BridgeDb? etc.

Ideas and thoughts are welcome! And the progre…

Nice to be noticed!

A few weeks back, I got an account for PubMed Commons; which can be used to add comments to articles listed in the PubMed database. I used my account to connect pathways created in Pathvisio and available in WikiPathways with the articles from which the pathways originated (for examples see "the mevalonate arm of cholesterol biosynthesis pathway" or a more specific part of this pathway extended with drugs, working as inhibitors for several proteins).  When you click on the reference links below the pathways (under Bibliography in WikiPathways), you will go directly to the PubMed database (where the comments are listed under the abstract of the article), such as the example article used for the drug inhibitors for the cholesterol pathway.

By linking the original articles to their machine readable counterpart (at least for the pathway figures), other researchers which are interested in the article can directly see if the pathway mentioned is available for data analysis. Since …

Significant drug interactions with narcotics

Just a small blog before the weekend starts... Since I guided a student from the Forensic Science Master (University of Amsterdam) during her literature thesis on Volatile Organic Compounds from narcotics in breath (presenting her findings on Dec. 13th of 2017, 13:45-14.30, Location A1.06), I wanted to investigate how much information I can find on narcotics in Wikidata. However, there was no label to investigate which compounds are listed under the Opium law (Dutch law on prohibited substance, which you are allowed to use, but not to sell/traffic etc.). So, I added this label for the compounds from List 1 and 2 (there is a difference in harm, however all compounds on these lists are considered to be narcotics), where there was a International Non-proprietary Name (INN) listed (Note: I hope to get to the compounds which do not have an INN yet somewhere in the near future). So, after I did all this manual work (261 unique chemical compounds), I wanted to see what I could do with all th…

Physical interactions of compounds

New day, new blog. Since I like the visualisation of the bubble chart from Wikidata, I tried to see what else I could do with it. This time, I am looking at all chemical compounds, which physically interact with another compound (which was an idea that I got after reading an interesting paper on Key characteristics of Carcinogenics). When I do this for all chemical compounds in Wikidata, I get the following visualisation (click on the link for a direct visualisation [query 1]):
Looks like a big hairball of information to me; so I narrowed down the search results to metabolites which are present in Wikipathways (click on the link for a direct visualisation [query 2]):

So, I think that we are doing all right in WikiPathways concerning metabolites which physically interact with another one; however it would be nice if I could include how these compounds interact with each other (agonist/antagonist) and group according to these roles. I will try that later hopefully.
If you want to try th…

More details on amino acids from Wikidata

From the previous blog it was apparent that Wikidata could use some additional information concerning amino acids. I added some content, for example which triplet codes for which amino acid (the active L-forms only) and which amino acids are considered to be essential (and have to be taking in via diet). There are also amino acids which are considered to be dispensable in the human body, and are therefore synthesised in the body itself. I wondered in which pathways I could find these non-essential amino acids and did a query on it in Wikidata:

 ?ID wdt:P279 wd:Q8066 .
 ?ID wdt:P279 wd:Q44266770 . 
 ?PWID wdt:P31 wd:Q4915012 . 
 ?PWID wdt:P527 ?ID . 

SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }

This revealed that there are 5 non essential amino acids, however only 4 can be found in multiple pathways from WikiPathways:

ID  IDLabel count


Yesterday I have been working on annotating a data set, which contained lots of amino acids. For this annotation I made use of the Wikidata database. The fun thing with this database, is that it is very structured. You can do for example SPARQL queries on it (and even though I wasn't familiar with these before I started my PhD, I rather enjoy them now). Below is an example query, which gives all proteinogenic coding L-amino acids (so the active forms of the amino acids, which are being build into proteins through transcription).

   ?ID wdt:P279 wd:Q8066 .
   ?ID wdt:P279 wd:Q24301658 .
   ?ID wdt:P279 wd:Q3241589 .
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }

This gave me 19 results (in 244 ms), and to my surprise there was a mistake in the results: D-isoleucine was labeled as an L-amino acid. So I went in Wikidata again and fixed this issue (which was quit easy since the result from the query cont…