Subscribe: Bio2RDF atlas of post genomic knowledge
http://bio2rdf.blogspot.com/feeds/posts/default
Added By: Feedage Forager Feedage Grade A rated
Language: English
Tags:
bio org  bio  data  drugbank  identifier  linked data  linked  namespace  namespaces  org  queries  search  sparql  uri 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Bio2RDF atlas of post genomic knowledge

Bio2RDF: Linked Data for the Life Sciences



The Bio2RDF project uses open-source Semantic Web technologies to provide interlinked life science data to support biological knowledge discovery. Using both syntactic and semantic data integration techniques, Bio2RDF puts into practice a simple methodolo



Updated: 2017-09-24T18:47:45.535-04:00

 



Happy 10th birthday Bio2RDF and welcome to its 500th citation !

2015-12-22T13:00:42.020-05:00

That is it, 10 years of being http://bio2rdf.org linked data service returning RDF from dereferenceable URIs according to the Linked data principles.

It all started in the November 23 2005

(image)


and here we are today with the 500th citations.



(image)





KaBOB VS Bio2RDF

2015-05-09T15:17:37.613-04:00

KaBOB: ontology-based semantic integration of biomedical databaseshttp://www.biomedcentral.com/1471-2105/16/126/abstractKaBOB recent paper describes how a mashup have been created using 14 ontologies and 18 data sources converted to RDF, all loaded into a triplestore which is not made public. Great work, a mashup well designed based on ontologies and data normalization a quality standard never really put into Bio2RDF's triplestores. Nice work but not available to the bioinformatician community and it is a lot of work to rebuild it from scratch.The first step of my hackhaton project is to rebuil such a mashup from the dame data collection and expose it on the web as linked data, I will use the kabob.bio2rdf.org namespace for it.In the past I would have created a triplestore for it, Virtuoso can easily handle 500 millions triples beast. I will try differently and will use Elasticsearch instead and Kibana as a user interface available at http://melina.bio2rdf.org.KaBOB currently imports the following 14 ontologies:1. Basic Formal Ontology (BFO) [9]2. BRENDA Tissue / Enzyme Source (BTO) [10]3. Chemical Entities of Biological Interest (ChEBI) [11] (54,838 from ONTOBEE)4. Cell Type Ontology (CL) [12]5. Gene Ontology including biological process, molecular function, and cellular component(GO) [7] (42,807 from ONTOBEE)6. Information Artifact Ontology (IAO) [6]7. Protein-Protein Interaction Ontology (MI) [13]8. Mammalian Phenotype Ontology (MP) [14]9. NCBI Taxonomy [15]10. Ontology for Biomedical Investigation (OBI) [16]11. Protein Modification (MOD) [17]12. Protein Ontology (PR) [18]13. Relation Ontology (RO) [19]14. Sequence Ontology (SO) [8]KaBOB currently imports data from the following 18 data sources:1. Database of Interacting Proteins (DIP) [20]2. DrugBank [21] (19,844 from Bio2RDF)3. Genetic Association Database (GAD) [22] ()4. UniProt Gene Ontology Annotation (GOA) [23]5. HUGO Gene Nomenclature Committee (HGNC) [24] (43,407 from Bio2RDF)6. HomoloGene [25] (18,712 from Bio2RDF)7. Human Protein Reference Database (HPRD) [26]8. InterPro [27]  (25,272 from Bio2RDF)9. iRefWeb [28]10. Mouse Genome Informatics (MGI) [29] ()11. miRBase [30]12. NCBI Gene [31] (47,728 from Bio2RDF)13. Online Mendelian Inheritance in Man (OMIM) [32] (14,609 from Bio2RDF)14. PharmGKB [33] ()15. Reactome [34] ()16. Rat Genome Database (RGD) [35]17. Transfac [36]18. UniProt [37] (124,567)In red is the number of document/graph loaded in ES.Data source :OBO : http://www.ontobee.org/sparqlUniprot : http://beta.sparql.uniprot.org/sparqland Bio2RDF corresponding SPARQL endpoints.[...]



Bio2RDF 10th birthday this year, and I am back on the biohacking road

2015-05-09T02:25:53.174-04:00

This weekend is the first biohackathon about BD2K in San Diego:https://github.com/Network-of-BioThings/nob-hq/wiki/1st-BD2K-3rd-Network-of-BioThings-HackathonIt is a good occasion to explore new avenue to expose RDF biological knowledge in the big data era. So let's try Elasticsearch... (https://www.elastic.co/products/elasticsearch)it is free, fast and it scale. This would not be doable without the recent availability of the RDF version format in JSON, the JSON-LD project (http://json-ld.org/).I will use the JSON-LD converter written by Peter Ansell, one of the major contributor to Bio2RDF, (https://github.com/jsonld-java).So let's try to load some of Bio2RDF triples into ElasticSearch ! I have 24 hours to explore this new approach.Here is what we will try to achieve :RDF2ES : Bring KaBOB online as RDF REST services using ElasticSearchDescription.  KaBOB is a semantic integration of 18 different biomedically relevant knowledge sources.  The linked paper describes processes for instantiating it as RDF, but does not provide a functional implementation.  This is likely because of the significant challenges involved in stably hosting a very large SPARQL endpoint.  Perhaps SPARQL isn’t the best way to share this content.  This project is to figure out a way to the useful data integration work done in kaBOB available via a set of web services that are both fast and reliable.  Willing to sacrifice some of the flexibility of a full sparql endpoint to gain a functional app.  Perhaps using Elastic Search.First we will load part of Kabob data source for human into an ElasticSearch cluster. (OMIM, GO, CHEBI, Drugbank, OBO ontologies, Reactome, Uniprot and entrez gene)Second we will build REST services to access it, there will be available for hacking.Third we will explore this data using Kibana tool.Finally, we will illustrate how a Talend workflow consuming RDF data can replace a complex SPARQL query. The querying workflow will be exposed at MyExperiments. input.  Instructions for integrating 18 different biological data sources + code at: https://github.com/UCDenver-ccp/datasource https://github.com/drlivingston/kr https://github.com/drlivingston/kabob I will use bio2rdf version of kabob selected dataset.If someone has access to Kabob RDF data, we could load it into ES triplestore.output. web services that provide useful answers to questions about genes, biological process, and diseases, Those REST services will be created the way Bio2RDF API have been done, they are generated using Talend ESB tool (http://bio2rdf.org/test) and virtuoso triplestore will be replaced by ES storage.We will try to create a type ahead user experience over those dataset, a feature that Bio2RDF have always been missing. (bio2rdf.org)Finally, we will explore the data visualisation potential of the Kabina tool over ElasticSearch data in JSON-LD format.[...]



Bio2RDF: moving forward as a community

2011-09-12T10:46:20.579-04:00

 Last week we held our first virtual meeting towards re-invigorating the Bio2RDF project with a significantly larger and vested community. From discussions, we plan to establish 3 focus groups around :A. policy (information, governance, sustainability, outreach)B. technical (architecture, infrastructure and RDFization)C. social (user experience and social networking)The next step then is for groups to:1. identify and certify discussion leads (responsibilities: set meeting times and agenda, facilitate and encourage discussion among members, draft reports)2. identify additional people to recruit from the wider community that would provide additional expertise (interested, but didn't attend the first? sign up now !)3. extend and prioritize discussion items (what exactly will this group focus its efforts on in the short and long term)4. identify and assign bite-sized tasks (so we can get things done one step at a time :)5. collate results and present to the wider communityI suggest that groups self-organize a first meeting in the next two weeks to deal with items 1-4, and either meet again or use the Google documents to collaboratively report findings.Finally, I'd like for us to hold another meeting with times that are much more accommodating for Europe + North America ;)  Please fill the doodle poll (http://www.doodle.com/fsuz6mgs5cztf2e2)As always, feel free to contact me if you have any questions, and please sign up to the Bio2RDF mailing list for all future discussions.[...]



Biocuration 2010 presentation

2010-10-20T16:05:35.528-04:00

Here is the presentation given at Biocuration 2010 , October 13th

W4 4 marc-alexandre-nolin-v2(object) (embed)
View more presentations from nolmar01.



Bio2RDF return to Japan

2010-10-05T18:15:29.417-04:00

Bio2RDF is returning in Japan again this year. We will give a talk about Bio2RDF at Biocuration 2010 . Biocuration is from October 11th to October 14th at Odaiba, Tokyo.



Video on Bio2RDF and Beyond! Large Scale, Distributed Biological Data Integration

2010-03-15T10:07:53.090-04:00

Invited by former student Alexander De Leon and host Oscar Corcho, Michel Dumontier gave a talk on "Bio2RDF and Beyond! Large Scale, Distributed Biological Data Integration" [video] to the Ontology Engineering group at the Universidad Politécnica de Madrid (UPM).




Bio2RDF Cognoscope presentation at BioHackathon 2010 in Tokyo

2010-02-10T17:10:45.824-05:00

François Belleau from the Bio2RDF project was invited as an early Semantic Web technology adopter to present the Bio2RDF project at the annual BioHackathon 2010 held each year in Tokyo.

Bio2RDF@BH2010(object) (embed)




Registry for original provider HTML pages

2009-11-30T00:52:12.268-05:00

If you weren't aware, the Bio2RDF project offers both RDF, and a service that redirects to either HTML, images, or other non-RDF sources that could be useful.

The HTML redirect service is particularly useful, because one can start at the Bio2RDF page, and follow a link that looks like "http://bio2rdf.org/html/namespace:identifier", to get to the original providers web page.

There are currently 142 namespaces that are registered along with HTML pages. Examples of these links are, the NextBio page for Amyloid Beta precursor protein (http://bio2rdf.org/nextbio:1445), the NCBI Entrez Geneid page for Superoxide dismutase 1 (http://bio2rdf.org/geneid:6647), the Pharmgkb page for Superoxide dismutase 1(http://bio2rdf.org/pharmgkb:PA334), and the HGNC page for Superoxide dismutase 1 (http://bio2rdf.org/hugo:SOD1).

The list below, details the namespace prefixes that are currently registered with Bio2RDF for this service. A full set of details about what services are provided for any particular namespaces are provided at here, and the entire RDF configuration that makes the Bio2RDF system work is available here (RDF/XML)

aceview, agi_locuscode, arrayexpress, asap, aspgd, aspgd_locus, aspgd_ref, bind, biogrid, biomodels, biopatml, biosystems, brenda, cas, cath, ccds, cdd, cgd, cgsc, chebi, chemidplus, cid, citations, cog, cpath, cpd, dbpedia, dbsnp, ddbj, dictybase, dictybase_trials, dip, doi, dr, drugbank_drugs, ec, echobase, eck, ecogene, embl, ensembl, enzyme, flybase, gdb, genbank, genedb_pfalciparum, genedb_spombe, geneid, gi, gl, go, goa_ref, gopubmed, gr, gr_gene, gr_protein, gr_qtl, gr_ref, h-invdb, h_inv, hgnc, homologene, hpa, hpa_antibody, hprd, huge_navigator, hugo, intact, interpro, ipi, iproclass, isbn, issn, keywords, lifedb, linkedct_trials, ma, mesh, metacyc, mgc, mgi, msdchem, myexp_user, myexp_workflow, nar, ncbi, nextbio, nist_chemistry_webbook, nmrshiftdb_molecule, oclc, omim, pamgo_vmd, path, pathguide, pdb, pdbsum, pfam, pharmgkb, phosphosite, po, prints, prodom, prosite, pseudocap, psimod, pubchem, pubmed, reactome, rebase, refseq, rgd, rn, scop, seed, sgd, sgd_locus, sgd_ref, sgn, sgn_ref, sid, sider_drugs, sider_sideeffects, smart, so, srs, swoogle, symbol, tair_arabidopsis, taxon, taxonomy, tc, tgd_locus, tgd_ref, um-bbd, uniparc, uniprot, uniref, unists, wikipathways, wikipedia, xenbase, zfin


If you know of a biological database that has webpages for their items and is not listed here then feel free to comment about it here or email the group at bio2rdf@googlegroups.com



Linking Open Drug Data wins the Triplify challenge

2009-09-14T11:54:46.606-04:00

Congratulations to Kei's group and their Linking Open Drug Data (LODD) project for winning the Triplify challenge.

http://blog.aksw.org/2009/triplification-challenge-2009-winners/

http://triplify.org/files/challenge_2009/LODD.pdf


It is a new contribution to the LOD cloud and they have linked those new datasets to Bio2RDF and DBpedia URIs. That is the right way to do it !

(image)



HOWTO: Using Bio2RDF

2009-08-16T18:44:08.176-04:00

The Bio2RDF URI is formed by taking a datasource and assigning a prefix to it. The prefix is a string which is only allowed to contain letters, numbers, the underscore (_), and the hyphen (-). The unique identifier for each object inside of the namespace, as the primary key for an object, is then included with the namespace prefix to make up the Bio2RDF URI, http://bio2rdf.org/namespaceprefix:identifier. In this example a user wants to find information about Propanolol, and they know there is a Wikipedia article about the topic. Since DBpedia mirrors the Wikipedia structure and represents it using RDF, they could go to http://bio2rdf.org/dbpedia:Propranolol.If the user then wants to find out where the Wikipedia article Propanolol is referenced in other databases, they can go to http://bio2rdf.org/links/dbpedia:Propranolol (may take a long time given the number of databases that are being used). If they know they only need to find out where the article is referenced in DrugBank, they can use http://bio2rdf.org/linksns/drugbank_drugs/dbpedia:Propranolol (should be much quicker because the number of databases is reduced here).There is also search functionality embedded into the Bio2RDF system. Searches can be conducted on particular namespaces, or across the entire Bio2RDF system. If a user wants to conduct a search on namespace "chebi" for instance, and they want to search for "propanolol", they could go to http://bio2rdf.org/searchns/chebi/propanolol. If they then also wish to search for "propanolol" including the other namespaces they can go to http://bio2rdf.org/search/propanolol (this may be slow because of the number of databases that are available for search).If a namespace has been configured with the ability to redirect to its original interface the redirection can be triggered by sending users to http://bio2rdf.org/html/namespace:identifier . For example, a user might be interested in http://bio2rdf.org/drugbank_drugs:DB00571 (the DrugBank identifier for Propanolol), and they want to see the original DrugBank interface. They could then go to http://bio2rdf.org/html/drugbank_drugs:DB00571 and their browser would be redirected to the description of that drug on the original DrugBank interface. Although not all namespaces have their original HTML interfaces encoded into the Bio2RDF system, some do, and it is a useful way of getting back to the non-RDF web.If someone is interested in taking the Bio2RDF RDF versions and using them internally, they can make sure they request either of the supported RDF formats (RDF/XML and N3), but adding /rdfxml/ or /n3/ to the front of any of the URL's they desire. Each of the links given for URI's in this post have been to request the Bio2RDF HTML versions using /page/, but they can equivalently be requested using http://bio2rdf.org/rdfxml/linksns/drugbank_drugs/dbpedia:Propranolol or http://bio2rdf.org/n3/search/propanolol respectively for RDF/XML and N3 for example.There are also advanced features for people wanting to determine the provenance of particular documents, since RDF doesn't natively support provenance for individual statements when multiple sources are merged into single documents, as Bio2RDF does. If the user wishes to know which sources of information were used in a particular document they can insert /queryplan/ at the start of the URI in order to get its provenance information http://bio2rdf.org/queryplan/linksns/drugbank_drugs/dbpedia:Propranolol. This information is returned as a set of objects, including Query Types, Providers and Namespaces, among other things. This information can then be used to recreate the exact set of queries, both SPARQL and otherwise, that were used to access the information, as long as the user has access to all of the provider endpoints in the query plan. In order to [...]



The story so far of Linked Data, Bio2RDF is part of it !

2009-07-20T11:24:12.284-04:00

In the latest publication of Tim Berner-Lee, he tells the recent story of emerging Linked Data, Bio2RDF is mentioned as an important Biology contributor. This paper is a must for anyone interested in this fantastic new approach.

http://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf

In this map of Linked Data, Bio2RDF contribution is shown in purple. The corresponding SPARQL endpoints are available here :

http://delicious.com/tag/bio2rdf:sparql


(image)



Bio2RDF is now using Virtuoso 6 and its new facet browser

2009-07-01T22:11:56.631-04:00

Bio2RDF is moving from Virtuoso 5 to Virtuoso 6 server. The new software support facet browsing in real time.

We invite you to explore our graph with a full text search query for hexokinase. Once the results list is shown try the options in the right menu. Enjoy the discovery experience.

Try the 2009 version of "Atlas about Human and Mouse" :

http://atlas.bio2rdf.org/fct/

the graph can also be queried in sparql :

http://atlas.bio2rdf.org/sparql


The list of the Bio2RDF converted graph will be published and updated here :

The facet browsers list :
http://delicious.com/tag/bio2rdf:fct
The sparql endpoints list :
http://delicious.com/tag/bio2rdf:sparql



Bio2RDF visit at HCLS annual meeting

2009-07-01T21:52:44.583-04:00

Bio2RDF team members Marc-Alexande Nolin, Michel Dumontier and Francois Belleau, have been invited to present actual state of the Bio2RDF project at the annual face to face meeting of the HCLS community. Here is a link to the presentation :

http://www.slideshare.net/fbelleau/bio2rdf-w3c-hcls2009


Thanks to the organizers of the event.




0.6.1 bug fix release now available

2009-06-29T01:08:20.733-04:00

A maintenance release, version 0.6.1 was released today on sourceforge [1]. There were a few coding bugs in the 0.6.0 release relating to the namespace match method "all", the rdf rule order was not being imported from the configuration properly resulting in queries which relied on more than one rule not getting any results back, and included static RDF/XML sections were not being included. There was also a fix related to default providers that eliminates duplicate queries for namespaces where a namespace was assigned to a default provider for a query that allowed default providers.

The configuration files have also been updated, although people using the live configuration method (the default) would have received the configuration changes already. Some performance improvements related to logging have also been made that in some circumstances will dramatically improve the performance of the package, although the majority of the overall request latency is still related to internet latency related to the SPARQL queries.

From this version on, I will also be releasing MD5 hashes for each of the downloaded files so people can check that their downloaded file matches the release on sourceforge.

[1] https://sourceforge.net/project/platformdownload.php?group_id=142631



Version 0.6.0 of the Bio2RDF server software released

2009-06-23T00:15:50.089-04:00

The next version of the Bio2RDF software, version 0.6.0 was released today on sourceforge [1]

It has some major feature additions over the previous version, with the highlights being an RDF based configuration, the ability to update the configuration while the server is running, and support for sophisticated profiles so that users can pick and choose sources without having to change the basic configuration sources that are shared between different users. If users want to add or subtract from the base configuration they can create a small RDF file on their server and use that file to pick which sources they want to use and which queries they want to be able to execute.

If anyone wants to check out the example [2] and use it as a guide to mock up some SPARQL queries or definitions for endpoints that go with the queries it would be great to see what other resources we can combine into the global Bio2RDF configuration. If you need pointers in how to get your own configuration working feel free to ask me.

[1] https://sourceforge.net/project/platformdownload.php?group_id=142631
[2] http://bio2rdf.wiki.sourceforge.net/sample+configuration



Version 0.5.0 of the Bio2RDF server software released

2009-05-08T02:07:48.089-04:00

The next version of the server software has been released on sourceforge. [1]It contains a number of changes that will hopefully make it more useful for the tasks we want to do with linked rdf queries.One major one is the introduction of content negotiation, which has been tested for N3 (using text/rdf+n3) and RDF/XML (using application/rdf+xml). It was made possible this quickly after the last release by the use of the content negotiation code from Pubby, the driver behind the DBpedia web interface and URI resolution mechanism. It is also possible to explicitly get to the N3 format currently by prefixing the URL with /n3/ See [2] for an example. The ability to explicitly get to the RDF/XML will be added in future.Another change that will hopefully be useful is the introduction of clear RDF level error messages when either the syntax of a URI is not recognised, or the syntax was recognised but there were no providers that were relevant to the URI. See [3], [4] and [5] for a demonstration of the error messages.There is also the ability to page through the results, which is necessary when there are more than 2000 results to a query from a particular endpoint. To use the paging facility the URI needs to be prefixed by /pageoffsetNN/, where NN is a number indicating which page you would like to look at. The queries are not ordered currently, but in the short term it would be reasonable to believe that they should be consistent enough to get through all of the results. Ordered queries take a lot longer than unordered queries, so it is unlikely that the public mirrors will ever introduce ordered queries. An example of the paging URL could be [6] or [7].There is also the ability to get an RDF document describing what actions would be taken for a particular query. It is interoperable with the /n3/ and /pageoffsetNN/ URI manipulations so URI's like [8] can be made up and resolved. This RDF document is setup to contain all of the necessary information for the client to then complete the query with their own network resources if necessary. In future, clients should be able to patch into this functionality without having to keep a local copy of the configuration on hand, although a distributed configuration idea is also in the works for sometime in the future. Currently the distribution is readonly from [9]. The [9] URL has also been made content negotiable for HTML/RDFXML/N3 content types, with a default to HTML if the content type is not recognised by the Sesame Rio library, but it can still be accessed in a particular format without content negotiation by appending /html /n3 or /rdfxml .Since the last release the GeoSpecies dataset has also been partially integrated, although it doesn't seem to have a sparql endpoint so currently it is only available for basic construct queries. [10] Not all of the namespaces inside the geospecies dataset have rules for normalisation to Bio2RDF URI syntax, but the rest will be integrated eventually.The order of normalisation rules is now respected when applying them, with lower numbers being applied before higher numbers. Numbers with the same order cannot be relied on to be applied in a consistent manner if they overlap syntactically.The MyExperiment SPARQL endpoint [11] has also been integrated into Bio2RDF since the last release, so for instance, a user in the MyExperiment system can be resolved using [12], but there are also other things like workflows which could in the future provide valuable interconnections for the linked rdf web. Further integration with MyExperiment would be invaluable to the future of the Bio2RDF network I think.Partial support for INCHI re[...]



2,4 billions triples of Bioinformatics RAW DATA NOW

2009-04-22T00:40:58.340-04:00

In his recent talk at TED, Tim Berner Lee invited the data provider to make available data in RDF format to help the building process of linked data web. He asked them to offer RAW DATA NOW.We totally share this approach in the Bio2RDF community, our goal is to make public datasets from the bioinformatics community available in RDF format via standard SPARQL endpoints (Virtuoso server is used for that). We strongly believe in the semantic web approach to solve science problem but we do not want to wait for data provider to do the RAW DATA conversion job. Converting data to RDF is not fun, we did a lot of this dirty job, and here are the results for actual Bio2RDF release of 34 data sources.Our current datasets in N3 format are available here :http://quebec.bio2rdf.org/download/n3/We invite semantic search engine provider to index these files.The way we produce them is documented in our Wiki at SourceForge in the Cookbook section :http://bio2rdf.wiki.sourceforge.net/Namespace%27s+updateThe actual list of SPARQL endpoints in the linked data cloud is hosted here :http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/StatisticsBio2RDF 2,4 billions triples graph of linked data represents 51 % of the actual global linked data graph size.Finally, this is what this highly connected knowledge world look like.I would take this occasion to thanks all the enthusiast biologist and researcher who invest themselves by annotating article, protein and gene product. Without this essential work of connecting documents and concepts together, this project would not have been possible.For the 20th anniversary of the web, I would also want to thanks Tim Berner Lee for his inspiring vision. Bio2RDF may not be the awaited killer app of the life science to demonstrate the semantic web potential, but let's say that it is only the beginning of the linked data cloud build by and for scientists. The WWW2009 workshop Linked Data on the Web (LDOW2009) was held today, I would like to say how important the work of this community is. Finally a last word to congratulate Virtuoso team and especially Orri Erling for his fantastic work with the new Virtuoso 6.0 server soon to be released. I cannot wait to see Bio2RDF data into this amazing engine.[...]



Bio2RDF's map new graphic representation

2009-04-21T23:50:20.359-04:00

This word net represents the actual namespace connection between Bio2RDF SPARQL endpoints. RDF datasets which were analyzed comes from Bio2RDF's download page. These representations are generated with Many Eyes visualization tools.


Static version.



New Bio2RDF query services

2009-04-02T21:34:11.699-04:00

The 0.3 release provides the ability to link to licence providers, so the applicable license for a namespace may be available by following a URL. The URL syntax for this is /license/namespace:identifier . It was easier to require the identifier to be present than to not have it. So far, the identifier portion is not being used, so it merely has to be present for the URL resolution to occur, but in future there is the allowance to have different licenses being given based on the identifier, which is useful for databases which are not completely released under a single license.

We provide countlinks and countlinksns which count the number of reverse links to a particular namespace and identifier, from all namespaces, or from within a given namespaces respectively. Currently these only function on virtuoso endpoints due to their use of aggregation extensions to SPARQL. The URL syntax is /countlinks/namespace:identifier and /countlinksns/targetnamespace/namespace:identifier

There is also the ability to count the number of triples in each SPARQL endpoint that point to a given Bio2RDF URI (or its equivalent identifier for non-Bio2RDF SPARQL endpoints). This ability is provided using /counttriples/namespace:identifier

We also provide search and searchns, which attempt to search globally using SPARQL (aren't currently linked to the rdfiser search pages which may be accessed using certain searchns URI's), or search within a particular namespace for text searches. The searches are all performed using the virtuoso fulltext search paradigm, ie, bif:contains, and other sparql endpoints haven't yet been implemented even with regex because it is reasonably slow but it would be simple to construct a query if people thought it was necessary. The URL syntax is /search/searchTerm and /searchns/targetnamespace:searchTerm

The coverage of each of these queries over the current Bio2RDF namespaces can be found here.

If anyone has any (possibly already SPARQL) queries on biology related databases that they regularly execute that can either be parameterised or turned into Pipes then it would be great to include them in future distributions for others to use.



RDF use and generation improvements

2009-04-02T21:21:59.693-04:00

The 0.3 version of the Bio2RDF Servlet implements true RDF handling in the background to provide consistency of output and the potential to support multiple output formats such as NTriples and Turtle in the future, although the only output currently supported is RDF/XML. The Sesame library is being used to provide this functionality.

Provide more RDFiser scripts as part of the source distribution, including Chebi, GO, Homologene, NCBI Geneid, HGNC, OBO and Ecocyc along with guides on the Bio2RDF wiki about how to use the scripts to regenerate new RDF versions using future versions of each database.



Live recent network statistics available

2009-04-02T21:07:15.178-04:00

The 0.3 releases provide the ability to show live statistics to diagnose some network issues without having to look at log files. The URL is /admin/stats
  • Shows the last time the internal provider blacklist reset, indicating how much activity is being displayed as the statistics are reset everytime the blacklist is reset. This blacklist is only implemented to prevent malfunctioning queries from being further communicated with.
  • By default shows the IP's accessing the server, with an indication of the total number and duration of their queries. Can be configured in low use and private situations to also show the queries being performed
  • Shows the servers which have been unresponsive since the last blacklist reset including a basic reason, such as an HTTP 503 or 400 error
There is also a live blacklisting functionality provided in version 0.3.2 to prevent crawlers who regularly utilise functionality that they shouldn't according to the Bio2RDF robots.txt file. The settings for this have been set rather high by default, and this functionality can be turned off completely by people who download and install the package and datasets locally. Specifically, a regular user of the public mirrors should make sure that they are not making either more than 40 requests in each 12 minute statistics period, or if they are making more than 40 requests in each 12 minute period, more than 25% of the queries should be for non-Robots.txt queries. These parameters will possibly change depending on further investigation. An individual can access /error/blacklist even if they are not blacklisted currently to show a list of requests from their IP address since the start of the last 12 minute statistics period.



Support provided for more non-Bio2RDF providers

2009-04-02T20:53:19.036-04:00

The 0.3 Bio2RDF Servlet release implements support for more non-Bio2RDF SPARQL endpoints such as LinkedCT, DrugBank, Dailymed, Diseasome, Neurocommons, DBPedia, and Flyted/Flybase .

The relevant namespaces for these inside of Bio2RDF are:
  • DBpedia - dbpedia, dbpedia_property, dbpedia_class
  • LinkedCT - linkedct_ontology, linkedct_intervention, linkedct_trials, linkedct_collabagency, linkedct_condition, linkedct_link, linkedct_location, linkedct_overall_official, linkedct_oversight, linkedct_primary_outcomes, linkedct_reference, linkedct_results_reference, linkedct_secondary_outcomes, linkedct_arm_group
  • Dailymed - dailymed_ontology, dailymed_drugs, dailymed_inactiveingredient, dailymed_routeofadministration, dailymed_organization
  • DrugBank - drugbank_ontology, drugbank_druginteractions, drugbank_drugs, drugbank_enzymes, drugbank_drugtype, drugbank_drugcategory, drugbank_dosageforms, drugbank_targets
  • Diseasome - diseasome_ontology, diseasome_diseases, diseasome_genes, diseasome_chromosomallocation, diseasome_diseaseclass
  • Neurocommons - Uses the equivalent Bio2RDF namespaces, with live owl:sameAs links back to the relevant Neurocommons namespaces. Used for pubmed, geneid, taxonomy, mesh, prosite and go so far
  • Flyted/Flybase - Not converted yet, only direct access provided using search functionalities
Provide live owl:sameAs references which match the URI's used in SPARQL queries to keep linkages to the original databases without leaving the Bio2RDF database:identifier paradigm, so if people know the DBPedia, etc., URI's, the link to their current knowledge is given

Some http://database.bio2rdf.org/database:identifier URI's are produced by the owl:sameAs additions, but these aren't standard, and are only shown where there is still at least one SPARQL endpoint available which still uses them. People should utilise the http://bio2rdf.org/database:identifier versions when linking to Bio2RDF.

Any further contributions to this list, or additions of other datasets which already utilise Bio2RDF URI's would be very useful! See the list of namespaces already implemented here.



Provider, query and namespace statistics now available

2009-04-02T20:17:12.151-04:00

At the time of posting Bio2RDF supported:
  • 230 namespaces
  • 35 different internal query titles (some of these map to the same URI pattern, so there are not this many URI query options)
  • 140 provider options, including a large number of /html/database:identifier providers which redirect to HTML pages which describe the Bio2RDF Identifier as well as the Bio2RDF SPARQL endpoints
More statistics can be found here

A list of the actual provider URL's mapped back to namespaces and queries can be found by downloading the Bio2RDF Servlet and changing a setting in log4j.properties to make the page more verbose. If the setting were turned on for the public mirrors it would result in a very large file each time.



LSID support for Bio2RDF

2009-04-02T19:38:17.757-04:00

From release 0.3.2 of the Bio2RDF Servlet, any URI similar to http://bio2rdf.org/namespace:identifier will be accessible using its equivalent LSID, with http://bio2rdf.org/ as the proxy, using http://bio2rdf.org/urn:lsid:bio2rdf.org:namespace:identifier . The LSID syntax will not be available for use with custom services such as http://bio2rdf.org/links/namespace:identifier or http://bio2rdf.org/search/searchterm.

This will NOT become the standard identifier, but it provides compatibility with some users who wish to utilise LSID's.