Subscribe: Kingsley Idehen's Blog Data Space
http://www.openlinksw.com/weblog/public/search.vspx?blogid=127&q=rdf%20data%20integration&type=text&output=xml
Added By: Feedage Forager Feedage Grade B rated
Language: English
Tags:
access  based  data access  data web  data  information  integration  linked data  linked  semantic  via  virtuoso  web 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Kingsley Idehen's Blog Data Space

Kingsley Idehen's Blog Data Space



About rdf data integration



Published: Sat, 16 Dec 2017 06:58:12 GMT

 



Semantic Web & Data Integration

Thu, 18 Jan 2007 14:25:51 GMT

Stefano Mazzocchi, via his blog: Stefano's Linotype, delivers insightful contribution to the ongoing effort to recapture the essence of the original Semantic Web vision.

The Semantic Web is about granular exposure of the underlying web-of-data that fuels the World Wide Web. It models "Web Data" using a Directed Graph Data Model (back-to-the-future: Network Model Database) called RDF.

In line with contemporary database technology thinking, the Semantic Web also seeks to expose Web Data to architects, developers, and users via a concrete Conceptual Layer that is defined using RDF Schema.

The abstract nature of Conceptual Models implies that actual instance data (Entities, Attributes, and Relationships/Associations) occurs by way of "Logical to Conceptual" schema mapping and data generation that can involve a myriad of logical data sources (SQL, XML, Object databases, traditional web content, RSS/Atom feeds etc.). Thus, by implication, it is safe assume that the Semantic Web's construction is basically a Data Integration and exposure effort. The point that Stefano alludes to in the blog post excerpts that follow:

The semantic web is really just data integration at a global scale. Some of this data might end up being consistent, detailed and small enough to perform symbolic reasoning on, but even if this is the case, that would be such a small, expensive and fragile island of knowledge that it would have the same impact on the world as calculus had on deciding to invade Iraq.

The biggest problem we face right now is a way to 'link' information that comes from different sources that can scale to hundreds of millions of statements (and hundreds of thousands of equivalences). Equivalences and subclasses are the only things that we have ever needed of OWL and RDFS, we want to 'connect' dots that otherwise would be unconnected. We want to suggest people to use whatever ontology pleases them and then think of just mapping it against existing ones later. This is easier to bootstrap than to force them to agree on a conceptualization before they even know how to start!

Additional insightful material from Stefano:

  1. A No-Nonsense Guide to Semantic Web Specs for XML People [Part I]
  2. A No-nonsense Guide to Semantic Web Specs for XML People [Part II]

Benjamin Nowack also chimes into this conversation via his simple guide to understanding Data, Information, and Knowledge in relation so the Semantic Web.




Data Spaces and Web of Databases

Mon, 04 Sep 2006 22:58:56 GMT

Note: An updated version of a previously unpublished blog post: Continuing from our recent Podcast conversation, Jon Udell sheds further insight into the essence of our conversation via a “Strategic Developer” column article titled: Accessing the web of databases. Below, I present an initial dump of a DataSpace FAQ below that hopefully sheds light on the DataSpace vision espoused during my podcast conversation with Jon. What is a DataSpace? A moniker for Web-accessible atomic containers that manage and expose Data, Information, Services, Processes, and Knowledge. What would you typically find in a Data Space? Examples include: Raw Data - SQL, HTML, XML (raw), XHTML, RDF etc. Information (Data In Context) - XHTML (various microformats), Blog Posts (in RSS, Atom, RSS-RDF formats), Subscription Lists (OPML, OCS, etc), Social Networks (FOAF, XFN etc.), and many other forms of applied XML. Web Services (Application/Service Logic) - REST or SOAP based invocation of application logic for context sensitive and controlled data access and manipulation. Persisted Knowledge - Information in actionable context that is also available in transient or persistent forms expressed using a Graph Data Model. A modern knowledgebase would more than likely have RDF as its Data Language, RDFS as its Schema Language, and OWL as its Domain  Definition (Ontology) Language. Actual Domain, Schema, and Instance Data would be serialized using formats such as RDF-XML, N3, Turtle etc). How do Data Spaces and Databases differ? Data Spaces are fundamentally problem-domain-specific database applications. They offer functionality that you would instinctively expect of a database (e.g. AICD data management) with the additonal benefit of being data model and query language agnostic. Data Spaces are for the most part DBMS Engine and Data Access Middleware hybrids in the sense that ownership and control of data is inherently loosely-coupled. How do Data Spaces and Content Management Systems differ?Data Spaces are inherently more flexible, they support multiple data models and data representation formats. Content management systems do not possess the same degree of data model and data representation dexterity. How do Data Spaces and Knowledgebases differ?A Data Space cannot dictate the perception of its content. For instance, what I may consider as knowledge relative to my Data Space may not be the case to a remote client that interacts with it from a distance, Thus, defining my Data Space as Knowledgebase, purely, introduces constraints that reduce its broader effectiveness to third party clients (applications, services, users etc..). A Knowledgebase is based on a Graph Data Model resulting in significant impedance for clients that are built around alternative models. To reiterate, Data Spaces support multiple data models. What Architectural Components make up a Data Space? ORDBMS Engine - for Data Modeling agility (via complex purpose specific data types and data access methods), Data Atomicity, Data Concurrency, Transaction Isolation, and Durability (aka ACID). Virtual Database Engine - for creating a single view of, and access point to, heterogeneous SQL, XML, Free Text, and other data. This is all about Virtualization at the Data Access Level. Web Services Platform - enabling controlled access and manipulation (via application, service, or protocol logic) of Virtualized or Disparate Data. This layer handles the decoupling of functionality from monolithic wholes for function specific invocation via Web Services using either the SOAP or REST approach. Where do Data Spaces fit into the Web's rapid evolution?They are an essential part of the burgeoning Data Web / Semantic Web. In short, they will take us from data “Mash-ups” (combining web accessible data that exists without integration and repurposing in mind) to “Mesh-ups” (combining web accessible data that exists with integration and repurposing in mind). Where can I see a DataSpace along the lines described, in action[...]



Email As A Platform

Thu, 22 Jun 2006 12:56:58 GMT

Email As A Platform It looks like more people are starting to realize that email is more than it seems. Especially given the drastic increase in storage size of web-based email applications, more people are realizing that email is basically a personal database. People simply store information in their email, from contact information that was emailed to them to schedule information to purchase tracking from emailed receipts. Lots of people email messages to themselves, realizing that email is basically the best "permanent" filing system they have. That's part of the reason why good email search is so important. Of course, what the article doesn't discuss is the next stage of this evolution. If you have a database of important information, the next step is to build useful applications on top of it. In other words, people are starting to realize that email, itself, is a platform for personal information management.

[via Techdirt]
 
Yep! And this is where the Unified Storage vision comes into play. Many years ago the same issues emerged in the business application realm, and at the time the issue at hand was: separating the DBMS engine from the Application logic. This is what the SQL Access Group (SAG) addressed via the CLI that laid the foundation for ODBC, JDBC, and recent derivatives; OLE DB and ADO.NET.
 
Most of us live inside our email applications and the need to integrate the content of emails, address books, notes, calendars with other data sources (Web Portal, Blogs, Wikis, CRM, ERP, and more) as part of our application interaction cycles and domain specific workflow is finally becoming obvious.  There is a need for separation of the application/service layer from the storage engine across each one of these functionality realms. XML, RDF, and Triple Stores (RDF / Semantic Data Stores) collectively provide a standards based framework for achieving this goal. On the other hand so does WinFS albeit total proprietary (by this I mean none standards compliant) at the current time.
 
As you can already see there are numerous applications (conventional or hosted) that address email, address books, bookmarking, notes, calendars, blogs, wikis, crm etc. specifically, but next to none that address the obvious need for transparent integration across each functionality realm - the ultimate goal.
 
Yes, you know what I am about to say! OpenLink Virtuoso is the platform for developing and/or implementing these next generation solutions. We have also decided to go one step further by developing a number of applications that demonstrate the vision (and ultimate reality); and each of these applications (and the inherent integration tapestry) will be the subject of a future Virtuoso Application specific post.



What is Linked Data, really?

Tue, 09 Nov 2010 18:53:01 GMT

Linked Data is simply hypermedia-based structured data. Linked Data offers everyone a Web-scale, Enterprise-grade mechanism for platform-independent creation, curation, access, and integration of data. The fundamental steps to creating Linked Data are as follows: Choose a Name Reference Mechanism — i.e., URIs. Choose a Data Model with which to Structure your Data — minimally, you need a model which clearly distinguishes Subjects (also known as Entities) Subject Attributes (also known as Entity Attributes), and Attribute Values (also known as Subject Attribute Values or Entity Attribute Values). Choose one or more Data Representation Syntaxes (also called Markup Languages or Data Formats) to use when creating Resources with Content based on your chosen Data Model. Some Syntaxes in common use today are HTML+RDFa, N3, Turtle, RDF/XML, TriX, XRDS, GData, OData, OpenGraph, and many others. Choose a URI Scheme that facilitates binding Referenced Names to the Resources which will carry your Content -- your Structured Data. Create Structured Data by using your chosen Name Reference Mechanism, your chosen Data Model, and your chosen Data Representation Syntax, as follows: Identify Subject(s) using Resolvable URI(s). Identify Subject Attribute(s) using Resolvable URI(s). Assign Attribute Values to Subject Attributes. These Values may be either Literals (e.g., STRINGs, BLOBs) or Resolvable URIs. You can create Linked Data (hypermedia-based data representations) Resources from or for many things. Examples include: personal profiles, calendars, address books, blogs, photo albums; there are many, many more. Related Linked Data an Introduction -- simple introduction to Linked Data and its virtues How Data Makes Corporations Dumb -- Jeff Jonas (IBM) interview Hypermedia Types -- evolving information portal covering different aspects of Hypermedia resource types URIBurner -- service that generates Linked Data from a plethora of heterogeneous data sources Linked Data Meme -- TimbL design issues note about Linked Data Data 3.0 Manifesto -- note about format agnostic Linked Data DBpedia -- large Linked Data Hub Linked Open Data Cloud -- collection of Linked Data Spaces Linked Open Commerce Cloud -- commerce (clicks & mortar and/or clicks & clicks) oriented Linked Data Space LOD Cloud Cache -- massive Linked Data Space hosting most of the LOD Cloud Datasets LOD2 Initiative -- EU Co-Funded Project to develop global knowledge space from LOD. [...]



What is Linked Data, really?

Tue, 15 Feb 2011 22:28:06 GMT

Linked Data is simply hypermedia-based structured data. Linked Data offers everyone a Web-scale, Enterprise-grade mechanism for platform-independent creation, curation, access, and integration of data. The fundamental steps to creating Linked Data are as follows: Choose a Name Reference Mechanism — i.e., URIs. Choose a Data Model with which to Structure your Data — minimally, you need a model which clearly distinguishes Subjects (also known as Entities) Subject Attributes (also known as Entity Attributes), and Attribute Values (also known as Subject Attribute Values or Entity Attribute Values). Choose one or more Data Representation Syntaxes (also called Markup Languages or Data Formats) to use when creating Resources with Content based on your chosen Data Model. Some Syntaxes in common use today are HTML+RDFa, N3, Turtle, RDF/XML, TriX, XRDS, GData, and OData; there are many others. Choose a URI Scheme that facilitates binding Referenced Names to the Resources which will carry your Content -- your Structured Data. Create Structured Data by using your chosen Name Reference Mechanism, your chosen Data Model, and your chosen Data Representation Syntax, as follows: Identify Subject(s) using Resolvable URI(s). Identify Subject Attribute(s) using Resolvable URI(s). Assign Attribute Values to Subject Attributes. These Values may be either Literals (e.g., STRINGs, BLOBs) or Resolvable URIs. You can create Linked Data (hypermedia-based data representations) Resources from or for many things. Examples include: personal profiles, calendars, address books, blogs, photo albums; there are many, many more. Related Hypermedia Types -- evolving information portal covering different aspects of Hypermedia resource types URIBurner -- service that generates Linked Data from a plethora of heterogeneous data sources Linked Data Meme -- TimbL design issues note about Linked Data Data 3.0 Manifesto -- note about format agnostic Linked Data DBpedia -- large Linked Data Hub Linked Open Data Cloud -- collection of Linked Data Spaces Linked Open Commerce Cloud -- commerce (clicks & mortar and/or clicks & clicks) oriented Linked Data Space LOD Cloud Cache -- massive Linked Data Space hosting most of the LOD Cloud Datasets LOD2 Initiative -- EU Co-Funded Project to develop global knowledge space from LOD. [...]



OpenLink Virtuoso - Product Value Proposition Overiew

Sat, 27 Feb 2010 17:46:36 GMT

Situation Analysis Since the beginning of the modern IT era, each period of innovation has inadvertently introduced its fair share of Data Silos. The driving force behind this anomaly remains an overemphasis on the role of applications when selecting problem solutions. Unfortunately, most solution selecting decision makers remain oblivious to the fact that most applications are architecturally monolithic; i.e., they fail to separate the following five layers that are critical to all solutions: Data Unit (Datum or Data Object) Identity, Data Storage/Persistence, Data Access, Data Representation, and Data Presentation/Visualization. The rise of the Internet, and its exponentially-growing user-friendly enclave known as the World Wide Web, is bringing the intrinsic costs of the monolithic application architecture anomaly to bear -- in manners unanticipated by many. For example, the emergence of network-oriented solutions across the realms of Enterprise 2.0-based Collaboration and Web 2.0-based Software-as-a-Service (SaaS), combined with the overarching influence of Social Media, are producing more heterogeneously-structured and disparately-located data sources than people can effectively process. As is often the case, a variety of problem and product monikers have emerged for the data access and integration challenges outlined above. Contemporary examples include Enterprise Information Integration, Master Data Management, and Data Virtualization. Labeling aside, the fundamental issues of the unresolved Data Integration challenge boil down to the following: Data Model Heterogeneity Data Quality (Cleanliness) Semantic Variance across Contexts (e.g., weights and measures). Effectively solving today's data integration challenges requires a move away from monolithic application architecture to loosely-coupled, network-centric application architectures. Basically, we need a ubiquitous network-centric application protocol that lends itself to loosely-coupled across-the-wire orchestration of data interactions. In short, this will be what revitalizes the art of application development and deployment. The World Wide Web is built around a network application protocol called HTTP. This protocol intrinsically separates the five layers listed earlier, thereby enabling: Use of Generic HTTP URIs as Data Object (Entity) Identifiers; Identifier Co-reference, such that multiple Data Object Identifiers may reference the same Data Object; Use of the Entity-Attribute-Value Model to describe Data Objects using real world modeling friendly conceptual graphs; Use of HTTP URLs to Identify Locations of Resources that bear (host) Data Object Descriptions (Representations); Data Access mechanism for retrieving Data Object Representations from persistent or transient storage locations. What is Virtuoso? A uniquely designed to address today's escalating Data Access and Integration challenges without compromising performance, security, or platform independence. At its core lies an unrivaled commitment to industry standards combined with unique technology innovation that transcends erstwhile distinct realms such as: Data Management (Relational, RDF Graph, or Document), Data Access Middleware, Web Application & Services Deployment, Linked Data Deployment, and Messaging. When Virtuoso is installed and running, HTTP-based Data Objects are automatically created as a by-product of its powerful data virtualization, transcending data sources and data representati[...]



Re-introducing the Virtuoso Virtual Database Engine

Wed, 17 Feb 2010 21:46:53 GMT

In recent times a lot of the commentary and focus re. Virtuoso has centered on the RDF Quad Store and Linked Data. What sometimes gets overlooked is the sophisticated Virtual Database Engine that provides the foundation for all of Virtuoso's data integration capabilities. In this post I provide a brief re-introduction to this essential aspect of Virtuoso. What is it? This component of Virtuoso is known as the Virtual Database Engine (VDBMS). It provides transparent high-performance and secure access to disparate data sources that are external to Virtuoso. It enables federated access and integration of data hosted by any ODBC- or JDBC-accessible RDBMS, RDF Store, XML database, or Document (Free Text)-oriented Content Management System. In addition, it facilitates integration with Web Services (SOAP-based SOA RPCs or REST-fully accessible Web Resources). Why is it important? In the most basic sense, you shouldn't need to upgrade your existing database engine version simply because your current DBMS and Data Access Driver combo isn't compatible with ODBC-compliant desktop tools such as Microsoft Access, Crystal Reports, BusinessObjects, Impromptu, or other of ODBC, JDBC, ADO.NET, or OLE DB-compliant applications. Simply place Virtuoso in front of your so-called "legacy database," and let it deliver the compliance levels sought by these tools In addition, it's important to note that today's enterprise, through application evolution, company mergers, or acquisitions, is often faced with disparately-structured data residing in any number of line-of-business-oriented data silos. Compounding the problem is the exponential growth of user-generated data via new social media-oriented collaboration tools and platforms. For companies to cost-effectively harness the opportunities accorded by the increasing intersection between line-of-business applications and social media, virtualization of data silos must be achieved, and this virtualization must be delivered in a manner that doesn't prohibitively compromise performance or completely undermine security at either the enterprise or personal level. Again, this is what you get by simply installing Virtuoso. How do I use it? The VDBMS may be used in a variety of ways, depending on the data access and integration task at hand. Examples include: Relational Database Federation You can make a single ODBC, JDBC, ADO.NET, OLE DB, or XMLA connection to multiple ODBC- or JDBC-accessible RDBMS data sources, concurrently, with the ability to perform intelligent distributed joins against externally-hosted database tables. For instance, you can join internal human resources data against internal sales and external stock market data, even when the HR team uses Oracle, the Sales team uses Informix, and the Stock Market figures come from Ingres! Conceptual Level Data Access using the RDF Model You can construct RDF Model-based Conceptual Views atop Relational Data Sources. This is about generating HTTP-based Entity-Attribute-Value (E-A-V) graphs using data culled "on the fly" from native or external data sources (Relational Tables/Views, XML-based Web Services, or User Defined Types). You can also derive RDF Model-based Conceptual Views from Web Resource transformations "on the fly" -- the Virtuoso Sponger (RDFizing middleware component) enables you to generate RDF Model Linked Data via a RESTful Web Service or within the process pipeline of the SPARQL query engine (i.e., you simply use the URL of a Web Resource in the FROM clause of a SPARQL query). It's important to note that Views take the form of HTTP links that serve as both Data Source Names and Data Source Addresses. This enables you to query and explore relationships across entities (i.e., People, Places, and other Real World Things) via HTTP clients (e.g., Web Browsers) or directly via SPARQL Query Language [...]



Virtuoso Chronicles from the Field: Nepomuk, KDE, and the quest for a sophisticated RDF DBMS.

Mon, 01 Feb 2010 14:02:55 GMT

For this particular user experience chronicle, I've simply inserted the content of Sebastian Trueg's post titled: What We Did Last Summer (And the Rest of 2009) – A Look Back Onto the Nepomuk Development Year ..., directly into this post, without any additional commentary or modification. 2009 is over. Yeah, sure, trueg, we know that, it has been over for a while now! Ok, ok, I am a bit late, but still I would like to get this one out - if only for my archive. So here goes. Virtuoso Let’s start with the major topic of 2009 (and also the beginning of 2010): The new Nepomuk database backend: Virtuoso. Everybody who used Nepomuk had the same problems: you either used the sesame2 backend which depends on Java and steals all of your memory or you were stuck with Redland which had the worst performance and missed some SPARQL features making important parts of Nepomuk  like queries unusable. So more than a year ago I had the idea to use the one GPL’ed database server out there that supported RDF in a professional manner: OpenLink’s Virtuoso. It has all the features we need, has a very good performance, and scales up to dimensions we will probably never reach on the desktop (yeah, right, and 64k main memory will be enough forever!). So very early I started coding the necessary Soprano plugin which would talk to a locally running Virtuoso server through ODBC. But since I ran into tons of small problems (as always) and got sidetracked by other tasks I did not finish it right away. OpenLink, however, was very interested in the idea of their server being part of every KDE installation (why wouldn’t they ;)). So they not only introduced a lite-mode which makes Virtuoso suitable for the desktop but also helped in debugging all the problems that I had left. Many test runs, patches, and a Virtuoso 5.0.12 release later I could finally announce the Virtuoso integration as usable. Then end of last year I dropped the support for sesame2 and redland. Virtuoso is now the only supported database backend. The reason is simple: Virtuoso is way more powerful than the rest - not only in terms of performance - and it is fully implemented in C(++) without any traces of Java. Maybe even more important is the integration of the full text index which makes the previously used CLucene index unnecessary. Thus, we can finally combine full text and graph queries in one SPARQL query. This results in a cleaner API and way faster return of  search results since there is no need to combine the results from several queries anymore. A direct result of that is the new Nepomuk Query API which I will discuss later. So now the only thing I am waiting for is the first bugfix release of Virtuoso 6, i.e. 6.0.1 which will fix the bugs that make 6.0.0 fail with Nepomuk. Should be out any day now. :) The Nepomuk Query API Querying data in Nepomuk pre-KDE-4.4 could be done in one of two ways: 1. Use the very limited capabilities of the ResourceManager to list resources with certain properties or of a certain type; or 2. Write your own SPARQL query using ugly QString::arg replacements. With the introduction of Virtuoso and its awesome power we can now do pretty much everything in one query. This allowed me to finally create a query API for KDE: Nepomuk::Query::Query and friends. I won’t go into much detail here since I did that before. All in all you should remember one thing: whenever you think about writing your own SPARQL query in a KDE application - have a look at libnepomukquery. It is very likely that you can avoid the hassle of debugging a query by using the query API. The first nice effect of the new API (apart from me using it all over the place obviously) is the new query interface in Dolphin. Internally it simply combines a bunch of Nepomuk::Query::Term objects into a Nepomuk::Query::AndTerm. All very readable and no ugly query strings. D[...]



The URI, URL, and Linked Data Meme's Generic HTTP URI (Updated)

Sun, 28 Mar 2010 16:19:00 GMT

Situation Analysis As the "Linked Data" meme has gained momentum you've more than likely been on the receiving end of dialog with Linked Open Data community members (myself included) that goes something like this: "Do you have a URI", "Get yourself a URI", "Give me a de-referencable URI" etc.. And each time, you respond with a URL -- which to the best of your Web knowledge is a bona fide URI. But to your utter confusion you are told: Nah! You gave me a Document URI instead of the URI of a real-world thing or object etc.. What's up with that? Well our everyday use of the Web is an unfortunate conflation of two distinct things, which have Identity: Real World Objects (RWOs) & Address/Location of Documents (Information bearing Resources). The "Linked Data" meme is about enhancing the Web by unobtrusively reintroducing its core essence: the generic HTTP URI, a vital piece of Web Architecture DNA. Basically, its about so realizing the full capabilities of the Web as a platform for Open Data Identification, Definition, Access, Storage, Representation, Presentation, and Integration. What is a Real World Object? People, Places, Music, Books, Cars, Ideas, Emotions etc.. What is a URI? A Uniform Resource Identifier. A global identifier mechanism for network addressable data items. Its sole function is Name oriented Identification. URI Generic Syntax The constituent parts of a URI (from URI Generic Syntax RFC) are depicted below: What is a URL? A location oriented HTTP scheme based URI. The HTTP scheme introduces a powerful and inherent duality that delivers: Resource Address/Location Identifier Data Access mechanism for an Information bearing Resource (Document, File etc..) So far so good! What is an HTTP based URI? The kind of URI Linked Data aficionados mean when they use the term: URI. An HTTP URI is an HTTP scheme based URI. Unlike a URL, this kind of HTTP scheme URI is devoid of any Web Location orientation or specificity. Thus, Its inherent duality provides a more powerful level of abstraction. Hence, you can use this form of URI to assign Names/Identifiers to Real World Objects (RWO). Even better, courtesy of the Identity/Address duality of the HTTP scheme, a single URI can deliver the following: RWO Identfier/Name RWO Metadata document Locator (courtesy of URL aspect) Negotiable Representation of the Located Document (courtesy of HTTP's content negotiation feature). What is Metadata? Data about Data. Put differently, data that describes other data in a structured manner. How Do we Model Metadata? The predominant model for metadata is the Entity-Attribute-Value + Classes & Relationships model (EAV/CR). A model that's been with us since the inception of modern computing (long before the Web). What about RDF? The Resource Description Framework (RDF) is a framework for describing Web addressable resources. In a nutshell, its a framework for adding Metadata bearing Information Resources to the current Web. Its comprised of: Entity-Attribute-Value (aka. Subject-Predictate-Object) plus Classes & Relationships (Data Dictionaries e.g., OWL) metadata model A plethora of instance data representation formats that include: RDFa (when doing so within (X)HTML docs), Turtle, N3, TriX, RDF/XML etc. What's the Problem Today? The ubiquitous use of the Web is primarily focused on a Linked Mesh of Information bearing Documents. URLs rather than generic HTTP URIs are the prime mechanism for Web tapestry; basically, we use URLs to conduct Information -- which is inherently subjective -- instead of using HTTP URIs to conduct "Raw Data" -- which is inherently objective. Note: Information is "data in context", it isn't the same thing as &[...]



Exploring the Value Proposition of Linked Data

Fri, 24 Jul 2009 12:20:01 GMT

What is Linked Data? The primary topic of a meme penned by TimBL in the form of a Design Issues Doc (note: this is how TimBL has shared his thoughts since the Beginning of the Web). There are a number of dimensions to the meme, but its primary purpose is the reintroduction of the HTTP URI -- a vital component of the Web's core architecture. What's Special about HTTP URIs? They possess an intrinsic duality that combines persistent and unambiguous Data Identity with platform & representation format independent Data Access. Thus, you can use a string of characters that look like a contemporary Web URL to unambiguously achieve the following: Identity or Name Anything of Interest Describe Anything of Interest by associating the Description Subject's Identity with a constellation of Attribute and Value pairs (technically: an Entity-Attribute-Value or Subject-Predicate-Object graph) Make the Description of Named Things of Interest discoverable on the Web by implicitly binding the aforementioned to Documents that hold their descriptions (technically: metadata documents or information resources) What's the basic value proposition of the Linked Data meme? Enabling more productive use of the Web by users and developers alike. All of which is achieved by tweaking the Web's Hyperlinking feature such that it now includes Hypertext and Hyperdata as link types. Note: Hyperdata Linking is simply what an HTTP URI facilitates. Examples problems solved by injecting Linked Data into the Web: Federated Identity by enabling Individuals to unambiguously Identify themselves (Profiles++) courtesy of existing Internet and Web protocols (e.g., FOAF+SSL's WebIDs which combine Personal Identity with X.509 certificates and HTTPs based client side certification) Security and Privacy challenge alleviation by delivering a mechanism for policy based data access that feeds off federated individual identity and social network (graph) traversal Spam Busting via the above. Increasing the Serendipitous Discovery Quotient (SDQ) of Web accessible resources by embedding Rich Metadata into (X)HTML Documents e.g., structured descriptions of your "WishLists" and "OfferLists" via a common set of terms offered by vocabularies such as GoodRelations and SIOC Coherent integration of disparate data across the Web and/or within the Enterprise via "Data Meshing" rather than "Data Mashing" Moving beyond imprecise statistically driven "Keyword Search" (e.g. Page Rank) to "Precision Find" driven by typed link based Entity Rank plus Entity Type and Entity Property filters. Conclusion If all of the above still falls into the technical mumbo-jumbo realm, then simply consider Linked Data as delivering Open Data Access in granular form to Web accessible data -- that goes beyond data containers (documents or files). The value proposition of Linked Data is inextricably linked to the value proposition of the World Wide Web. This is true, because the Linked Data meme is ultimately about an enhancement of the current Web; achieved by reintroducing its architectural essence -- in new context -- via a new level of link abstraction, courtesy of the Identity and Access duality of HTTP URIs. As a result of Linked Data, you can now have Links on the Web for a Person, Document, Music, Consumer Electronics, Products & Services, Business Opening & Closing Hours, Personal "WishLists" and "OfferList", an Idea, etc.. in addition to links for Properties (Attributes & Values) of the aforementioned. Ultimately, all of these links will be indexed in a myriad of ways providing the substrate for the next major period of Internet & Web driven innovation, within our larger human-ingenuity driven innovation continuum. Relat[...]



ebiz RDF & Data Integration Article Retort

Thu, 29 Jan 2009 21:25:58 GMT

Yesterday, I stumbled across an ebiz article by David Linthicum titled: RDF & Data Integration. Naturally, I read it, and while reading encountered a number of inaccuracies that compelled me to comment on the post. Today, I revisited the same article -- and to my shock and horror -- my comments do not exist (note: the site did accept my comments yesterday!). Even more frustrating for me, I now have to expend time I don't have re-writing my comments due to the depth and danger of the inaccuracies in this post re. RDF in general. Important Note to ebiz and David: Please look into what happened to my comments. It's too early for me to conclude that subjective censorship is a play on the Web -- which isn't a hard copy journalistic format style of platform where editors get away with such shenanigans. The Web is a sticky database, and outer joining is well and truly functional (meaning: exclusion and omission ultimately come back to bite via full outer join query results against the Web DB). By the way, if you publish the comments I made to the post (yesterday), I will add a note to this post, accordingly. Yes! David just confirmed to me via Twitter that this is yet another comment system related issue and absolutely no intent to censor etc. His words Twervatim :-) For sake of clarity, I've itemized the inaccuracies and applied my correction comments (inline) accordingly: Inaccuracy #1: Resource Description Framework (RDF), a part of the XML story, provides interoperability between applications that exchange information. Correction #1: RDF and XML are not inextricably linked in any way. RDF is part Data Model (EAV/CR style Graph) with associated markup and data serialization formats that include: N3, Turtle, TriX, RDF/XML etc. Inaccuracy #2: RDF uses XML to define a foundation for processing metadata and to provide a standard metadata infrastructure for both the Web and the enterprise. Correction #2: RDF/XML is an XML based markup and data serialization format. As a markup language it can be used for creating RDF model records/statements (using Subject, Predicate, Object or Entity, Attribute, Value). As a serialization format, it provides a mechanism for marshaling RDF data across data managers and data consumers. Inaccuracy #3: The difference between the two is that XML is used to transport data using a common format, while RDF is layered on top of XML defining a broad category of data. Correction #3: See earlier corrections above. Inaccuracy #4: When the XML data is declared to be of the RDF format, applications are then able to understand the data without understanding who sent it. Correction #4: You do not declare data to be of RDF format. RDF isn't a format it is a data model (as stated above). You can "up lift" or map data from XML to RDF (hierarchical to graph model mapping). Likewise you can "down shift" or map data from RDF to XML (example: SPARQL SELECT query patterns "down shift" to SPARQL Results XML, which isn't RDF/XML, while keeping access to graphs via URIs or Entity Identifiers that reside within the serialization). Inaccuracy #5: RDF extends the XML model and syntax to be specified for describing either resources or a collection of information. (XML points to a resource in order to scope and uniquely identify a set of properties known as the schema.). Correction #5: See earlier comments. The single accurate paragraph in this ebiz article lies right at the end and it states the following: "I've always thought RDF has been underutilized for data integration, and it's really an old standard. Now that we're focused on both understanding and integrating data, perhaps RDF should make a comeback." Related: Semantic Web FAQ fragm[...]



Time for RDBMS Primacy Downgrade is Nigh! (No Embedded Images Edition - Update 1)

Tue, 17 Mar 2009 15:50:58 GMT

As the world works it way through a "once in a generation" economic crisis, the long overdue downgrade of the RDBMS, from its pivotal position at the apex of the data access and data management pyramid is nigh. What is the Data Access, and Data Management Value Pyramid? As depicted below, a top-down view of the data access and data management value chain. The term: apex, simply indicates value primacy, which takes the form of a data access API based entry point into a DBMS realm -- aligned to an underlying data model. Examples of data access APIs include: Native Call Level Interfaces (CLIs), ODBC, JDBC, ADO.NET, OLE-DB, XMLA, and Web Services. See: AVF Pyramid Diagram. The degree to which ad-hoc views of data managed by a DBMS can be produced and dispatched to relevant data consumers (e.g. people), without compromising concurrency, data durability, and security, collectively determine the "Agility Value Factor" (AVF) of a given DBMS. Remember, agility as the cornerstone of environmental adaptation is as old as the concept of evolution, and intrinsic to all pursuits of primacy. In simpler business oriented terms, look at AVF as the degree to which DBMS technology affects the ability to effectively implement "Market Leadership Discipline" along the following pathways: innovation, operation excellence, or customer intimacy. Why has RDBMS Primacy has Endured? Historically, at least since the late '80s, the RDBMS genre of DBMS has consistently offered the highest AVF relative to other DBMS genres en route to primacy within the value pyramid. The desire to improve on paper reports and spreadsheets is basically what DBMS technology has fundamentally addressed to date, even though conceptual level interaction with data has never been its forte. See: RDBMS Primacy Diagram. For more then 10 years -- at the very least -- limitations of the traditional RDBMS in the realm of conceptual level interaction with data across diverse data sources and schemas (enterprise, Web, and Internet) has been crystal clear to many RDBMS technology practitioners, as indicated by some of the quotes excerpted below: "Future of Database Research is excellent, but what is the future of data?" "..it is hard for me to disagree with the conclusions in this report. It captures exactly the right thoughts, and should be a must read for everyone involved in the area of databases and database research in particular." -- Dr. Anant Jingran, CTO, IBM Information Management Systems, commenting on the 2007 RDBMS technology retreat attended by a number of key DBMS technology pioneers and researchers. "One size fits all: A concept whose time has come and gone They are direct descendants of System R and Ingres and were architected more than 25 years ago They are advocating "one size fits all"; i.e. a single engine that solves all DBMS needs. -- Prof. Michael Stonebreaker, one of the founding fathers of the RDBMS industry. Until this point in time, the requisite confluence of "circumstantial pain" and "open standards" based technology required to enable an objective "compare and contrast" of RDBMS engine virtues and viable alternatives hasn't occurred. Thus, the RDBMS has endured it position of primacy albeit on a "one size fits all basis". Circumstantial Pain As mentioned earlier, we are in the midst of an economic crisis that is ultimately about a consistent inability to connect dots across a substrate of interlinked data sources that transcend traditional data access boundaries with high doses of schematic heterogeneity. Ironically, in a era of the dot-com, we haven't been able to make meaningful connections between relevant &quo[...]



The Time for RDBMS Primacy Downgrade is Nigh!

Wed, 03 Jun 2009 22:09:58 GMT

As the world works it way through a "once in a generation" economic crisis, the long overdue downgrade of the RDBMS, from its pivotal position at the apex of the data access and data management pyramid is nigh. What is the Data Access, and Data Management Value Pyramid? As depicted below, a top-down view of the data access and data management value chain. The term: apex, simply indicates value primacy, which takes the form of a data access API based entry point into a DBMS realm -- aligned to an underlying data model. Examples of data access APIs include: Native Call Level Interfaces (CLIs), ODBC, JDBC, ADO.NET, OLE-DB, XMLA, and Web Services. The degree to which ad-hoc views of data managed by a DBMS can be produced and dispatched to relevant data consumers (e.g. people), without compromising concurrency, data durability, and security, collectively determine the "Agility Value Factor" (AVF) of a given DBMS. Remember, agility as the cornerstone of environmental adaptation is as old as the concept of evolution, and intrinsic to all pursuits of primacy. In simpler business oriented terms, look at AVF as the degree to which DBMS technology affects the ability to effectively implement "Market Leadership Discipline" along the following pathways: innovation, operation excellence, or customer intimacy. Why has RDBMS Primacy has Endured? Historically, at least since the late '80s, the RDBMS genre of DBMS has consistently offered the highest AVF relative to other DBMS genres en route to primacy within the value pyramid. The desire to improve on paper reports and spreadsheets is basically what DBMS technology has fundamentally addressed to date, even though conceptual level interaction with data has never been its forte. For more then 10 years -- at the very least -- limitations of the traditional RDBMS in the realm of conceptual level interaction with data across diverse data sources and schemas (enterprise, Web, and Internet) has been crystal clear to many RDBMS technology practitioners, as indicated by some of the quotes excerpted below: "Future of Database Research is excellent, but what is the future of data?" "..it is hard for me to disagree with the conclusions in this report. It captures exactly the right thoughts, and should be a must read for everyone involved in the area of databases and database research in particular." -- Dr. Anant Jingran, CTO, IBM Information Management Systems, commenting on the 2007 RDBMS technology retreat attended by a number of key DBMS technology pioneers and researchers. "One size fits all: A concept whose time has come and gone They are direct descendants of System R and Ingres and were architected more than 25 years ago They are advocating "one size fits all"; i.e. a single engine that solves all DBMS needs. -- Prof. Michael Stonebreaker, one of the founding fathers of the RDBMS industry. Until this point in time, the requisite confluence of "circumstantial pain" and "open standards" based technology required to enable an objective "compare and contrast" of RDBMS engine virtues and viable alternatives hasn't occurred. Thus, the RDBMS has endured it position of primacy albeit on a "one size fits all basis". Circumstantial Pain As mentioned earlier, we are in the midst of an economic crisis that is ultimately about a consistent inability to connect dots across a substrate of interlinked data sources that transcend traditional data access boundaries with high doses of schematic heterogeneity. Ironically, in a era of the dot-com, we haven't been able to make meaningful connections between relevant "real-world things&quo[...]



My Hopes for Linked Data in 2009 (Update #2)

Wed, 07 Jan 2009 02:35:19 GMT

Happy New Year!

In 2009 I hope the following happens re. "Linked Data":

  1. We realize it's a Meme
  2. We collectively connect the Meme to the concept of granular hyperlinks between data entities/objects (datum to datum linkage aka. Hyperdata Linking)
  3. We generally connect the Meme to technology ancestry such as the Entity-Attribute-Value with Classes & Relationships (EAV/CR) data model (then broader commonality with erstwhile unrelated realms will be unveiled e.g., Entity Frameworks from Microsoft, Core Data from Apple, SimpleDB from Amazon, and the Freebase Graph Model DB amongst others)
  4. We instinctively connect the Meme to the concept of Entity Oriented Data Access and Management (RDF based Linked Data is basically EAV/CR scheme that uses HTTP based Pointers for Entity, Attribute, and Relationship Identifiers)
  5. We naturally connect the Meme with the notion that an identifier for a unit of data (aka. Datum) should be the conduit to a negotiable representation of said Datum's description (i.e., it's attribute and relationship properties in HTML, XHTML, RDFa, Turtle, N3, RDF/XML etc., for example)
  6. We ultimately connect the Meme with a conceptual-level approach to data integration across disparate data sources (also known as Master Data Management (MDM) ).

2009 is about a reboot on a monumental scale. We need new thinking, new technology, new approaches, and new solutions. No matter what route we take, we can't negate the importance of "Data". When dealing with organic or inorganic computers systems -- Data is simply everything!

The ability of individuals and enterprises to access, mesh, and disseminate data to relevant nodes across public and private networks will ultimately determine the winners and losers in the new frontier, ushered in by 2009.

Do not take data access and data management technology for granted. User interfaces come and ago, application logic comes and goes, but your data stays with you forever. If you are mystified by data access technology then make 2009 the year of data access technology demystification :-)

Related




Introducing Virtuoso Universal Server (Cloud Edition) for Amazon EC2

Fri, 28 Nov 2008 21:06:02 GMT

What is it? A pre-installed edition of Virtuoso for Amazon's EC2 Cloud platform. What does it offer? From a Web Entrepreneur perspective it offers: Low cost entry point to a game-changing Web 3.0+ (and beyond) platform that combines SQL, RDF, XML, and Web Services functionality Flexible variable cost model (courtesy of EC2 DevPay) tightly bound to revenue generated by your services Delivers federated and/or centralized model flexibility for you SaaS based solutions Simple entry point for developing and deploying sophisticated database driven applications (SQL or RDF Linked Data Web oriented) Complete framework for exploiting OpenID, OAuth (including Role enhancements) that simplifies exploitation of these vital Identity and Data Access technologies Easily implement RDF Linked Data based Mail, Blogging, Wikis, Bookmarks, Calendaring, Discussion Forums, Tagging, Social-Networking as Data Space (data containers) features of your application or service offering Instant alleviation of challenges (e.g. service costs and agility) associated with Data Portability and Open Data Access across Web 2.0 data silos LDAP integration for Intranet / Extranet style applications. From the DBMS engine perspective it provides you with one or more pre-configured instances of Virtuoso that enable immediate exploitation of the following services: RDF Database (a Quad Store with SPARQL & SPARUL Language & Protocol support) SQL Database (with ODBC, JDBC, OLE-DB, ADO.NET, and XMLA driver access) XML Database (XML Schema, XQuery/Xpath, XSLT, Full Text Indexing) Full Text Indexing. From a Middleware perspective it provides: RDF Views (Wrappers / Semantic Covers) over SQL, XML, and other data sources accessible via SOAP or REST style Web Services Sponger Service for converting non RDF information resources into RDF Linked Data "on the fly" via a large collection of pre-installed RDFizer Cartridges. From the Web Server Platform perspective it provides an alternative to LAMP stack components such as MySQL and Apace by offering HTTP Web Server WebDAV Server Web Application Server (includes PHP runtime hosting) SOAP or REST style Web Services Deployment RDF Linked Data Deployment SPARQL (SPARQL Query Language) and SPARUL (SPARQL Update Language) endpoints Virtuoso Hosted PHP packages for MediaWiki, Drupal, Wordpress, and phpBB3 (just install the relevant Virtuoso Distro. Package). From the general System Administrator's perspective it provides: Online Backups (Backup Set dispatched to S3 buckets, FTP, or HTTP/WebDAV server locations) Synchronized Incremental Backups to Backup Set locations Backup Restore from Backup Set location (without exiting to EC2 shell). Higher level user oriented offerings include: OpenLink Data Explorer front-end for exploring the burgeoning Linked Data Web Ajax based SPARQL Query Builder (iSPARQL) that enables SPARQL Query construction by Example Ajax based SQL Query Builder (QBE) that enables SQL Query construction by Example. For Web 2.0 / 3.0 users, developers, and entrepreneurs it offers it includes Distributed Collaboration Tools & Social Media realm functionality courtesy of ODS that includes: Point of presence on the Linked Data Web that meshes your Identity and your Data via URIs System generated Social Network Profile & Contact Data via FOAF? System generated SIOC (Semantically Interconnected Online Community) Data Space (that includes a Social Graph) exposing all your Web data in RDF Linked Data form System generated OpenID and automatic integration with FOAF Transparent Data Integration across Facebook, Digg, LinkedIn, FriendFeed, Twitter, and any other Web 2.0 d[...]



Entity Oriented Data Access

Tue, 04 Nov 2008 03:51:48 GMT

Recent perturbations in Data Access and Data Management technology realms are clear signs of an imminent inflection. In a nutshell, the focus of data access is moving from the "Logical Level" (what you see if you've ever looked at a DBMS schema derived from an Entity Data Model) to the "Conceptual Level" (i.e., the Entity Model becoming concrete). In recent times I've stumbled across Master Data Management (MDM) which is all about entities that provide holistic views of enterprise data (or what I call: Context Lenses). I've also stumbled across emerging tensions in the .NET realm between Linq to Entities and Linq to SQL, where in either case the fundamental issues comes down to the optimal paths "Conceptual Level Access" over the "Logical Logical Level" when dealing with data access in the .NET realm. Strangely, the emerging realm of RDF Linked Data, MDM, and .NET's Entity Frameworks, remain strangely disconnected. Another oddity is the obvious, but barely acknowledged, blurring of the lines between the "traditional enterprise employee" and the "individual Web netizen". The fusion between these entities is one of the most defining characteristics of how the Web is reshaping the data landscape. At the current time, I tend to crystalize my data access world view under the moniker: YODA ("You" Oriented Data Access), based on the following: Entities are the new focal point of data access, management, and integration "You" are the entry point (Data Source Name) into this new realm of inter connected Entities that the Web exposes "You" the "Person" Entity is associated with many other "Things" such as "Organizations", "Other People", "Books", "Music", "Subject Matter" etc. "You" the "Person" needs Identity in this new global database, which is why "You" need to Identify "Yourself" using an an HTTP based Entity ID (aka. URI) When "You" have an ID for "Yourself" it becomes much easier for the essence of "You" to be discovered via the Web When "Others" have IDs for "Themselves" on the Web it becomes much easier for "You" to serendipitously discover or explicitly "Find" things on the Web. Related Is LINQ to SQL truly dead? Virtuoso, Linked Data, and Linq2Rdf Enterprise 0.0, Linked Data, and the Semantic Data Web (*an old post*) [...]



The Trouble with Labels (Contd.): Data Integration & SOA

Sun, 12 Oct 2008 22:54:22 GMT

I just stumbled across an post from ITBusines Edge titled: How Semantic Technology Can Help Companies with Integration. While reading the post I encountered the term: Master Data Manager (MDM), and wondered to myself, "what's that?" only to realize it's the very same thing I described as a Data Virtualization or Virtual Database technology (circa. 1998). Now, if re-labeling can confuse me when applied to a realm I've been intimately involved with for eons (internet time). I don't want to imagine what it does for others who aren't that intimately involved with the important data access and data integration realms. On the more refreshing side, the article does shed some light on the potency of RDF and OWL when applied to the construction of conceptual views of heterogeneous data sources. "How do you know that data coming from one place calculates net revenue the same way that data coming from another place does? You’ve got people using the same term for different things and different terms for the same things. How do you reconcile all of that? That’s really what semantic integration is about." BTW - I discovered this article via another titled: Understanding Integration And How It Can Help with SOA, that covers SOA and Integration matters. Again, in this piece I feel the gradual realization of the virtues that RDF, OWL, and RDF Linked Data bring to bear in the vital realm of data integration across heterogeneous data silos. Conclusion A number of events, at the micro and macro economic levels, are forcing attention back to the issue of productive use of existing IT resources. The trouble with the aforementioned quest is that it ultimately unveils the global IT affliction known as: heterogeneous data silos, and the challenges of pain alleviation, that have been ignored forever or approached inadequately as clearly shown by the rapid build up of SOA horror stories in the data integration realm. Data Integration via conceptualization of heterogenous data sources, that result in concrete conceptual layer data access and management, remains the greatest and most potent application of technologies associated with the "Semantic Web" and/or "Linked Data" monikers. Related InforWorld 2003 Innovator article 2006 Podcast Interview with Jon Udell Enterprise Information Integration One of several posts about our Virtuoso Universal Server and Conceptual Model based data integration History of Virtuoso Mike Bergman's post titled: WOA: A New Enterprise Partner for Linked Data [...]



Is the Semantic Web necessary (and feasible)?

Fri, 29 Aug 2008 15:08:12 GMT

Here is another "Linked Discourse" effort via a blog post that attempts to add perspective to a developing Web based conversation. In this case, the conversation originates from Juan Sequeda's recent interview with Jana Thompson titled: Is the Semantic Web necessary (and feasible)? Jana: What are the benefits you see to the business community in adopting semantic technology? Me: Exposure, exploitation, of untapped treasure trove of interlinked data, information, and knowledge across disparate IT infrastructure via conceptual entry points (Entity IDs / URIs / Data Source Names) that refer to as "Context Lenses". Jana: Do you think these benefits are great enough for businesses to adopt the changes? Me: Yes, infrastructural heterogeneity is a fact of corporate life (growth, mergers, acquisitions etc). Any technology that addresses these challenges is extremely important and valuable. Put differently, the opportunity costs associated with IT infrastructural heterogeneity remains high! Jana: How large do you think this impact will actually be? Me: Huge, enterprise have been aware of their data, information, and knowledge treasure troves etc. for eons. Tapping into these via a materialization of the "information at your fingertips" vision is something they've simply been waiting to pursue without any platform lock-in, for as long as I've been in this industry. Jana: I’ve heard, from contacts in the Bay Area, that they are skeptical of how large this impact of semantic technology will actually be on the web itself, but that the best uses of the technology are for fields such as medical information, or as you mentioned, geo-spatial data. Me: Unfortunately, those people aren't connecting the Semantic Web and open access to heterogeneous data sources, or the intrinsic value of holistic exploration location of entity based data networks (aka Linked Data). Jana: Are semantic technologies going to be part of the web because of people championing the cause or because it is actually a necessary step? Me: Linked Data technology on the Web is a vital extension of the current Web. Semantic Technology without the "Web" component, or what I refer to as "Semantics Inside only" solutions, simply offer little or no value as Web enhancements based on their incongruence with the essence of the Web i.e., "Open Linkage" and no Silos! A nice looking Silo is still a Silo. Jana: In the early days of the web, there was an explosion of new websites, due to the ease of learning HTML, from a business to a person to some crackpot talking about aliens. Even today, CSS and XHTML are not so difficult to learn that a determined person can’t learn them from W3C or other tutorials easily. If OWL becomes the norm for websites, what do you think the effects will be on the web? Do you think it is easy enough to learn that it will be readily adopted as part of the standard toolkit for web developers for businesses? Me: Correction, learning HTML had nothing to do with the Web's success. The value proposition of the Web simply reached critical mass and you simply couldn't afford to not be part of it. The easiest route to joining the Web juggernaut was a Web Page hosted on a Web Site. The question right now is: what's the equivalent driver for the Linked Data Web bearing in mind the initial Web bootstrap. My answer is simply this: Open Data Access i.e., getting beyond the data silos that have inadvertently emerged from Web 2.0. Jana: Following the same theme, do you think this will le[...]



Time for Context Lenses (Update)

Mon, 04 Aug 2008 15:24:50 GMT

As the Linked Data meme continues on it's quest to unravel the mysteries of the Semantic Web vision, it's quite gratifying to see that data virtualization comprehension: creating "Conceptual Views" into logically organized "Disparate & Heterogeneous Data Sources" via "Context Lenses" is taking shape, as illustrated in the "note-to-self" post by David Provost.




Virtualization of heterogeneous data sources is only achievable if you have a dexterous data model based "Bus" into which the data sources are plugged. RDF has offered such a model for a long time.



(image)



When heterogeneous data sources are plugged into an RDF based integration bus e.g., customer records sourced from a variety of tables, across a plethora of databases, you can only end up with true value if the emergent entities from such an effort are coherently linked and (de)referencable; which is what Linked Data's fundamental preoccupation with dereferencable URIs is all about. Of course, Even when you have all of the above in place, you also need to be able to construct "Context Lenses" i.e., context driven views of the Linked Data Mesh (or Linked Data Spaces).


Additional Diagrams:


1. Clients of the RDF Bus
2. RDF Bus Server plugins: Scripts that emit RDF
3. RDF Bus Servers: RDF Data Managers (Triple or Quad Stores)
4. RDF Bus Servers: Relational to RDF Mappers (RDF Views, Semantic Covers etc.)
5. RDF Bus Server plugins: XML to RDF Mappers
6. RDF Bus Server plugins: GRDDL based XSLT stylesheets that emit RDF
7. RDF Bus Server plugins: Intelligent RDF Middleware









Missing Bits from semanticweb.com Interview

Fri, 13 Jun 2008 13:01:40 GMT

Yikes! I've just discovered that the final part of the semanticweb.com's interview with Jim Hendler and I, includes critical paragraphs that omit my example links :-( As you can imagine, this is a quite excruciating, bearing in mind that "Literals" are of marginal value in a Linked Data world. Anyway, thanks to the Blogosphere, I can attempt to fix this problem myself -- via this post :-) Q. If you wanted to provide a bewildered but still curious novice a public example of Linked Data at work in their everyday life, what would it be? Kingsley Idehen: Any one of the following: My Linking Open Data community Profile Page - the Linked Data integration is exposed via the "Explore Data" Tab My Linked Data Space - viewed via OpenLink's AJAR (Asynchronous Javascript and RDF) based Linked Data Brower My Events Calendar Tag Cloud - a Linked Data view of my Calendar Space using an RDF-aware browser In all cases, you have the ability to explore my data spaces by simply clicking on the links, which on the surface appear to be standard hypertext links, although in reality you are dealing with hyperdata links (i.e., links to entities that result in the generation of entity description pages that expose entity properties via hyperdata links). Thus, you have a single page that describes me in a very rich way since it encompasses all data associated with me, covering: personal profile, blog posts, bookmarks, tag clouds, social networks etc. Q. What would you show the CEO or CTO of a company outside the tech industry? Kingsley Idehen: A link to the Entity ALFKI, from the popular Northwind Database associated with Microsoft Access and SQL Server database installations. This particular link exposes a typical enterprise data space (orders, customers, employees, suppliers ...) in a single page. The hyperdata links represent intricate data relationships common to most business systems that will ultimately seek to repurpose existing legacy data sources and SOA services as Linked Data. Alternatively, I would show the same links via the Zitgist Data Viewer (another Linked Data-aware browser). In both cases, I am exploiting direct access to entities via HTTP due to the protocols incorporation into the Data Source Naming scheme. [...]



DBpedia receives shot #1 of CLASSiness vaccine

Tue, 13 Jul 2010 14:45:40 GMT

The current live instance of DBpedia has just received dose #1 of a series of planned "Context" oriented booster shots. These shots seek to to protect DBpedia from contextual incoherence as it grows in data set expanse and popularity. Dose #1 (vaccine label: Yago) equips DBpedia with a functional (albeit non exclusive) Data Dictionary component courtesy of the Yago Class Hierarchy . When the DBpedia & Yago integration took place last year (around WWW2007, Banff) there was a little, but costly omission that occurred: nobody sought to load the Yago Class Hierarchy into the Virtuoso's Inference Engine :-( Anyway, the Class Hierarchy has now been loaded into the Virtuoso's inference engine (as Virtuoso Inference Rules) and the following queries are now feasible using the live Virtuoso based DBpedia instance hosted by OpenLink Software: -- Find all Fiction Books associated with a property "dbpedia:name" that has literal value:  "The Lord of the Rings" .   DEFINE input:inference "http://dbpedia.org/resource/inference/rules/yago#" PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX dbpedia: <http://dbpedia.org/property> PREFIX yago: <http://dbpedia.org/class/yago>  SELECT DISTINCT  ?s FROM < xmlns="http" dbpedia.org="dbpedia.org">//dbpedia.org> WHERE { ?s a yago:Fiction106367107 . ?s dbpedia:name "The Lord of the Rings"@en . } -- Variant of query with Virtuoso's Full Text Index extension via the bif:contains function/magic predicate DEFINE input:inference "http://dbpedia.org/resource/inference/rules/yago#" PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX dbpedia: <http://dbpedia.org/property> PREFIX yago: <http://dbpedia.org/class/yago>  SELECT DISTINCT ?s ?n FROM < xmlns="http" dbpedia.org="dbpedia.org">//dbpedia.org> WHERE { ?s a yago:Fiction106367107 . ?s dbpedia:name ?n . ?n bif:contains 'Lord and Rings' } -- Retrieve all individuals instances of Fiction Class which should include all Books. DEFINE input:inference "http://dbpedia.org/resource/inference/rules/yago#" PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX dbpedia: <http://dbpedia.org/property> PREFIX yago: <http://dbpedia.org/class/yago>  SELECT DISTINCT ?s FROM < xmlns="http" dbpedia.org="dbpedia.org">//dbpedia.org> WHERE { ?s a yago:Fiction106367107 . } LIMIT 50 Note: you can also move the inference pragmas to the Virtuoso Sever side i.e place the inference rules in a server instance config file, thereby negating the need to place "define input:inference 'http://dbpedia.org/resource/inference/rules/yago#'" pragmas directly in your SPARQL queries. Related Mike's UMBEL: Making Linked Data Classypost Fred's announcement about the Yago revamp en route to UMBEL Expanding Data Object Domains via UMBEL My Prior posts about UMBEL [...]



Comments about recent Semantic Gang Podcast

Tue, 06 May 2008 00:06:42 GMT

After listening to the latest Semantic Web Gang podcast, I found myself agreeing with some of the points made by Alex Iskold, specifically: -- Linked Data does not implicitly imply making all your data public -- Linked Data principles benefit Intranet and Extranet style data integration (trumps alternative distributed database integration approaches any day) -- Business exploitation of Linked Data on the Web will certainly be driven by the correlation of opportunity costs (which is more than likely what Alex meant by "use cases") associated with the lack of URIs originating from the domain of a given business (Tom Heath: also effectively alluded to this via his BBC and URI land grab anecdotes; same applies Georgi's examples) -- History is a great tutor, answers to many of today's problems always lie somewhere in plain sight of the past. Of course, I also believe that Linked Data serves Web Data Integration across the Internet very well too, and the fact that it will be beneficial to businesses in a big way. No individual or organization is an island, I think the Internet and Web have done a good job of demonstrating that thus far :-) We're all data nodes in a Giant Global Graph. Daniel lewis did shed light on the read-write aspects of the Linked Data Web, which is actually very close to the callout for a Wikipedia for Data. TimBL has been working on this via Tabulator (see Tabulator Editing Screencast), Bengamin Nowack also added similar functionality to ARC, and of course we support the same SPARQL UPDATE into an RDF information resource via the RDF Sink feature of our WebDAV and ODS-Briefcase implementations.[...]



Linked Data is vital to Enterprise Integration driven Agility

Sat, 22 Mar 2008 18:13:41 GMT

John Schmidt, from Informatica, penned an interesting post titled: IT Doesn't Matter - Integration Does. Yes, integration is hard, but I do profoundly believe that what's been happening on the Web over the last 10 or so years also applies to the Enterprise, and by this I absolutely do not mean "Enterprise 2.0" since "2.0" and productive agility do not compute in my realm of discourse. large collections of RSS feeds, Wikiwords, Shared Bookmarks, Discussion Forums etc.. when disconnected at the data level (i.e. hosted in pages with no access to the "data behind") simply offer information deluge and inertia (there are only so many hours for processing opaque information sources in a given day). Enterprises fundamentally need to process information efficiently as part of a perpetual assessment of their relative competitive Strengths, Weaknesses, Opportunities, and Threats (SWOT), in existing and/or future markets. Historically, IT acquisitions have run counter intuitively to the aforementioned quest for "Ability" due to the predominance of "rip and replace" approach technology acquisition that repeatedly creates and perpetuates information silos across Application, Database, Operating System, Development Environment boundaries. The sequence of events typically occurs as follows: applications are acquired on a problem by problem basis back-end application databases are discovered once ad-hoc information views are sought by information workers back-end database disparity across applications is discovered once holistic views are sought by knowledge workers (typically domain experts). In the early to mid 90's (pre ubiquitous Web), operating system, programming language, operating system, and development framework independence inside the enterprise was technically achievable via ODBC (due to it's platform independence). That said, DBMS specific ODBC channels alone couldn't address the holistic requirements associated with Conceptual Views of disparate data sources, hence the need for Data Access Virtualization via Virtual Database Engine technology. Just as is the case on the Web today, with the emergence of the "Linked Data" meme, enterprises now have a powerful mechanism for exploiting the Data Integration benefits associated with generating Data Objects from disparate data sources, endowed with HTTP based IDs (URIs). Conceptualizing access to data exposed Databases APIs, SOA based Web Services (SOAP style Web Services), Web 2.0 APIs (REST style Web Services), XML Views of SQL Data (SQLX), pure XML etc.. is problem area addressed by RDF aware middleware (RDFizers e.g Virtuoso Sponger). Here are examples of what SQL Rows exposed as RDF Data Objects (identified using HTTP based URIs) would look like outside or behind a corporate firewall: Customer - Alfreds Futterkiste Customer Contact - Maria Anders Salesrep - Nancy Davolio Customer Orders Numbers - 11084, 11011, 11078, 11085 What's Good for the Web Goose (Personal Data Space URIs) is good for the Enterprise Gander (Enterprise Data Space URIs). Related Data Access - A Cultural or Technical Challenge? [...]



Semantic Web Advocate of Tribe Linked Data! (Updated)

Thu, 20 Mar 2008 20:29:47 GMT

These days I increasingly qualify myself and my Semantic Web advocacy as falling under the realm Linked Data. Thus, I tend to use the following introduction: I am Kingsley Idehen, of the Tribe Linked Data. The aforementioned qualification is increasingly necessary for the following reasons: The Semantic Web vision is broad and comprised of many layers A new era of confusion is taking shape just as we thought we had quelled the prior AI dominated realm of confusion None of the Semantic Web vision layers are comprehensible in practical ways without a basic foundation Open Data Access is the foundation of the Semantic Web (in prior post I used the term: Semantic Web Layer 1) URIs units of Open Data Access in Semantic Web parlance i.e.. each datum on the Web must have an ID (minted by the host Data Space). The terms GGG, Linked Data, Data Web, Web of Data, and Web 3.0 (when I use this term) all imply URI driven Open Data Access for the Web Database (maybe call this ODBC for the Web) -- ability to point to records across data spaces without any adverse effect to the remote data spaces. It's really important to note that none of the aforementioned terms have nothing to do with the "Linguistic Meaning of blurb". Building a smarter document exposed via a URL without exposing descriptive data links doesn't provide open access to information data sources. As human beings we are all endowed with reasoning capability. But we can't reason without access to data. Dearth of openly accessible structured data is the source of many ills in cyberspace and across society in general. Today we still have Subjectivity reigning over Objectivity due to the prohibitive costs of open data access. We can't cost-effectively pursue objectivity without cost-effective infrastructure for creating alternative views of the data behind information sources (e.g. Web Pages). More Objectivity and less Subjectivity is what the next Web Frontier is about. At OpenLink we simply use the moniker: Analysis for All! Everyone becomes a data analyst in some form, and even better, the analysis are easily accessible to anyone connected to the Web. Of course, you will be able to share special analysis with your private network of friends and family, or if you so choose, not at all :-) Recap, it's important to note that Linked Data is the foundation layer of the Semantic Web vision. It's not only facilitates open data access, it also enables data integration (Meshing as opposed to Mashing) across disparate data schemas As demonstrated by DBpedia and the Linked Data Solar system emerging around it, if you URI everything, then everything is Cool. Linked Data and Information Silos are mutually exclusive concepts. Thus, you cannot produce a web accessible Information Silo and then refer to it as "Semantic Web" technology. Of course, it might be very Semantic, but it's fundamentally devoid of critical "Semantic Web" essence (DNA). My acid test for any Semantic Web solution is simply this (using a Web User Agent or Client): go to the profile page of the service ask for an RDF representation of my profile (by this I mean "get me the raw data in structured form") attempt to traverse the structured data graph (RDF) that the service provides via live de-referncable URIs. Here is the Acid test against my Data Space: My Profile Page (HTML representation dispatched via an instance o[...]



Reminder: Why We Need Linked Data!

Fri, 02 Nov 2007 22:52:34 GMT

"The phrase Open Social implies portability of personal and social data. That would be exciting but there are entirely different protocols underway to deal with those ideas. As some people have told me tonight, it may have been more accurate to call this "OpenWidget" - though the press wouldn't have been as good. We've been waiting for data and identity portability - is this all we get?" [Source: Read/Write Web's Commentary & Analysis of Google's OpenSocial API] ..Perhaps the world will read the terms of use of the API, and realize this is not an open API; this is a free API, owned and controlled by one company only: Google. Hopefully, the world will remember another time when Google offered a free API and then pulled it. Maybe the world will also take a deeper look and realize that the functionality is dependent on Google hosted technology, which has its own terms of service (including adding ads at the discretion of Google), and that building an OpenSocial application ties Google into your application, and Google into every social networking site that buys into the Dream. Hopefully the world will remember. Unlikely, though, as such memories are typically filtered in the Great Noise....[Source: Poignant commentary excerpt from Shelly Power's Blog (as always)] The "Semantic Data Web" vision has always been about "Data & Identity" portability across the Web. Its been that and more from day one. In a nutshell, we continue to exhibit varying degrees of Cognitive Dissonance re the following realities: The Network is the Computer (Internet/Intranet/Extranet depending on your TCP/IP usage scenarios) The Web is the OS (ditto) and it provides a communications subsystem (Information BUS) comprised of - HTTP Protocol - URIs (pointer system for identifying, accessing, and manipulating data) HTTP based Interprocess (i.e Web Apps are processes when you discard the HTML UI and interact with the application logic containers called "Web Services" behind the pages) ultimately hit data Web Data is best Modeled as a Graph (RDF, Containers/Items/Item Types, Property & Value Pairs associated with something, and other labels) Network are Graphs and vice versa Social Networks are graphs where nodes are connected via social connectors ( [x]--knows-->[y] ) The Web is a Graph that exposes a People and Data Network (to the degree we allude to humans not being data containers i.e. just nodes in a network, otherwise we are talking about a Data Network) Data access and manipulation depends inherently on canonical Data Access mechanisms such as Data Source Identifiers / Names (time-tested practice in various DBMS realms) Data is forever, it is the basis of Information, and it is increasing exponentially due to proliferation of Web Services induced user activities (User Generated Content) Survival, Vitality, Longevity, Efficiency, Productivity etc.. are all depend on our ability to process data effectively in a shrinking time continuum where Data and/or Information overload is the alternative. The Data Web is about Presence over Eyeballs due to the following realities: Eyeballs are input devices for a DNA based processing system (Humans). The aforementioned processing system can reason very well, but simply cannot effectively process masses of data or information Widgets offer little value long term re. t[...]



Virtuoso 5.0.2 Released!

Mon, 08 Oct 2007 14:27:27 GMT

A new release of Virtuoso is now available in both Open Source and Commercial variants. The main features and Enhancements associated with this release include:

    * 64-bit Integer Support
    * RDF Sink Folders for WebDAV - enabling RDF Quad Store population by simply dropping RDF files into WebDAV or via HTTP (meaning you can use CURL as an RDF in put mechanism for instance)
    * Additional Sponger Cartridges from Audio binary files (i.e ID3 tag extraction and Music Ontology mapping which exposes the fine details of music as RDF based Structured Data; one for the DJs & Remixers out there!)
    * New Sponger Cartridges for Facebook, Freebase, Wikipedia, GRDDL, RDFa, eRDF and more
    * Support for PHP 5.2 runtime hosting (Virtuoso is a bona fide deployment platform for: Wordpress, MediaWiki, phpBB, Drupal etc.)
    * Enhanced UI for managing RDF Linked Data deployment (covering Multi Homed domains, Virtual Directories associated with URL-rewrite rules
    * Demonstration Database includes SQL-RDF Views & SQL Table samples for the THALIA Web Data Integration benchmark and test-suite
    * Tutorial Application includes Linked Data style SQL-RDF Views for the Northwind SQL DBMS schema (which is the same as the standard Virtuoso demo atabase schema)
    * SQL-RDF Views implementation of the TPC-D benchmark (Yes, we can run this grueling SQL benchmark via RDF views of SQL Data!)
    * A new Amazon EC2 Image for Virtuoso that enables you to instantiate a fully configured instance comprising the Virtuoso core, OpenLink Data Spaces platform and the OpenLink Ajax Toolkit (OAT) (we now have bona fide Data Spaces in the Clouds as an addition to the emerging Semantic Data Web mesh).

Download Lnks:




Fourth Platform: Data Spaces in The Cloud (Update)

Sun, 26 Oct 2008 21:59:33 GMT

I've written extensively on the subject of Data Spaces in relation to the Data Web for while. I've also written sparingly about OpenLink Data Spaces (a Data Web Platform that build using Virtuoso). On the other hand, I haven't shed much light on installation and deployment of OpenLink Data Spaces. Jon Udell recently penned a post titled: The Fourth Platform. The post arrives at a spookily coincidental time (this happens quite often between Jon and I as demonstrated last year during our podcast; the "Fourth" in his Innovators Podcast series). The platform that Jon describes is "Cloud Based" and comprised of Storage and Computation. I would like to add Data Access and Management (native and virtual) under the fourth platform banner with the end product called: "Cloud based Data Spaces". As I write, we are releasing a Virtuoso AMI (Amazon Image) labeled: virtuoso-dataspace-server. This edition of Virtuoso includes the OpenLink Data Spaces Layer and all of the OAT applications we've been developing for a while. What Benefits Does this offer? Personal Data Spaces in the Cloud - a place where you can control and consolidate data across your Blogs, Wikis, RSS/Atom Feed Subscriptions, Shared Bookmarks, Shared Calendars, Discussion Threads, Photo Galleries etc All the data in your Data Space is SPARQL or GData accessible. All of the data in your Personal Data Space is Linked Data from the get go. Each Item of data is URI addressable SIOC support - your Blogs, Wikis, Bookmarks etc.. are based on the SIOC ontology for Semantically Interlinking Online Communities (think: Open social-graph++) FOAF support - your FOAF Profile page provides a URI that is an in-road to all Data in your Data Space. OpenID support - your Personal Data Space ID is usable wherever OpenID is supported. OpenID and FOAF are integrated as per latest FOAF specs Two Integration with Facebook - You can access your Data Space from Facebook or access Facebook from your Data Space Unified Storage - The WebDAV based filesystem provides Cloud Storage that's integrated with Amazon S3; It also exposes all of your Data Space data via a traditional filesystem UI (think virtual Spotlight); You can also mount this drive to your local filesystem via your native operating system's WebDAV support SyncML - you can sync calendar and contact details with your Data Space in the cloud from your Mobile phone. A practical Semantic Data Web solution - based on Web Infrastructure and doesn't require you to do anything beyond exposing URIs for data in your Data Spaces. EC2-AMI Details: AMI ID: ami-e2ca2f8b Manifest file: virtuoso-images/virtuoso-dataspace-server.manifest.xml Installation Guide: Get an Amazon Web Services (AWS) account Signup for S3 and EC2 services Install the EC2 plugin for Firefox Start the EC2 plugin Locate the row containing ami-7c31d515  Manifest virtuoso-test/virtuoso-cloud-beta-9-i386.manifest.xml (sort using the AMI ID or Manifest Columns or search on pattern: virtuoso, due to name flux) Start the Virtuoso Data Space Server AMI Wait 4-5 minutes (*take a few minutes to create the pre-configured Linux Image*) Connect to http://http://your-ec2-instance-cname:8890/ Log in with user/password dba/dba Go to the Admin UI (Virtuoso Conductor) and change the PWDs for the 'dba' and 'dav'[...]



Semantic Web Value Proposition

Fri, 21 Sep 2007 12:05:07 GMT

The motivation behind this post is a response to the Read/WriteWeb post titled: Semantic Web: Difficulties with the Classic Approach. First off, I am going to focus on the Semantic Data Web aspect of the overall Semantic Web vision (a continuum) as this is what we have now. I am also writing this post as a deliberate contribution to the discourse swirling around the real topic: Semantic Web Value Proposition. Situation Analysis We are in the early stages of the long anticipated Knowledge Economy. That being the case, it would be safe to assume that information access, processing, and dissemination are of utmost importance to individuals and organizations alike. You don't produce knowledge in a vacum! Likewise, you can produce Information in a vacum, you need Data. The Semantic Data Web's value to Individuals Problem: Increasingly, Blogs, Wikis, Shared Bookmarks, Photo Galleries, Discussion Forums, Shared Calendars and the like, have become invaluable tools for individual and organizational participation in Web enabled global discourse (where a lot of knowledge is discovered). These tools, are typically associated with Web 2.0, implying Read-Write access via Web Services, centralized application hosting, and data lock-in (silos). The reality expressed above is a recipe for "Information Overload" and complete annihilation of ones effective pursuit and exploitation of knowledge due "Time Scarcity" (note: disconnecting is not an option). Information abundance is inversely related to available processing time (for humans in particular). In my case for instance, I was actively subscribed to over 500+ RSS feeds in 2003. As of today, I've simply stopped counting, and that's just my Weblog Data Space. Then add to that, all of the Discussions I track across Blogs, wikis, message boards, mailing lists, traditional usnet discussion forumns, and the like, and I think you get the picture. Beyond information overload, Web 2.0 data is "Semi-Structured" by way of it's dominant data containers ((X)HTML, RSS, Atom documents and data streams etc.) lacking semantics that formally expose individual data items as distinct entities, endowed with unambiguous naming / identification, descriptive attributes (a type of property/predicate), and relationships (a type of property/predicate). Solution: Devise a standard for Structured Data Semantics that is compatible with the Web Information BUS. Produce structured data (entities, entity types, entity relationships) from Web 1.0 and Web 2.0 resources that already exists on the Web such that individual entities, their attributes, and relationships are accessible and discernible to software agents (machines). Once the entities are individually exposed, the next requirement is a mechanism for selective access to these entities i.e. a query language. Semantic Data Web Technologies that facilitate the solution described above include: Structured Data Standards: RDF - Data Model for structured data RDF/XML - A serialization format for RDF based structured data N3 / Turtle - more human friendly serialization formats for RDF based structured data Entity Exposure & Generation: GRDDL - enables association between XHTML pages and XSLT stylesheets that facilitates loosely coupled "on the fly" extraction of RD[...]



Linked Data & The Web Information BUS

Wed, 08 Aug 2007 22:26:55 GMT

Chris Bizer, Richard Cyganiak, and Tom Heath have just published a Linked Data Publishing Tutorial that provides a guide to the mechanics of Linked Data injection into the Semantic Data Web. On different, but related, thread, Mike Bergman recently penned a post titled: What is the Structured Web?. Both of these public contributions shed light on the "Information BUS" essence of the World Wide Web by describing the evolving nature of the payload shuttled by the BUS. What is an Information BUS? Middleware infrastructure for shuttling "Information" between endpoints using a messaging protocol. The Web is the dominant Information BUS within the Network Computer we know as the "Internet". It uses HTTP to shuttle information payloads between "Data Sources" and "Information Consumers" - what happens when we interact with Web via User Agents / Clients (e.g Browsers). What are Web Information Payloads? HTTP transported streams of contextualized data. Hence the terms: "Information Resource" and "Non Information" when reading material related to http-range-14 and Web Architecture. For example, an (X)HTML document is a specific data context (representation) that enables us to perceive, or comprehend, a data stream originating from a Web Server as a Web Page. On the other hand, if the payload lacks contextualized data, a fundamental Web requirement, then the resource is referred to as a "Non Information" resource. Of course, there is really no such thing as a "Non Information" resource, but with regards to Web Architecture, it's the short way of saying: "the Web Transmits Information only". That said, I prefer to refer to these "Non Information" resources as "Data Sources", are term well understood in the world of Data Access Middleware (ODBC, JDBC, OLEDB, ADO.NET etc.) and Database Management Systems (Relational, Objec-Relational, Object etc). Examples of Information Resource and Data Source URIs: http://demo.openlinksw.com/Northwind/Customer/ALFKI (Information Resource) http://demo.openlinksw.com/Northwind/Customer/ALFKI#this (Data Source) Explanation: The Information Resource is a conduit to the Entity identified by Data Source (an entity in my RDF Data Space that is the Subject or Object of one of more Triple based Statements. The triples in question can that can be represented as an RDF resource when transmitted over the Web via an Information Resource that takes the form of a SPARQL REST Service URL or a Physical RDF based Information Resource URL). What about Structured Data? Prior to the emergence of the Semantic Data Web, the payloads shuttled across the Web Information BUS comprised primarily of the following: HTML - Web Resource with presentation focused structure (Web 1.0 dominant payload form) XML - Web Resource with structure that separates presentation and data (Web 2.0's dominant payload form). The Semantic Data Web simply adds RDF to the payload formats that shuttle the Web Information BUS. RDF addresses formal data structure which XML doesn't cover since it is semi-structured (distinct data entities aren't formally discernible). In a nutshell, an RDF payload is basically a conceptual model database packaged as [...]



Enterprise 0.0, Linked Data, and Semantic Data Web

Tue, 05 Feb 2008 04:19:26 GMT

Last week we officially released Virtuoso 5.0.1 (in Commercial and Open Source Editions). The press release provided us with an official mechanism and timestamp for the current Virtuoso feature set. A vital component of the new Virtuoso release is the finalization of our SQL to RDF mapping functionality -- enabling the declarative mapping of SQL Data to RDF. Additional technical insight covering other new features (delivered and pending) is provided by Orri Erling, as part of a series of post-Banff posts. Why is SQL to RDF Mapping a Big Deal? A majority of the world's data (especially in the enterprise realm) resides in SQL Databases. In addition, Open Access to the data residing in said databases remains the biggest challenge to enterprises for the following reasons: SQL Data Sources are inherently heterogeneous because they are acquired with business applications that are in many cases inextricably bound to a particular DBMS engine Data is predictably dirty DBMS vendors ultimately hold the data captive and have traditionally resisted data access standards such as ODBC (*trust me they have, just look at the unprecedented bad press associated with ODBC the only truly platform independent data access API. Then look at how this bad press arose..*) Enterprises have known from the beginning of modern corporate times that data access, discovery, and manipulation capabilities are inextricably linked to the "Real-time Enterprise" nirvana (hence my use of 0.0 before this becomes 3.0). In my experience, as someone whose operated in the data access and data integration realms since the late '80s, I've painfully observed enterprises pursue, but unsuccessfully attain, full control over enterprise data (the prized asset of any organization) such that data-, information-, knowledge-workers are just a click away from commencing coherent platform and database independent data drill-downs and/or discovery that transcend intranet, internet, and extranet boundaries -- serendipitous interaction with relevant data, without compromise! Okay, situation analysis done, we move on.. At our most recent (12th June) monthly Semantic Web Gathering, I unveiled to TimBL and a host of other attendees a simple, but powerful, demonstration of how Linked Data, as an aspect of the Semantic Data Web, can be applied to enterprise data integration challenges. Actual SQL to RDF Mapping Demo / Experiment Hypothesis A SQL Schema can be effectively mapped declaratively to RDF such that SQL Rows morph into RDF Instance Data (Entity Sets) based on the Concepts & Properties defined in a Concrete Conceptual Data Model oriented Data Dictionary (RDF Schema and/or OWL Ontology). In addition, the solution must demonstrate how "Linked Data in the Web" is completely different from "Data on the Web" or "Linked Data on the Web" (btw - Tom Heath eloquently unleashed this point in his recent podcast interview with Talis). Apparatus An Ontology - in this case we simply derived the Northwind Ontology from the XML Schema based CSDL (Conceptual Schema Definition Language) used by Microsoft's public Astoria demo (specifically the Northwind Data Services demo). SQL Database Schema - Northwind (comes bundled with ACCESS, SQL S[...]



RDF based Integration Challenges (update)

Fri, 30 Mar 2007 23:35:35 GMT

Danny Ayers responds, via his post titled: Sampling, to "Stefano Mazzochi's post about Data Integration using Semantic Web Technologies. "There is a potential problem with republication of transformed data, in that right away there may be inconsistency with the original source data. Here provenance tracking (probably via named graphs) becomes a must-have. The web data space itself can support very granular separation. Whatever, data integration is a hard problem. But if you have a uniform language for describing resources, at least it can be possible." Alex James also chimes in with valuable insights in his post: Sampling the global data model, where he concludes: "Exactly we need to use projected views, or conceptual models. ' See a projected view can be thought of as a conceptual model that has some mapping to a *sampling* of the global data model. The benefits of introducing this extra layer are many and varied: Simplicity, URI predictability, Domain Specificity and the ability to separate semantics from lower level details like data mapping. Unfortunately if you look at today’s ORMs you will quickly notice that they simply map directly from Object Model to Data Model in one step. This naïve approach provides no place to manage the mapping to a conceptual model that sampling the world’s data requires. What we need to solve the problems Stefano sees is to bring together the world of mapping and semantics. And the place they will meet is simply the Conceptual Model." Data Integration challenges arise because the following facts hold true all of the time (whether we like it or not): Data Heterogeneity is a fact of life at the intranet and internet levels Data is rarely clean Data Integration prowess are ultimately measured by pain alleviation A some point human participation is required, but the trick is to move human activity up the value chain Glue code size and Data Integration success are inversely related Data Integration is best addressed via "M" rather than "C" (if we use the MVC pattern as a guide. "V" is dead on arrival for the scrappers out there) In 1997 we commenced the Virtuoso Virtual DBMS Project that morphed into the Virtuoso Universal Server; A fusion of DBMS functionality and Middleware functionality in a single product. The goal of this undertaking remains alleviation of the costs associated with Data Integration Challenges by Virtualizing Data at the Logical and Conceptual Layers. The Logical Data Layer has been concrete for a while (e.g Relational DBMS Engines), what hasn't reached the mainstream is the Concrete Conceptual Model, but this is changing fast courtesy of the activity taking place in the realm of RDF. RDF provides an Open and Standards compliant vehicle for developing and exploiting Concrete Conceptual Data Models that ultimately move the Human aspect of the "Data Integration alleviation quest" higher up the value chain. [...]



Hello Data Web (Take 3 - Feel The "RDF" Force)

Sat, 24 Feb 2007 22:01:28 GMT

As I have stated, and implied, in various posts about the Data Web and burgeoning Semantic Web in general; the value of RDF is felt rather than seen (driven by presence as opposed to web sites). That said, it is always possible to use the visual Interactive-Web dimension (Web 1.0) as a conduit to the Data-Web dimension. In this third take on my introduction to the Data Web I would like to share a link with you (a Dynamic Start Page in Web 2.0 parlance) with a Data Web twist: You do not have to preset the Start Page Data Sources (this is a small-big thing, if you get my drift, hopefully!). Here are some Data Web based Dynamic Start Pages that I have built for some key play ers from the Semantic Web realm (in random order): Dan Brickley Tim Berners-Lee Dan Connolly Danny Ayers Planet RDF "These are RDF prepped Data Sources....", you might be thinking, right? Well here is the reminder: The Data Web is a Global Data Generation and Integration Effort. Participation may be active (Semantic Web & Microformats Community), or passive (web sites, weblogs, wikis, shared bookmarks, feed subscription, discussion forums, mailing lists etc..). Irrespective of participation mode, RDF instance can be generated from close to anything (I say this because I plan to add binary files holding metadata to this mix shortly). Here are examples of Dynamic Start Pages for non RDF Data Sources: del.icio.us Web 2.0 Events Bookmarks Vecosys Techcrunch Jon Udell's Blog Dave Winer's Scripting News Robert Scoble's Blog what about Microformats you may be wondering? Here goes: Microformats Wiki (click on the Brian Suda link for instance) Microformats Planet Del.icio.us Microformats Bookmarks Ben Adida's home page (RDFa) Let's carry on. How about some traditional Web Sites? Here goes: OpenLink Software's Home Page Oracle's Home Page Apple's Home Page Microsoft's Home Page IBM's Home Page And before I forget, here is My Data Web Start Page . Due to the use of Ajax in the Data Web Start Pages, IE6 and Safari will not work. For Mac OS X users, Webkit works fine. Ditto re. IE7 on Windows. [...]



Our Basic Human Instincts

Sat, 24 Feb 2007 00:55:49 GMT

I just overheard the following dialog between my six year old son and his play date: Play Date: What is that thing on the Wall? My Son: Security Alarm Play Date: How does it work My Son: If you click on that top button and then open the door, I will have to enter a code when we come back in or the alarm will go off Play Date: What is the code? My Son: I can't tell you that! Play Date: Why not? My Son: You might come and steal something from our house! Play Date: No I won't! My Son: Well, you might tell someone that might come and steal something from our house! or that person could tell someone who could tell someone that would steal from our house LOL!! of course! At the same time wondering, how come a majority of adults don't quite see the need for granular access to Web Data in a manner that enables computers and humans to collectively arrive at similar decisions? Putting Data in context en route to producing actionable knowledge is a transient endeavor that engages a myriad of human senses. We demonstrate comprehension of this fact in our daily existence as social creatures (at a very early age as depicted above). That said, we seem to forget this fact when engaging the Web: If we can't see it then it can't be valuable. BTW - I just received a ping about the "Sensory Web" (which is just another way of describing a Data Driven Web experience from my vantage point.) In the popular M-V-C pattern you don't see the "M", but the "M" will kill you if you get it wrong (it is the FORCE)! Coming to think about it, the pattern could have been coined: V-C-M or C-M-V, but isn't for obvious reasons :-) RDF is the vehicle that enables us tap into the Data aspect of the Web. We started off with pages of blurb linked via hypertext (Web 1.0) and then looked to "Keywords" for some kind of data access; we then isolated some "Verbs" and discovered another dimension of Web Interaction (Web 2.0) but looked to these "Verbs" for data access which left us with Mashups; and now we are starting to extract "Nouns" and "Adjectives" from sentences (Subject, Predicate, Object - Triples) associated with resources on the Web (Data Web / Web 3.0 / Semantic Web Layer 1) which provides a natural data access substrate for Meshups (natural joining of disparate data from a plethora of data sources) while providing the foundation layer for the Semantic Web. For those who need use-cases that demonstrate tangible value re. the Semantic Web, here are some projects to note courtesy of the Semantic Web Education and Outreach (SWEO) interest group: FOAF based White-lists - Attacking SPAM Open Data Access and Linking for the Data Web - Data Integration and Generation effort that creates a cluster of RDF instance data from a myriad of data sources relating to every day things such as: People, Places, Events, Projects, Discussions, Music, Books, and other things Content Labeling - Protecting our kids on the Web amongst other matters relating to knowledge about data sources Others.. Related posts: Data Web and Global Data Integration & Generation Effort Previous Data W[...]



OAT: OpenAjax Alliance Compliant Toolkit (Live Links Version)

Fri, 02 Feb 2007 15:29:55 GMT

OAT: OpenAjax Alliance Compliant Toolkit: " Ondrej Zara and his team at Openlink Software have created a Openlink Software JS Toolkit, known as OAT. It is a full-blown JS framework, suitable for developing rich applications with special focus to data access. OAT works standalone, offers vast number of widgets and has some rarely seen features, such as on-demand library loading (which reduces the total amount of downloaded JS code). OAT is one of the first JS toolkits which show full OpenAjax Alliance conformance: see the appropriate wiki page and conformance test page. There is a lot to see with this toolkit: You can see some of the widgets in a Kitchen sink application Sample data access applications: SQL Query By Example Forms designer DB Designer OAT is Open Source and GPL’ed over at sourceforge and the team has recently managed to incorporate our OAT data access layer as a module to dojo datastore. (Via Ajaxian Blog.) This is a corrected version of the initial post. Unfortunately, the initial post was inadvertently littered with invalid links :-( Also, since the original post we have released OAT 1.2 that includes integration of our iSPARQL QBE into the OAT Form Designer application. Re. Data Access, It is important to note that OAT's Ajax Database Connectivity layers supports data binding to the following data source types: RDF - via SPARQL (Query Language, Protocol, and Resultset Serialization formats: RDF/XML, RDF/N3, RDF/Turtle, XML, and JSON) SQL - via XMLA (somewhat forgotten SOAP protocol for SQL Data Access that can sit atop ODBC, ADO.NET, OLE-DB, and even JDBC) XML - via SOAP or REST style Web Services In all cases, OAT also provides Data Aware controls for the above that include: Tabular Grids Pivot Tables TimeLines Extended Anchor Tags Map Service Controls (Google, Yahoo!, OpenLayers, Microsoft Visual Earth) SVG based RDF Graph Control (Opera 9.x provides best viewing experience at the current time) OAT also includes a number of prototype applications that are completely developed using OAT Controls and Libraries: Visual SPARQL Query Builder Visual SQL Query Builder Web Forms Designer (includes Drag-Drop usage of Data Aware Controls etc.) Visual DB Designer Note: Pick "Local DSN" from page initialization dialog's drop-down list control when prompted[...]



Virtuoso's SQL Schema to RDF Ontology Mapping Language (1.0)

Fri, 17 Nov 2006 23:24:25 GMT

A new technical white paper about our declarative language for SQL Schema to RDF Ontology Mapping has just been published.

What is this?

A declarative language adapted from SPARQL's graph pattern language (N3/Turtle) for mapping SQL Data to RDF Ontologies. We currently refer to this as a Graph Pattern based RDF VIEW Definition Language.

Why is it important?

It provides an effective mechanism for exposing existing SQL Data as virtual RDF Data Sets (Graphs) negating the data duplication associated with generating physical RDF Graphs from SQL Data en route to persistence in a dedicated Triple Store.

Enterprise applications (traditional and web based) and most Web Applications (Web 1.0 and Web 2.0) sit atop relational databases, implying that SQL/RDF model and data integration is an essential element of the burgeoning "Data Web" (Semantic Web - Layer 1) comprehension and adoption process.

In a nutshell, this is a quick route for non disruptive exposure of existing SQL Data to SPARQL supporting RDF Tools and Development Environments.

How does it work?

RDF Side

  1. locate one or more Ontologies (e.g FOAF, SIOC, AtomOWL, SKOS etc.) that effectively defines the Concepts (Classes) and Terms (Predicates) to be exposed via your RDF Graph
  2. Using the Virtuoso's RDF View Definition Language declare a International Resource Identifier (or URI) for your Graph. Example:
    CREATE GRAPH IRI("http://myopenlink.net/dataspace")
  3. Then create Classes (Concepts), Class Properties/Predicates (Memb), and Class Instances (Inst) for the new Graph. Example:
    CREATE IRI CLASS odsWeblog:feed_iri  "http://myopenlink.net/dataspace/kidehen/weblog/MyFeeds" (
      in memb varchar not null, in inst varchar not null)

SQL Side

  1. If Virtuoso isn't your SQL Data Store, Identify the ODBC or JDBC SQL data source(s) containing the SQL data to be mapped to RDF and then link the relevant tables into Virtuoso's Virtual DBMS Layer
  2. Then use the RDF View Definition Language's graph pattern feature to generate SQL to RDF Mapping Template for your Graph. As shown in this ODS Weblog -> AtomOWL Mapping example.