Subscribe: The Wallace Line
Added By: Feedage Forager Feedage Grade B rated
Language: English
cems uwe  code  data  element  function  interface  module  note step  script  sql  step note  step  string  xml  xquery 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: The Wallace Line

The Wallace Line

Updated: 2018-03-06T13:20:55.974+00:00


Gone to Posterous


I should have written this a long time ago, but I had so much trouble putting code examples on this Blogger blog that I switched over to Posterous .

later - well now Posterous has gone, I've set up my own blog using XQuery and eXist-db.

even later I've moved to using Tumblr

Weather Data on the Web


In preparation for our cruise up to Scotland this summer, I'm setting up some SMS services so I can get weather reports on board, provided we're in mobile phone range. This is based on the two-way service rented from Clickatell. I recently rewrote a PHP/MySQL router which routes MO calls to an application based on the first word of the message, and returns the reply, if any, to the originator. Much simpler in XQuery because the routing table is now just a simple XML configuration file and the XQuery code is much cleaner. So far I've written services to get the UK shipping forecast, the UK inshore waters forecast and the latest weather conditions at weather buoys. Each has presented different challenges to acquire the raw data, both technical and legal. In the domain of weather information at least we seem a very long way from an integrated, easy to use, web of data. First the inshore waters forecast. The only publicly available format is this web page. The Met Office does provide a few RSS feeds but none for shipping forecasts. This web page looks technically promising for analysis even if I'm unsure of the legal status of this act. I'd like to know how the Met Office is funded currently but failed to discover from a Google quick search. I'd like to know the extent to which this is 'Our Data' and despite the Met Office legal notices and Freedom of Information pages, I'm none the wiser really. I console myself with the fact that I'm only playing with no intention to produce a commercial service in competition with the Met Offices own services.The Inshore waters page looks promising, with sections for each area split into meaningful headings. However on closer inspection the page suffers from that increasingly common bane of the scrapper, a complex mixture of data and JavaScript. The page appearance is the result of JavaScript processing of bland text. Here is the raw forecast for my bit of the coast:Lands End to St Davids Head including the Bristol Channel24 hour forecast: Variable 3 or 4. Slight becoming moderate later in southwest. Fair. Moderate or good.Outlook: Variable 3 or 4, becoming west or northwest 4 or 5, occasionally 6 later.Slight or moderate.Thundery rain or showers.Moderate or good. Well now. Firstly, this is not all the data in the displayed section; the time span and the strong winds warning(if any) are elsewhere in the HTML. The nice sections are not there: instead the four parts of the forecast separated by fullstops - so the last sentence 'Moderate or good' is the Visibility. Second, the limits of the areas are identified by place identifiers in the maplet, but these do not appear in the text, and only the full area name can be used to identify. Of course, the ardent scraper can cope with this. I've been forced to add my own area ids however to support the SMS interface:Lands End to St Davids HeadBut it's horrible, unstable and makes me wonder if this design is a form of obfuscation. I suppose if they wanted to, they could switch randomly between different HTML/JavaScript layers generating the same appearance and then scrappers would be really stuffed - thankfully that seems not be be the case. Next stop, the shipping forecast. In this case the forecast text is not on the page at all but in a generated JavaScript file which defines JavaScript arrays and their values. In an way that's simpler because I just have to fetch the JavaScript source and parse it. This application and its design is described in detail in the XQuery Wikibook. Over in the States, their freedom of information creates a very different data climate, and NOAA provides a wonderful array of RSS and XML feeds. However, reusing even this data is not without its problems. One set of feeds I want to tap into are the data from weather buoys around the world. Many are operated by NOAA and others by local Met services or commercial operations. The UK coverage shows the locations and identifiers for UK station and there is an RSS feed of the current conditions at a buoy. The nearest up-weather buoy[...]

Twitter Radio


Thought I'd try to get my XQuery Twitter Radio application going to listen to the tweets from the Mark Logic conference. It's only a simple script, requires Opera with Voice enabled and uses REPLACED="refresh" to refresh the page. It only works if the window is active, so it rather limits my use of the computer - just need another to run the radio I guess. If I wasn't marking, I'd write an AJAX-based version. XHTML+Voice is quite tricky to get right however.

Twitter Radio on #mluc09

I rather like the idea of following the Mark Logic conference with an eXist-based mashup - perhaps we should organise an eXist conference in Bristol - with my part-time status next academic year, perhaps I should put some effort into an event in Bristol.

Matching sequences in XQuery


Collation is a core algorithm in processing sequences. In XQuery, the straight-forward expression of the algorithm is as a recursive function:

declare function local:merge($a, $b as item()*)
as item()* {
if (empty($a) and empty($b))
then ()
else if (empty ($b) or $a[1] lt $b[1])
then ($a[1], local:merge(subsequence($a, 2), $b))
else if (empty($a) or $a[1] gt $b[1])
then ($b[1], local:merge($a, subsequence($b,2)))
else (: matched :)
($a[1], $b[1],

Coincidently, Dan McCreary was writing an article in the XQuery wikibook on matching sequences using iteration over one sequence and indexing into the second. The task is to locate missing items. Collation is one approach to this task, albeit requiring that the sequences are in order.

Here is a test suite comparing three methods of sequence comparison.

I also did some volume tests with two sequences differing by a single, central value. Here are the tests on a sequence of 500 items. In summary, the timings are :

* Iteration with lookup: 6984 ms - not repeatable - average is 2600
* Iteration with qualified expression: 1399 ms
* Recursive collate: 166 ms

The collate result is surprising and rather impressive. Well done eXist!

More XQuery performance tests


I noticed this morning that Dan had added an alternative implementation to an article in the XQuery Wikibook on matching words against a list. It got me wondering which implementation was preferable. I wrote a few tests and was surprised at the result. My initial implementation based on element comparisons was five times slower than comparing with a sequence of atoms, and Dan's suggestion of using a qualified expression was worse still.

Here is the test run and the Wikibook article.

XQuery Unit Tests


I had a fright last week - Wolfgang asked for a copy of the test harness I'd used to evaluate different implementations of a lookup table. This is code I wrote some time ago, tinkered with, good enough for our internal use but ... well pretty bad code. I have to confess here that as a lone XQuery programmer, my code doesn't get the level of critique it needs. The Wikibook has been disappointing in that regard: I've published thousands of lines of code there and there has not been a single criticism or improvement posted. Typos in the descriptions are occasionally corrected by helpful souls, graffiti erased by others but as a forum for honing coding skills - forget it. In my task as project leader on our FOLD project (now coming to an end), I see and review lots of my students' code as well as the code Dan McCreary contributes to the WikiBook so I do quite a bit of reviewing. However I am only too conscious of the lacunae in my XQuery knowledge which perhaps through over kindness or because everyone is so busy, remain for too long. I'm envious of my agile friends who have been pair-programming for years. Perhaps there should be a site to match up lonely programmers for occasional pairing.Anyway the test suite got a bit of work on it one day last week and its looking a bit better. Here is a sample test script . As a test script to test the test runner it has the unusual property that some failed tests are good since failing is what's being tested. Here is it running.Here is another, used to test the lookup implementations and one to test the geodesy functions. Version 1 of the test runner executed tests and generated a report in parallel. A set of tests may have a common set of modules to import, prefix and suffix code. For each test, modules are dynamically loaded, the code concatenated and then evaled inside a catch. let $extendedCode := concat($test/../prolog,$test/code,$test/../epilog)let $output := util:catch("*",util:eval($extendedCode),Compile error) The output is compared with a number of expected values. Comparison may be string-based, element-based, substring present or absent. (I also need to add numerical comparison with defined tolerance.) A test must meet all expectations to pass. To get a summary of the results requires either running the sequence of tests recursively or constructing the test results as a constructed element and then analysing the results. Recursion would be suitable for a simple sum of passes and fails, but it closely binds the analysis to the testing. An intermediate document decouples testing from reporting, thus providing greater flexibility in the analysis but requiring temporary documents. So version 2 constructed a sequence of test results, and then merged these results with the original test set to generate the report. Collating two sequences is a common idiom which in functional languages must either recurse over both, or iterate over one sequence whilst indexing into the other, or iterate over a extracted common key and index into both. The reporting is currently done in XQuery but it should be possible to use XSLT. Either the collating would need to be done before the XSLT step or XSLT would have the collating task. Not a happy situation.So last week in comes version 3. Now the step which executes the tests augments each test with new attributes (pass, timing) and elements (output) and similarly each expectation with the results of its evaluation so that one single, enhanced document is produced, with the same schema as the original [the augmented data has to be optional anyway since some tests may be tagged to be ignored]. Transformation of the complete document to HTML is then straightforward either in line,in a pipeline with XQuery or XSLT. The same transformation can be run on the un-executed test set.Augmenting the test set is slightly harder in XQuery than it would be in XSLT. For example, after executing each [...]

Implementing a table look-up in XQuery


Handling temporary XML fragments in the eXist XML db has improved markedly in version 1.3. I have been looking again at an example of processing MusicXML documents which I first wrote up in the XQuery wikibook. The code requires a translation from the note name (A, B) to the midi note value for each note. The pitch of a note is defined by a structure like: C 1 3 One approach is to use an if -then -else construct:declare function local:MidiNote($thispitch as element(pitch) ) as xs:integer{ let $step := $thispitch/step let $alter := if (empty($thispitch/alter)) then 0 else xs:integer($thispitch/alter) let $octave := xs:integer($thispitch/octave) let $pitchstep := if ($step = "C") then 0 else if ($step = "D") then 2 else if ($step = "E") then 4 else if ($step = "F") then 5 else if ($step = "G") then 7 else if ($step = "A") then 9 else if ($step = "B") then 11 else 0 return 12 * ($octave + 1) + $pitchstep + $alter} ;but this cries out for a table lookup as a sequence:declare variable $noteStep := ( , , , , , , );declare function local:MidiNote($thispitch as element(pitch) ) as xs:integer{ let $alter := xs:integer(($thispitch/alter,0)[1]) let $octave := xs:integer($thispitch/octave) let $pitchstep := xs:integer($noteStep[@name = $thispitch/step]/@step) return 12 * ($octave + 1) + $pitchstep + $alter} ;or an XML element:declare variable $noteStep := ;declare function local:MidiNote($thispitch as element(pitch) ) as xs:integer{ let $alter := xs:integer(($thispitch/alter,0)[1]) let $octave := xs:integer($thispitch/octave) let $pitchstep := xs:integer($noteStep/note[@name = $thispitch/step]/@step) return 12 * ($octave + 1) + $pitchstep + $alter} ;We could also store the table in the database since it is constant.eXist does some optimisation of XPath expressions, but it does not factor out the invariant expression $thispitch/step in the XPath predicate. I wrote a test suite to time these various implementations. Typically this shows that factoring the sub-expression reduces the execution time by 25%. However, even with this optimisation, the structure lookup is disappointingly slow. It is about 50% slower than the if/then expression when stored on disk, and 100% slower when in memory. This aspect of XQuery performance is important if XQuery is to be used for general application development since data structures such as indexed and associative arrays have to be represented as sequences of atomic values or elements. This performance is not really surprising and there may be more performance to be gained by indexing the data base element.[...]

XQuery module for Geodesy


I wrote my first attempt at Mercator ~ Latitude/Longitude conversion functions about 2 years ago when working in a case study for SPA2007. Part of this was documented in the XQuery Wikibook article on UK bus stops and Ordnance Survey coordinates. At the time I did not appreciate why my coordinates were just a bit off but fudged the difference. Last month I used the same functions to map pedal cycle data but as gardens and roofs appeared to be much more dangerous than roads, I thought I'd better try harder and learnt about Helmert Transformations.

My latest attempt is now available in the XQuery Examples Google Project and the Wikibook article has been revised to use this module. The formulae come mainly from the OS Guide to coordinate systems. PHP Code on Barry Hunter's site was also useful.The test suite for this module is still being added to. I have struggled with rounding, needed to get positions with sensible resolutions and for testing and I'm not yet happy. Some tests need visual inspection and there is a problem with heights.

The module depends on the eXist math module, a prime candidate for cross-implementation standardization by the EXQuery initiative. In the latest version (v1-3) of the module, the math:atan2() function has parameters in the correct order (y,x) but older releases had these parameters reversed,as in v1-2.

The design of the module uses elements with attributes in the same namespace as the module to define compound structures such as LatLongs, Ellipsoids and Projections. These are defined in an associated schema. Compile-time checking in eXist is limited to checking the element name since eXist is not schema-aware although full schema-awareness would be of benefit in this module.

Suggestions for additions to this module are most welcome, as of course is any review of the code.

Dashboards and Widgets in XQuery


Jim Fuller's recent article on Dashboards in XQuery makes a very good case for using XQuery for mashups generally. Jim's dashboard application reminded me of work I had been doing with my students last term on a configurable web page containing widgets to display NOAA weather data, RSS feeds, Google Maps and their own choice of data source. For this we used PHP with Simple XML, but to demonstrate the power of XQuery, I needed to show them the same application built on XQuery. It also seemed to me that the business dashboard would benefit from a design which split widget code from company-specific data.The basic architecture of the approach here is to create a set of generalised widgets as XQuery functions and to define the specifics of the widget in a configuration file.Here are a couple of configurations : jim.xml which is based on Jim's Dashboard exampleand dsa.xml which is based on our course weather display.The second example has a page refresh rate set, but I'm working on making the refresh rate widget- rather than page- specific using AJAX.In the demo interface code, each widget links to its XQuery source code and hence to the code of called functions. Widget executions are timed and there is a link to the configuration file itself. Here is a basic main script:import module namespace widgets = "" at "widgets.xqm";declare option exist:serialize "method=xhtml media-type=text/html omit-xml-declaration=no indent=yes doctype-public=-//W3C//DTD XHTML 1.0 Transitional//EN doctype-system=";let $configuri := request:get-parameter("config",())let $config := doc($configuri)/configreturn {$config/title}


{ for $section in $config/section let $component := $section/* return


} Run itThere are a couple of ideas used in the code. Each widget is defined by its own section in the configuration file. As a simple example, to configure the widget to get the date and time:
EE dd/MM HH:mm
The main script processes the configuration file, and each widget is rendered by calling a function of the same name. The code for this dispatching currently uses the eXist function util:eval(), within a util:catch call, to implement late binding:declare function widgets:render($section as element(section)) as element(div) {(: dispatch section to the matching function :) let $component := $section/*[1] let $widget := local-name($component) let $function := concat("widgets:", $widget, "($component)") return
{util:catch( "*", util:eval($function), Missing or bad widget.) }
};A safer alternative would be to use typeswitch :declare function widgets:render2($section as element(section)) as element(div) { let $component := $section/*[1] return typeswitch ($component) case element(datatime) return widgets:datetime($component) case element(SQL) return widgets:SQL($component) case element(monitor) return widgets:monitor($component) .... default return
Missing widget {local-name($component)}
};but this needs updating every time a new widget is added to the module.To help with processing table data from different sources such as SQL, Excel[...]

Parameterised MS Word Documents with XQuery


It's coming round to exam time again at UWE, Bristol and as usual I've been struggling to get mine written. The XQuery-based FOLD application (which supports staff and students in our School) generates exam front pages contain exam details such as module code and title, examination date, length and time as HTML which had to be copied (poorly) into MS Word. This wasn't very satisfactory and it would be better to generate a Word document with the completed front page and sample pages with headers and footers. I'd put this off as it seemed too complicated. The Word XML format wordml is one route but it looked daunting to generate for scratch. However for this application I only need to make some small edits to a base document. The most obvious approach was to 'parameterise' the Word document with place-holders. Unique place-holders can be edited in with Word before the document is saved as XML. Fields which are not editable in MS Word, such as the author and timestamps can be parameterised by editing the wordml directly. To instantiate a new Word document, the place-holders in the wordml are replaced with their values.Treating this as string replacement is easier than editing the XML directly, even if this was possible in XQuery. The XQuery script reads the wordml document, serializes the XML as a string, replaces the placeholders in the string with their values and then converts back to XML for output.Although this is not a typical task for XQuery and would be written in a similar way in other scripting languages, it is possible in XQuery with the help of a pair of functions which should be part of a common XQuery function library. In eXist these are util:serialize() to convert from XML to a string and the inverse, util:parse().The function needs to replace multiple strings so we use a an XML element to define the name/value pairs:let $moduleCode := request:get-parameter("moduleCode",())..let $replacement := ..and a recursive function to do the replacements:declare function local:replace($string,$replacements) {if (empty($replacements))then $stringelse let $replace := $replacements[1] let $rstring := replace($string,string($replace/@string),string($replace/@value)) return local:replace($rstring,subsequence($replacements,2))};After gathering the parameter values and formatting a replacement element, the new document is generated by:let $template := doc("/db/FOLD/doc/examtemplate.xml")let $stemplate := util:serialize($template,"method=xml")let $mtemplate := local:replace($stemplate,$replaceStrings/*)return util:parse($mtemplate)Here the generated wordml is displayed in the browser, from where it can be saved, then loaded into Word. I found out the directive at the front of the wordml:is used by the Windows OS to associate the file with MS Word so the media type is just the standard text/xml. However it is helpful to define a suitable default file name using a function in eXist's HTTP response module, the pair to the request module used to access URL parameters:let $dummy := response:set-header('Content-Disposition', concat('attachment;filename=',concat("Exam_",$moduleCode,".xml") ))let $dummy := response:set-header('Content-Type','application/msword')The document could also be saved directly to the database, or all documents generated eagerly ready for use.This approach feels like a bit of a hack, but it took only an hour to develop and is a major improvement on the previous approach. Changes to the base document will need re-parameterisation, but that seems a small overhead for slowly changing standard documents. XQuery forces a recursi[...]

Review of IBM developerWorks article by Brian Carey


I've just come across an article published by IBM's developerWorks "Use XQuery for the presentation layer" by Brian Carey. This illustrates the value of storing complex data in XML form and using XQuery to select and transform to HTML. Whilst the main message is well-presented, the implementation, below layers of Java, is over-complicated in a couple of ways.Brian makes the curious observation under the heading Using XQuery prevents "cheating" that "You cannot write business logic in XQuery because XQuery focuses exclusively on querying and transformation". This rather ignores the fact that a significant number of applications are built soley on XQuery as the server-side language. The consequence is that a simple Web application retains a complex and unnnecessary Java middle tier acting as Controller in an MVC architecture.Brian's web application provides a browsing interface to a collection of products using AJAX to update the page and XQuery to select producats and transfrom to an HTML table.Implemented in XQuery on the open source eXist XML database as an example, we need only use the HTTP interface functions provided to couple the HTTP requests directly to the XQuery script. For eXist the additions would be:declare variable $docName as xs:string := "lures.xml";declare variable $configuration as xs:string := request:get-parameter("configuration",());declare variable $usage as xs:string:= request:get-parameter("usage",());It might be objected that this script binds the resources and interfaces too closely to the script. Indeed the only benefit of the Java controller layer is this de-coupling. We can achieve the same effect in XQuery with a local configuration file to aid portability and testing: /db/Wiki/BC/lures2.xmland adding these lines to the XQuery script:declare variable $config := /ConfigurationFile;declare variable $docName := $config/lureFileName;I've implemented Brian's code with eXist on the XQuery Wikibook server and the relevant scripts are here: changing only the script address in the HTML code and correcting an error in the AJAX function where request.readyState was miss-typed. (took me a while to track that down!). Middle layer all gone. Storage of the whole application in eXist would be a normal deployment but was not possible without editing because the HTML page is not valid XHTML .One impediment to the use of XQuery as an application development language is that functions to extend the XPath function with functionality such as HTTP interfacing are implementation-dependent, limiting portability. A new inititive EXQuery seeks to remedy this problem by developing a cross-platform library.One other feature of Brian's implementation is the structure of the XML file. The start of the file looks like But since the data is intended to be searched for usage(e.g. casting) and configuration (e.g. minnow) this leads to XQuery code likeif ($configuration = 'minnow' and $usage = 'casting') thenfor $minnows in doc($docName)//casting/minnowsreturnelseelse if ($configuration = 'minnow' and $usage = 'trolling') then..and because the child structures are actually all the same, this leads to unmaintableable and repetative code.A fix is possible in XQuery using local-name to filter using the node names themselves. A better approach would be to flatten the whole file, changing the representation of the configuration and usage concepts from elements to attributes: Data Normalization


My lastest teaching program is for model inference from un-normalized data. This had its inception in a PHP tool written some years ago. The new version uses my ER XML schema to integrate the output with the other data modelling tools.

Normalisation is usually taught on data base courses via the abstract concepts of first, second third normal and higher normal forms. In my introductory module I just want to get over the basic idea of reduction in data duplication through the actual factorisation of a first normal-form table (atomic values) into a set of related third normal-form tables.

Here is the tool, written of course in XQuery:

I get the students to take a nominated data set, factorise it, generate the SQL table declaration and INSERT statements, load them into a MySQL database and then reconstruct the original table using a select statement joining all the tables. This allows the student to check that the factorisation is loss-less but of course it does not check that it is optimal. At present the tool allows the student to explore different possibilities and create any factorisation they like.

The state of the factorisation is the current dataset URI and the current factorisation, defined by an ER model. Currently these are passed between client and server in the URL. This limits the size of the model and i guess I should change to POST but the interface behavior will not be as clean (the back button works as a simple Undo) and I can't have simple generated links on buttons. I guess I need help in interface programming here.

For the record, the code is 600 lines split into 27 functions and using two functions in the larger er module to transform the XML model to DDL and an ER diagram. Code by request until I find a suitable home.

Data Modelling Tutor


The SQL tutor is now in use and seems to be finding favour with students and other tutors. There is a long list of things to add, like the ability to discuss an exercise but the course moves on and now I want to apply the same ideas to the teaching of data modelling. Students often find this rather difficult.

For the past few years we have used an excellent case tool called QSEE, developed by Mark Dixon at Leeds Metropolitan University. We have mainly used this multi-diagram tool for the ER diagrams. QSEE supports conceptual ER models and handles foreign keys, link tables and week entities when generating the SQL DDL. I have a running battle with some other tutors over the use of conceptual ER diagrams versus Relational ER Diagrams, complete with foreign keys and link tables. In my multi-paradigm teaching, conceptual models which treat the later as artefacts of a relational data model makes more sense. Of course I'd like to see a few improvements but sadly development of this tool seems to have ceased. Pity that it hasn't been open sourced.

My teaching emphases the difference between a model and its various representations as diagrams, as text and as code. Since we have already studied XML, it is natural to think of representing the conceptual data model as an XML document and writing the transformations in XSLT or XQuery. Having used Graphviz for a number of years the XML can be transformed to the dot notation to create diagrams in different diagraming conventions. Moreover the goal of providing an interactive data modeling tutor seems more easily realised by processing textual descriptions.

So this weekend, snowed in from work on the boat, I've been working on this tutor and data modelling tool. The really hard part has been to write the model comparator so that differences between a student model and the 'master' model can be detected and explained. This has to take account of variations in the names the student may use as well as differences in order of definition, so a straight XML diff isn't enough. What I have now is not quite right but it will have to be good enough if I want to get this tutor out to students this week.

So here is the link to the index of worksheets so far written:

The transformations currently supported include Chen diagrams, ER diagrams with or without attributes, SQL DDL, SQL DDL with foreign keys and many-many resolution, and a rather clunky English text representation.

One feature which was unplanned and just emerged as an obvious addition, was the ability to provide a default model template so the student could solve initial problems by filling in the blanks rather than starting with a blank page.

There is still a lot to do, but I'm pleased to have got the prototype of the ground - a long-held idea finally coming to fruition - made possible by the power of XQuery and the eXist XML database, for which I give thanks to Wolfgang and and the guys every day.



Some years ago I wrote an SQL workbook which was used on a couple of courses, but although it had an interactive interface so that that a student could test their SQL statements against the example database, the results were not checked. I planned to create an interactive site which would present tasks to the student, accept the SQL statement input, execute the statement and compare the result with that of the model answer. Since I was teaching PHP/MySQL at the time, in good dog-fooding tradition, I started on an implementation using this platform, but it got sticky and stalled. Then I discovered XQuery and the other day, wrote an XQuery/ eXist implementation.[I'm really supposed to be marking but I find my most creative streak when marking's about - the root perhaps of the love/hate relationship I have with my job.]The relative ease with which this version was created well illustrates the power of the XQuery/ Native XML database development approach. This application lies in the sweet spot for this technology and here is why I think that is:XML handles CompositionEach worksheet is represented by an XML document. The document describes properties of the worksheet - database used, title, tags and each of the steps in the worksheet. It is too weak to say that steps are part of the worksheet, they -are- the worksheet. A normalised relational implementation requires one table for the worksheet properties , another for the steps and a foreign key to link these two types together. This artifical separation into master and child records complicates the model and its processing. A symptom of the need for a composition can be found in the naming problem - what do you call the master table - worksheet perhaps? But that's not right - in domain terms a worksheet is the master record AND its children as a single entity. worksheetBody? - yeech.XML handles orderThe Steps in a worksheet are ordered. To represent this in SQL requires an additional sequence number . Then editing to insert and delete steps requires re-numbering. In XML, order is implicit in the document order of the steps.XML handles heterogeneitySteps in a worksheet are different types. Some are simple explanations, some are demonstrations of SQL constructs, many are exercises and others have yet to be designed. A relational approach would either coerce all types into a common structure, with lots of null fields, or use multiple tables, one for each type, and a super-type table. [Object-relational databases support a more transparent implementation but who uses those?]. In XML, different types of Step can be freely interleaved in the step sequence.XML handles multi-valued attributesAs with most of my applications these days, I use tags to support searching and classifying resources. In a normalised relational database, I should break these out into a separate table with a foreign key, but would probably fudge the problem by putting all tags into an arbitrarily concatenated field. In XML tags are simply repeated elements with no temptation to step outside the data model.XML supports documentsWorksheets are standalone entities, which are authored, edited, deployed, removed as units. In a relational approach, all worksheets would be stored in common database tables, and the domain concept of worksheet as a unit is lost. This is particularly a problem in development: in the SQL tutor, worksheets may either be located in the XML database but may also be accessed from anywhere on the web - it makes little difference to the script providing the interactive interface. So new worksheets can be tested before installation with ease. They can also be exchanged,[...]

Listen to Twitter


Finding myself working but wanting to know how Lewis Hamilton was getting on, I wondered if Twitter would be able to let me know. I was looking for interesting feeds for the students, so I knocked up a bit of XQuery to fetch the atom feed for a Twitter search and turn that into Voice+XML for use with Opera. Works pretty well even if it is rather unsophisticated and uses page refresh rather than AJAX. The script uses the md5 hash of the last tweet spoken to know what new tweets there are. I plan to have this running on Tuesday in the lecture. One problem is that it only works if the Opera window is active so I can't have it running in the background. However the main problem is that tweets don't indicate the language so a lot of very poor, and probably disappointed Portugese is being tweeted now on the Hamilton stream.

Wikipedia Categories for Posters


I'd planned to extend the Alphabet maker into a site that assisted Charlie to find appropriate names by inducing the category of terms and then either warning about names not in the category, or correcting spelling based on names in that category, or even suggesting a name for a missing letter.

First I thought I should understand the categories available in dbPedia and started with the Wikipedia categories using the skos vocabulary . I wrote a small skos-based browser:

This has two pages: a category page showing the category, the list of resources in that category with broader and narrower categories and a resource page showing the English abstract and the Wikipedia thumbnail if there is one.

From a category, you can link to a gallery of all thumbnails for resources in that category, and hence to a random Alphabet poster based on that category. There is a significant proportion of dead links among the thumbnails however and I need to look-ahead to exclude them.

One feature of this application which I haven't seen elsewhere (I live a sheltered life!) is the use of key-bindings to perform common searches on selected text. Text selected in the abstract can, with one key-stoke, link to Wikipedia, Google, Google Maps or Google Images. I like the idea of giving more control to the user over what is linked, and I have implemented this on my prototype presentation software which I'm trialling on a couple of courses to see if students find this useful.

Browsing around dbPedia using Wikipedia categories and foaf:depiction is not without its problems. For example the category Amphibians includes:
  • common names of amphibians - Cave Salamander
  • species of amphibians - Gerobactrus
  • groups, families and orders of Amphibians - Oreolalax
  • parts of amphibians - Vocal Sac
  • lists of amphibians - List of all Texas amphibians
  • lists of related subjects - Federal Inventory of Amphibian Spawning Areas
This puts me in mind of Borges' invention of a Chinese classification of animals. Aren't categories like "suckling pigs" and "those that from a long way off look like flies" just delicious? However, erhaps a subject's other categories might help but there is no "List" category for example, so no way to disambiguate the various usages of a category.

foaf:depiction has a similar problem. The Modern Painters category shows a equal mixture of depictions of the painter and depictions of works by the painter, with a few depictions of where the artist lived. This is particularly confusing when the image is a portrait! However, these categories are much cleaner than others, if somewhat incomplete.

It has often been observed that tools based on dbPedia should help to improve Wikipedia. For example it is clear that the Painters by Nationality
should not have any Painter resources, so it would be nice to use this interface to edit the categories of the two errant painters directly from an interface like this.

Alphabet Poster


Grandson Charlie (age nearly 6) rang the other night to tell me the animals he had found for the animal alphabet we had discussed the previous night. I thought it would be a neat present to make a program to create a poster by fetching images from the web for each of his words and lay it out as a poster. I like the idea of writing programs as gifts, but Charlie would prefer something real- like a climbing wall!

I thought of using Flickr, or Google images, then settled on using Wikipedia, searched via dbpedia.

There are generally two images included in the dbpedia data - foaf:img - a full size JPEG image and foaf:depiction a GIF thumbnail. The thumbnails are fine for this job.

The SPARQL query to get the thumbnail for an image is rather simple:

PREFIX foaf:
:Hedgehog foaf:depiction ?img.

The XQuery script parses the list of words and for each word, uses this query to get the uri of the wikipedia image. The trickiest part was laying out the poster. I struggled to do the gallery layout in CSS alone but could not get this to work with an image + caption. In the end I reverted to a table layout with a width parameter.

The functional XQuery requires the layout to be done in two stages: first generate the table cells in the right, sorted order. Then compute the number of rows required for the given number of columns and generate the table, indexing into the cell sequence to layout the cells in order. In an imperative language, or a language which did not require that every constructed element was well-formed, the two task can be merged. The approach necessitated by the functional language feels cleaner but I'd prefer to write this as a pipeline: sort the words > generate the image cells > layout the table without the need to resolve the structure clash (a Jackson Structured Programming term) between the image order and the table order via a random access into a sequence. The co-routines in Python would make a better solution I feel. XML Pipelines might help but they feel too heavyweight for this minor task.

Charlies Animals so far.

The XQuery Script is in the Wikibook

RDF Vocab work


I'm off to Oxford to learn about RDF Vocabularies at the Oxford Vocamp.

My own meanderings in this field have been limited to a rather hacked Vocabulary Browser written in XQuery:

and my rather limited attempts to provide an RDF extract from the FOLD Information System.

with a current dump of the RDF

SPARQLing Country Calling Codes


Stimulated by Henry Story's blog entry, I wrote the equivalent in XQuery, and in doing so, bumped into some issues with the dbpedia data. In particular, there is no category I could find to identify a country, but then what constitutes a country depends on what the geographical entity is classified for, so this is to be expected.

In the end I resorted to scraping the wikipedia page which lists the codes directly.

Wikibook module

XQuery SMS service


I've recently resurrected our two-way SMS service for use by my students in their current coursework, a site to gather and report results for their chosen team sport. I require an exotic interface to the data, for example a speech interface with Opera or an SMS interface. In my SMS installation, the first word in an in-coming message is used to determine the service to which the message is to be routed via HTTP, and the reply if any is then sent via our out-bound service to the originating phone. The framework was originally implemented in PHP, but individual services can be in any language. There are a number of mainly demonstration services implemented. XQuery is used to implement a decoder for UK vehicle license numbers. This is also a nice example of the use of regular expressions. By comparison with the original PHP script, the XQuery version is both cleaner and more general. However there is no regexp function in XQuery which returns the matched groups in an expression, so this is perhaps bodged with a wrapper around the XSLT2 analyze-string function.

RDF /Sparql with XQuery


As part of my learning about RDF, Sparql and the semantic web, I thought I would take the familiar employee/department/salary grade example which I used in the XQuery/SQL comparison as a case study. To this end I wrote two XQuery scripts:
  • XML to RDF - a script using a generic function, guided by a map , to translate flat XML tables to RDF and RDFS
  • Sparql query interface - an XQuery interface to a Joseki Sparql service to allow the user to execute Sparql queries against the emp-dept RDF
This is documented in an article in the wikibook.

AJAX, AHAH and XQuery


Today [well some days ago now, this item got stuck in draft] , I came across the abbreviation AHAH to refer to the style of using AJAX to request XHTML fragments to be inserted into an HTML page. The example of XQuery and AJAX to search employee data in the wikibook used this pattern - like the gentleman in Molière's play, I had been speaking AHAH all these years without realising it .

I also happened on an item in Mark McLaren's blog in which he describes the use of this pattern to provide an incremental search of the chemical elements. He advocates using a JavaScript library such as but I'm not sure this library is warranted for a simple task like this (tempting fate here I fear). For teaching purposes, minimal code is best I feel. So I implemented a version using XQuery and minimal JavaScript.
XQuery and AHAH make a pretty good pair I think.

GoogleChart API and sparklines


As a long-time fan of Edward Tufte's work, I've often wanted to make use of his sparkline idea, but haven't come across a suitable tool to make them. Now the GoogleChart API can generate these and a plethora of other chart types via a web service.

Here is an XQuery script to demo the interface, using the character-based simple encoding of the data:

I have one small problem - I don't know how to get rid of the axes.


I've just discovered the undocumented chart type lfi so the sparkline can be shown without the axes - I found out from Brain Suda's blog



As we start to think about the equipment we need aboard Aremiti, the Westerly ketch we are currently re-fitting, one new item that is on our shopping list is AIS.

All vessels over 300 tons and passenger vessels over 100 tons are required to carry an AIS transmitter. This broadcasts vessel data such as identification, location, speed and course on a VHF frequency. This is picked up by shore or vessel-based receivers and decoded into NMEA sentences. The data can then be used to map the vessel on a electronic chart or radar or combined with a receiving vessel's own location and course, in collision avoidance. AIS data may also be broadcast by or on behalf of static navigational aids like lighthouses and buoys.

There are a number of manufacturers of AIS 'engines' (receiver/decoders) : NASA (misleadingly called a 'radar' system) and KATAS; and software such as Shiplotter.

Since the setup cost for an amateur shore station is minimal, anyone with line of sight of a busy stretch of water can set up their own. Some publish the results on the web.
A site which I came across tonight,
is a wonderful example of what a enthusiastic web engineer can do with this data. No longer is that ship in the distance a grey blob - it's a vessel with a name, a speed, a destination, a closeup when mashed up with images from this site or
and possibly a story, a history of visits and voyages. In a small boat, that data broadcast to all and sundry could be life-or-death information to you. That distant blob on an apparent collision course is no longer anonymous, routeless and inhuman. If you are still uncertain about the ships intentions, it's so much less confusing to call up a vessel by name than some vague lat/long and bearing.

All this depends on the global unique, stable IMO number, introduced to improve the safety of shipping. On the web, it is this identifier which is the basis on any semantic web data and tools to bring this information together.

The problem for both the above sites is to garner a modicum of funds to support the engineer's passion. One key question for the semantic web is how to reward them for making their deep pot of information available as RDF. It would seem so wrong to scrape their pages, tempting though it is.

More XQuery and Semantic web mashups.


Somewhat rested after a short, breezy holiday in Falmouth , with the server now working, I completed my two case studies of XQuery /DBpedia mashups. Both are described in the XQuery Wikibook. The implementation is still a bit hairy, but now makes use of the SPARQL Query XML Result format, although I still find it useful to transform to tuples with named elements.

The first is the mapping of the birth places of football players by club. [Wikibook]

The starting page is an index of clubs in the top English and Scottish leagues:
The second shows the discography of rock artists and groups, shown as an HTML table and using SIMILE timeline. [Wikibook].

The starting page is an index of artists in a selected Wikipedia category, by default the Rock and Roll Hall of Fame: