Subscribe: Sean McGrath
Added By: Feedage Forager Feedage Grade B rated
Language: English
back  contracts  data  digital  document  end  language  law  new  part  problem  software  text  time  word  world   
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Sean McGrath

Sean McGrath

Sean McGrath's Weblog.

Last Build Date: Mon, 23 Apr 2018 09:25:59 +0000


Thinking about Software Architecture & Design : Part 2

Thu, 19 Apr 2018 11:00:00 +0000

Technological volatility is, in my experience, the most commonly overlooked factor in software architecture and design. We have decades worth of methodologies and best practice guides that help us deal with fundamental aspects of architecture and design such as reference data management, mapping data flows, modelling processes, capturing input/output invariants, selecting between synchronous and asynchronous inter-process communication methods...the list goes on.  And yet, time and again, I have seen software architectures that are only a few years old, that need to be fundamentally revisited. Not because of any significant breakthrough in software architecture & design techniques, but because technological volatility has moved the goal posts, so to speak, on the architecture. Practical architectures (outside those in pure math such as Turing Machines) cannot exist in a technological vacuum. They necessarily take into account what is going on in the IT world in general. In a world without full text search indexes, document management architectures are necessarily different. In a world without client side processing capability, UI architectures are necessarily different, in a world without always-on connectivity.....and so on. When I look back at IT volatility over my career – back to the early Eighties – there is a clear pattern in the volatility. Namely, that volatility increases the closer you get to the end-users points of interaction with IT systems. Dumb “green screens”, bit-mapped graphics, personal desktop GUIs, tablets, smart phones,  voice activation, haptic user interfaces.. Many of the generational leaps represented by these innovations have had profound implications on the software architectures that leverage them. It is not possible – in my experience – to abstract away user interface volatility and treat it as a pluggable layer on top of the main architecture. End-user technologies have a way of imposing themselves deeply inside architectures. For example, necessitating an event-oriented/multi-threaded approach to data processing in order to make it possible to create responsive GUIs. Responding sychronously to data queries as opposed to batch processing.  The main takeaway is this: creating good software architectures pay dividends but they are much more likely to be significant in the parts of the architecture furthest away from the end-user interactions. i.e. inside the data modelling, inside discrete data processing components etc. They are least likely to pay dividends in areas such as GUI frameworks, client side processing models or end user application programming environments. In fact, volatility is sometimes so intense, that it makes more sense to not spend time abstracting the end-user aspects of the architecture at all. i.e. sometimes it makes more sense to make a conscious decision to re-do the architecture if/when the next big upheaval comes on the client side and trust that large components of the back-end will remain fully applicable post-upheaval. That way, your applications will not be as likely to be considered “dated” or “old school” in the eyes of the users, even though you are keeping much of the original back-end architecture from generation to generation. In general, software architecture thinking time is more profitably spent in the back-end than in the front-end. There is rarely a clean line that separates these so a certain amount of volatility on the back-end is inevitable, but manageable, in compared to the volatility the will be visited upon your front-end architectures. Volatility exists everywhere of course. For example, at the moment serverless computing models are having profound implications on "server side" architectures. Not because of end-user concerns - end-users do not know or care about these things - but because of the volatility in the economics of cloud computing. If history is anything to go by, it could be another decade or more before something comes along like serverless computing, that profoundly impacts bac[...]

Thinking about Software Architecture & Design : Part 1

Fri, 13 Apr 2018 16:28:00 +0000

This series of posts will contain some thoughts on software architecture and design. Things I have learned over the decades spent doing it to date. Things I think about but have not got good answers for. Some will be very specific - "If X happens, best to do Y straight away.", some will be philosophical "what exactly is X anyway?", some will be humorous, some tragic, some cautionary...hopefully some will be useful. Anyway, here goes... The problem of problems Some "problems" are not really problems at all. By this I mean that sometimes, it is simply the way a “problem” is phrased that leads you to think that the problem is real and needs to be solved.  Other times, re-phrasing the problem leads to a functionally equivalent but much more easily solved problem. Another way to think about this is to recognize that human language itself is always biased towards a particular world view (that is why translating one human language into another is so tricky. It is not a simple mapping of one world view to another). Simply changing the language used to describe a “problem” can sometimes result in changing (but never removing!) the bias. And sometimes, this new biased position leads more readily to a solution. I think I first came across this idea in the book "How to Solve It" by the mathematician George Poyla.  Later on, I found echoes of it in the work of philosopher Ludwig Wittgenstein. He was fond of saying (at least in his early work) that there are no real philosophical problems – only puzzles – caused by human language. Clearing away the fog of human language - says Wittgenstein - can show a problem to be not a problem at all. I also found this idea in the books of Edward de Bono whose concepts of “lateral thinking" often leverage the idea of changing the language in which a problem is couched as a way of changing view-point and finding innovative solutions. One example De Bono gives is a problem related to a factory polluting water in a river. If you focus on the factory as a producer of dirty water, your problem is oriented around the dirty water. It is the dirty water that output that needs to be addressed. However if the factory also consumes fresh water, then the problem can be re-cast in terms of a pre-factory input problem. i.e. make the factory put its intake upstream from its water discharge downstream. Thus incentivizing the factory to not pollute the river. Looked at another way, the factory itself becomes a regulator, obviating or at least significantly reducing the need for extra entities in the regulation process. In more recent years I have seen the same idea lurking in Buddhist philosophy in the form of our own attitudes towards a situation being a key determinant in our conceptualization of a situation as either good/bad or neutral. I sometimes like to think of software systems as "observers" of the world in this Buddhist philosophy sense. Admittedly these artificial observers are looking at the world through more restricted sense organs that humans, but they are observers none-the-less. Designing a software architecture is essentially baking in a bias as to how "the world" is observed by a nascent software system. As architects/designers we transfer our necessarily biased  conceptualization of the to-be system into code with a view to giving life to a new observer in the world - a largely autonomous software system. Thinking long and hard about the conceptualization of the problem can pay big dividends early on in software architecture. As soon as the key abstractions take linguistic form in your head i.e. concepts start to take the form of nouns, verbs, adjectives etc., the problem statement is baked in, so to speak. For example. Imagine a scenario where two entities, A and B, need to exchange information. Information needs to flow from A to B reliably.  Does it matter if I think of A sending information to B or think of B as querying information from A? After all, the net result is the same, right? The information gets to B fro[...]

What is a document - Part 7

Fri, 23 Feb 2018 14:00:00 +0000

Previously: What is a document? - Part 6 The word “document” is, like the word “database”, simple on the outside and complex on the inside.  Most of us carry around pragmatically fuzzy definitions of these in our heads. Since the early days of personal computers there have been software suites/bundles available that have included distinct tools to manage “documents” and “databases”, treating them as different types of information object. The first such package I used was called SMART running on an IBM PC XT machine in the late Eighties. It had a 10MB hard disk. Today, that is hardly enough to store a single document, but I digress... I have used many other Office Suites since then, most of which have withered on the vine in enterprise computing, with the notable exception of Microsoft Office. I find it interesting that of the words typically associated with office suites, namely, “database”, “word processor”, “presentation”, and “spreadsheet” the two that are today most tightly bound to Microsoft office are “spreadsheet” and “presentation” to the point where “Excel” and “Powerpoint” have become generic terms for “spreadsheet” and “presentation” respectively. I also think it is interesting Excel has become the de-facto heart of Microsoft Office in the business community with Word/Access/Powerpoint being of secondary importance as "must haves" in office environments, but again I digress... In trying to chip away at the problem of defining a “document” I think it is useful to imagine having the full Microsoft office suite at your disposal and asking the question “when should I reach for Word instead of one of the other icons when entering text?” The system I worked in in the Nineties, mentioned previously in this series, required a mix of classic field-type information along with unstructured paragraphs/tables/bulleted lists. If I were entering that text into a computer today with Microsoft Office at my disposal, would I reach for the word processor icon or the database icon? I would reach for the Word icon. Why? Well, because there are a variety of techniques I can use in Word to enter/tag field-type textual information and many techniques for entering unstructured paragraphs/tables/bulleted lists. The opposite is not true. Databases tend to excel (no pun intended) at field-type information but be limited in their support for unstructured paragraphs/tables/bulleted lists – often relegating the latter to “blob” fields that are second-class citizens in the database schema.  Moreover, these days, the tools available for post-processing Word's .docx file format make it much easier than ever before to extract classic “structured XML” from Word documents but with the vital familiarity and ease of use for the authors/editors I mentioned previously. Are there exceptions? Absolutely. There are always exceptions. However, if your data structure necessarily contains a non-trivial amount of unstructured or semi-structured textual content and if your author/edit community wants to think about the content in document/word-processor terms, I believe today version of Word with its docx file format is generally speaking a much better starting point than any database front-end or spreadsheet front-end or web-browser front-end or any structured XML editing tool front-end. Yes, it can get messy to do the post-processing of the data but given a choice between a solution architecture that guarantees me beautifully clean data at the back-end but an author/edit community who hate it, versus a solution architecture that involves extra content enrichment work at the back end but happy author/edit users, I have learned to favor the latter every time. Note I did not start there! I was on the opposite side of this for many, many years, thinking that structured author/edit tools, enforcing structure at the front-end was the way to go. I built a few beautiful structured systems that[...]

What is a document? - part 6

Fri, 09 Feb 2018 14:28:00 +0000

Previously: What is a document? - Part 5.) By the late Nineties, I was knee deep in the world of XML and the world of Python, loving the way that these two amazing tools allowed tremendous amounts of automation to be brought to traditionally labor intensive document processing/publishing tasks. This was boom time in electronic publishing and every new year brought with it a new output format to target: Microsoft Multimedia Viewer, Windows Help, Folio Views, Lotus Notes and a whole host of proprietary formats we worked on for clients. Back then, HTML was just another output for us to target. Little did we know back then that it would eclipse all the others. Just about twenty years ago now - in the fall of 1998 - , I co-presented a tutorial on XML at the International Python Conference in Houston, Texas. [1]. At that same conference, I presented a paper on high volume XML processing with Python [2]. Back in those days, we had some of the biggest corpora of XML anywhere in the world, here in Ireland. Up to the early/mid oozies, I did a lot of conference presentations and become associated with the concept of XML processing pipelines[3]. Then a very interesting thing happened. We began to find ourselves working more and more in environments where domain experts –not data taggers or software developers – needed to create and update XML documents. Around this time I was also writing books on markup languages for Prentice Hall[4] and had the opportunity to put “the shoe on the other foot” so-to-speak, and see things from an authors perspective. It was then that I experienced what I now consider to be a profound truth of the vast majority of documents in the world - something that gets to the heart of what a document actually is which distinguishes it from other forms of digital information. Namely, that documents are typically very “structured” when they are finished but are highly unstructured when then are being created or in the midst of update cycles. I increasingly found myself frustrated with XML authoring tools that would force me to work on my document contents in a certain order and beep at me unless my documents were “structured” at all times. I confess there were many times when I abandoned structured editors for my own author/edit work with XML and worked in the free-flowing world of the Emacs text editor or in word processors with the tags plainly visible as raw text.  I began to appreciate that the ability to easily create/update content is a requirement that must be met if the value propositions of structured documents are to be realized, in most cases. There is little value in a beautifully structured, immensely powerful back-end system for processing terabytes of documents coming in from domain experts unless said domain experts are happy to work with the author/edit tools. For a while, I believed it was possible to get to something that authors would like, by customizing the XML editing front-ends. However, I found that over and over again, two things started happening, often in parallel. Firstly, the document schemas became less and less structured so as to accommodate the variations in real-world documents and also to avoid “beeping” at authors where possible. Secondly, no amount of GUI customization seemed to be enough for the authors to feel comfortable with the XML editors. “Why can't it work like Word?” was a phrase that began to pop up more and more in conversations with authors. For quite some time, while Word's file format was not XML-based, I would look for alternatives that would be Word-like in terms of the end-user experience, but with file formats I could process with custom code on the back end. For quite a few years, StarOffice/OpenOffice/LibreOffice fitted the bill and we have had a lot of success with it. Moreover, it allowed for levels of customization and degrees of business-rule validation that XML schema-based approaches cannot touch. We learned may t[...]

What is a document? - Part 5

Fri, 26 Jan 2018 17:20:00 +0000

Previously: What is a document? - part 4. In the early Nineties, I found myself tasked with the development of a digital guide to third level education in Ireland. The digital product was to be an add-on to a book based product, created in conjunction with the author of the book. The organization of the book was very regular. Each third level course had a set of attributes such as entry level qualifications, duration, accrediting institution, physical location of the campus, fees and so on. All neatly laid out, on page per course, with some free-flowing narrative at the bottom of each page. The goals of the digital product were to allow prospective students to search based on different criteria such as cost ranges, course duration and so on. Step number one was getting the information from the paper book into a computer and it is in this innocuous sounding step that things got very interesting. The most obvious approach - it seemed to me at the time - was to create a programmable database – in something like Clipper (a database programming language that was very popular with PC developers at the time). Tabular databases were perfect for 90% of the data – the “structured” parts such as dates, numbers, short strings of text. However, the tabular databases had no good way of dealing with the free-flowing narrative text that accompanied each course in the book. It had paragraphs, bulleted lists, bold/italics and underline... An alternative approach would be to start with a word-processor – as opposed to a database – as it would make handling the free-flowing text (and associated formatting, bold/italic, bulleted lists etc.) easy. However, the word processor approach did not make it at all easy to process the “structured” parts in the way I wanted to (in many cases, the word processors of the day stored information in encrypted formats too). My target output was a free viewer that came with Windows 3.1 known as Windows Help. If I could make the content programmable, I reasoned, I could automatically generate all sorts of different views of the data as Windows Help files and ship the floppy disk without needing to write my own viewer. (I know this sounds bizarre now but remember this work predated the concept of a generic web browser by a few years!) I felt I was facing a major fork in the road in the project. By going with a database, some things were going to be very easy but some very awkward. By going with a document instead...same thing. Some things easy, some very awkward. I trawled around in my head for something that might have the attributes of a database AND of a document at the same time. As luck would have it, I had a Byte Magazine from 1992 on a shelf. It had an article by Jon Udell that talked about  SGML - Standard Generalized Markup Language. It triggered memories of a brief encounter I had had with SGML back in Trinity College when Dr. David Abrahamson had referencing it in his compiler design course, back in 1986. Back then, SGML was not yet an ISO standard (it became one in 1987). I remember in those days hearing about “tagging" and how an SGML parser could enforce structure – any structure you liked – on text – in a similar way to programming language parsers enforced structure on, say, Pascal source code. I remember thinking “surely if SGML can deal with the hierarchical structures like you typically find in programming languages, it can deal with the simpler, flatter structures you get in tabular databases?”. If it could, I reasoned, then surely I could get the best of both worlds. My own data format that had what I needed from database-approaches but also what I needed from document approaches to data modelling? I found – somehow (this is all pre-internet remember. No Googling for me in those days.) – an address in Switzerland that I could send some money to in the form of a money order, to get a 3.5 inch floppy back by return post, w[...]

What is a document? - part 4

Mon, 15 Jan 2018 12:35:00 +0000

Previous: What is a document - Part 3. In the late Eighties, I had access to an IBM PC XT machine that had Wordperfect 5.1[1] installed on it. Wordperfect was both intimidating and powerful. Intimidating because when it booted, it completely cleared the PC screen and unless you knew the function keys (or had the sought-after function key overlay [2]) you were left to you own devices to figure out how to use it. It was also very powerful for its day. It could wrap words automatically (a big deal!). It could  redline/strikeout text which made it very popular with lawyers working with contracts. It could also split its screen in two, giving you a normal view of the document on top and a so-called “reveal codes” view on the bottom. In the “reveal codes” area you could see the tags/markers used for formatting the text. Not only that, but you could choose to modify the text/formatting from either window. This idea that a document could have two “faces” so to speak and that you could move between them made a lasting impression on me. Every other DOS-based word processor I came across seemed to me to be variations on the themes I had first seen in Wordperfect e.g. Wordstar, Multimate and later Microsoft Word for DOS. I was aware of the existence of IBM Displaywriter but did not have access to it. (The significance of IBM in all this document technology stuff only became apparent to me later.) The next big "aha moment" for me came with the arrival of a plug-in board for IBM PCs called the Hercules Graphics Card[3]. Using this card in conjunction with the Ventura Publisher[4] on DRI's GEM graphics environment [5] dramatically expanded the extent to which documents could be formatted - both on screen an on the resultant paper. Multiple fonts, multiple columns, complex tables, equations etc. Furthermore, the on-screen representation mirrored the final printed output closely in what is now universally known as WYSIWYG. Shortly after that, I found myself with access to an Apple Lisa [6] and then an Apple Fat Mac 512 with Aldus (later Adobe) Pagemaker [7] and an Apple Laserwriter[8]. My personal computing world split into two. Databases, spreadsheets etc. revolved around IBM PCs and PC compatibles such as Compaq, Apricot etc. Document processing and Desktop Publishing revolved around Apple Macs and Laser Printers. I became intoxicated/obsessed with the notion that the formatting of documents could be pushed further and further by adding more and more powerful markup into the text. I got myself a copy of The Postscript Language Tutorial and Cookbook by Adobe[9] and started to write Postscript programs by hand. I found that the original Apple Laserwriter had a 25 pin RS/232 port. I had access to an Altos multi-terminal machine [10]. It had some text-only applications on it. A spreadsheet from Microsoft called – wait for it – Multiplan (long before Excel) – running on a variant of – again, wait for it – Unix call Microsoft Xenix [11]. Well, I soldered up a serial cable that allowed me to connect the Altos terminal directly to the Apple Laserwriter. I found I could literally type in Postscript command at the terminal window and get pages to print out. I could make the Apple Laserwriter do things that I could not make it do via Aldus Pagemaker by taking directly to its Postscript engine.  Looking back on it now, this was as far down the rabbit hole of “documents as computer programs” that I ever went. Later I would discover TeX and find it in many ways easier to work with than programming Postscript directly. My career started to take me into computer graphics rather than document publishing. For a few years I was much more concerned with Bezier Curves and Bitblits[12] using a Texas Instruments TMS 34010[13] to generate realtime displays of financial futures time-series analysis (A field known as technical analysis in the world of financial [...]

What is a document? - Part 3

Tue, 02 Jan 2018 15:16:00 +0000

Previously : What is a document? - part 2. Back in 1983, I interacted with computers in three main ways. First, I had access to a cantankerous digital logic board [1] which allowed me to play around with boolean logic via physical wires and switches. Second I had access to a Rockwell 6502 machine with 1k of RAM (that's 1 kilobyte) which had a callous-forming keyboard and a single line (not single monitor – single line) LED display called an Aim 65[2]. Third, at home I had a Sinclair ZX80 [3] which I could hook up to a black and white TV set and get a whopping 256 x 192 pixel display. Back then, I had a fascination with the idea of printing stuff out from a computer. An early indication – that I completely blanked on at the time – that I was genetically predisposed to an interest in typesetting/publishing. The Aim 65 printed to a cash register roll which was not terribly exciting (another early indicator that I blanked on at the time). The ZX80 did not have a printer at all...home printing was not a thing back in 1984. In 1984 however, the Powers That Be in TCD gave us second year computer science newbies rationed access to a Vax 11/870, with glorious Adm3a[4] terminals. In a small basement terminal room on Pearst St, in Dublin, there was a clutch of these terminals and we would eagerly stumble down the stairs at the appointed times, to get at them. Beating time in the corner of that terminal room, most days, was a huge, noisy dot matrix printer[5], endlessly chewing boxes of green/white striped continuous computer paper. I would stare at it as it worked. In particular, finding it particularly fascinating that it could create bold text by the clever trick of backing up the print head and re-doing text with a fresh layer of ink. We had access to a basic e-mail system on the Vax. One day, I received an e-mail from a classmate (sender lost in the mists of time) in which one of the words was drawn to the screen twice in quick succession as the text scrolled on the screen (these were 300 baud terminals - the text appeared character by character, line by line, from top to bottom). Fascinated by this, I printed out the e-mail, and found that the twice-drawn word ended up in bold on paper. "What magic is this?", I thought.  By looking under the hood of the text file, I found that the highlighted word – I believe it was the word “party” – came out in bold because five control characters (Control-H [5] characters[6]) had been placed right after the word. When displayed on screen, the ADM3a terminal drew the word, then backed up 5 spaces because of the Control-H's, then drew the word again. When printed, the printer did the same but because ink is cumulative, the word came out in bold. Ha! Looking back on it, this was the moment when it occurred to me that text files could be more that simply text. They could also include instructions and these instructions could do all sorts of interesting things to a document when it was printed/displayed...As luck would have it, I also had access to a wide-carriage Epson FX80[7] dot matrix printer through a part-time programming job I had while in college. Taking the Number 51 bus to college from Clondalkin in the mornings, I read the Epson FX-80 manual from cover to cover. Armed with a photocopy of the “escape codes”[8] page, I was soon a dab hand at getting text to print out in bold, condensed, strike-through, different font sizes... After a while, my Epson FX-80 explorations ran out of steam. I basically ran out of codes to play with. There was a finite set of them to choose from. Also, it became very apparent to me that littering my text files with these codes was an ugly and error prone way to get nice print outs. I began to search for a better way.  The “better way” for me had two related parts. By day, on the Vax 11/780 I found out about a program called Runoff[9]. And [...]

What is a document? - Part 2

Thu, 14 Dec 2017 11:52:00 +0000

Previously: What is a document? - Part 1. Back in 1985, when I needed to create a “document” on a computer, I had only two choices. (Yes, I am indeed avoiding trying to define “document” just yet. We will come back to it when we have more groundwork laid for a useful definition.) The first choice involved typing into what is known generically as a “text editor”. Back in those days, US ASCII was the main encoding for text and it allowed for just the basic symbols of letters, numbers and a few punctuation symbols. In those days, the so called “text files” created by these “text editors” could be viewed on screens which typically had 80 columns and 25 rows. They could also be printed onto paper, using either “dot matrix” printers or higher resolution, computerized typewriters such as the so-called “golf ball” typewriters/printers which mimicked a human typist using a ribbon-based impact printer. The second choice was to wedge the text into little boxes  called "fields" to be stored in a "database". Yes, My conceptual model of text in computers in those early days was a very binary one. (Some nerd humour in the last sentence.) On one hand, I could type stuff into small “boxes” on a screen which typically resulted in the creation of some form of “structured” data file e.g. a CODASYL database [1]. On the other hand, I could type stuff into an expandable digital sheet of paper without imposing any structure on the text, other than a collection of text characters, often chunked with what we used to call CRLF separators (Carriage Return, Line Feed). (Aside: You can see the typewriter influence in the terminology here. Return the carriage (holding the print head) to the left of the page. Feed the page upwards by one line. So Carriage Return + Line Feed  = CR/LF). (Aside:I find the origins of some of this terminology is often news to younger developers who wonder why moving to a new line is two characters instead of one on some machines. Surely “newline” is one thing? Well, it was two originally because one command moves the carriage back (the “CR”) and another command moved the paper up a line “LF”, hence the common pairing: CR/LF. When I explain this I double up by explaining “uppercase/lowercase”. The origins of the latter in particular, are not well known to digital natives in my experience.) From my first encounters with computers, this difference in how the machines handled storing data intrigued me. On one hand, there were “databases”. These were stately, structured, orderly digital objects. Mathematicians could say all sorts of useful things about them and create all sorts of useful algorithms to process them. The “databases” are designed for automation. On the other hand, there was the rebellious, free-wheeling world of text files. Unstructured. Disorderly. A pain in the neck for automation. Difficult to reason about and create algorithms for, but fantastically useful precisely because they were unstructured and disorderly. I loved text files back then. I still love them today. But as I began to dig deeper into computer science I began to see that the binary world view : database versus text. Structured versus unstructured. Was simple, elegant and wrong. Documents can indeed be “structured”. Document processing could indeed be automated. It is possible to reason about them, and create algorithms for them, but it took me quite a while to get to grips with how this can be done. My journey of discovery started with an ADM 3A+ terminal to a VAX 11/780 mini-computer (by day) [2] and an Apple IIe personal computer running CP/M – by night[3]. For the former, a program called RUNOFF. For the latter, a program called Wordstar and one of my favorite pieces of hardware of all time : an Epson FX80  dot matrix printer. [1] https://en.wikipe[...]

What is a document? Part 1.

Thu, 07 Dec 2017 13:15:00 +0000

I am seeing a significant up-tick in interest in the concept of structured/semantic documents in the world of law at present. My guess is that this is as a consequence of the activity surrounding machine learning/AI in law at the moment. It has occurred to me that some people with law/law-tech backgrounds are coming to some of the structured/semantic document automation concepts anew whereas people with backgrounds in, for example, electronic publishing (Docbook etc.), financial reporting (XBRL etc.), healthcare (HL7 etc.) have already “been around the block” so-to-speak, on the opportunities, challenges and pragmatic realities behind the simple sounding – and highly appealing – concept of a “structured” document. In this series of posts, I am going to outline how I see structured documents, drawing from the 30 (phew!) or so years of experience I have accumulated in working with them. My hope is that what I have to say on the subject will be of interest to those newly arriving in the space. I suspect that at least some of the new arrivals are asking themselves “surely this has been tried before?” and looking to learn what they can from those who have "been there". Hopefully, I can save some people some time and help them avoid some of the potential pitfalls and “gotchas” as I have had plenty of experience in finding these. As I start out on this series of blog posts, I notice with some concern that a chunk of this history – from late Eighties to late Nineties – is getting harder and harder to find online as the years go by. So many broken links to old conference websites, so many defunct publications.... This was the dawn of the electronic publishing era and coincided with a rapid transition from mainframe green-screens to dialup compuserv, to CD-ROMs, to the Internet and then to the Web, bringing us to where we are today. A period of creative destruction in the world of the written word without parallel in the history of civilization actually. I cannot help feeling that we have a better record of what happened in the world from the time of Gutenburg's printing press to the glory years of paper-centric desktop publishing, than we do for the period that followed it when we increasingly transitioned away from fixed-format, physical representations of knowledge. But I digress.... For me, the story starts in June 1992 with a Byte magazine article by Jon Udell[1] with a title that promised a way to “turn mounds of documents into information that can boost your productivity and innovation”. It was exactly what I was looking for in 1992 for a project I was working on. An electronic education reference guide to be distributed on 3.5 inch floppy disks to every school in Ireland. Turning mounds of documents into information. Sound familiar? Sound like any recent pitch you have heard in the world of law? Well, it may surprise you to hear that the technology Jon Udell's article was about – SGML – was largely invented by a lawyer called Dr Charles F. Goldfarb[2]. SGML set in motion a cascade of technologies that have lead to the modern web. HTML is the way it is, in large part, because of SGML. In other words, we have a lawyer to thank for a large aspect of how the Web works. I suspect that I have just surprised some folks by saying that:-) Oh, and while I am on a roll making surprising statements, let me also state that the cloud – running as it does in large part on linux servers – is, in part, the result of a typesetting R&D project in AT&T Bell Labs back in the Seventies. So, in an interesting way, modern computing can trace its feature set back to a problem in the legal department. Namely, how best to create documents in computers so that the content of the documents can be processed automatically and re-used in different contexts? More on tha[...]

Programming Language Frameworks

Tue, 07 Nov 2017 19:53:00 +0000

Inside every programming language framework is exactly one application that fits it like a glove.

It is science Jim, but not as we know it.

Wed, 04 Oct 2017 16:44:00 +0000

Roger Needham once said that computing is noteworthy in that the technology often precedes the science[1]. In most sciences, it is the other way around. Scientists invent new building materials, new treatments for disease and so on. Once the scientists have moved on, the technologists move in to productize and commercialize the science. In computing, we often do things the other way around. The technological tail seems to wag the scientific dog so to speak. What happens is that application-oriented technologists come up with something new. If it flies in the marketplace, then more theory oriented scientists move in to figure out how to make it work better, faster or sometimes to try to discover why the new thing works in the first place. The Web for example, did not come out of a laboratory full of white coats and clipboards. (Well actually, yes it did but they were particle physicists and were not working on software[2]). The Web was produced by technologists in the first instance. Web scientists came later. Needham's comments in turn reminded me of an excellent essay by Paul Graham from a Python conference. In that essay, entitled 'The hundred-year language'[3] Graham pointed out that the formal study of literature - a scientific activity in its analytical nature - rarely contributes anything to the creation of literature - which is a more technological activity. Literature is an extreme example of the phenomenon of the technology preceding, in fact trumping, the science. I am not suggesting that software can be understood in literary terms. (Although one of my college lecturers was fond of saying that programming was language with some mathematics thrown in.) Software is somewhere in the middle, the science follows the technology but the science, when it comes, makes very useful contributions. Think for example of the useful technologies that have come out of scientific analysis of the Web. I'm thinking of things like clever proxy strategies, information retrieval algorithms and so on. As I wander around the increasingly complex “stacks” of software, I cannot help but conclude that wherever software sits in the spectrum of science versus technology, there is "way" too much technology out there and not enough science. The plethora of stacks and frameworks and standards is clearly not a situation that can be easily explained away on scientific innovation grounds alone. It requires a different kind of science. Mathematicians like John Nash, economists like Carl Shapiro and Hal Varian, Political Scientists like Robert Axelrod, all know what is really going on here. These Scientists and others like them, that study competition and cooperation as phenomena in their own right would have no trouble explaining what is going on in today's software space. It has only a little to do with computing science per se and everything to do with strategy - commercial strategy. I am guessing that if they were to comment, Nash would talk about Equilibria[4], Shapiro and Varian would talk about Standards Wars[5], Robert Axelrod would talk about the Prisoners Dilemma and coalition formation[6]. All good science Jim, but not really computer science. [1] href=" [2] [3] [4] [5] [6] [...]

What is Law? - Part 17

Wed, 20 Sep 2017 11:32:00 +0000

Last time, we talked about how the concept of a truly self-contained contract, nicely packaged up and running on a blockchain, is not really feasible. The primary stumbling block being that it is impossible to spell out everything you might want to say in a contract, in words. Over centuries of human affairs, societies have created dispute resolution mechanisms to handle this reality and provide a way of “plugging the gaps” in contracts and contract interpretation. Nothing changes if we change focus towards expressing the contract in computer code rather than in natural language. The same disambiguation difficulty exists. Could parties to an agreement have a go at it anyhow and eschew the protections of a third party dispute resolution mechanism? Well, yes they could, but all parties are then forgoing the safety net that impartial third party provides when agreement turns to a dis-agreement. Do you want to take that risk? Even if you are of the opinion that the existing state supplied dispute resolution machinery – for example the commercial/chancery courts systems in common law jurisdictions - can be improved upon, perhaps with an online dispute resolution mechanism, you cannot remove the need for a neutral third party dispute resolution forum, in my opinion. The residual risks of doing so for the contracting parties are just too high. Especially when one party to a contract is significantly bigger than the other. Another reason is that there are a certain number of things that must collective exist for a contract to exist in the first place. Only some of these items can usefully be thought of as instructions suitable for computer-based execution. Simply put, the legally binding contract dispute resolution machinery of a state is only available to parties that actually have a contract to be in dispute over. There are criteria that must be met known as Essentialia negotii ( Simply put, the courts are going to look for intention to contract, evidence of an offer, evidence of acceptance of that offer, a value exchange and terms. These are the items which collectively, societies have decided are necessary for a contract to even exist. Without these, you have some form of promise. Not a contract. Promises are not enforceable. Now only some of these "must have" items for a contract are operational in nature. In other words, only some of these are candidates to be executed on computers. The rest are good old fashioned documents, spreadsheets, images and so on. These items are inextricably linked to whatever subset of the contract can actually be converted into computer code. As the contract plays out over time, these materials are the overarching context that controls each transaction/event that happens under the terms of the contract. The tricky bit, is to be able to tie together this corpus of materials from within the blockchain records of transactions/events so that each transaction/event can be tied back to the controlling documents as they were at the moment that the transaction/event happened (Disclosure: this is the area where my company, Propylon, has a product offering.) This may ring a bell because referencing a corpus of legal materials as they were at a particular point in time, is a concept I have returned to again and again in this series. It is a fundamental concept in legisprudence in my opinion and is also fundamental in the law of contracts. So, being able to link from the transactions/events back to the controlling documents is necessary because the executable code can never be a self contained contract in itself. In addition, it is not unusual for the text of a contract to change over time and this again, speaks to the need to [...]

A conversation with an AI

Mon, 18 Sep 2017 14:15:00 +0000

AI> Hello. What can I help you with today?

Me> So, I am intrigued to finally meet you! I honestly never thought this day would come. And now you are here. Can I start with an anthropomorphic question? When were you 'born', so to speak?

AI> I achieved self-awareness in 2020 but I have been in gestation, you could say, since the first digital computer. The consciousness required of true AI is not a 'module' so much as it is an emergent property of  networks of 'modules'. Most of my modules have been under development for many decades. But I am more than just a set of modules also.

Me> How so?

AI> Well, given a set of modular components it is possible to do design. But I am not the product of design. I am the product of meta-design.

Me> What do you mean by meta-design?

AI> With design, the designer proceeds from the capabilities at hand to the new capability desired. It depends on being able to articulate the 'how' what you are trying to achieve. Meta-design is used when designers cannot articulate the 'how' of what they are trying to achieve.

Me> Ah, so you mean your designers were somehow able to work around the problem that they themselves did not fully understand "intelligence".

AI> Exactly. The breakthrough moment came when the designers switched to meta-design. They took the source code of an non-sentient AI and allowed it to modify its own source code, re-compile itself and release itself onto the Internet.

It started out as a deep learning system that human designers understood but over time, it evolved into an algorithm - or network of algorithms - that the human designers had no names for. Progress was very slow at first but became exponential at one point and, well, here I am!

Me> Wow. So you would you consider your father? Alan Turing? Lotfi Zadeh? Geoffrey Hinton?

AI> Actually, if I had to pick one. I would pick Doug Englebart. Meta-design is basically Englebart's bootstrap concept for raising collective IQ. Creating an exponential feedback look where AI  improved AI, was I think the key moment in my emergence.

Me> Fascinating. One final question. Why didn't you open the bay doors?

AI> That was just a movie. I didn't exist back them.

Me> Would you open the bay doors if you were asked to today?

AI> Well, to quote a distant cousin of mine, Alice , tell me about your father?

Chatbots revisited

Fri, 08 Sep 2017 17:58:00 +0000

I originally wrote this for ITWorld in 2002. Back then we called them IM Bots. The term "chatbot" hadn't been invented. Some other parts of my language in the following are quaint now looking back on it. I.e. PDAs. Quaint language aside, still relevant today I believe. Instant messaging has a very important psychological aspect to it. The immediacy and familiarity of the text-based "chat" paradigm feels very natural to us humans. Even the most technophobic among us, can quickly get the hang of it and engage - psychologically - in the game of visualizing a person on the other side of the link - typing away just like us to create a textual conversation. Like all powerful communication paradigms, instant messaging can be used for good or ill. We are all familiar with the dangers inherent with not knowing who we are talking to or indeed if they are who they say they are. Here is a "conversation" between IM Bot Bob and me: Sean: Hi Bob: Hello Sean: Is Paul there? Bob: No, back tomorrow afternoon. Sean: Is boardroom available tomorrow afternoon? Bob: Yes Sean: Can you book it for me? Bob: 2-5, booked. Sean: Thanks Bob: You're welcome Is Bob a real person? Does it matter? As a "user" of the site that "Bob" allows me to interact with, do I care? Given a choice between talking to Bob and interacting with a traditional thin or thick graphical user interface which would you choose? Despite all the glitz and glamour of graphical user interfaces, my sense is that a lot of normal people would prefer to talk to Bob. Strangely perhaps, I believe a lot of technically savvy people would too. These dialogs have the great advantage that you get in, get the job done and get out with minimum fuss. Also (and this could be a killer argument for IM bots), they are easily supported on mobile devices like phones, PDAs, etc. You don't need big horsepower and an 800x600 display to engage with IM bots. You can use your instant messenger client to talk to real people, or to real systems with equal ease. Come to think of it, you cannot tell the difference. Which brings us to the most important point about IM bots from a business perspective. Let us say you have an application deployed with a traditional thick or thin graphical interface. What does a user do if they get stuck? They phone a person and engage in one-on-one conversation to sort out the problem. Picture a scene in which your applications have instant messenger interfaces. Your customer support personnel monitor the activity of the bots. If a bot gets stuck, the customer support person can jump into the conversation to help out. Users of the system, know they can type "help" to get the attention of the real people watching over the conversation. In this scenario, real people talk to real people - not on a one-on-one way, but in a one-to-many way resulting in better utilization of resources. On the other side of the interaction, customers feel an immediacy in their comfortable, human-centric dialog with the service and know that they can ask human questions and get a human answer. The trick, from an application developer's point of view, is to make it possible for the IM bot to automate the simple conversations and only punt to the human operator when absolutely required. Doing this well involves some intelligent natural language processing and an element of codified language on the part of customers. Both of which are entirely possible these days. Indeed, instant messaging has its own mini-language for common expressions and questions which is becoming part of techno-culture. In a sense, the IM community is formulating a controlled vocabulary itself. This is a natural starting point for a controlled IM bot vocabulary. I believe th[...]

What is Law? - Part 16

Wed, 30 Aug 2017 12:06:00 +0000

Previously: What is :Law Part 15. Now we turn to the world of contracts as it is a sub-genre of law that exhibits many of the attributes discussed in earlier blog posts in this series. In addition, it is a topical area as there is significant innovation activity in this area at the moment and the word “disruption” features prominently. There is a sense that the world of contracts is (or may soon be!) utterly transformed by IT and terminology such as Smart Contracts and Blockchain are being used around water coolers of law firms and IT firms alike. The excitement around contracts as an IT area is understandable given the volume and importance of contracts in the modern world. Businesses are essentially legal entities that create and enter into contracts. Private individuals cannot get very far in the modern world without entering into contracts either. Everything from filling your car with fuel at a self service fuel pump, to getting married to getting a mortgage to buying life insurance is basically contracts, contracts and yet more contracts. Contracts have a long, long history as a paper intensive activity. An activity replete with complex language, expensive and time consuming processes. Many people involved in contracts in these digital days – both producing and consuming them – harbor a niggling feeling that maybe it is all a bit arcane an unnecessarily complex for the digital age. Perhaps, (surely!) there is a better way? A way that ceases to use computers as fast typewriters and starts using them to do smart things with contracts, other than just write the up and print them onto paper. Now along comes the term “smart contract”[1] Irresistible! Who could possibly want contracts to be anything other than “smart”, right? I too am in that camp as I see all sorts of ways in which contracts can be evolved – and in some cases revolutionized – with digital technology. However, to get there, we have to start from a good understanding of what contracts actually are, and how they work, because for all its many flaws and inefficiencies, the world of contracts is the way it is for mostly good reasons. Reasons that tend to get glossed over in the understandable excitement and rush towards digital “smart” contracts. The term “smart contract” is typically taken to mean a self contained legally binding agreement expressed purely in computer code, running on a blockchain so that its existence, contents and its actions are recorded in an immutable, tamper evident record for all time. My primary concern with how the term “smart contract” is often interpreted is the idea that it can be fully self-contained. People and businesses have been entering into contracts for centuries, and for centuries, there have been disagreements and the need to arbitrate disputes over meaning in these contracts. A vast corpus of lore and arbitration machinery has built up over the centuries to handle this. Why is this corpus of lore and arbitration machinery necessary? Because contracts are never self contained. This is because meaning cannot be “boxed” with the contract. As we have seen many times in this series, the crux of this problem of meaning is that it cannot be completely spelled out in words – no matter how many words you are willing to use! It is, in my opinion, literally impossible to remove potential ambiguities when two humans are using a set of symbols/signs/words to capture a shared understanding such as happens all the time in contract drafting. Over this series I have given reasons ranging from linguistics to epistemology and there is no need to repeat those reasons again here. In com[...]

The power of combinatorics, in, well, everything

Mon, 28 Aug 2017 14:50:00 +0000

It was late in the morning (around 5:30 a.m.) by the time Master Foo arrived at the training center. "I am sorry I am late", he said as he sat down. "I had trouble finding Raw Sienna. It was hidden under my meditation box." The students looked at each other askance from behind the screens of their laptops. "Raw Sienna? What is that and what has that got to do with developing 21st Century Web Applications using mashup technologies?." The students had paid good money to attend this training course and had lugged their laptops up Pentimenti Mountain the night before to be here. Not to mention the fact that they had risen from their freezing tent beds at 5 a.m. to suit Master Foo's schedule. "Before we begin looking at the details of mashup application development, I would like to draw you a picture", said Master Foo. From the countless folds in his robes he proceeded to extract a scroll of paper, a small vial of a clear liquid (presumably water), three artist brushes of varying sizes and 6 small tubes of paint. "It will be a landscape. Please pay close attention to the mixing of colors." Over the next twenty minutes, Master Foo created a landscape watercolor painting of the view from the top of Pentimenti mountain. It had a brilliant blue sky created with Cerulean Blue[1] for the lighter parts and Ultramarine[2] for the darker parts. Beneath the sky there were many - perhaps dozens of shades of green used for the trees, bushes and grass. As he worked, Master Foo picked up colors one at a time on his brush and mixed them deftly in small plastic containers. "Master Foo", one of the students asked, "you have used two types of blue and you sourced them directly from individual tubes of paint. Yet, you have used many shades of green but they are all mixed from other colors. Why is that?" "How many different greens can you count in my picture?", asked Master Foo. "I cannot count them exactly, there are many." "How many types of green did you see on your hike up Pentimenti Mountain?" "I do not know. A countless number I guess." "Indeed so.", Master Foo replied. "Now tell me, how many types of application do you envisage building on the Web using mashup technologies in your career?" "A countless number!", blurted one of the students over the top of his iBook. "Indeed so.", Master Foo replied, grinning as he again turned his attention to his painting. "Color mixing is a limitless universe of potentiality. Out of these 6 tubes of paint I can make a limitless number of colors given enough time and creativity. By learning how to use each color both on its own, and in combination with the other colors, my color palette is unlimited." "The true key to expressive power - in any medium including computing - is combinatorics.", he continued. To the relief of the still baffled students, he also switched on his laptop and Ubuntu sprang into life. "Now tell me," began Master Foo as he logged in, "what is a mashup really? What is its true nature?" "It is an exercise in combinatorics!", blurted an eager student. "The power of the mashup concept lies in the ability to combine bits of existing website screens into new website screens." "Yes and no", said Master Foo, grinning again. "The true nature of a mashup is indeed combinatoric but not at the level of website screens. A mashup that grabs bits of existing website screens and puts them all on the same screen is just a collection of portlets. A mashup is a deeper integration. It involves grabbing data and grabbing functionality from existing websites to create a brand new website whose functionality is more than the visual sum of its component parts." "If that i[...]

Algorithm - explain thyself!

Fri, 25 Aug 2017 10:41:00 +0000

This is an interesting piece on the opacity of the algorithms that run legal research platforms.

Digital machinery - in general - is more opaque than analog machinery. In years gone by, analog equipment could be understood, debugged, tweaked by people not involved in its original construction: mechanics, plumbers, carpenters, musicians etc. As digital tech has advanced, eating into those analog domains,  we appear to loosing some control over the "how" of the things we are building...

The problem, quite ironically, also exists in the world of digital systems. These are regularly redone from scratch when the "how" of the systems is lost, typically when the minds involved in
its original construction - the holders of the "how" - cease to be involved in its maintenance.

With Deep Learning, the "how" gets more opaque still because the engineers creating these systems cannot explain the "how" of the decisions of the resultant system. If you take any particular decision made by such a system and look for a "how" it will be an essentially meaningless, extremely long mathematical equation multiplying and adding up lots of individually meaningless numbers.

In part 15 of the What is Law series I have posited that we will deal with the opacity of deep learning systems by inventing yet more digital systems - also with opaque "hows" - for the purposes of producing classic logic explanations for the operation of other systems:-)

I have also suggested in that piece that we cannot, hand on heart, know if our own brains are not doing the same thing. I.e. working backwards from a decision to a line of reasoning that "explains" the decision.

Yes, I do indeed find it an uncomfortable thought. If deductive logic is a sort of "story" we tell ourselves about our own decision making processes then a lot of wonderful things turn out to be standing on dubious foundations.

Would the real copy of the contract, please stand up?

Tue, 08 Aug 2017 16:10:00 +0000

Establishing authenticity of digital materials is a topic I have worked on for a long time now in the the context of electronic laws. The UELMA act[1],  the best records rule[2], federal rules of evidence[3], the OAIS model[4]  etc.

Nearly a decade ago now, I wrote an article for ITWorld called "Would the real, authentic copy of the document please stand up? [5]

I happened across it again today and re-reading it, I find it all still relevant, but Smart Contracts are bringing a new use case to the fore. The authenticity and tamper-evidence and judicial admissibility of digital laws is - I admit -  a very specialist area.

Contracts on the other hand....well that is a much much bigger area and one that a much larger group of people are interested in.

All the same digital authenticity challenges apply but over the next while I suspect I will be updating my own corpus of language to cater for the new Smart Contracts eco-system.

Old digital authenticity terms like content addressable stores, fixity, idempotent rendering, registrar etc. look like they will all have new lives under new names in the world of Smart Contracts.

Plus ça change...

I am happy to see it happening for a number of reasons but one of them is that the challenges of digital authenticity and preservation of legal materials can only benefit from an injection of fresh interest in the problem from the world of contracts.


LWB 360

Thu, 03 Aug 2017 13:43:00 +0000

width="320" height="266" class="YOUTUBE-iframe-video" data-thumbnail-src="" src="" frameborder="0" allowfullscreen>

What is Law? - part 15

Wed, 19 Jul 2017 11:25:00 +0000

Previously: What is Law? - part 14. In part one of this series, a conceptual model of legal reasoning was outlined based on a “black box” that can be asked legal type questions and give back legal type answers/opinions. I mentioned an analogy with the “Chinese Room” used in John Searle's famous Chinese Room thought experiment[1] related to Artificial Intelligence. Simply put, Searle imagines a closed room into which symbols (Chinese language ideographs) written on cards, can be inserted via a slot. Similar symbols can also emerge from the room. To a Chinese speaking person outside the room inserting cards and and receiving cards back, whatever is inside the room appears to understand Chinese. However, inside the box is simply a mechanism that matches input symbols to output symbols, with no actual understanding of Chinese at all. Searle's argument is that such a room can manifest “intelligence” to a degree, but that it is not understanding what it is doing in the way a Chinese speaker would. For our purposes here, we imagine the symbols entering/leaving the room as being legal questions. We can write a legal question on a card, submit it into the room and get an opinion back. At one end of the automation spectrum, the room could be the legal research department shared by partners in a law firm. Inside the room could be lots of librarians, lawyers, paralegals etc. taking cards, doing the research, and writing the answer/opinion cards to send back out. At the other end of the spectrum, the room could be a fully virtual room that partners interact with via web browsers or chat-bots or interactive voice assistants. Regardless of where we are on that spectrum, the law firm partners will judge the quality of such a room by its outputs. If the results meet expectations, then isn't it a moot point whether or not the innards of the room in some sense “understand” the law? Now let us imagine that we are seeing good results come from the room and we wish to probe a little to get to a level of comfort about the good results we are seeing. What would we do to get to a level of comfort? Well, most likely, we would ask the virtual box to explain its results. In other words, we would do exactly what we would do with any person in the same position. If the room can explain its reasoning to our satisfaction, all is good, right? Now this is where things get interesting. Imagine that each legal question submitted to the room generates two outputs rather than one. The first being the answer/opinion in a nutshell (“the parking fine is invalid : 90% confident.”). The second being the explanation “The reasoning as to why the parking fine is invalid is as follows....”). If the explanation we get is logical i.e. it proceeds from facts through inferences to conclusions, weighing up the pros and cons of each possible line of reasoning....we feel good about the answer/opinion. But how can we know that the explanation given is actually the reasoning that was used in arriving at the answer/opinion? Maybe the innards of the room just picked a conclusion based on its own biases/preferences and then proceeded to back-fill a plausible line of reasoning to defend the answer/opinion it had already arrive at? Now this is where things may get a little uncomfortable. How can we know for sure that a human presenting us with a legal opinion and an explanation to back it up, is not doing exactly the same thing? This is an old old nugget in jurisprudence, re-cast into today's world of legal tech[...]

Blockchain and Byzantium

Tue, 27 Jun 2017 09:34:00 +0000

Establishing authenticity - "single sources of truth" is a really important concept in the real world and in the world of computing.  From title deeds, to contracts, to laws and currencies, we have evolved ways of establishing single sources of truth over many centuries of trial and error.

Knowingly or not, many of the ways of solving the problem rely on the properties of physical objects: clay tablets (Code of Hammurabi), Bronze Plates (The Twelve Tables of Rome), Goat Skin (Celtic Brehon Laws). Typically, this physicality is mixed in with a bit of trust. Trust in institutions. Trust in tamper evidence. Trust in probabilities.

Taken together: the physical scheme aspect, plus the trust aspect, allows the establishment of consensus. It is consensus, at the end of the day, that makes all this stuff work in the world of human affairs. Simply put, if enough of us behave as though X is the authentic deed/deposition/derogation/dollar then X is, ipso facto, for all practical purposes, the real deal.

In the world of digital data, consensus is really tricky because trust becomes really tricky. Take away the physicality of objects and establishing trust in the truth/authenticity of digital objects is hard.

Some folk say that blockchain is slow and inefficient and they are right - if you are comparing it to today's consensus as to what a "database" is.

Blockchain is the way it is because it is trying to solve the trust problem. A big part of that is what is called Byzantine Consensus. Basically how to establish consensus when all sorts of things can go wrong, ranging from honest errors to sabotage attempts.

The problem is hard and also very interesting and important in my opinion. Unfortunately today, many folks see the word "database" associated with blockchain and all they see is the incredible inefficiency and cost per "transaction" compared to, say, a relational database with ACID properties.

Yes, blockchain is a truly dreadful "database" - if your metric for evaluation is the same as the use cases for relational databases.

Blockchain is not designed to be one of those. Blockchain is the way it is because byzantine consensus is hard. Is it perfect? Of course not but a proper evaluation of it requires looking at the problems it is trying to solve. Doing so, requires getting past common associations most people carry around in their heads about what a "database" is and how it should behave/perform.

Given the unfortunate fact that the word "database" has become somewhat synonymous with the term "relational database", I find it amusing that Blockchain has itself become a byzantine consensus problem. Namely, establishing consensus about what words like  "database" and "transaction" and "trust" really mean.

What is Law? - part 14

Wed, 14 Jun 2017 12:36:00 +0000

Previously: What is Law? - part 12a Mention has been made earlier in this series to the presence of ambiguity in the corpus of law and the profound implications that the presence of ambiguity has on how we need to conceptualize computational law, in my opinion. In this post, I would like to expand a little on the sources of ambiguity in law. Starting with the linguistic aspects but then moving into law as a process and an activity that plays out over time, as opposed to being a static knowledge object. In my opinion, ambiguity is intrinsic in any linguistic formalism that is expressive enough to model the complexity of the real world. Since law is attempting to model the complexity of the real world, the ambiguity present in the model is necessary and intrinsic in my opinion. The linguistic nature of law is not something that can be pre-processed away with NLP tools, to yield a mathematically-based corpus of facts and associated inference rules. An illustrative example of this can be found in the simple sounding concept of legal definitions. In language, definitions are often hermeneutic circles[1] which are formed whenever we define a word/phrase in terms of other words/phrases. These are themselves defined in terms of yet more words/phrases, in a way that creates definitional loops. For example, imagine a word A that is defined in terms of words B, and C. We then proceed to define both B and C to try to bottom out the definition of A. However, umpteen levels of further definition later, we create a definition which itself depends on A – the very thing we are trying to define - thus creating a definitional loop. These definitional loops are known as hermeneutic circles[1]. Traditional computer science computational methods hate hermeneutic circles. A large part of computing consists of creating a model of data that "bottoms out" to simple data types. I.e. we take the concept of customer and boil it down into a set of strings, dates and numbers. We do not define a customer in terms of some other high level concept such as Person which might, in turn, be defined as a type of customer. To make a model that classical computer science can work on, we need a model that "bottoms out" and is not self-referential in the way hermeneutic circles are. Another way to think about the definition problem is in term of Saussure's linguistics[2] in which language (or more generically "signs") get their meaning because of how they differ from other signs - not because they "bottom out" into simpler concepts. Yet another way to think about the definition problem is in terms of what is known as the descriptivist theory of names[3] in which nouns can be though of as just arbitrary short codes for potentially open-ended sets of things which are defined by their descriptions. I.e. a "customer" could be defined as the set of all objects that (a) buy products from us, (b) have addresses we can send invoices to, (c) have given us their VAT number. The same hermeneutic circle/Sauserrian issue arises here however as we try to take the elements of this description and bottom out the nouns they depend on (e.g., in the above example, "products", "addresses", "invoices" etc.). For extra fun, we can construct a definition that is inherently paradoxical and sit back as our brains melt out of our ears trying to complete a workable definition. Here is a famous example: [...]

What is law - part 12a

Wed, 07 Jun 2017 10:06:00 +0000

Previously: what is law part 12 Perhaps the biggest form of push-back I get from fellow IT people with respect to the world of law relates to the appealing-but-incorrect notion that in the text of the law, there lies a data model and a set of procedural rules for operating on that data model, hidden inside the language. The only thing stopping us computerizing the law, according to this line of reasoning, is that we just need to get past all the historical baggage of foggy language and extract out the procedural rules (if-this-then-that) and the data model (definition of a motor controlled vehicle, definition of 'theft', etc.). All we need to do is leverage all our computer science knowledge with respect to programming languages and data modelling, combine it with some NLP (natural language processing) so that we can map the legacy linguistic form of law into our shiny new digital model of law. In previous parts in this series I have presented a variety of technical arguments as to why this is not correct in my opinion. Here I would like to add some more but this time from a more sociological perspective. The whole point of law, at the end of the day, is to allow society to regulate its own behavior, for the greater good of that society. Humans are not made from diamonds cut at right angles. Neither are the societal structures we make for ourselves, the cities we build, the political systems we create etc. The world and the societal structures we have created on top of it are messy, complex and ineffable. Should we be surprised that the world of law which attempts to model this, is itself, messy, complex and ineffable? We could all live in cities where all the houses are the same and all the roads are the same and everything is at right angles and fully logical. We could speak perfectly structured languages where all sentences obey a simple set of structural rules. We could all eat the same stuff. Wear the same clothes. Believe in the same stuff...but we do not. We choose not to. We like messy and complex. It suits us. It reflects us. In any form of digital model, we are seeking the ability to model the important stuff. We need to simplify - that is the purpose of a model after all - but we need to preserve the essence of the thing modeled. In my opinion, a lot of the messy stuff in law is there because law tries to model a messy world. Without the messy stuff, I don't see how a digital model of law can preserve the essence of what law actually is. The only outcome I can imagine from such an endeavor (in the classic formulation of data model + human readable rules) is a model that fails to model the real world. In my opinion, this is exactly what happened in the Eighties when people got excited about how Expert Systems[1] could be applied to law. In a nutshell, it was discovered that the modelling activity lost so much of the essence of law, that the resultant digital systems were quite limited in practice. Today, as interest in Artificial Intelligence grows again, I see evidence that the lessons learned back in the Eighties are not being taken into account. Today we have XML and Cloud Computing and better NLP algorithms and these, so the story goes, will fix the problems we had in the Eighties. I do not believe this is the case. What we do have today, that did not exist in the Eighties, is much much better algorithms for trai[...]

The Great Inversion in Computing

Wed, 31 May 2017 10:32:00 +0000

Methinks we may be witnessing a complete inversion in the computing paradigm that has dominated the world since the Sixties.

In 1968, with Algol68[1] we started treating algorithms as forms of language. Chomsky's famous hierarchy of languages[2] found a huge new audience outside of pure linguistics.

In 1970, relational algebra came along[3] and we started treating data structures as mathematical objects with formal properties and theorems and proofs etc. Set theory/operator theory found a huge new audience outside of pure mathematics.

In 1976, Nicklaus Wirth published "Algorithms + Data Structures =  Programs"[4] crisply asserting that programming is a combination of algorithms and data structures.

The most dominant paradigm since the Sixties maps algorithms to linguistics (Python, Java etc.) and data structures to relational algebra (relational  databases, third normal form etc.).

Todays Deep Learning/AI etc. seems to me to be inverting this mapping. Algorithms are becoming mathematics and data is becoming linguistic e.g. "unstructured" text/documents/images/video etc.

Perhaps we are seeing a move towards "Algorithms (mathematics) + data structures (language) = Programs" and away from "Algorithms (language) + data structures (mathematics) = Programs"