Language: English
api  blog  google api  google  log  number hits  number  ontology learning  ontology  population  prolog  semantic web  web 
Victor de Boer's Blog on Stuff, Life and Things

Updated: 2015-09-16T17:49:53.241+02:00


So I kinda stopped posting here a long time ago, but I am contributing to a Dutch group blog: geencommentaar, so everybody head over there!

Most Dangerous idea


Every year, The Edge asks a number of top scientists and other thinkers a single question. This year the question was: "WHAT IS YOUR DANGEROUS IDEA?"

Some very cool people contributed (Rodney Brooks, Daniel Dennett, Freeman Dyson, Richard Dawkins, V.S. Ramachandran, Robert Shapiro, to name a few). One of my favorites is the Daniel Dennetts most dangerous idea: There aren't enough minds to house the population explosion of memes. I love memes...

Sint Game


For our first anual Sinterklaas game is coming up and these are the rules:
Each one of us bought 2 or more pakjes with a combined value of 5 Euros. These pakjes are put on a big pile. Then everyone gathers around in a circle and on his or her turn throws two regular dice:

2 - Give one pakje away
3 - Put one pakje back on the pile
4 - Choose someone to take a pakje from the pile or from someone else
5 - Everyone gives their pakjes to their left neighbour
6 - Choose someone with a pakje to unwrap a pakje
7 - Nothing happens
8 - Take one pakje from the pile
9 - Unwrap one pakje of your own
10 - Swap a pakje with someone or the pile (if you have nothing, just take one)
11 - Take two pakjes from the pile or someone
12 - Take all pakjes from one person

Continue throwing the dice untill all pakjes have been distributed. After that, three more rounds are played. After that everyone keeps their pakjes and unwraps any unopened ones.

Blog du Olivier + Bullettime


A journalist friend of mine started a blog, so I thought I would help him skyrocket his google ranking by adding a link from my little blog. Here you go:

Olivier van Beemen's blog

Ok, while I'm typing away, I made a Matrix-esque 'bullet-time' video at a friends party. I guess it is pretty a in-crowd thing, but why hide my art from the world:

Biktie Bullettime (.wmv, not to big)

New Grouplog


Mattijs (the one from the tijsepijs log) came up with a great idea today. We should create a group blog for anyone at the HCS Lab.
Creating Blogs being so easy, we did just that and now we have the group blog: Human-Computer Blog (for current lack of a better name). In the following weeks we will have to see if this will be as big a succes as the Biktorrr blog :)

SSSW 2005


By the way, I attended the Semantic Web Summerschool 2005 in Cercedilla, Spain. IT was really cool and I met a lot of nice people working on similar topics. And it is really nice to hear the SW stuff from the 'Awesome Superstars of the Semantic Web' themselves.

Most of the students are now member of the sssw05 yahoo group, and a lot of people put their pictures online. I stole a (image) couple of them and put them up on my own webspace.

One of the cool things about the Summerschool was that we were to do a lot of practical stuff. This was either in the form of the hands-on sessions or a mini-project. I was in a mini project with three other students (Rinke, Tom and Jan). Tom came up with a cool idea to envision a semantic framework for lonely hearts ads. This actually landed us the third prize! (An alarm clock)

Google API


Ok, so it has been over three months since my last post, but hey, who said I would post regularly?I actually am making this new post for two reasons. First a Greek Ph.D. Student read my blog on Ontology Learning and sent me a link to his homepage. It seems he is working on this kind of stuff. That encouraged me to make a new blog post. The second reason was that I did some experiments with the Google API: content!So I read this paper by Rudy Cilibrasi and Paul Vitanyi on the Normalized Google Distance (NGD), which deals with using Google to extract meaning from the web by exploiting the redundancy of knowledge. It is a really nice paper, very intuitive and firmly grounded in complexity theory as well. In short: the NGD between two concepts is determined by taking the number of hits each concept (term) has on a Google query (NrHits1 and NrHits2) and comparing this to the number of hits the Google query composed of the boolean combination of the two concepts (NrHit1+2). This is normalized by the total number of Google-indexed pages (M). The complete formula is:(MAX(LOG(NrHits1), LOG(NrHits2)) - LOG(NrHits1+2))/(LOG(M) - MIN(LOG(NrHits1),LOG(NrHits2)))Anyway, I decided to play around a little bit with this Google distance in my chosen domain: Artists and Art styles. I ran a couple of test. One consisted of calculating the NGD between an Art Style concept ('Impressionism') and Artist Names ('Vincent van Gogh','Manet',...). I found these results (Table shows NGD and name of Artist):0.171111166168 Monet, Claude0.41242976086 Hassam, Childe0.42292288857 Frieseke, Frederick Carl0.425858890051 Gogh, Vincent van0.438108128715 Pissarro, Camille0.456466856759 Morisot, Berthe0.479547692307 Caravaggio, Michelangelo Merisi da0.488549764746 Nolde, Emil0.488922434217 Manet, Edouard0.496153921128 Rembrandt Harmensz. van Rijn0.497940393553 Degas, Edgar0.505376793035 Warhol, Andy0.519882855212 Goya y Lucientes, Francisco Jose de0.525532648974 Picasso, Pablo0.55359075495 Munch, Edvard0.565804822367 Dali, SalvadorEspecially the big difference between number 1 and 2 is pretty weird. Anyway I decided to check on the actual number of hits the google API returned to me, this is what I found (Table shows Name of Artist; Number of hits accoriding to Google API; The number of hits according to the manual search on the Google web page and how these two are related):API Google Web Page API/Web Page Monet, Claude 67200 368000 5.476190476 Gogh, Vincent van 7520 40200 5.345744681 Manet, Edouard 34500 184000 5.333333333 Warhol, Andy 44500 237000 5.325842697 Goya y Lucientes, Francisco Jose de 318 681 2.141509434 Degas, Edgar 44400 237000 5.337837838As you can see, the Google API is normally off by a factor 5.3, but sometimes (in Goya's case) by a completely different factor. Either the Google API is wrong or the normal web based search estimate of the number of hits is off.The google API news group actually reported the same problem with the number of hits Google API returns. Apparently Google is aware of this problem but doesn't really fix it. And since automatic invocation of Google web pages through http modules is not in compliance with the Googles Terms of Use I wonder if there is a legal way to obtain good values to use in calculating NGD values. (I hope Cilibrasi and Vitanyi didn't use the values the API gave them in their experiments)[...]

Films I have seen and you should too


(A highly subjective, yet alphabetical, list)

2001, A space oddyssey Aguirre, der Zorn Gottes Akira Alien Aliens Apocalypse Now Barbarella Ben-Hur Blade Runner Blow Out Blow Up Blue Velvet Brazil Breaking the Waves Casablanca Casablanca Citizen Kane Clerks Close Encounters of the Third Kind Das Boot Delicatessen Dogville Donnie Darko Dr. Strangelove Epidemic E.T. Eternal Sunshine of the Spotless mind Europa ExistenZ Fargo Faster Pussycat Kill Kill Festen Fitzcarraldo Full Metal Jacket Ghost in the Shell 1 + 2 Himmel Uber Berlin Iron Monkey Jaws Jules et Jim Lost in Translation M Magnolia Manhattan Metropolis Mony Python and the Holy Grail Mulholland Drive Nosferatu On the Waterfront One Flew Over the Cuckoos Nest Paris, Texas Pi Raging Bull Raiders of the Lost Ark Ran Rashomon Rear Window Riget I+II Salo: 120 days of Sodom Seven Samurai Seventh Seal Spirited Away Star Wars Stranger than paradise Taxi Driver The Big Lebowski The Day the Earth stood still The Deer Hunter The Godfather 1 + 2 the Good, the Bad and the ugly The Graduate The Shining The Shining The Silence of the Lambs The Wizard of Oz The Wizard of Oz Tonari No Totoro Twelve Monkeys

OLP-AIO's Workshop


Argh! I had to re-type this whole post since blogger crashed on me! Well here we go again...

Yesterday we organised a workshop at the HCS Lab for the Ontology Learning and Population AIO's in the Netherlands. We ended up with zeven AIO's and two M.Sc. students. Four talks focussed on real Ontology and Population and three talks described the perifery of the field I'm working in.

All in all I think it went really well. I guess the main reson for this was the relatively small group of participants and the length of the talks (15-20 mins). This ensured that we could dive into the subjects we are interested in immediately and do not have to waste our time with introduction slides (layercakes and OWL and such :).

Things we concluded:
- Ontology Learning and Population is a useful thing to try to do
- Generic is a new buzzword
- Coffee helps
- Evaluation still is a problem and we should stress this everywhere we can.
- Cool names for the group are: Ontology Club, OLP-AIO's, Generic something something...- We should do this again

Ps. I've added another Fellow Blogger to my list. Welcome to the Blogosphere, Bikkie

How people look at Google result pages


A nice bit of popular Human-Computer interaction: This page has a picture visualising how people look at the result pages of Google.


HCS Snowman


This wednesday, after the 'laboratoriumvergadering', we decided to build a HCS-snowman. I didn't want to hold back the pictures I took any longer so here htey are:

The HCS Snowman

Snowman, Sophia and Mattijs

Snowman and Simon

Short Animation Oscar Winner


Last sunday the 77th Academy Awards were awarded. Of course, the awards are pretty meaningless and don't tell you much about which films you forgot to see this year. There are only a handfull of categories that really matter and the one that interests me most is best animated short film. One nice thing about it is that you can view it on the web.
I watched it just now an it is sheer genius.

So please, do yourself a favor and watch 'Ryan' by Chris Landreth.


Framework for using ontological information for OLP


I was just thinking about a possible next step in my Ph.D. research and I wanted to write it down somewhere. So I thought, why not on my weblog. I don't yet know if it makes sense at all or if it has been done before, but just look at it as a stream-of-consciousness type of post.

So, I am trying to do all kind of smart stuff to extract instances for given ontologies. Up untill now I have not yet actually been using any information from the ontology itself other than the descriptors (and synonyms). I have always suggested that one day the information that is actually in the ontology should be used in extracting new instances for that ontology (population) or even to expand the ontology (enrichment).
A framework or methodology to do this would be nice :) Now of course I would like this framework to be as generic as possible. So my idea was to start with guidelines for extraction using ontological information stated in OWL.

For instance:
Suppose we know that two classes are disjunct (like Birds and Mammals) we can use information that if we extract an instance for one class (say 'parrot' as an instance of Birds) we know it can't also be an instance of Mammals. This is a pretty obvious example, but it could be of use for the Ontology Population problems i try to deal with now: to determine what painters are considered Impressionist or Expressionist.

The same thing could be done for OWL constructs as 'same_as' and more elaborate class descriptions with restrictions and such. If anyone has any ideas from the top of their heads about this I would like to hear about it.

Anyway, I will discuss it with my supervisors but first I will have to wrap up and write down what I have been doing up 'till now.

SIKS course


So as I said, I went to a SIKS course on thursday and friday.
SIKS is my research school and one thing it does is organising courses on topics relevant for its students.
This one was on 'computational intelligence', a title that didn't reveal to much. It turned out to be mainly about (Fuzzy) Machine Learning and Data mining. This course was an 'advanced' one and it proved to be. A number of speakers provided us with their research. I found it to be the most interesting SIKS course I have had to date.
I include two pictures here.

One of the speakers.

The venue had an actual 'koffie concept'.

notable quotes:
"Reality is here"
"Let's go fuzzy"
"We could do as if"



I haven't been blogging for a couple of days because of two things:
First of all I went to a SIKS course (see also my next post) and second, I was sick all weekend, monday and tuesday.

Although there seems to be a Flu epidemic around, I think because of the short time it took me to recover it will probably have been just a common cold. Anyway, I got sick of sitting at home, so I went to the Watergraafsmeer to check my mail and do some work.

And some blogging of course.

Swi-Prolog and Google


I was wondering what percentage of Prolog users out there use SWI-Prolog as their implementation. (I know I do!). I ran google on the different implementations and compared the number of page hits. This gives at least an indication of the percentage of SWI-Prolog users:

SWI-Prolog 97.200 47.4
SICStus Prolog 40.300 19.7
GNU Prolog 19.300 9.4
Strawberry Prolog 18.000 8.8
XSB + Prolog 13.800 6.7
Amzi! Prolog 5.390 2.6
B-Prolog 3.810 1.9
Trinc Prolog 2.500 1.2
Open Prolog 1.570 0.8
TuProlog 1.040 0.5
LPA Prolog 818 0.4
YAP Prolog 709 0.3
hProlog 190 0.1
ilProlog 155 0.1
NanoProlog 47 0.1
CxProlog 27 0.1

So, 47.4% that's not too bad Jan!



I got this neat Bluetooth dongle thingy for christmas this year, so now I can download cool hi-res pictures from my even cooler T630 phone and start MOBLOGGING (don't you just love the language virtuosos that inhabit the Web).

So heres my first attempt at moblogging:

I went to Zeeland recently and took this picure, i think it looks windy. In the front you can see the AutoVanDeMoederVanDeBiktie, the coolest car on the planet.

Grave of the Fireflies


I saw 'Grave of the Fireflies' ('Hotaru no Haka') last night. It has been on my wish list for quite some time now, as one of the few Studio Ghibli films I hadn't seen yet. Studio Ghibli is the Anime studio that produced the Miyazaki classics like Mononoke Hime, Spirited away and one of my favorite movies of all time: My Neighbour Totoro.

Grave of the fireflies is a bit different, though. It must be the saddest film I know, without being melodramatic. It's about how after the Kobe firebombing, a boy and his little sister try to make ends meet. After they have lost their mother in the bombing, they live at a distant relative's house, but they aren't welcome there for long. They move to an abandoned bomb shelter and try to stay alive there.

Man it's sad... Please go see it if you can!'s page on Hotaro no Haka

Ontology Learning and Population for the SW


In this post I will try to sum up what I believe my research is about. Let's see if I can get the message across in a couple of lines.
The one-sentence-description is something like:
Ontology Learning and Population for the Semantic Web using Heterogeneous Sources on the Web

Semantic Web
The Semantic Web is both a vision and a project to boost the World Wide Web to a next level. In the Semantic Web, information is represented in a knowledge-rich way and, more importantly, that knowledge can be read and processed by computer programs (Machine Readable). This in contrast to the mostly human-only-readable format it is in now (i.e. Natural Language).

The backbone of the Semantic Web is formed by ontologies, formal representations of the knowledge from a certain domain. Ontologies are actually taxonomies with relations between concepts ( "horse" is_a "animal", "animal" is_a "thing", "animal" eat "food" etc.) The ontologies for the Semantic Web provide us with formal and unambiguous representations of certain knowledge.

Learning Ontologies
However, there aren't a lot of these ontologies around and constructoing them by hand is time-consuming and error-prone. So, what we are trying to do is to construct these formal, unambiguous ontologies automatically from the available knowledge hidden in the informal, highly ambiguous pages of the World Wide Web. Extracting the concepts, the hierarchical structure and the inter-conceptual links is called ontology learning and finding the actual instances concepts is called ontology population.

All this, people,is not easy.

I am currently working on a document detailing the exact task of Ontology Learning and population and the different subtasks. I will put that or an abstract on this Blog as well.

I understand that this post doesn't fully explain anything, so here are a couple of links to WWW pages that might answer your questions:
Wikipedia on the Semantic Web
Wikipedia on ontologies
Ontology Learning and Population Workshop at ECAI 2004
Only book I know of on OL&P



Ah, i've sinned against my first post already and didn't manage to post here yesterday. I am still thinking about what this blog should be about. Should it be a life-log or should I just put my research things up. A third option is to not make this decision yet. That sounds more like me! I'll take it!

Anyway, I have added a link to the new blog of my dear friend Merlijn, who tomorrow will undergo a pretty cool operation that should restore full capabilities in his knee, hence the name knielog. They will take a piece of his hamstring to replace his kneestring.

Hope all goes well and I wish you a speedy recovery!

Trackback added


So I installed a trackback service (from Haloscan). I'm not quite sure what the benefit is exactly, but as I understand it, a link is created between the log that links to another log. I did this because of Anjo's talk in which he mentioned that the trackback feature allows him to research the interlinkedness of posts and blogs.
So, i tried to trackback to this article on Anjo's page.



So let's see how this Blogging thing works.
This being my first post, I guess it would be a good idea to link to some other bloggers.
First of all, I think the final thing that made me try out this blogging was a talk about Weblog research from Anjo Anjewierden at the HCS Lab (where I work and do my Ph.D. work), you should visit his blog to find out more cool stuff about blog research.
Another interesting blog is from a friend who went to Palestine to research the how and why of the Palestinian resistance, for more information I suggest you check out Martijn Dekker's Blog.

Update: Ok, I am fairly happy now with the way everything looks and works. To committ myself to this blogging business I'll try to post at least one post per day on weekdays and we'll see where it goes from there.

Update 2: I have added a new fellow blogger to my list: Mattijs Ghijssen is another Aio at the HCS lab, he also (re)started blogging thanks to Anjo.