Yes, it's been a very long time since the last post here in the ZoomClouds blog, and I think, like the title says, it's about time I'll post an update on the project.
While there hasn't been any major development on ZoomClouds in the last several months, yes we've fixed a few bugs here and there, and what's even more important than that, last week we took the ZoomClouds database and moved it to a dedicated server. Up until now, ZoomClouds was sharing the database server with other projects - of course each of them using their own database, but all running on the same server.
However, both ZoomClouds and the other projects were starting to slow each other down to a point that I figured it was about time that ZoomClouds got its own database server. And after a brief downtime of about 1 hour, all data was safely moved to a new server.
With this move, first I want to make clear that ZoomClouds is far from abandoned. Quite the opposite. Having now a dedicated database server is above all a commitment that we plan to continue running ZoomClouds for many years to come. Also, this move should have speeded things up a bit, despite the data in ZoomClouds just keep growing and growing every time a cloud is reanaylized.
Also let me tell you what things we're planning to do in the next few months.
First, complete and absolute internationalization. Someone has argued that ZoomClouds is good at extracting English terms and nothing else. That isn't quite true because the two word analisys tools used by ZoomClouds don't discriminate words based on language, although it is tue that our content analysis tool learns more as it processes more data, and since most of the content that ZoomClouds analyzes is in English, it may show that it knows a bit better how to handle terms English.
The "complete and absolute internationalization" project refers to enabling ZoomClouds to use and understand non-Latin languages, or in other words, to make it UTF-8 compliant. Today it is not, and so for example, clouds from a CJK language (Chinese, Japanese and Korean) look like crap.
The second improvement I plan to make within the next few months is to design a better caching system, so that the clouds do not need to be re-calculad everytime they're drawn. The current caching system is not terrible but it can be improved a lot, and that's what I plan to do.
The third improvement - but only after these other two are complete - is to offer a private label and ad-free version of ZoomClouds, so that you can display your tag cloud in your blog or website, and when people click on a tag, they are presented a page without ads (at least without our ads) and branded to resseblme your own site, so your visitors don't get the feeling that they've left your site. The ads- free service will not be free, but the private-label service will.
Anyway, that's it for now. I apologize for not having kept this blog a bit more up-to-date. As these new developments are made, I'll be posting them here.
We said stats would be coming. Well, they've come!
Now, when you're in your ZoomClouds account and click on any of your clouds, along with the usual menu options Design, Edit and Filters, you'll see a new option, Stats.
There you can see how many clicks were made on the tags of your clouds. You can see clicks by the hour, by day, week, month or year, and for each of them, you can get a small report to see what tags were clicked the most, as well as from what countries those clicks were made.
We've been noticing that as clouds were being built, sometimes there were terms that, despite being valid words that normally would be taken into account, due to the context around them, they should actually be ignored, yet, ZoomClouds was still considering them "important".
One of the most obvious examples is the text "Technorati tags: ...", so we'll use it as an example of what's been going on so far and how we've fixed it.
The text "Technorati tags: ..." often appears at the end of many posts. When ZoomClouds found the word "Technorati", it assumed it was a relevant term, and so it included it in the cloud. What happens is that, since in these cases, it's a term that appears in each and every post for that blog, the cloud ended up with the term Technorati as one of the most relevants in the cloud, if not the very most relevant one. Yet, chances are the blogger didn't even mention the word Technorati in his/her posts.
This forced some people to include "Technorati" as an unwanted tag, and while that somewhat fixed the problem, in fact, should this person ever write something about Technorati, it would never ever appear in the cloud, so it certainly wasn't a perfect solution.
Now, we've added a small behind the scenes context filter to ZoomClouds so when some terms are found within certain irrelevant contexts, these terms will not be considered when building the cloud. That's the case with the "Technorati tags" example, as well as a few others.
Starting right now, ZoomClouds is also keeping track of clicks made on each and every tag in each and every cloud.
Now, we have not prepared yet the pages where you can see how many clicks is your cloud getting, so don't look for them yet, but we will be implementing them soon and expect to have it "live" early next week.
When that's done - and we'll announce it right here - you'll be able to see things like:
How many people are clicking on tags in your clouds, with daily, weekly, monthly and yearly reports.
How many clicks any given tag is getting, as well as a list of "most clicked tags".
From what countries these clicks are coming.
Whether the click comes from a human or a robot or spider (or we might simply exclude robots and spiders from the reports, we're still thinking about this).
And perhaps a few more details.
The cloud and tag (obviously) that gets the click.
The IP from where the click is made. This is used only to figure out the country from where the click is made. We will not disclose IP addresses.
The country from where the click is made.
And the time and date when the click was made.
And that's it! Now, our approach to ZoomClouds is to keep it simple, but we just could not stop from offering this feature, as we believe it makes a lot of sense, and we hope it will help you see how and how often your cloud is being used.
Shortly after we posted the "We've been Techcrunched" post, we fixed the slowlyness of our in-house content analysis tool. We didn't announce it right away because well, yesterday was a really busy day in all counts.
In terms of speed, the improvements are amazing. We tested a 20 posts feed (200 words per post average) with our previous engine and it took about 75 seconds to process - along with putting up quite some stress on the server, nothing significant on its own, but it would be if we were to do that for 100 feeds simultaneously.
Then we tested the same feed with our new engine and it took less than 9 seconds!! That's including the time to connect to the feed and fetch it, which on average can take between 1 an 4 seconds, depending on how fast the other server responds. Ok, now that's a lot better. In fact, we believe that time-wise it beats Yahoo's API, but being fair, Yahoo's API also does things that our content analysis tool doesn't.
We had to make one small sacrifice though. And that's to ignore, as we analyze the content and select terms and words, all words of three characters or less, unless they contain at least one capital letter. In other words, if ZoomClouds updates your blog/feed and finds the word "drm" 25 times, it will ignore it anyway (unless you've added that word to your list of "wanted terms" of course). However, if it finds DRM, Drm, dRM, etc (you get the picture), then it will consider it. Two and one letter words are and were ignored before.
Other than that, the quality of our content analysis is exactly the same as before, but now it's over 8 times faster. Why didn't we do this before launch??? Ah...
By the way, would you be interested to know what are the three more popular terms in ZoomClouds clouds so far? The winner is blog, followed by Google and then podcast. That's interesting :-)
Well, sort of. No, the site hasn't gone down or anything like that, but at some points, updating a cloud might have seemed to take forever. Here's why and what are we going to do about it right away.
If you read my previous post I mentioned the two content analysis tools we use with ZoomClouds to extract relevant terms: Yahoo's and ours. I also mentioned that for user-generated updates (like when you build a cloud for the very first time) we only rely on Yahoo's content analysis API mainly because it's a lot faster, leaving our homemade content analysis tool for the behind-the-scenes updates.
But as I also mentioned, Yahoo currently has a 5,000 calls limit for every 24 hours, and when that limit is reached, we have no choice but to use our - not worst but a lot slower - content analysis tool. And that's exactly what happened today. We reached the 5,000 calls limit rather quickly and ZoomClouds started to use our tool instead, resulting in some clouds taking well over a minute, sometimes even two, to get completely analyzed and built for the very first time.
That is not acceptable. So we went back to the drawing board and tried to come up with a way that, without giving up functionality at all, would do the processing a lot faster. And after some time brainstorming and scratching our heads, I think we've got it. Now we'll be working in developing, testing and implementing this new approach, and I expect to have it live-to-site within 24 hours. If it actually turns out to be really efficient, we might even include it on each and every update, whether it's user generated or not. I'll keep you posted!
ZoomClouds uses two different content analysis tools. One is the Yahoo! Content Analysis API. The other is our own content analysis tool.
When ZoomClouds uses the Y! API, then ZoomClouds is acting as a mash-up, by making use of the data sent by the Y! API. When ZoomClouds doesn't use the Y! API, then it isn't.
So when does ZoomClouds use one or another tool? Simple. When you just build a cloud, update it, reload it, etc. ZoomClouds tries to use the Yahoo! API first. Why? Because it's much faster than ours (meaning where the Y! API takes 2 seconds, our content analysis tool takes 6) and sometimes it's better at picking new terms.
On the other side, our content analysis tool takes a bit longer, but it's really good at remembering things. More often than not, our content analysis tool does a better job at extracting terms from blogs it already knows. But because it's somewhat slower, we let it do its job behind the scenes when it's time to update the cloud when nobody's looking. Unfortunately that happens at most once a day.
Let's look at some examples. Here's a cloud that used only Yahoo's content analysis tool.
(image) And here's the same feed but using only our content analysis tool: (image)
They're a bit different, but they're both, um, good looking clouds. Notice "our" cloud has a nicer weight distribution though.
Sometimes Yahoo's tool fails miserably, we're not sure why. This is a cloud from a blog after using Yahoo's API:
(image) And this is the cloud we've got when we noticed how terrible job Y!'s API did and we ran our tool instead: (image)
Now we're talking!
There's one more instance when our content analysis tool will take over Yahoo's and that's if we've ran out of our daily Y! API quota. Basically, after 5,000 calls in a 24 hours period, Yahoo won't accept any more API calls. Well, if we get over that quota, we wouldn't be able to create or update any cloud until that 24 period was over, so when that happens - or anytime the Y! API fails for whatever reason - then our content analysis tool will take over.
Ideally what we'll eventually do is, when you build your cloud for the first time, run Yahoo's API first, then immediately our own content analysis tool. But for that, we need to optimize our algorithms a bit more, so that you don't have to wait say 30 seconds to get your cloud updated.
Could we live without Yahoo's API? Yes, certainly we could. But as long as we can use two content analysist tools instead of one, what's the harm? :-)
As you know, when you create your cloud, you are asked how many days worth of content you'd like to be taken into account when building your cloud.
Some people mistakenly think that if they select "Forever" the cloud they immediately get is based on everything they've published on their entire blog since day 1, or if they select "the last 365" the cloud they get right away is based on the last year, and so on.
That's not true.
When you create your cloud, ZoomClouds will go and fetch your feed, and the cloud that you get at that moment is based only on whatever content is in your feed at that time, which usually is the last 10-15 posts.
What the "Days worth" option indicates is, as time goes by, how many days worth of content you want your cloud to reflect from the date you create your cloud.
For example, a simple call without drawing dots nor writting the weight of the tags, looks something like this:
There, the letter d tells ZoomClouds that it must draw the dots between tags, and the letter s instructs ZoomClouds to write the weight of each tag. If you remove the letter s, then the weight won't show up. We call these letters "extra commands". They tell ZoomClouds to do certain things that usually cannot be done just by manipulating the CSS code.
The million dollar question now is - are there other extra commands? Sure there are. Not many so far, but just enough to start documenting them, so you can take advantage of them if you like.
Each command is indeed one single letter (all lowercase so far). Not very intuitive but it helps keeping the URLs short and to the point.
The other active extra commands so far are:
If a tag takes up more than one word, the n command tells ZoomClouds not to break the tag into two lines, and keep it always in one line. You could think of this as a nowrap command.
This command forces each tag to appear on a single line. This is the same as saying that instead of a tag cloud, you'll get a list of tags, one tag per line.
Instead of showing the tags in alphabetical order, sort them by weight, starting with the most relevant tags (the ones using the largest font). I personally like this option.
Normally, when you click on a tag, you get a results page on the same window. When you use the o command, you force the browser to show the results page on a new browser window.
Ignore the assigned tag colors and instead assign a random color to each and every tag - out of a total of 12 colors. The random colors look better over a white background, so we suggest not to use this option if you're using a background color other than white.
The order of the commands is irrelevant. If you want to use the wods commands, it doesn't matter whether you add dows, owsd or wosd. It is important however to use only lowercase letters.
And that's it for now. If you have any other suggestions, comments are open!
ZoomClouds has a very simple API that allows you to take the results from your cloud and do with them, well, whatever you want. Let's describe first how it works and later I'll throw a couple of suggestions.Call formatLet's say that the name you gave your cloud is BDSV. Then, you can call ZoomClouds with the following URL:
http://www.zoomclouds.com/xml/BDSV/30/And ZoomClouds will return an XML page, very easy to parse. In the example above, the number 30 indicates how many tags we want the tag cloud to have. You can enter a number between 5 and 100. Therefore, the format of the call is:
http://www.zoomclouds.com/xml/[name of your cloud]/[number of tags]/Response formatThe response format will be something like this (we use colors here just to emphasize the different elements):
API3 http://www.zoomclouds.com/tag/BDSV/API tomato12 http://www.zoomclouds.com/tag/BDSV/tomato etc... That is... The response starts with a entity that has a "name" attribute, where the cloud name is described.Within the entity, there is one entity with three attributes:"count" tells you how many tags are included in the results."maxweight" is the largest tag weight you can find in this cloud."minweight" is the smallest tag weight you can find in the cloud.Within the entity you can find all the tags in the cloud, each of them within its ... block.Each ... block comes with three sub-entities:
The name of the tag, UTF-8 encoded
A positive integer indicating the weight of this tag.
Possibly redundant, so we'll describe it as optional. It indicates the URL associated to this tag and cloud.And that's it. What could you do with something like this? Well, a lot of things, although obviously they won't be as trivial as copying & pasting a given piece of code. Some ideasThe first advantage is that by using this sort-of API, you get 100% control over your cloud's data. That's not to say that you had no control over it without the API - we never stop claiming that you can customize itto your hearth's content - but now you can also "take" the tags and use them in any other way you want, not just to build a tag cloud.For example, something that might be interesting could be to link your cloud data with tags in Flickr and build a mosaic of Flickr pictures based in your tags. Or building a Flash app that uses your tags in whatever way you want.You could also build your own tag cloud, or a keyword catalog, or a directory and many other things.The only requirement is that whatever you build, you reference it with a link to ZoomClouds, whether using the icon, or with a small text link such as "Powe[...]
It's not unusual to see ZoomClouds at the very beginning not selecting the very best results you'd expect. It's not normal to see virtually all results to be completely out of focus either, but to find a few terms that make you go "ok that shouldn't be there" is not a very rare occurrence. It is something that will self-correct within a short period of time, and it has a very logical explanation, due to several reasons:
At first, ZoomClouds will analyze the content it finds in your RSS feed, which will likely only include your last 10-15 posts. As you write more articles, ZoomClouds will have more content to analyze so when it has to calculate what the most relevant terms are from new content, the results will be more accurate.
Even more important is that ZoomClouds doesn't just analyze what's in your feed and accumulates the data, but in fact, when it has to analyze new content, it remembers the results of all of its previous analysis and takes that knowledge into account when it comes to determine what's relevant and what's not. You could say - in fact, you can say - that ZoomClouds becomes smarter and sharper as it finds new content from you.
One important detail is that when ZoomClouds analyzes new content for your cloud, not only it takes into account the results from analyzing past content from you, but it also takes advantage of anything else it has learned from other blogs, so in general terms, the intelligence ZoomClouds acquires from each and every feed is later applied as it analyzes new content for each and every cloud.
And last, how could you have a great content analysis system without some human touch? When you look at the results from analyzing your feeds, you can make those results even better by defining not only unwanted terms (terms you don't want to see in your cloud no matter what) but also wanted terms, that is, terms ZoomClouds apparently missed but that you find them to be relevant. It is obvious that when you enter these wanted or unwanted terms, not only you're helping ZoomClouds to better analyze your content, but you also help ZoomClouds in general terms to better analyze future content.
Therefore, if you just created a cloud and you're not completely happy with the results, it's just a matter of time - hopefully not long - until your cloud starts to show much more relevant results.
Hint: If all you do is create your cloud but do not place it anywhere, ZoomClouds will not learn much from your content. That's because ZoomClouds doesn't just update your cloud periodically simply because you created it, but it needs to see that your cloud is also being shown to people. Just one visit a day will do. If you don't place the cloud anywhere, ZoomClouds will think "well, nobody's visiting this page, so why should I update the cloud?". You can of course come to your ZoomClouds account and update it manually every day, but it's much better - and convenient - to simply place your cloud and let the magic happen behind the scenes. This may change in the future, but for now, that's how it works.
When you're asked to enter the URL, you're not supposed to enter your blog's URL, but your blog's feed URL, more specifically your blog's RSS or Atom feed URL.
These are the only two formats so far understood by ZoomClouds: RSS and Atom. If you enter a URL that is not a RSS/Atom feed, you will most certainly get an error. Nowadays virtually all blogging platforms offer RSS or Atom syndication feeds (or both), so if you're not sure what your feed URL is, check with the support team at whatever place you're hosting your blog, or if you're using some particular software, check the software's documentation.
For example, if your blog is in Blogger.com and the blog URL is http://myblog.blogspot.com/ , then the feed URL (the one you need to give to ZoomClouds) is http://myblog.blogspot.com/atom.xml
If you have a blog with ZoomBlog.com, and the blog URL is http://myblog.zoomblog.com/ , then the feed URL is http://myblog.zoomblog.com/rss.xml
After a couple of weeks having ZoomClouds accesible to just a couple of dozen of beta-tester friends, today we officially open its doors to the public.
I'd like to start by clarifying a few things and getting some FAQs answered, but since we're going to have the FAQ right here on this blog, I'll be posting about all that on separate posts during the next coming days.