Subscribe: Igor Minar's Blog
http://net3x.blogspot.com/feeds/posts/default?alt=rss
Added By: Feedage Forager Feedage Grade B rated
Language: English
Tags:
code  confluence  disk  file  glassfish  good  grizzly sendfile  grizzly  make  sendfile  site  ssh  time  users  wiki 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Igor Minar's Blog

Igor Minar's Blog



A Sudden Burst of Ideas...



Updated: 2018-01-06T08:41:06.530-08:00

 



Change of Status

2010-10-24T15:09:54.421-07:00

$ sqlplus -s
SQL> connect hr@oracle.com/hr
SQL> UPDATE employees SET current = false WHERE email = "Igor.Minar@oracle.com";
SQL> COMMIT;
SQL> disconnect
SQL> exit
$ curl -X POST -H "Content-Type: application/json" \
   -d '{ "firstName":"Igor", "lastName":"Minar"}' \
   http://google.com/employee/
(image)



Thanks for All the Fish

2010-10-24T15:11:35.115-07:00

Hi guys,

Most of you don't know, but today is the 3rd birthday of wikis.sun.com. In 2007 a bunch of us decided that it was worth it to boldly go where no man has gone before and on August 3, 2007 we launched wikis.sun.com.

At that time very few corporations were actively using some kind of wiki internally and there was no known significant public wiki deployment run by a corporation. We were astonished to see the uptake and user interest and watched the project grow from a few power users and a few dozens of wiki pages to tens of thousands of users and tens of thousands of wiki pages.

Thank you all for contributing, providing feedback and helping us to make the project successful.

Despite being a small team (did you know that officially there wasn't a single person working on wikis full-time?), we managed to get a lot done and I'm very proud of our accomplishments.

I do believe that there is still room for improvements, but I made a decision that these improvements will have to be implemented by someone else. I have found a new challenge that I'm going to pursue and unfortunately it's time for me to hand the wikis project over to a new group that will oversee the operations and development of the site. I did my best to make this transition as smooth as possible and I'm hopeful that the site will be in good hands.

Just by chance my last day at Oracle coincides with wikis' 3rd birth day, I'll take that as a good sign. I wish you all the best and I hope to see you around. Internet is a small place.

Good luck to you all.

Cheers,
Igor

PS: If you want to stay in touch, you can find me at linkedin at: http://www.linkedin.com/in/igorminar

(image)



DGC VI: Wiki Organization and Working with the Community

2010-10-24T15:17:34.443-07:00

This blog post is part of the DevOps Guide to Confluence series. In this chapter of the guide, we’ll have a look at wiki organization and working with the user community. This post is going to be more subjective than the others, because the recommendation I'm going to make apply to a wiki site with similar goals and purpose as ours. I'm just going to share our experience and hopefully some of it will be useful for others. The Purpose First thing that should be clear for you when building a wiki site is what is the purpose that it's going to serve. Confluence has been successfully used for many purposes ranging from team collaboration, documentation writing, to website CMS system just to mention a few. When our team set out to build a wiki site, the goal was to create a wiki platform that could be used by anyone in our company to publicly collaborate with external parties without having to deploy and maintain their own wiki. It was a pleasant surprise when one of the first groups of users who joined our pilot three years ago were technical writers eager to drop their heavy-weight tools with lots of fancy features in exchange for lightweight and more importantly inclusive collaboration tool. The main issue they were facing was that their processes and tools were very exclusive, and next to impossible for a non-writer to quickly join in order to make small edits. This resulted in lots of proxying of engineering feedback, and inevitable delays. With a wiki, the barrier for entry is very low for almost everyone. There is nothing to install or configure, a browser is all one needs. A wiki allowed a relatively small and overloaded team of technical writers to more efficiently gather and more importantly incorporate feedback from subject matter experts into the documentation. Of course there were trade-offs, mainly in the area of post processing the content for printable documentation (i.e. generating PDFs), but I'm hopeful that as the wiki system matures, more attention will be paid to make this area stronger (Atlassian: hint hint). Anyway, with the tech writers on board, the purpose, goals and evolution of our site got heavily influenced by their feedback. In exchange we received a lot of high quality content that attracted new users who started using the wiki. This kind of bootstrap of the site greatly helped to speed up the viral adoption across our thirty-thousand-employee company. Wiki Organization When we launched our site three years ago, there were no other big corporations with a public facing wiki site (many corporations didn't even have an internal wiki yet, boy that has all changed since then), this put us into a position where we had to be the first explorers in search of best practices as well as things that didn't work at all. Fortunately, since our team successfully pioneered the area of corporate blogging before the wikis launch, we had some experience with building communities that we could leverage. Some of the main principles that we reused from our blogs site were: Make the rules and policies as simple as possible It is a goal shared by all employees to create a good image of the company and make the company succesful. We should trust their judgement and empower them to be able to do the right thing. The team running the site is small, so the employees should be able to do as much as possible on their own (self-provisioning FTW!) Since we trust our employees, we should delegate as much decision making and as many responsibilities as possible, and let them delegate some to others, otherwise we won't be able to scale. There should be very little (close to none) policing or content organization done by the core team. We don't have the man-power for that. Besides, the Internet is not being policed by anyone and things tend to just work out. The popular, well organized and valuable content bubbles up, in one way or another. Implemented Actions With our principles laid out, we took these actions: We integrated Confluence with our single sign-on an[...]



DGC V: Customizing and Patching Confluence

2010-10-24T15:25:26.217-07:00

This blog post is part of the DevOps Guide to Confluence series. In this chapter of the guide, we’ll have a look at how to customize and patch Confluence. Customizing Confluence Before we talk about any customization at all, I need to warn you. Any kind of customization of Confluence (or any other software) comes with a maintenance and support cost. The problems usually arise during or after a Confluence upgrade, and if they catch you unprepared, you might get yourself in a lot of trouble. Keep this in mind and before you customize anything, justify your intent. There are several ways how to customize Confluence. For some the maintenance and support cost is low, others give you lots of flexibility at a higher cost. So depending on your needs and requirements you can pick one of the following. Confluence User Macros I already mentioned these in the Confluence Configuration chapter — they are easy to create and usually don't break during upgrades, but they are a nightmare to maintain. Avoid them. Confluence Themes, HTML Headers and Footers You can easily inject html code in the header and footer by editing the appropriate sections of the Admin UI (described in the config chapter). If this html code contains visual elements, then it's possible that your code will break during upgrades. In general I would avoid editing headers and footers in this way as much as I could unless I was doing something very simple. Confluence themes are the way to go. You can either pick a theme that was already built and published by someone else, or you can build our own. Building your own theme will give you the most flexibility, but the cost of maintaining and supporting it will be the highest. You can do some things to cut corners, but be prepared to do some Confluence plugin development (a Confluence theme, is really just a type of Confluence plugin). What worked well for me and our heavily customized theme, is to create our theme as a patch for the Confluence default theme. I simply symlink all the relevant files from Confluence source code into a directory structure that can be built as a Confluence theme/plugin, add my atlassian-plugin.xml and patch the files with changes I need no matter how complex they are. The advantage of this approach is that my theme will always be compatible with my Confluence version (after rebase) and I get all the new features introduced in the new version. The downside is that I often need to rebase my patches during Confluence upgrades, but with a good patch management solution (see below) this headache can be greatly minimized. Lastly there is Theme Builder from Adaptavist. I haven't personally used this Confluence plugin because it was not popular when we initially created our theme and it was not desirable for us to depend on yet another (unknown at that time) vendor during our Confluence upgrades. If I were about to start creating a theme from scratch I would compare it with my patching method and see what gives me the most benefits. The main concern with Theme Builder I have, is my ability to version control the theme, which if not easily possible might be the deal breaker for me and many others. Confluence Plugins I mentioned Confluence Plugins already in the previous chapter, so I'm not going to repeat myself here. What I'm going to add is that you really can extend and customize Confluence in crazy ways via the plugins. You can either discover the existing plugins at Atlassian Plugin Exchange or you can build your own with Maven (or the Plugin SDK), Java (or another Java compatible language) and Atlassian Plugin Framework. The nice thing about plugins is that they are encapsulated pieces of code that interact with the rest of Confluence via public API and additionally they are hot plugable. This means that in theory they should work after a Confluence upgrade and that you can install and uninstall them on the fly without a need for restart. While the latter is true in practice, the former is not always the[...]



DGC IV: Confluence Upgrades

2010-10-24T15:29:07.297-07:00

This blog post is part of the DevOps Guide to Confluence series. In this chapter of the guide, we’ll have a look at Confluence upgrades. Confluence Release History and Track Record I started using Confluence at around version 2.4.4 (released March 2007). A lot has changed since then, mostly for better. In my early days, Atlassian was spitting out one release after another — typically 3 weeks or less apart — followed by a major release every 3 months. You can check out the full release history on their wiki. This changed later on and recently there have been fewer minor releases and bigger major releases delivered 3.5-4 months. Depending on your point of view this is good or bad. It now takes longer to get awaited features and fixes, but on the other hand the releases are more solid and better tested. For major releases, Atlassian now usually offers Early Access Program, which gives you access to milestone builds so that you can see and mold the new stuff before it ships. Contrary to the past, the minor versions have been very stable lately and have contained only bugfixes, so it is generally safe to upgrade without a lot of hesitation. The same can't be said about major releases. Even though the stability of x.y.0 releases has been dramatically improving lately, I still consider it risky for a big site to upgrade soon after a major release is announced. Wait for the first bugfix release (x.y.1), monitor the bug tracker, knowledge base and forums, and then consider the upgrade. Having gone through many upgrades myself, I think that it is a good practice to stay up to date with your Confluence site. We have usually been at most one major version behind and frequently on the latest version, but as I mentioned avoiding the x.y.0 releases. This has been working well for us. Staying in Touch and Getting Support In order to know what's going on with Confluence releases, it is a good idea to subscribe to the Confluence Announcements mailing list. This is a very low traffic mailing list used for release and security announcements only. Atlassian's tech writers usually do a good job at creating informative release notes, upgrade notes and security advisories, so be sure to read those for each release (even if you are skipping some). There are several other channels through which people working on Confluence (plugin) development can communicate and support each other, these include: official Confluence Development forum official Atlassian Development IRC community Confluence Development Skype chat - a place where some of us hang out and discuss issues or share Confluence related news. Despite Atlassian's claims about their legendary support, I found the official support channel rarely useful. Being a DIY guy and having a reasonable knowledge about Confluence internals, I usually found myself in need of a more qualified support than what the support channel was created for. For this reason my occasional support tickets usually ended up being escalated to the development team, instead of handled by the support team. On the other hand the public issue tracker has been an invaluable source of information and a great communication tool. I wish that more of my bug reports had been addressed, but for the most part I have been receiving reasonable amount of attention even though sometimes I had to request escalation to have someone look at and fix issues that were critical for us. The biggest hurdle I've been experiencing with bug fixes and support was that sites of our size are not the main focus for Atlassian and they are not hesitant to be open about it. I often shake my head when I see features of little value (for us that is - because they target small deployments and have little to do with core wiki functionality) being implemented and promoted, but major architectural issues, bugs and highly anticipated features go without attention for years. Just browser the issue tracker and you'll get the idea. Conf[...]



DGC III: Confluence Configuration and Tuning

2010-10-24T15:36:01.153-07:00

This blog post is part of the DevOps Guide to Confluence series. In this chapter of the guide, we’ll have a look at Confluence configuration and tuning. There are four ways how one can modify Confluence's runtime behavior: Config Files in Confluence Home directory Config Files in WEB-INF/classes JVM OptionsAdmin UI Config Files in Confluence Home directory Confluence Home directory contains one or more config files that control runtime behavior of Confluence. The most important file is confluence.cfg.xml that must be present in order for Confluence to start. This file can be modified by hand while confluence is shut down, but also gets modified by Confluence occasionally (mostly during upgrades). Your changes will be preserved, as long as you made them while Confluence was offline. Another relevant file is tangosol-coherence-override.xml which must unfortunately be used to override Confluence’s lame multicast configuration needed for cluster configuration (see below). Lastly there is config/confluence-coherence-cache-config-clustered.xml which contains configuration of the Confluence cache. Generally you don't want to modify this file by hand. I’ll come back to talk about cache configuration later in the Admin UI section of this chapter. In general it is advisable to be very consistent about your environment, so that you can then just have a single version of these files that you can distribute on all servers when needed. This includes the directory layout, network interface names, and so on. A combination of the first two files will allow you to configure the following: Clustering As I mentioned, this configuration is split between two config files. confluence.cfg.xml contains confluence.cluster.* properties, which allow you to set multicast IP, interface and TTL, but not the port. Only tangosol-coherence-override.xml can do that. The cluster IP is by default derived from a "cluster name" specified via the Admin UI or installation wizard. For some reason Atlassian believes that in an enterprise environment one can just let a software pick a random IP and port to run multicast on. I don’t know about any serious datacenter where things work this way. You’ll likely want to explicitly set IP, port, interface name and TTL and the only way to do that is by modifying these files by hand and ignoring the "cluster name" setting in the UI. Make sure that settings are consistent in both files. DB Connection Pool Confluence comes with an embedded connection pool. I believe that you can use your own too (if it comes with your servlet container), but I’d suggest sticking with the embedded one since it is widely used and Atlassian runs their tests with it also. The pool is configured via confluence.cfg.xml and its hibernate.c3p0.* properties. The most important property is pool max_size which will prevent the pool from opening more than a defined number of connections at a time. You want this number to be higher than your typical peak concurrent request count (are you monitoring that?), but not higher than what your db can handle. We have ours set to 300, which is double of our occasional peaks. Don’t forget that in order to take advantage of these connections, you’ll likely need to also increase the worker thread count in your servlet container. DB Connection The connection is configured via hibernate.connection.* properties in confluence.cfg.xml. Depending on your db, you might need to specify several settings for the connection to work well and grok UTF-8. For our MySQL db, we need to set the connection url to something likejdbc:mysql://server:3306/wikisdb?autoReconnect=true&useUnicode=true&characterEncoding=utf8 Note that if you are editing this file by hand, you must escape illegal xml characters. More info about db connection can be found in the Confluence documentation. Config Files in WEB-INF/classes Just a side note: if you are building confluence from so[...]



DGC II: The JVM Tuning

2010-10-24T15:44:33.789-07:00

This blog post is part of the DevOps Guide to Confluence series. In this chapter of the guide, I’ll be focusing on JVM tuning with the aim to make our Confluence perform well and operate reliably. JDK Version First things first: use a recent JDK. Java 5 (1.5) has been EOLed 1.5 years ago, there is absolutely no reason for you to use it with Confluence. As George pointed out in his presentation, there are some significant performance gains to be made just by switching to Java 6 and you can get another performance boost if you upgrade from an older JDK 6 release to a recent one. JDK 6u21 is currently the latest release and that’s what I would pick if I were to set up a production Confluence server today. If you are wondering about which Java VM to use, I suggest that you stick with Sun’s HotSpot (also known as Sun JDK). It’s the only VM supported by Atlassian and I really don’t see any point in using anything else at the moment. Lastly it goes without saying that you should use -server JVM option to enable the server VM. This usually happens automatically on server grade hardware, but it's safer to set it explicitly. VM Observability For me using JDK 6 is not just about performance, but also about observability of the VM. Java 6 contains many enhancements in the monitoring, debugging and probing arena that make JDK 5 and its VM look like an obsolete black box. Just to mention some enhancements, the amount of interesting VM telemetry data exposed via JMX is amazing, just point a VisualVM to a local Java VM to see for yourself (no restart or configuration needed). Be sure to install VisualGC plugin for VisualVM. In order to allow remote connections you’ll need to start the JVM with these flags:-Dcom.sun.management.jmxremote.port=some_port -Dcom.sun.management.jmxremote.password.file=/path/to/jmx_pw_file -Djavax.net.ssl.keyStore=/path/to/your/keystore -Djavax.net.ssl.keyStorePassword=your_pw Unless you make the port available only on some special admin-only network, you should password protect the JMX endpoint as well as use SSL. The JMX interface is very powerful and in the wrong hands could result in security issues or outages caused by inappropriate actions. For more info about all the options available read this document. In addition to JMX, on some platforms there is also good DTrace integration which helped me troubleshoot some Confluence issues in production without disrupting our users. And lastly there is BTrace that allowed me to troubleshoot a nasty hibernate issue once. It's a very handy tool that as opposed to DTrace, works on all OSes. I can’t stress enough how important continuous monitoring of your Confluence JVMs is. Only if you know how your JVMs and app are doing, you can tell if your tuning has any effect. George Barnett has also a set of automated performance tests which are handy to load test your test instance and compare results before and after you make some tweaks. Heap and Garbage Collection Must Haves After upgrading the JDK version, the next best thing you can do is to give Confluence lots of memory. In the infrastructure chapter of the guide, I mentioned that you should prepare your HW for this, so let’s put this memory to use. Before we set the heap size, we should decide between 32-bit JVM and 64-bit JVM. 64-bit VM is theoretically a bit slower, but allows you to create huge heaps. 32-bit JVM has heap size limited by the available 32-bit address space and other factors. 32bit OSes will allow you to create heaps up to only 1.6-2.0 GB. 64bit Solaris will allow you to create 32bit JVMs with up to 4GB heap (more info). For anything bigger than that you have to go 64bit. It’s not a big deal, if your OS is 64bit already. The option to start the VM in 64bit mode is -d64. On almost all platforms the default is -d32. Before I go into any detail, I should explain what are the main objectives of heap and garbage[...]



DGC I: The Infrastructure

2010-10-24T15:55:09.822-07:00

In the introductory post, I mentioned that a Confluence cluster is the way to go big. Let's go through some of the main things to consider when you start preparing your infrastructure. Confluence cluster To build a Confluence site, you need Confluence :-). Well, make it two... as in a two-node cluster license. I recommend this for any bigger site with relatively high uptime expectations, even if you know that your amount of traffic won't require load balancing between two nodes. I often find my self in a need of a restart (e.g. during a patch deployment) and with a cluster, you can restart one node at a time and your users won't even know about it. Network My team operates other big sites, and from all of them we expect some level of redundancy. Typically we split everything between "odd" (composed of hosts with hostnames ending with an odd number) and "even" strings, and this applies to Confluence nodes as well (that's why you need two-node license). Each string is composed of a border firewall, load balancer, switches and the actual servers (web/application/database/whathaveyou) and both strings can either share the load or work as primary&standby depending on your application needs and network configuration. This kind of splitting, allows us to take half of our datacenter offline for maintenance when needed or allows us to absorb potential failure of any hardware or software within one string without any perceivable interruption of service. Sure, you can make things even more redundant by adding a third or forth string, but none of our apps requires that level of redundancy and the cost and complexity of getting there is therefore hard to justify. There are two important things that matter when it comes to setting up the network, and both can make or break you Confluence clustering. The latency between the two nodes should be minimal. Ideally they should be just one hop apart and on a fast network (1GBit). There will be a lot of communication going on between your Confluence nodes, and you want it to happen as quickly as possible, otherwise the cluster synchronization will drag down your overall cluster performance. Don't even think about putting the two nodes into different datacenters, let alone on different continents. Confluence clustering was not built for that type of scenario. Make absolutely sure that your network (mainly switches, OS, firewall) supports multicast. The best way to check that the multicast works reliably is to use the multicast test tool that is bundled with Coherence (a library that is bundled with Confluence). To run it just run the following command on all nodes and check if all packes are being delivered and no duplicates are present: java -cp $CONFLUENCE_PATH/WEB-INF/lib/coherence-x.y.jar:$CONFLUENCE_PATH/WEB-INF/lib/tangosol-x.y.jar \ com.tangosol.net.MulticastTest \ -group $YOUR_MULTICAST_IP:$YOUR_MULTICAST_PORT \ -ttl 1 \ -local $NODE_IP In our environment, it took us months of waiting for the right patch from our network gear vendor and some OS patching to make things totally stable. Fortunately, our ops guys eventually found the magic combination of patches and settings, and then we were good to go. Our site uses both http and https protocols for content delivery and since we already had an SSL accelerator available in our datacenter we utilized it for Confluence, but I don't think that with current hardware, hw acceleration is not very important these days. Another noteworthy suggestion I have for your network is the load balancer configuration. We started off with a session-affinity-based load-balancing, but at one point people started to notice that sometimes they see different content than their colleagues. This was due to delay in propagation of changes throughout the cluster. Usually the delay is unnoticeable, but for some reasons it's not always the case. I haven't investigate[...]



DevOps Guide to Confluence (DGC)

2010-10-24T16:03:25.844-07:00

After working with Atlassian Confluence for 3 years, running one of the bigger public Confluence installations, I realized that there is a major lack of information about how to run Confluence on a larger scale and outside of the intranet firewalls. I'm hoping that I can improve this situation with a blog series that will describe some of the (best?) practices that I implemented while running, tweaking, patching and supporting our Confluence-based site. Just to throw out some numbers to give context of what I mean by "relatively large": # registered users: 180k+# contributing users: 7k+ # wiki spaces: 1.5k+ # wiki pages: 65k+# page revisions: 570k+ # comments: 10k+# visits per month: ~300k# page views per month: ~800k # http requests per day: ~1m+ (includes crawlers and users with disable javascript) So I'm not talking about a huge site like amazon, twitter, etc, but still bigger than most of the public facing confluence instances out there. Some of the practices described in this guide might be an overkill for smaller deployments, so I’ll leave it up to you to pick the right ones for you and your environment. There are many aspects that need careful consideration if you want to go relatively big, and there are even more of them when you run your site on the Internet as opposed to doing it internally within an organization. In my blog series I'm going to focus on these areas that I consider important: The infrastructure Confluence clusterhardware (cpu,memory,disk) os, filesystemweb container logs & monitoring networkdb backups The JVM tuning heap garbage collector fancy switches Confluence configuration and tuning where and how to configure Confluence Confluence upgrades release history and track record following announcements and getting support the upgrade procedure Customizing and patching Confluence customization options reasons for patching Mercurial Queues as patch management tool Wiki Organization and Working with the Community main principles global and default permissions delegating decision-making to space admins Internet-facing deployment and operation Varnish or other caching reverse proxy robots.txt I'm not going to go into details about why to pick Confluence or why not to pick it. I really just want to focus on how to make it run smoothly and reliably while serving a relatively large audience of users (and robots). Given that we want to run a site on the Internet, we are lucky to have well defined maintenance windows, that we can work with. Meaning that any downtime will be perceived by at least a portion of your users as your failure, and the only way how you can avoid looking like an idiot is to keep the downtime to the absolute minimum. You are now probably thinking that a Confluence cluster will solve all your problems with scalability and reliability. Right, that's what the marketing people tell you. Anyone who knows a thing or two about software engineering, knows that there is no such a thing as "unlimited scalability" and ironically a Confluence cluster can hit several bottlenecks quite quickly in certain situations. That said, a Confluence cluster with all its pros and cons is really the way to go big with Confluence, but you should have realistic expectations about its scalability and reliability. The fact that makes things even more difficult is that if you do things right, your wiki is going to take off. More users, more content, more traffic, more spam, more crawlers, more users unhappy about any kind of downtime... Growth is what you need to take into account from day one. I'm not saying that you have to start big, you just shouldn't paint your self into a corner and I'm going to mention some tips on how to avoid just that. I was inspired to write up this guide after [...]



Improving Satan and Solaris SMF

2010-10-24T18:28:38.143-07:00

One of the features of Solaris that we heavily rely on in our production environment at work is Service Management Facility or SMF for short. SMF can start/stop/restart services, track dependencies between services and use that to optimize the boot process and lots more. Often handy in production environment is that SMF keeps track of processes that a particular service started and if a process dies, SMF restarts its services. One gripe I have with SMF is that its process monitoring capabilities are rather simple. A process associated with a contract (service) must die in order for SMF to get the idea that something is wrong and that the service should be restarted. In practice, more often than not a process gets into a weird state that prevents it from working properly, yet it doesn't die. Failures might include excessive cpu or memory usage or even application level failures that can be detected only by interacting with the application (e.g. http health check). SMF in its current implementation is incapable of detecting these failures. And this is where Satan comes into the play. Satan a small ruby script that monitors a process and following the Crash-only Software philosophy, kills it when a problem is detected. It then relies on SMF to detect the process death(s) and restart the given service. I fell in love with the simplicity of Satan (which was inspired by God) and started exploring the feasibility of using it to improve the reliability of SMF on our production servers. Upon a code review of the script, I noticed several things that I wished were implemented differently. Here are some: Satan watches processes rather than services as defined via SMF One Satan instance is designed to watch many different processes for different services, which adds unnecessary complexity and lacks isolation Satan is merciless (what a surprise! :-) ) and uses kill -9 without a warning Satan has no test suite!!! :-( (i.e. I must presume that it doesn't work) Thankfully the source code was out there on GitHub and licensed under BSD license so it was just a matter of a few keystrokes to fork it (open source FTW!). By the time I was done with my changes, there wasn't much of the original source code left, but oh well :-) I'm happy to present to you http://github.com/IgorMinar/satan for review and comments. The main changes I made are the following: One Satan instance watches single SMF service and its one or more processes The single service to monitor design allows for automatic monitoring suspension via SMF dependencies while the monitored service is being started, restarted or disabled Several bugfixes around how rule failures and recoveries are counted before a service is deemed unhealthy At first Satan tries to invoke svcadm restart and only if that doesn't occur within a specified grace period, it uses kill -9 to kill all processes for the given contract (service) Satan now has decent RSpec test suite (more on that in my previous post)Improved HTTP condition with a timeout setting New JVM free heap space condition to monitor those pesky JVM memory leaks Extensible design now allows for new monitoring conditions (rules) to be defined outside of the main Satan source code As always there are more things to improve and extend but, I'm hoping that my Satan fork will be a decent version that will allow us to keep our services running more reliably. If you have suggestions, or comments feel free to leave feedback.[...]



Testing matters, even with shell scripts

2010-10-24T18:32:51.296-07:00

A few months ago, we migrated out production environment at work from Solaris 10 to OpenSolaris. We loved the change because it allowed us to take advantage of the latest inventions in Solaris land. All was good and dandy until one day one of our servers ran out of disk space and died. WTH? We have monitoring scripts that alert us long before we get even close to running out of space, yet no alert was issued this time. While investigating the cause of this incident, we found out that our monitoring scripts that work well on Solaris 10, didn't monitor the disk space correctly on OpenSolaris. When I asked our sysadmins if they didn't have any tests for their scripts that could validate their functionality, they laughed at me. Fast forward a few months. A few days ago I started looking at Satan, to augment the self healing capabilities of Solaris SMF (think initd or launchd on stereoids). At first sight I loved the simplicity of the solution, but one thing that startled me during the code review was that there were no tests for the code, except for some helper scripts that made manual testing a bit less painful. At the same time, I spotted several bugs that would have resulted in an unwanted behavior. Satan relies on invoking solaris commands from ruby and parsing the output and acting upon it. Thanks to its no BS nature, ruby makes for an excellent choice when it comes to writing programs that interact with the OS by executing commands. There are several ways to do this, but the most popular looks like this: ps_output = `ps -o pid,pcpu,rss,args -p #{pid}` All you need to do is to stick the command into backticks and optionally use #{variable} for variable expansion. To get a hold of the output, just assign the return value to a variable. Now if you stick a piece of code like this in the middle of the ruby script you get something next to untestable: module PsParser def ps(pid) out_raw = `ps -o pid,pcpu,rss,args -p #{pid}` out = out_raw.split(/\n/)[1].split(/ /).delete_if {|arg| arg == "" or arg.nil? } { :pid=>out[0].to_i, :cpu=>out[1].to_i, :rss=>out[2].to_i*1024, :command=>out[3..out.size].join(' ') } end end With the code structured (or unstructured) like this, you'll never be able to test if the code can parse the output correctly. However if you extract the command execution into a separate method call: module PsParser def ps(pid) out = ps_for_pid(pid).split(/\n/)[1].split(/ /).delete_if {|arg| arg == "" or arg.nil? } { :pid=>out[0].to_i, :cpu=>out[1].to_i, :rss=>out[2].to_i*1024, :command=>out[3..out.size].join(' ') } end private def ps_for_pid(pid) `ps -o pid,pcpu,rss,args -p #{pid}` end end You can now open the module and redefine the ps_for_pid in your tests like this: require 'ps_parser' PS_OUT = { 1 => " PID %CPU RSS ARGS 12790 2.7 707020 java", 2 => " PID %CPU RSS ARGS 12791 92.7 107020 httpd" } module PsParser def ps_for_pid(pid) PS_OUT[pid] end end And now you can simply call the pid method and check if the fake output stored in PS_OUT is being parsed correctly. The concept is the same as when mocking webservices or other complex classes, but applied to running system command and programs. To conclude, what makes you more confident about a software you want to rely on. An empty test folder: Or all green results from a test/spec suite? [...]



Configuring Common Access Log Format in GlassFish v2 and v3

2010-10-24T18:35:10.139-07:00

For a long time Matthew and I had a dilemma about changing the non-standard access log format used by GlassFish v2 and v3, to the commonly used common or combined format used by Apache. GlassFish does allow one to specify the access log format, but how this works is not obvious. If one tries to create a formatting string, which should result in one of the Apache access log formats, the resulting output does contain all the specified fields and in the right order, but the field delimiters are not preserved from the formatting string and instead all the fields are quoted and separated by spaces. That's not quite what we want, especially if you plan to feed the logs into a log analyzer that expect the usual Apache syntax. While getting Confluence wiki to run on GlassFish v3, I fetched the GF source code and since I already had it, I thought that it should be trivial to find out how the Access log format gets processed in GF. To my big surprise, I found out that there are classes with very suspicious names: CommonAccessLogFormatterImpl, CombinedAccessLogFormatterImpl and DefaultAccessLogFormatterImpl. A minute later I also found this piece of code "hidden" in PEAccessLogValve: // Predefined patterns private static final String COMMON_PATTERN = "common"; private static final String COMBINED_PATTERN = "combined"; ... ... /** * Set the format pattern, first translating any recognized alias. * * @param p The new pattern */ public void setPattern(String p) { if (COMMON_PATTERN.equalsIgnoreCase(p)) { formatter = new CommonAccessLogFormatterImpl(); } else if (COMBINED_PATTERN.equalsIgnoreCase(p)) { formatter = new CombinedAccessLogFormatterImpl(); } else { formatter = new DefaultAccessLogFormatterImpl(p, getContainer()); } } Whoa! So both Apache formats are implemented already and one just needs to know how to "unlock" them. The "common" and "combined" constants looked like the magic keywords to do just that, and sure enough, when one sets either of them as the formatting string, the log will contain the expected output. You can also use asadmin to make this config change: asadmin set server.http-service.access-log.format="combined" After a restart the log now uses the requested format: 0:0:0:0:0:0:0:1%0 - - [21/Dec/2009:07:42:45 -0800] "GET /s/1722/3/_/images/icons/star_grey.gif HTTP/1.1" 304 0 0:0:0:0:0:0:0:1%0 - - [21/Dec/2009:07:42:45 -0800] "GET /images/icons/add_space_32.gif HTTP/1.1" 304 0 0:0:0:0:0:0:0:1%0 - - [21/Dec/2009:07:42:45 -0800] "GET /images/icons/feed_wizard.gif HTTP/1.1" 304 0 0:0:0:0:0:0:0:1%0 - - [21/Dec/2009:07:42:45 -0800] "GET /images/icons/people_directory_32.gif HTTP/1.1" 304 0 0:0:0:0:0:0:0:1%0 - - [21/Dec/2009:07:42:45 -0800] "GET /s/1722/3/_/images/icons/add_12.gif HTTP/1.1" 304 0 Believe it or not, this information is not documented anywhere in the official documentation, and even folks Matthew chatted with on Sun's internal support mailing lists had no clue about it. Ideally the GF documentation and UI should be updated to make changing the access log format as simple as it should be. Oh btw, open source FTW, when documentation is lacking one can at least read the sources![...]



Running Confluence on GlassFish v3

2010-10-24T18:35:44.179-07:00

Even though Atlassian considers GlassFish to be an unsupported servlet container for Confluence, it is quite easy to use Confluence with GlassFish v2.1. In fact that's the container that I've been using for a long time during my Confluence and Confluence plugin development.

I've been monitoring progress of GlassFish v3 development for several months and noticed that at some point Confluence 2.x and 3.0.x stopped working due to conflicts between different versions of Apache Felix used by both GFv3 and Confluence.

Fortunately Confluence 3.1 now contains Felix v2.x (an upgrade from 1.x), which solves the previously mentioned issues. Excited about the change, I tried to deploy Confluence 3.1 to GFv3 (final) and observed that there are a few more issues that one needs to deal with. I filed these two bugs and one RFE against Confluence and provided patches that anyone can use to get Confluence to run with GFv3:

Once the patches are applied to the Confluence source code, build a war file in the usual way, deploy it to GFv3 and you should be good to go. (There is a harmless exception thrown when Confluence starts, more info, just ignore it)

Oh, and be sure to vote for CONF-6603 to get Atlassian to officially support GlassFish.(image)



Using Mercurial Bisect to Find Bugs

2010-10-24T18:36:40.857-07:00

Yesterday I tried to find a regression bug in Grizzly that was preventing grizzly-sendfile from using blocking IO. I knew that the bug was not present in grizzly 1.9.15, but somewhere between that release and the current head someone introduced a changeset that broke things for me. Here is how I found out who that person was.

Grizzly is unfortunately still stuck with subversion, so the only thing (besides complaining) that I can do to make my life easier, is to convert the grizzly svn repo to some sane SCM, such as mercurial. I used hgsvn to convert the svn repo.

Once I had a mercurial repo, I wrote BlockingIoAsyncTest - a JUnit test for the bug. And that was all I needed to run bisect:
$ echo '#!/bin/bash                                                                          
mvn clean test -Dtest=com.sun.grizzly.http.BlockingIoAsyncTest' > test.sh
$ chmod +x test.sh
$ hg bisect --reset #clean repo from any previous bisect run
$ hg bisect --good 375 #specify the last good revision
$ hg bisect --bad tip #specify a known bad revision
Testing changeset 604:82e43b848ae7 (458 changesets remaining, ~8 tests)
517 files updated, 0 files merged, 158 files removed, 0 files unresolved
$ hg bisect --command ./test.sh #run the automated bisect
...
(output from the test)
...
...
...

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 9 seconds
[INFO] Finished at: Sat Nov 14 11:41:07 PST 2009
[INFO] Final Memory: 25M/79M
[INFO] ------------------------------------------------------------------------
Changeset 500:983a3fc2debe: good
The first good revision is:
changeset:   501:b5239bf9427b
branch:      code
tag:         svn.3343
user:        jfarcand
date:        Wed Jun 17 17:20:32 2009 -0700
summary:     [svn r3343] Fix for https://grizzly.dev.java.net/issues/show_bug.cgi?id=672
In under two minutes I found out who and with which revision caused all the trouble! Bisect is very useful for large projects, developed by multiple users, where the amount of code and changes is not trivial. Finding regressions in this way can save a lot of time otherwise spent by debugging.

The only caveat I noticed is that you need to create a shell script that is passed in as an argument to the bisect command. It would be a lot easier if I could just specify the maven command directly without the intermediate shell script.(image)



grizzly-sendfile to Become an Official Grizzly Module

2010-10-24T18:37:55.442-07:00

After a chat with JFA about grizzly-sendfile's future, I'm pleased to announce today that grizzly-sendfile 0.4 will be the first version of grizzly-sendfile released as an official module of grizzly. This is a huge news for grizzly-sendfile and I believe an equally important news for grizzly and its community.

What this "merger" means for grizzly-sendfile:

  • great opportunity to extend the reach
  • opportunity to become the default static file handler in Grizzly
  • aspiration to become the default static file handler in GlassFish v3
  • more testing and QA
  • easier and faster access to grizzly developers and contributors

What this "merger" means for grizzly:

  • contribution of 1 year of research, development and testing time in the area of static http downloads
  • several times better performance and scalability of http static file downloads
  • built-in X-Sendfile functionality
  • better JMX instrumentation for http downloads
  • and more

If you can't wait for 0.4, go and get recently released version 0.3.

This is a great day for both projects! :-)

Project site: http://grizzly-sendfile.kenai.com/(image)



grizzly-sendfile 0.3 is out!

2010-10-24T18:38:49.935-07:00

After a few months of late night hacking, grizzly-sendfile 0.3 is finally ready for prime time!

New features include:

I also started using kenai's JIRA for issue tracking. So feel free to file bugs or RFE's there.

Benchmark & Enjoy!

Project Website The source code The binaries(image)



grizzly-sendfile and Comparison of Blocking and NonBlocking IO

2010-10-24T18:47:01.171-07:00

From the very early beginnings of my work on grizzly-sendfile (intro) I was curious to compare blocking and non-blocking IO side to side. Since I didn't have any practical experience to understand which one would be more suitable when, I designed grizzly-sendfile to be flexible so that I could try different strategies and come to conclusions based on some real testing rather than theorizing or based on the words of others. In this post I'd like to compare blocking and nonblocking IO, benchmark them, and draw some conclusions as to which one is more suitable for specific situations. grizzly-sendfile has a notion of algorithms that control the IO operations responsible for writing data to a SocketChannel (grizzly-sendfile is based on NIO and leverages lots of great work put into grizzly). Different algorithms can do this in different ways, and this is explained in depth on the project wiki. The point is that this allows me to create algorithms that use blocking or nonblocking IO in different ways, and easily swap them and compare their performance (in a very isolated manner). Two algorithms I implemented right away were SimpleBlockingAlgorithm (SBA) and EqualNonBlockingAlgorithm (ENBA), and only recently followed by EqualBlockingAlgorithm (EBA). The first one employs the traditional approach of sending a file via the network (while not EOF write data to a SocketChannel using blocking writes), while ENBA uses non-blocking writes and Selector re-registration (in place of blocking) to achieve the same task. This means that a download is split into smaller parts, each sequentially streamed by an assigned worker thread to the client. EBA works very similarly to ENBA, but uses blocking writes. I ran two variations of my faban benchmark[0] against these three algorithms. At first I made my simulated clients hit the server as often as possible and download files of different sizes as quickly as possible. Afterward I throttled the file download speed to 1MB/s per client (throttling was done by clients). While the first benchmark simulates traffic close to the one on the private network in a datacenter, the second benchmarks better represents client/server traffic on the Internet. grizzly-sendfile delegates the execution of the selected algorithm to a pool of worker threads, so the maximum number of the treads in the pool, along with the selected algorithm, are one of the major factors that affects the performance[1] and scalability[2] of the server. In my tests I kept the pool size relatively small (50 threads), in order to easily simulate situations when there are more concurrent requests than the number of workers, which is common during traffic spikes. Conc. ClientsDownload limitAlgorithmAvg init time[3] (sec)Avg speed[4] (MB/s)Avg total throughput (MB/s)50noneSBA0.0194.36208.7650noneENBA0.0214.15198.7950noneEBA0.0184.23202.29100noneSBA4.6664.32212.79100noneENBA0.0481.84168.15100noneEBA0.1401.96175.71200noneSBA14.2884.31208.59200noneENBA0.1080.87144.69200noneEBA0.2640.97158.83 Conc. ClientsDownload limitAlgorithmAvg init time[3] (sec)Avg speed[4] (MB/s)Avg total throughput (MB/s)501MB/sSBA0.0031.042.9501MB/sENBA0.0020.9841.82501MB/sEBA (81k)[5]0.0030.9841.91501MB/sEBA (40k)[6]0.0020.9842.851001MB/sSBA20.51.040.41001MB/sENBA0.0030.9985.141001MB/sEBA (81k)0.0181.084.191001MB/sEBA (40k)0.0130.9984.342001MB/sSBA64.81.037.122001MB/sENBA0.1120.86141.82001MB/sEBA (81k)0.20.95156.592001MB/sEBA (40k)0.1590.96154.23001MB/sSBA113.91.034.23001MB/sENBA0.1850.58127.533001MB/sEBA (81k)0.310.61133.663001MB/sEBA (40k)0.2390.63132.75 The interpretation of the results is that with SBA the individual download spe[...]



Identifying ThreadLocal Memory Leaks in JavaEE Web Apps

2010-10-24T18:41:17.706-07:00

A few weeks ago wikis.sun.com powered by Confluence "Enterprise" Wiki grew beyond yet another invisible line that triggered intermittent instabilities. Oh boy, how I love these moments. This time the issue was that Confluence just kept on running out of memory. Increasing the heap didn't help, even breaking the 32bit barrier and using a 64bit JVM was not good enough to keep the app running for more than 24 hours. The Xmx size of the heap suggested that something was out of order. It was time to take a heap dump using jmap and check what was consuming so much memory. I tried jhat to analyze the heap dump, but 3.5GB dump was just too much for it. The next tool I used was IBM's Heap Analyzer - a decent tool, which was able to read the dump, but consumed a lot of memory in order to do so (~8GB), and was pretty hard to use once the dump was processed. While looking for more heap analyzing tools, I found SAP Memory Analyzer, now known as Eclipse Memory Analyzer, a.k.a MAT. I thought "What the heck does SAP know about JVM?" and reluctantly gave it a try, only to find out how prejudiced I was. MAT is a really wonderful tool, which was able to process the heap really quickly, visualize the heap in a easy-to-navigate way, use special algorithms to find suspicious memory regions, and all of that while using only ~2GB of memory. An excellent preso that walks through MAT features and how heap and memory leaks work, can be found here. Thanks to MAT I was able to create two bug reports for folks at Atlassian (CONF-14988, CONF-14989). The only feature I missed was some kind of PDF or HTML export, but I did quite well with using Skitch to take screenshots and annotate them. One of the leaks was confirmed right away, while it wasn't clear what was causing the other one. All we knew was that significant amounts of memory were retained via ThreadLocal variables. More debugging was in order. I got this idea to create a servlet filter, that would inspect the thread-local store for the thread currently processing the request and log any thread-local references that exist before the request is dispatched down the chain and also when it comes back. Such a servlet could be packaged as a Confluence Servlet Filter Plugin, so that it is convenient to develop and deploy it. There was only one problem with this idea, the thread-local store is a private field of the Thread class and is in fact implemented as an inner class with a package default access - kinda hard to get your hands on to. Thankfully private stuff is not necessarily private in Java, if you get your hands dirty with reflection code: Thread thread = Thread.currentThread(); Field threadLocalsField = Thread.class.getDeclaredField("threadLocals"); threadLocalsField.setAccessible(true); Class threadLocalMapKlazz = Class.forName("java.lang.ThreadLocal$ThreadLocalMap"); Field tableField = threadLocalMapKlazz.getDeclaredField("table"); tableField.setAccessible(true); Object table = tableField.get(threadLocalsField.get(thread)); int threadLocalCount = Array.getLength(table); StringBuilder sb = new StringBuilder(); StringBuilder classSb = new StringBuilder(); int leakCount = 0; for (int i=0; i < threadLocalCount; i++) { Object entry = Array.get(table, i); if (entry != null) { Field valueField = entry.getClass().getDeclaredField("value"); valueField.setAccessible(true); Object value = valueField.get(entry); if (value != null) { classSb.append(value.getClass().getName()).append(", "); } else { classSb.append("null, "); } leakCount++; } } [...]



Announcing grizzly-sendfile!

2010-10-24T18:42:21.521-07:00

It's my pleasure to finally announce grizzly-sendfile v0.2 - the first stable version of a project that I started after I got one of those "Sudden Burst of Ideas" last summer. For people who follow the grizzly development or mediacast.sun.com, this is not exactly hot news. grizzly-sendfile has been used by mediacast since last September and mentioned on the grizzly mailing list several times since then, but I haven't had time to promote it and explain what it does and how it works, so here I go. If you don't care about my diary notes, skip down to "What is grizzly-sendfile". A bit of background: the whole story goes back to the end of 2007 when a bunch of us where finishing up the rewrite of mediacast.sun.com in JRuby on Rails. At that time we realized that one of the most painful parts of the rewrite would be implementing the file streaming functionality. Back then Rails was single-threaded (not any more, yay!), so sending the data from rails was not an option. Fortunately, my then-colleague Peter, came up with an idea to use a servlet filter to intercept empty download responses from rails and stream the files from this filter. That did the trick for us, but it was a pretty ugly solution that was unreliable from time to time and was PITA to extend and maintain. At around this time, I learned about X-Sendfile - a not well known http header - that some webservers (e.g. apache and ligttpd) support. This header could be used to offload file transfers from an application to the web server. Rails supports it natively via the :x_sendfile option of send_file method. I started looking for the X-Sendfile support in GlassFish, which we have been using at mediacast, but it was missing. After some emails with glassfish and grizzly folks, mainly Jean-Francois, I learned that the core component of glassfish called grizzly could be extended via custom filters, which could implement this functionality. The idea stuck in my head for a few weeks. I looked up some info on grizzly and NIO and then during one overnight drive to San Diego, I designed grizzly-sendfile in my head. It took many nights and a few weekends to get it into reasonable shape and test it under load with some custom faban benchmarks that I had to write, but in late August I had version 0.1 and was able to "sell" it to Rama as a replacement of the servlet filter madness that we were using at mediacast. Except for a few initial bugs that showed up under some unusual circumstances, the 0.1 version was very stable. A few minor 0.1.x releases were followed by 0.2 version, which was installed on mediacast servers some time in November. Since then I've worked on docs and setting up the project at kenai.com. What is grizzly-sendfile?From the wiki: grizzly-sendfile is an extension for grizzly - a NIO framework that among other things powers GlassFish application server. The goal of this extension is to facilitate an efficient file transfer functionality, which would allow applications to delegate file transfers to the application server, while retaining control over which file to send, access control or any other application specific logic. How does it work?By mixing some NIO "magic" and leveraging code of the hard working grizzly team, I was able to come up with an ARP (asynchronous request processing) filter for grizzly. This filter can be easily plugged in to grizzly (and glassfish v2) and will intercept all the responses that contain X-Sendfile header. The value of this header is the path of the file that the application that processed the request wants to send to the client. All [...]



Benchmarking JRuby on Rails

2010-10-24T18:43:13.342-07:00

Last night, while working on a project I found a really neat use of Rails Components, but I also noticed that this part of Rails is deprecated, among other reasons because it's slow. Well, how slow? During my quest to find out, I collected some interesting data, and even more importantly put JRuby and MRI Ruby face to face. Disclaimer: the benchmarks were not done on a well isolated and specially configured test harness, but I did my best to gather data with informational value. All the components were used with OOB settings. Setupruby 1.8.6 (2008-03-03 patchlevel 114) [universal-darwin9.0] + Mongrel Web Server 1.1.4jruby 1.1.6 (ruby 1.8.6 patchlevel 114) (2008-12-17 rev 8388) [x86_64-java] + GlassFish gem version: 0.9.2common backend: mysql5 5.0.75 Source distribution (InnoDB table engine, Rails pool set to 30) BenchmarksI used an excellent high quality benchmarking framework Faban for my tests. I was lazy, so I only used fhb (very similar to ab, but without its flaws) to invoke simple benchmarks:simple request benchmark: bin/fhb -r 60/120/5 -c 10 http://localhost:3000/buckets/1component request benchmark: bin/fhb -r 60/120/5 -c 10 http://localhost:3000/bucket1/object1Both tests were run with JRuby as well as with RMI Ruby and in addition to that I ran the tests with Rails in single-threaded as well as multi-threaded modes. I didn't use mongler clusters or glassfish pooled instances - there was always only one Ruby instance serving all the requests. Results ruby 1.8.6 + mongrel --------------------------------- simple action + single-threaded: ops/sec: 210.900 % errors: 0.0 avg. time: 0.047 max time: 0.382 90th %: 0.095 simple action + multi-threaded: ops/sec: 226.483 % errors: 0.0 avg. time: 0.044 max time: 0.180 90th %: 0.095 component action + single-threaded: ops/sec: 132.950 % errors: 0.0 avg. time: 0.075 max time: 0.214 90th %: 0.130 component action + multi-threaded: ops/sec: 131.775 % errors: 0.0 avg. time: 0.076 max time: 0.279 90th %: 0.125 jruby 1.2.6 + glassfish gem 0.9.2 ---------------------------------- simple action + single-threaded: ops/sec: 141.417 % errors: 0.0 avg. time: 0.070 max time: 0.259 90th %: 0.115 simple action + multi-threaded: ops/sec: 247.333 % errors: 0.0 avg. time: 0.040 max time: 0.318 90th %: 0.065 component action + single-threaded: ops/sec: 107.858 % errors: 0.0 avg. time: 0.092 max time: 0.595 90th %: 0.145 component action + multi-threaded: ops/sec: 179.042 % errors: 0.0 avg. time: 0.055 max time: 0.357 90th %: 0.085 Platform/ActionSimple+/-Component+/-Ruby ST 210ops0%132ops0%Ruby MT226ops7.62%131ops-0.76%JRuby ST141ops-32.86%107ops-18.94%JRuby MT247ops17.62%179ops35.61%(ST - single-threaded; MT - multi-threaded) ConclusionFrom my tests it appears that MRI is faster in single threaded mode, but JRuby makes up for the loss big time in the multi-threaded tests. It's also interesting to see that the multi-threaded mode gives MRI(green threads) a performance boost, but it's nowhere close to the boost that JRuby(native threads) can squeeze out from using multiple threads. During the tests I noticed that rails was reporting more times spent in the db when using JRuby (2-80ms) compared to MRI (1-3ms). I don't know how reliable this data is but I wonder if this is the bottleneck that is holding JRuby back in the single threaded mode.[...]



Using ZFS with Mac OS X 10.5

2010-10-24T18:44:37.098-07:00

A few days ago I got a new MacBook Pro. While waiting for it to be delivered, I started thinking about how I want to layout the installation of the OS. For a long long time I wanted to try to use ZFS file system on Mac and this looked like a wonderful opportunity. Getting rid of HFS+, which was causing me lots of problems (especially its case insensitive re-incarnation), sounds like a dream come true. If you've never heard of ZFS before, check out this good 5min screencast of some of the important features. A brief google search revealed that there are several people using and developing ZFS for Mac. There is a Mac ZFS porting project at http://zfs.macosforge.org and I found a lot of good info at AlBlue's blog. Some noteworthy info:The current ZFS port (build 119) is based on ZFS code that shipped with Solaris build 72It's currently not possible to boot Mac OS X from a ZFS filesystemFinder integration is not perfect yet - Finder lists a ZFS pool as an unmountable drive under devicesThere are several reports of kernel panics, most of which appeared in connection to the use of cheap external USB disks (I haven't experienced any)There are a bunch of minor issues, which I'm sure will eventually go away.None of the above was a show stopper for me, so I went ahead with the installation. My plan was simple - repartition the internal hard drive to a small bootable partition and a large partition used by ZFS, which will hold my home directory and other filesystems. Install ZFSEven though MacOS X 10.5 comes with ZFS support, it's only a read-only support. In order to be able to really use ZFS, full ZFS implementation must be installed. The installation is very simple and can be done by following these instructions: http://zfs.macosforge.org/trac/wiki/downloads. Alternatively, AlBlue created a fancy installer for the lazy ones out there. Repartition DiskOnce ZFS is installed and the OS was rebooted, I could repartition the internal disk. If you are using an external hard drive, you'll most likely need to use zpool command instead. First let's check what the disk looks like: $ diskutil list /dev/disk0 #: TYPE NAME SIZE IDENTIFIER 0: GUID_partition_scheme *298.1 Gi disk0 1: EFI 200.0 Mi disk0s1 2: Apple_HFS boot 297.8 Gi disk0s2Good, the internal disk was identified as /dev/disk0 and it currently contains an EFI (boot) slice and ~300G data slice/partition. Let's repartition the disk so that it contains two data partitions.$ sudo diskutil resizeVolume disk0s2 40G ZFS tank 257G Password: Started resizing on disk disk0s2 boot Verifying Resizing Volume Adjusting Partitions Formatting new partitions Formatting disk0s3 as ZFS File System with name tank [ + 0%..10%..20%..30%..40%..50%..60%..70%..80%..90%..100% ] Finished resizing on disk disk0 /dev/disk0 #: TYPE NAME SIZE IDENTIFIER 0: GUID_partition_scheme *298.1 Gi disk0 1: EFI 200.0 Mi disk0s1 2: Apple_HFS boot 39.9 Gi disk0s2 3: ZFS tank 252.0 Gi disk0s3 Great, the disk was repartitioned and the existing data partition, which I call boot, was resized into a smaller 40GB partition and the extra space was used to create a ZFS pool called tank. Btw all the data on the boot pa[...]



Freezing activerecord-jdbc Gems into a Rails Project

2010-10-24T18:46:06.146-07:00

Over the Christmas break a Slovak friend of mine (Hi Martin!) asked me to build a simple book library management app for a school in the Philippines where he's been volunteering for the past year. I thought to my self that if someone can volunteer one year of his life in such an amazing way, I could spend a few hours to help him out too. Since from his description it was obvious that he was looking for a low maintenance solution, I though that a rails application with an embedded database would be a good choice. I worked with derby (JavaDB) in the past and I knew that derby drivers were already available as an active-record adapter gem, so I thought that it would be pretty simple to set up dev environment using Rails, JRuby, and embedded derby db. Surprisingly there were a few issues along the way. I started with defining the database config in config/database.yml: development: adapter: jdbcderby database: db/library_development pool: 5 timeout: 5000 ... ... The database files for the dev db will be stored under RAILS_ROOT/db/library_development Secondly I specified the gem dependency in config/environment.rb (you gotta love this Rails 2.1+ feature): Rails::Initializer.run do |config| ... config.gem "activerecord-jdbcderby-adapter", :version => '0.9', :lib => 'active_record/connection_adapters/jdbcderby_adapter' ... endNote that you must specify the :lib parameter, otherwise Rails won't be able to initialize the gem and you'll end up with:no such file to load -- activerecord-jdbcderby-adapterSo far so good. Now let's install the gems we depend on:$ jruby -S rake gems:install (in /Users/me3x/Development/library) rake aborted! Please install the jdbcderby adapter: `gem install activerecord-jdbcderby-adapter` (no such file to load -- active_record/connection_adapters/jdbcderby_adapter) (See full trace by running task with --trace) Huh? I asked rake to install gems and I get an error that I need to install gems first? It turns out that this error comes from ActiveRecord, which tries to initialize db according to database.yml, and only then environment.rb gets to be read. Ok, so let's install the db dependencies manually: $ sudo jruby -S gem install activerecord-jdbcderby-adapter Password: JRuby limited openssl loaded. gem install jruby-openssl for full support. http://wiki.jruby.org/wiki/JRuby_Builtin_OpenSSL Successfully installed activerecord-jdbc-adapter-0.9 Successfully installed jdbc-derby-10.3.2.1 Successfully installed activerecord-jdbcderby-adapter-0.9 3 gems installed Installing ri documentation for activerecord-jdbc-adapter-0.9... Installing ri documentation for jdbc-derby-10.3.2.1... Installing ri documentation for activerecord-jdbcderby-adapter-0.9... Installing RDoc documentation for activerecord-jdbc-adapter-0.9... Installing RDoc documentation for jdbc-derby-10.3.2.1... Installing RDoc documentation for activerecord-jdbcderby-adapter-0.9... Cool, let's check if all the dependencies are available:$ jruby -S rake gems (in /Users/me3x/Development/library) - [I] activerecord-jdbcderby-adapter = 0.9 - [I] activerecord-jdbc-adapter = 0.9 - [I] jdbc-derby = 10.3.2.1 I = Installed F = Frozen R = Framework (loaded before rails starts) Yay, all dependencies are installed. In the past when dependencies couldn't be declared in environment.rb, I found developing with frozen rails and gems much more manageable, especially when the app is being developed by more than one person. This also made for less deployment surprises. With the config.gem defi[...]



How to Install a Glassfish Patch

2010-10-24T19:18:23.439-07:00

Recently I've been working quite extensively with glassfish and had a need to apply a patch for some of the core code.

To my big surprise I was not able to find the official way to apply patches to a glassfish installation. I found several patches posted on the issue tracker and paying customers have access to tested and supported patch bundles, which are released outside of the regular release cycle. But even some extensive googling didn't easily reveal how to apply them.

Then luckily I found this discussion on the glassfish mailing list, which describes the process.

To make life easier for others and google to index this information, here is a brief recap:
  1. Create a directory /lib/patches
  2. Copy the jar with your patch into this directory
  3. Edit your /domain//config/domain.xml and add attribute classpath-prefix="${com.sun.aas.installRoot}/lib/patches/" to the node
  4. Restart your domain
I tested this with Glassfish v2, I'll have to check if it works with the upcoming v3 as well.(image)



My Confluence 3.0 Wishlist

2010-10-24T19:05:52.657-07:00

Confluence 2.9 was released last month and I've seen references to 2.10 in the Confluence issue tracker, so I expect to see it out in 1-2 months. That makes me think about what's next. As a part of my adventures of working on Sun's external wiki wikis.sun.com, I've been working on Confluence plugins and even the Confluence core code for a year and a half now, adding new features, enhancing the existing features and very often fixing bugs. Sometimes it was trivial to enhance the code or fix a bug, other times it was not, but what I want to write about today are things that were not possible at all without irreversibly forking the code. Confluence 3.0 should be a version that really deserves to have the first digit incremented. Not because marketing said it's time for that, but because the changes in the application are so significant. I'm sure that Atlassian has lots of ideas about what Confluence 3.0 should look like, but Atlassian guys, in case you start to run out of ideas, here is my wish list: Fix the Database Schema Confluence has been in development for years and the database schema definitely shows that. Since the database is the heart of the application, I think it deserves a lot of attention and major performance boost could be gained by doing a clean up. Specific improvements: Establish and in the future enforce naming conventionsReplace all the natural foreign keys with surrogate keys, e.g. user name, spacekey, group name should be replaced with ids in all the referencing tables (this would finally allow CONF-4063 to be implemented)Add caches for the lower function (patch) and maybe counter cachesRework the ClusteringClustering is usually supposed to fulfill two functions: scalability and robustness. In the case of Confluence mainly the second attribute is missing. In fact, I'd go as far as saying that a Confluence cluster is less robust than a single instance of Confluence. Why? Because the way it is implemented makes the entire cluster vulnerable when one node has problems. I personally experienced several cluster lock-ups or crashes, usually initiated by a separate Confluence bug, in which the effect was multiplied by the clustering code. One of such of these bugs: CONF-12319 Mike's presentation covers quite a few design goals behind the implementation in Confluence. Clustering can really get ugly and complicated and Mike covered it pretty well. Unfortunately the distributed share part of the clustering makes Confluence prone to problems. One of the clustering goals that Mike emphasizes in his presentation is that clustering should be "admin-friendly" (low admin overhead and easy setup). While I agree with the low overhead part, the easiness of setup should not compromise the goals which clustering is trying to fulfill in the first place. Clustering is for people who are serious about running Confluence, and as such should be expected to be qualified for the job. Specific improvements:Either reevaluate the distributed share clustering so that it is super robust, or consider implementing clustering via a centralized shareAvoid shutting down the entire cluster when "cluster panic" is detected. A better solution, which avoids unnecessary downtime, would be to shut down all the nodes, except for the nodes properly clustered with the oldest node.Clean Up the HTML and CSS CodeThe html code that comes out of Confluence is horrendous. While the rendered output looks pretty pleasant, looki[...]



Unintrusive but secure passwordless ssh authentication

2010-10-24T19:06:57.558-07:00

On a daily basis I need to log in to many remote servers inside or outside of Sun via SSH, often dozens of times per day. This can get pretty tiresome if you need to type in your password with every log in. Some suggest setting up so-called "passwordless" authentication by generating ssh keys and specifying empty passphrase for the private key. This will result in passwordless authentication, but will also decreased security. Should anyone get hold of your private key, (s)he'll get access to all of your remote systems. ssh-agent can help a lot in keeping the security level high and minimizing the number of times you need to type in the password. However, if you use a terminal with tabs or use both local and remote terminals on your workstation, you'll end up running many ssh-agent processes and having to authenticate every time you start such a process, which diminishes most of the conveniences of using ssh-agent. Frustrated with this situation and with a bit of help from Martin, I created a shell script, which I added to my .bash_profile startup script. All I have to do now is to authenticate when my first terminal session starts and I'm good until the next time I restart my OS. sweeeet... Here is how you could set it up on a workstation and a remote-server: First, if you haven't generated your private/public ssh key pair, do that now:workstation $ ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/Users/user/.ssh/id_rsa): ********************** Enter same passphrase again: ********************** Your identification has been saved in id_rsa. Your public key has been saved in id_rsa.pub. The key fingerprint is: 01:a6:95:23:1c:74:53:c7:f4:87:07:a2:50:ef:99:16 user@Computer.local Now make sure that the file system permissions are set up correctly:workstation $ cd ~/.ssh/ workstation $ ls -l total 56 -rw------- 1 user staff 1743 Aug 31 00:13 id_rsa -rw-r--r-- 1 user staff 398 Aug 31 00:13 id_rsa.pub The id_rsa file must be readable only by owner, if this is not true, the key will be ignored. On the remote server you need to authorize your newly generated key pair by appending it's public key to the ~/.ssh/authorized_keys file under your remote home directory:workstation $ cat ~/.ssh/id_rsa.pub |ssh user@remote-server 'sh -c "cat - >>~/.ssh/authorized_keys"' Now you can try to log in:workstation $ ssh user@remote-server Enter passphrase for /Users/user/.ssh/id_rsa: ********************** Identity added: /Users/user/.ssh/id_rsa (/Users/user/.ssh/id_rsa) Last login: Thu Sep 11 20:19:19 2008 remote-server $ If you open a new tab in your terminal and try to log in again, you'll be asked to enter the passphrase yet again. This is where my script becomes useful. First download the script from Mediacast: ssh-agent-init.sh and store it somewhere in your home directoryworkstation $ mkdir ~/bin workstation $ cd ~/bin workstation $ wget http://mediacast.sun.com/users/IgorMinar/media/ssh-agent-init.sh workstation $ chmod o+x ~/bin/ssh-agent-init.sh The next (last) step is optional if you want to start the script manually you can skip it. I wanted to have this script automatically invoked when I start my terminal for the first time in the interactive mode. All I needed to modify were my .bash_profile (used for interactive sessions) and .bashrc (used for non-interactive sessions) startup scripts for my bash shell (modifications are in italics):[...]