Subscribe: PHP Everywhere - By John Lim
http://phplens.com/phpeverywhere/?q=node/feed/1
Added By: Feedage Forager Feedage Grade B rated
Language: English
Tags:
active record  adodb  apache  child  cpu  days vcpu  new  open source  oracle  person  server  source  squid  xenstat 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: PHP Everywhere - By John Lim

PHP Everywhere - By John Lim





 



ADOdb 5.12 Released

Sun, 24 Jul 2011 04:11:17 -0400

After a long pause, a new version. Having young children and a busy work schedule has been keeping me too occupied.

Mostly bug-fixes. Some minor enhancements for Oracle performance monitoring.

Click here to Download.

Regards, John Lim


Postgres: Added information_schema support for postgresql.

Postgres: Use pg_unescape_bytea() in _decode.

oci8: Fix bulk binding with oci8. http://phplens.com/lens/lensforum/msgs.php?id=18786

oci8 perf: added wait evt monitoring. Also db cache advice now handles multiple buffer pools properly.

sessions2: Fixed setFetchMode problem.

sqlite: Some DSN connection settings were not parsed correctly.

mysqli: now GetOne obeys $ADODB_GETONE_EOF;

memcache: compress option did not work. Fixed. See http://phplens.com/lens/lensforum/msgs.php?id=18899




Moving to PHP 5.3

Fri, 07 Jan 2011 07:25:12 -0500

Now that PHP 5.2 is at the end of life, we are starting to migrate to PHP 5.3.

Here are some of my experiences with our code:

  • The function session_register() is now deprecated. We have created a wrapper function called _session_register with the same functionality.
    
    //-------------------------------------------
    // php 5.3 compat version of session_register
    function _session_register($v)
    {
    global $$v;
    	
    	if (!isset($_SESSION[$v])) {
    		if (!isset($$v)) $$v = null;
    		$_SESSION[$v] =& $$v;
    	} else  $$v =& $_SESSION[$v];
    }
    
  • Legacy PHP4 code with
         $obj = &new ClassName();
    
    has to be converted to
         $obj = new ClassName();
    
  • Lots of other functions have been deprecated, including split(), ereg(), etc. See PHP 5.3 Manual, Deprecated Features.

    Upgrading Zend Server from PHP 5.2 to PHP 5.3

    Backup your php.ini file. Then run the following commands:
    yum remove php-*
    yum remove mod-php-*
    yum install zend-server-php-5.3
    

    After this, you will need to restore back the settings of your php.ini file.

    Hiding Deprecated Warnings

    Set your error_reporting to
    error_reporting(E_ALL & ~E_DEPRECATED);
    



The fine art of application virtualization

Tue, 31 Aug 2010 11:23:40 -0400

The new 8-core Intel Xeon 7550 processors are extremely powerful and a good platform for virtualised applications. My company is setting up PHP application running on a Xen based virtualisation on two HP Proliant DL580 for a total of 64 cores in a high availability environment. . Why Virtualise? First let's investigate why virtualisation is attractive. The advantages are: Simplified maintenance, as all software is running on virtual machines (VMs). Easy to stop and start VMs from the VM server console. High availability is easier to achieve, as you can just restart the VM on a secondary server if the primary server fails, assuming that the VM is stored in shared storage accessible by all servers, using shared storage methods such as NFS, iSCSI or a SAN (Storage Area Network). Able to optimize server hardware utilisation globally as CPUs, memory, hard disk, network resources are all shared. The disadvantages are strangely enough related to the advantages: The maintenance is simplified, but the initial setup is more complex as not merely do you need to setup the hardware and the operating systems, but you have to plan out the virtual environment, such as the virtual network, and make sure that everything is properly sized as you are buying a few big machines, instead of lots of smaller servers. High availability means you need to invest in high quality shared storage to store the shared VMs, typically a SAN, the SAN switches and the Fibre Channel HBA cards to connect to the SAN switches. Need to over-configure the hardware as there is some overhead in virtualisation, particularly in terms of network I/O and storage I/O. CPU overhead for virtualisation is normally not a concern with modern virtualisation technologies such as VMWare, Xen, or Hyper-V. In this case, our customer was comfortable with virtualisation, as they are big users of IBM AIX Logical Partitions and VMWare. Secondly, they already operate several large Storage Area Networks, and merely had to upgrade the SAN to support us. Lastly they had the budget to pay for all of this :) VM Technology We are using Zend Server CE (PHP 5.2), Apache 2.2 and Oracle 11gR2 running on Red Hat Linux 5.5 and Oracle VM 2.2 (which uses the Xen Hypervisor). In the VM world, there are 4 main technologies that are popular: VMWare - the market leader, this company has been doing virtualisation for over 10 years and they have good products. Xen Based products - Xen is an open source technology that arose from some research on Hypervisors (the bare metal OS that controls all the virtual machines) done in Cambridge University. Today, several companies offer products using Xen, including Red Hat, Citrix, and Oracle. Has good support for Linux and Windows. KVM - many kernel hackers were dissatisfied with Xen (because the Xen Hypervisor is not Linux-based internally), resulting in the development of KVM, which is fully Linux based. KVM is not as mature as Xen, but improving really fast. Support Linux and Windows. Hyper-V - Microsoft's virtualisation technology. Microsoft is committed to supporting Linux also. We chose Oracle VM (Xen) because it has good performance with Linux and Oracle databases (naturally). Oracle VM is free with optional paid support available. There are also licensing advantages to using Oracle VM with Oracle databases. Oracle database licensing dictates that if you are using virtualisation technologies such as VMWare and are purchasing CPU licenses, you still have to pay database licenses for all the CPU cores of the server, even if the database VM is using only 1 core. However if you are using Oracle VM, then you only have to pay for the CPU cores you use using; this is known as "hard-partition" licensing in Oracle terminology. I also have experience with VMWare, and can recommend it as an mature alternative. Apparently the Citrix Xen Server is a good product, but I don't have experience with it. I will cover more technical details in part 2, which i will be writing in September after I complete this i[...]



Updated Optimizing PHP Article

Fri, 06 Nov 2009 02:52:50 -0500

I have just updated my popular Optimizing PHP article with additional information on caching. I discuss memcache and squid. I also updated the PHP Accelerators and changed the tone of some parts of the article. I quote:

Perhaps the most significant change to PHP performance I have experienced since I first wrote this article is my use of Squid, a web accelerator that is able to take over the management of all static http files from Apache. You may be surprised to find that the overhead of using Apache to serve both dynamic PHP and static images, javascript, css, html is extremely high. From my experience, 40-50% of our Apache CPU utilisation is in handling these static files (the remaining CPU usage is taken up by PHP running in Apache).

It is better to offload downloading of these static files by using Squid in httpd-accelerator mode. In this scenario, you use Squid as the front web server on port 80, and set all .php requests to be dispatched to Apache on port 81 which i have running on the same server. The static files are served by Squid.

(image)




Malaysian FOSS Conference 2009 Opening Keynote

Sun, 25 Oct 2009 09:12:02 -0400

Last Saturday, I gave the opening keynote of the Malaysian Free & Open Source Software 2009 conference. The speech was prepared the day before, but as usual, I will improvise some stuff, so some parts have been amended based on memory: Ladies and gentlemen, honored guests, good morning! Today the landscape of information technology has been transformed by the vision of free software and open source. The search engines of Google roar with the sounds of open source Linux. Our Malaysian government encourages the use of open source whenever possible. Sounds of PHP, MySQL, Apache, GPL have become familiar names in the tapestry of IT. But that was not what it was like when I first started out as a young student in the mid-80s at the University of Melbourne, Australia. Things were different then. Concepts such as open source, GPL were still unknown. I still remember a fellow student was expelled from university for making copies of the source code of proprietary Unix software for his personal use. I admit I was disturbed by this, because I too had an insatiable curiosity about how software worked, and it was impossible to learn more without access to the source code. I wanted to find and understand the wiring inside the software. I remember fondly, and today with a bit of guilt, that I used to crack copy protected games, not for the pursuit of profit, but as an intellectual challenge – well ok, I have to admit I did it to play the games. The trick doing this (cracking) metaphorically is finding the wiring behind the copy protection and reversing the wires so that instead of refusing to run, it does the opposite and continues working. Of course to quickly find the right wires to switch and crack a large program is not easy. Which brings me to the first piece of advice if you want to be successful in software design… You need to have good taste, which is kind of weird because nerdy programmers are notoriously bad dressers, fond of bad hair days and certainly not fussy about the finer points of fine dining. What I’m taking about is of course is a taste for good logic. The feel of a beautiful idea, the taste of a mighty logic, or the fun in a great hack. Games designers and coders are a talented bunch of people, and if you understand their logical rhythms and designs, it becomes obvious where the wires you need to reverse to crack the software reside. The other important element of success is being happy. You have to have passion and enjoy what you are doing. To me cracking games was like cracking walnuts, a fun thing to do, but after a while it got boring. You need to do something with others and share with others to become really passionate about something. Social responsibility is another important element of life. You need to channel your life productively - only then will you find true happiness. Cracking games became boring and I found other better diversions. It was around this time my fellow student was expelled that I learned about the international USENET community. To young people, you have to imagine a time before the World Wide Web, when people used the Internet primarily for email. USENET was a fantastic group of mailing lists with forums and archives. USENET was also used to disseminate programming ideas and knowledge in the form of source code. So even before the concepts of Open Source and licenses such as GPL became well known, there was this thriving community of programmers who shared their source code and learnt from others. Which brings me to the next lesson: the typical image of the best programmers being lonely introverted hackers is misleading. People are only successful in a community. Open source software needs to be grown organically and for that you need social skills. The classic example here is of course Linus Torvalds, author of Linux, who has skillfully led the Linux community from its inception. It was through the USENET that I released software that I had written, [...]



Monitoring and logging CPU Utilization of Virtual Machines in Xen

Wed, 07 Oct 2009 06:45:54 -0400

Oct 6 update: Added logging of disk [d] and network [n] info. Oct 4 update: added availability option. Now uses xentop internally. Oct 2 update: added graphing to xenstat.pl. Now xenstat.pl detects Guest VM start/shutdown and resets itself. Number of vcpus also shown. Misc bug fixes. You can download xenstat.pl here. Syntax perl xenstat.pl [$mode] [$intervalsecs=5] [$nsamples=0] [$urlToPostStats] Quick Guide perl xenstat.pl -- generate cpu stats every 5 secs perl xenstat.pl 10 -- generate cpu stats every 10 secs perl xenstat.pl 5 2 -- generate cpu stats every 5 secs, 2 samples perl xenstat.pl d 3 -- generate disk stats every 3 secs perl xenstat.pl n 3 -- generate network stats every 3 secs perl xenstat.pl a 5 -- generate cpu avail (e.g. cpu idle) stats every 5 secs perl xenstat.pl 3 1 http://server/log.php -- gather 3 secs cpu stats and send to URL perl xenstat.pl d 4 1 http://server/log.php -- gather 4 secs disk stats and send to URL perl xenstat.pl n 5 1 http://server/log.php -- gather 5 secs network stats and send to URL Requires xentop from Xen 3.2 or newer xentop backported to Xen 3.1. Usage To use run "perl xenstat.pl" in domain 0. The following output will be generated, with a new statistic generated every 5 seconds: [root@server ~]# perl xenstat.pl cpus=2 40_falcon 2.67% 2.51 cpu hrs in 1.96 days ( 2 vcpu, 2048 M) 52_python 0.24% 747.57 cpu secs in 1.79 days ( 2 vcpu, 1500 M) 54_garuda_0 0.44% 2252.32 cpu secs in 2.96 days ( 2 vcpu, 750 M) Dom-0 2.24% 9.24 cpu hrs in 8.59 days ( 2 vcpu, 564 M) 40_falc 52_pyth 54_garu Dom-0 Idle 2009-10-02 19:31:20 0.1 0.1 82.5 17.3 0.0 ***** 2009-10-02 19:31:25 0.1 0.1 64.0 9.3 26.5 **** 2009-10-02 19:31:30 0.1 0.0 50.0 49.9 0.0 ***** In the above output, the first few lines summarise the CPUs and running domains. Then we have the statistics generated every 5 seconds. At the end of each line is a simple graph. 5 stars means 90% or over CPU utilisation, 4 stars is 70% or over, etc. You can also define the interval to poll (in seconds), and the number of samples just like vmstat: [root@server ~]# perl xenstat.pl 3 2 cpus=2 40_falcon 2.67% 2.51 cpu hrs in 1.96 days ( 2 vcpu, 2048 M) 52_python 0.24% 748.07 cpu secs in 1.79 days ( 2 vcpu, 1500 M) 54_garuda_0 0.44% 2258.38 cpu secs in 2.96 days ( 2 vcpu, 750 M) Dom-0 2.24% 9.24 cpu hrs in 8.59 days ( 2 vcpu, 564 M) 40_falc 52_pyth 54_garu Dom-0 Idle 2009-10-01 12:14:59 0.0 0.0 1.7 5.7 92.5 2009-10-01 12:15:02 0.0 0.0 0.3 10.4 89.3 * [root@server ~]# Logging Using REST web service To log the CPU utilisation using the Perl script, I didn't want to install a database client in Dom-0. So I added another parameter, a URL to a web server to call with the CPU info as GET parameters. I assume wget is installed in your Dom-0. [root@server ~]# perl xenstat.pl 10 1 http://192.168.0.1/ cpus=2 54_garuda_0 0.49% 165.81 cpu sec over 3.62 days (2 vcpu, 750 M) 59_gyrfalcon 0.62% 69.03 cpu sec over 0.80 days (2 vcpu, 2000 M) Dom-0 1.57% 2.15 cpu hrs over 3.62 days (2 vcpu, 564 M) --10:46:42-- http://192.168.0.1/?54_garuda_0=0.1&59_gyrfalcon=2.1&Dom%2D0=2.2& Connecting to 192.168.0.1:80... connected. HTTP request sent, awaiting response... 200 OK Length: 498 [text/html] Saving to: `STDOUT' 100%[============================================>] 498 --.-K/s in 0s 10:46:42 (67.8 MB/s) - `-' saved [498/498] 2009-09-29 10:46:42 0.1 2.1 2.2 95.6 This will accumulate statistics for 10 seconds then send it to the above url in this format: http://192.168.0.1/?54_garuda_0=0.1&59_gyrfalcon=2.1&Dom%2D0=2.2&. This allows you to log the data using a REST-ful web servic[...]



PHP with Oracle RAC

Sat, 04 Jul 2009 21:14:52 -0400

My article on High Performance and Availability with Oracle RAC and PHP is out on the Oracle web site.

It discusses my experiences creating an Oracle Real Application Cluster and running it with PHP 5.2 and oci8 1.3 for a customer. Since that article was written I currently recommend that oci8.events be turned off in php.ini since I've had some reliability issues with this setting.

(image)




Boiling down your Computer Science degree into 4000 words

Thu, 18 Jun 2009 23:30:41 -0400

Four thousand words that distill what you really need to understand to build scalable multiuser servers: Server Design by Jeff Darcy.

(image)




HTML5 Gaining Ground?

Tue, 02 Jun 2009 08:33:10 -0400

Tim O'Reilly talks about how Google bets big on HTML5. Sadly, without Internet Explorer support, a lot of this talk is moot to me. Not to say that I wouldn't salivate over using the new HTML5 canvas element. We draw a lot of flow charts graphics and it needs to run on as many browsers as possible. We don't use flash, we use Walter Zorn's excellent jsgraphics library. It's not very fast, but hey, that's why we have multi-gigahertz PCs on our desktops and on our laps.

(image)




The State of Solid State Devices for Databases

Tue, 02 Jun 2009 08:34:03 -0400

Recently I read in AnandTech a good article on Solid State Devices (SSD). It certainly blew away many misconceptions I had about SSD.

From a professional point of view, my main interest would be how databases are affected by the following characteristics of SSD:

  • Both sequential and random reads are fast with a granularity of 4K. In other words, to modify 1 byte, you still have to write 4K.
  • Writes require the block to be erased first. A block is typically 512K. That means if there are no erased (also called trimmed) blocks, you need to erase, and it is a s-l-o-w operation.
  • You can only erase a block 10,000 times before it stops working.
  • Good quality SSD controllers with onboard caches and highly parallel architectures make a big difference.

From this summary, it appears that the SSD is ideal for relatively static data, and we can selectively put certain parts of the database on SSD. Examples include:

  1. The typical publishing web-site, where articles and messages are rarely edited more than a few times.
  2. Systems with large amounts of static data, eg. multi-player online games such as the EVE Online case-study.

For transaction processing systems it depends on the usage. Assuming the blocks in a table are updated 10,000 times a day, the data distribution of updates is even across all records, and the table fits into 100 512K blocks, then the lifetime of the SSD for those blocks would be 100 days (this might be acceptable if SSD was sufficiently cheap). And even for data warehouse applications with relatively static data, B-tree index rebalancings will cause the frequent rewrites of indexes.

It also appears that operations characterized by sequential writes such as transaction logging should continue to be placed on hard disks.

For some information on potential database performance, see these articles testing SSD with mysql, DB2 and PostgreSQL. Also see Windows 7 Support and Q&A for Solid-State Drives.

(image)




Memories are made of Squid...

Fri, 20 Feb 2009 03:53:30 -0500

Apache is not a particularly fast web server. A single Apache server doesn't handle a mix of static and dynamic data particularly well. Ideally, static data such as gifs, pngs and html pages should be cached in memory for quick access; Apache doesn't do this. And the prefork design of Apache (where we have a simple reliable parent process managing multiple worker processes that do the real work) makes it exceptionally robust, but the overhead of having these parent and child processes makes things run slower.

Squid is a web proxy accelerator. What it does is make Apache look good -- downloads magically become faster because of the caching of static files, and the overhead of connection setup and keep-alives is offloaded from Apache. When a request for a .php page is made, Squid will pass the request to Apache and return the results.

Since 2004, we have had a customer runnning an Intranet system with Apache 1.3, PHP 4.3, Oracle and Squid. This system runs on 16 CPU Sun E20K mainframe and has over 3000 users logged in every day. A few months ago, we upgraded them to PHP 5.2. While planning for the upgrade, I did some benchmarks and found that PHP 5.2 was about 30-40% faster than PHP 4.3. So I confidently recommended that we disable Squid...

On the morning of the rollout, I saw to my horror the CPU utilisation surge, from 50%, to 60%, to 70%, until it hit 98% -- and it stayed there! Only then did I remember that in my testing back in 2004, Squid had improved performance significantly more than an upgrade to PHP5 ever would. At eleven, we started Squid and modified apache to listen to port 3000 again. CPU utilisation dropped from 98% to 40%. Squid had saved day again.

The moral of the story: never underestimate the power of a good squid


The Server setup is as below, all software running on a single E20K Solaris Server:


  Squid listens on port 80 ---- Apache on port 3000 (http) ---- Apache Children  ---- Oracle 10g
                                 and port 443 (https)            running PHP 5.2

When web browsers use https for login authentication, they connect directly to Apache. When users are accessing normal data on port 80, Squid will forward the request to Apache which is listening on port 3000. Squid is setup to cache .png, .gif, .jpg, , .htm, .html and .css files.




ADOdb Active Record and the art of redesign

Thu, 25 Dec 2008 23:43:50 -0500

Merry Christmas and Happy New Year everyone. Looking forward to the new year as I expect to be a father in January :)

Let's now talk a little bit about the parenting and the past: Since 2006 ADOdb has supported Active Record, the object-oriented paradigm for processing records using SQL. One of the most powerful features of Active Record is the ability to define parent-child relationships. The old way was:

$person = new person();
$person->HasMany('children', 'person_id');
$person->Load("id=1");

Where "persons" is the parent table, "children" is the child table and "children.person_id" is a field in "children" pointing to "persons". All the children of person with id=1 would be dynamically loaded into the array $person->children when the property was accessed (lazy loading).

This was confusing for the programmer and had many limitations, as was pointed out by Arialdo Martini in this post.

Firstly it was confusing to the programmer. Should HasMany() be called everytime you create a new person()? The answer was no, it's global, but the implementation made it look like it was local to the instance. The HasMany function really should be defined as a statically, before new person() was used.

Another problem was you could not override the class of the child objects. So you couldn't modify the behaviour of child object easily.

My objective was to fix all this, and still keep backward compatibility so your old code continued to work. The good news is that all the metadata to keep track of all the object-table relationships could still be reused. The problem was one of a weak API, but the internals were sound. The solution implemented in ADOdb 5.07 was to create a new set of static functions that override the default behaviour:

ClassHasMany

The new way defines the relationship in a static function, which makes it clearer that it only needs to be called once in your init php code:

class person extends ADOdb_Active_Record{}
ADODB_Active_Record::ClassHasMany('person', 'children','person_id');
$person = new person();
$person->Load("id=1");

TableHasMany

One of the things that I try to do in ADOdb is maintain backward compatability. You are able to override the defaults of Active Record (id is the primary key, the name of the table is the plural version of the class name). So if the table name of the parent is not "persons", but "people", you can use:

ADODB_Active_Record::TableHasMany('people', 'children','person_id');
$person = new person();
$person->Load("id=1");

TableKeyHasMany

The default primary key name is "id". You can override it (say "pid" is used) using

ADODB_Active_Record::TableKeyHasMany('people', 'pid', 'children', 'person_id')

Child Class Definition

Formerly, the child class was always an ADODB_Active_Record instance. Now you can derive the class of the children like this:

class childclass extends ADODB_Active_Record {... };
ADODB_Active_Record::ClassHasMany('person', 'children','person_id', 'childclass');

Works the same way with TableHasMany().

Belongs To

Analogously, there are functions ClassBelongsTo, TableBelongsTo, TableKeyBelongsTo for defining child pointing to parent.


Download ADOdb   ADOdb Active Record docs



Divide-and-conquer and parallel processing in PHP

Fri, 20 Feb 2009 03:57:24 -0500

In my previous post Easy Parallel Processing in PHP, I showed you how to implement parallel batch processing using PHP and a web server. In this post, I want to discuss partitioning your tasks so that they become easily parallelized.

The strategy I prefer is divide-and-conquer. This works by recursively breaking down a problem into two or more sub-problems of the same type, until these become simple (and fast) enough to be solved directly. The solutions to the sub-problems are then combined to give a solution to the original problem.

To illustrate with an example, lets say you have millions of financial payment data records in a database you want to process in parallel using PHP:

  1. First group your data into logical chunks that need to be processed in one transaction. If you are processing payments for lots of accounts, then grouping them by account number makes lots of sense.
  2. Decide on how many parallel child processes you want to run. For this example, assume we are on a single dual core CPU server, so it makes sense to only run two concurrent child processes.
  3. Split all the records by the median account (the median is a statistical term that means the middle record in a range). To make it easy to split by the median
  4. From the parent process, pass one child process record 1 to median-1 and pass the second child process the median to final records as $_GET parameters.
  5. For simplicity's sake, both child processes will run the same code, but receive different parameters. The results of the processing can be either stored in the database, or returned back to the parent process by echo'ing the results in the child process.

To find the median of a set of records in a database, I have extended ADOdb, the popular PHP open source database abstraction library I maintain with the following function defined in the ADOConnection class:

	function GetMedian($table, $field,$where = '')
	{
		$total = $this->GetOne("select count(*) from $table $where");
		if (!$total) return false;
	
		$midrow = (integer) ($total/2);
		$rs = $this->SelectLimit("select $field from $table $where order by 1",1,$midrow);
		if ($rs && !$rs->EOF) return reset($rs->fields);
		return false;
	}

If you have a Quad-Core CPU then you can call GetMedian 3 times to break up the data into 4 approximately equal parts, and pass then to 4 child processes:

  $mid = $db->GetMedian('PAYMENTS', 'ACCOUNTNO');
  if (!$mid) return 'Error';
  $lomid = $db->GetMedian('PAYMENTS', 'ACCOUNTNO', "where ACCOUNTNO < $mid");
  $himid = $db->GetMedian('PAYMENTS', 'ACCOUNTNO', "where ACCOUNTNO >= $mid");

The above GetMedian function is not particularly optimal when you want need to run it multiple times on the same dataset. Improvements are left to the reader (or in a future blog entry).

PS: Another strategy for parallelization popularised by Google is Map Reduce.




Easy Parallel Processing in PHP

Fri, 20 Feb 2009 04:05:14 -0500

The proliferation of multicore CPUs and the inability of our learned CPU vendors to squeeze many more GHz into their designs means that often the only way to get additional performance is by writing clever parallel software. One problem we were having is that some of our batch processing jobs were taking too long to run. In order to speed the processing, we tried to split the processing file into half, and let a separate PHP process run each job. Given that we were using a dual core server, each process would be able to run close to full speed (subject to I/O constraints). Here is our technique for running multiple parallel jobs in PHP. In this example, we have two job files: j1.php and j2.php we want to run. The sample jobs don't do anything fancy. The file j1.php looks like this: $jobname = 'j1'; set_time_limit(0); $secs = 60; while ($secs) { echo $jobname,'::',$secs,"\n"; flush(); @ob_flush(); ## make sure that all output is sent in real-time $secs -= 1; $t = time(); sleep(1); // pause } The reason why we flush(); @ob_flush(); is that when we echo or print, the strings are sometimes buffered by PHP and not sent until later. These two functions ensure that all data is sent immediately. We then have a 3rd file, control.php, which does the coordination of jobs j1 and j2. This script will call j1.php and j2.php asynchronously using fsockopen in JobStartAsync(), so we are able to run j1.php and j2.php in parallel. The output from j1.php and j2.php are returned to control.php using JobPollAsync(). # # control.php # function JobStartAsync($server, $url, $port=80,$conn_timeout=30, $rw_timeout=86400) { $errno = ''; $errstr = ''; set_time_limit(0); $fp = fsockopen($server, $port, $errno, $errstr, $conn_timeout); if (!$fp) { echo "$errstr ($errno)
\n"; return false; } $out = "GET $url HTTP/1.1\r\n"; $out .= "Host: $server\r\n"; $out .= "Connection: Close\r\n\r\n"; stream_set_blocking($fp, false); stream_set_timeout($fp, $rw_timeout); fwrite($fp, $out); return $fp; } // returns false if HTTP disconnect (EOF), or a string (could be empty string) if still connected function JobPollAsync(&$fp) { if ($fp === false) return false; if (feof($fp)) { fclose($fp); $fp = false; return false; } return fread($fp, 10000); } ########################################################################################### if (1) { /* SAMPLE USAGE BELOW */ $fp1 = JobStartAsync('localhost','/jobs/j1.php'); $fp2 = JobStartAsync('localhost','/jobs/j2.php'); while (true) { sleep(1); $r1 = JobPollAsync($fp1); $r2 = JobPollAsync($fp2); if ($r1 === false && $r2 === false) break; echo "r1 = $r1
"; echo "r2 = $r2
"; flush(); @ob_flush(); } echo "

Jobs Complete

"; } And the output could look like this: r1 = HTTP/1.1 200 OK Date: Wed, 03 Sep 2008 07:20:20 GMT Server: Apache/2.2.4 (Unix) mod_ssl/2.2.4 OpenSSL/0.9.8d X-Powered-By: Zend Core/2.5.0 PHP/5.2.5 Connection: close Transfer-Encoding: chunked Content-Type: text/html 7 j1::60 r2 = HTTP/1.1 200 OK Date: Wed, 03 Sep 2008 07:20:20 GMT Server: Apache/2.2.4 (Unix) mod_ssl/2.2.4 OpenSSL/0.9.8d X-Powered-By: Zend Core/2.5.0 PHP/5.2.5 Connection: close Transfer-Encoding: chunked Content-Type: text/html 7 j2::60 ---- r1 = 7 j1::59 r2 = 7 j2::59 ---- r1 = 7 j1::58 r2 = 7 j2::58 ---- Note that "7 j2::60" is returned by PollJobAsync(). The reason is that the HTTP standard requires the packet to return the payload length (7 bytes) in the first line. I hope this was helpful. Have fun! PS: Also see Divide-and-conquer and parallel processing in PHP. Also see popen for a[...]



Microsoft contributes to LGPL project for first time: ADOdb mssqlnative drivers

Mon, 22 Sep 2008 19:52:10 -0400

Last week, I got an email from Garrett Serack, M'soft Open Source Community Developer. Microsoft have been kind enough to donate a set of ADOdb drivers for the new MSSQL Native Extension for PHP. You can download the extension here and the ADOdb drivers here.

Garrett also mentions that ADOdb is the first LGPL project that Microsoft has ever contributed to. I quote from his email to me:

ADODB is actually the first LGPL Open Source project that Microsoft has ever contributed to. 
We've got a dozen or so others lined up and ready to go to other open source PHP projects 
(GPL, BSD and others), But ADODB was the *FIRST*. You could say that contributing to ADODB 
is Microsoft going from zero to one.

We announced it at OSCON, (see the post at http://port25.technet.com/archive/2008/07/25/oscon2008.aspx )
along with Microsoft becoming a platinum sponsor of the Apache Software Foundation. Either of
these two steps is such a good move for Microsoft, and both together, is a good sign that the 
Company is learning.

Thanks Garrett.

Story in The Register.

PS: ADOdb is dual licensed as LGPL and BSD. Choose which license you want.