Last Build Date: Thu, 29 Jan 2004 23:29:00 -0000
Thu, 29 Jan 2004 23:29:00 -0000
The JavaDocs for Scoof are online at http://scoof.sourceforge.net
Sat, 03 Jan 2004 13:29:20 -0000
Thanks to Graklaw's Quote DB I managed to find the original story that irked me, so many moons ago, into writing the initial scoof version.
To Quote Darl:
"We're talking about line-by-line code copying. That includes not just the function but the exact, word-for-word lines of code. And the developer comments are exactly, 100 percent the same. The developer comments really get to the DNA of the code. It's one thing to have something look the same, but when the developer comments are exactly the same, that tells you everything you need to know that this is in fact lifted, that it has been copied and pasted from Unix into Linux."
This is why current versions of Scoof focus on the comments. If a comment is 100% the same then Scoof will find that, it will also normalise the comments by removing the asterisks (*) from the margin, folding to lower case and stripping whitespace, so it will find comments that are less than 100% the same too.
I've done the hard work, and generated hashes for all 500k kernel annotations including file name and line number. If I can do that so can SCO, which makes the current discovery debacle fascinating.
Thu, 13 Nov 2003 23:18:49 -0000
After a night out talking over the future of the IT industry over a Thai meal and a coffee or two, I came up with the following outline plan for the project.
An interesting aspect of the night was that I was talking with two sold out Microsoft junkies and really didn't want to get into a debate about FOSS, however, FOSS came up as a solution to problems they suggested with the packaged software model. We were talking about BIG packages, telco billing, ERP, MIS and CRM, not your average shrink wrap utility.
The problem is that for businesses to lead the market they need to invest in unique technology to diferentiate themselves from the competition - packages inhibit this. Open Software allows them to over come this inhibition by legally and fairly embracing the work of others and building upon it. SCO are challenging the right of all businesses to do this based on the GPL and challenging FOSS in general, the solution? Answer all SCO's claims by creating open software to audit open software.
Pretty grand huh? Here's the plan:
- Modify the search routine to generate Passages based on arbitray numbers of code lines as well as just the comments. This takes the project beyond its initial remit of disproving specific Darlisms, onto answering whole chunks of Scoistic FUD.
- Modify the data handling to support the live offloading of this data to files and databases - otherwise we hit memory issues and generally slow down.
- Create a pluggable output mechanism fitting the Factory pattern.
- Create output mechanisms for relational databases, and a standard schema specification backed by DDL scripts.
- Create output mechanisms for XML - one file in one file out - this is for people who don't want to bother with relational databases servers.
- Finishing off the threading model using Listeners and supporting multiple threads, this allows the OS to schedule hashing operations during IO wait time, but is low priority since poor performamce is not a show stopper and the theory is unproven.
- Create a wizzy graphical front end to support analysis of the DB, plus admin tasks like importing the XML file output to the DB, scanning external CVS servers for evidence etc.
If anybody wishes to contribute to this road map please don't hesitate to contact me, I have said all along that I have little time to devote, so offers are gratefully received.