Subscribe: Rob Gillen's WebLog
Added By: Feedage Forager Feedage Grade B rated
Language: English
atom  azure storage  azure  bit  client  cloud  cross posted  data  file  i’m  i’ve  method  storage  time  upload   
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Rob Gillen's WebLog

Rob Gillen's WebLog

Random comments on MPF, Provisioning, and .NET Development


CodeStock 2012: Buffer Overflow Attack

Mon, 18 Jun 2012 01:14:34 GMT

We had a great time at CodeStock a few days ago discussing buffer overflow attacks, showing developers how they are discovered and exploited and a bit about how to avoid creating software that is vulnerable to these types of attacks. Below are the slides and video from the session:

height="281" src="" frameborder="0" width="500" mozallowfullscreen="mozallowfullscreen" webkitallowfullscreen="webkitallowfullscreen" allowfullscreen="allowfullscreen">

External File Upload Optimizations for Windows Azure

Mon, 26 Apr 2010 20:21:00 GMT

[Cross posted from here:] I’m wrapping up a bit of the work we’ve been doing on data movement optimizations for cloud computing and the latest set of data yielded some interesting points I thought I’d share. The work done here is not really rocket science but may, in some ways, be slightly counter-intuitive and therefore seemed worthy of posting. Summary: for those who don’t like to read detailed posts or don’t have time, the synopsis is that if you are uploading data to Azure, block your data (even down to 1MB) and upload in parallel. Set your block size based on your source file size, but if you must choose a fixed value, use 1MB. Following the above will result in significant performance gains… upwards of 10x-24x and a reduction in overall file transfer time of upwards of 90% (eg, uploading a 1GB file averaged 46.37 minutes prior to optimizations and averaged 1.86 minutes afterwards). Detail: For those of you who want more detail, or think that the claims at the end of the preceding paragraph are over-reaching, what follows is information and code supporting these claims. As the title would indicate, these tests were run from our research facility pointing to the Azure cloud (specifically US North Central as it is physically closest to us) and do not represent intra-cloud results… we have performed intra-cloud tests and the overall results are similar in notion but the data rates are significantly different as well as the tipping points for the various block sizes… this will be detailed separately). We started by building a very simple console application that would loop through a directory and upload each file to Azure storage. This application used the shipping storage client library from the 1.1 version of the azure tools. The only real variation from the client library is that we added code to collect and record the duration (in ms) and size (in bytes) for each file transferred. The code is available here. We then created a directory that had a collection of files for the following sizes: 2KB, 32KB, 64KB, 128KB, 512KB, 1MB, 5MB, 10MB, 25MB, 50MB, 100MB, 250MB, 500MB, 750MB, and 1GB (50 files for each size listed). These files contained randomly-generated binary data and do not benefit from compression (a separate discussion topic). Our file generation tool is available here. The baseline was established by running the application described above against the directory containing all of the data files. This application uploads the files in a random order so as to avoid transferring all of the files of a given size sequentially and thereby spreading the affects of periodic Internet delays across the collection of results.  We then ran some scripts to split the resulting data and generate some reports. The raw data collected for our non-optimized tests is available via the links in the Related Resources section at the bottom of this post. For each file size, we calculated the average upload time (and standard deviation) and the average transfer rate (and standard deviation). As you likely are aware, transferring data across the Internet is susceptible to many transient delays which can cause anomalies in the resulting data. It is for this reason that we randomized the order of source file processing as well as executed the tests 50x for each file size. We expect that these steps will yield a sufficiently balanced set of results. Once the baseline was collected and analyzed, we updated the test harness application with some methods to split the source file into user-defined block sizes and then to upload those blocks in parallel (using the PutBlock() method of Azure storage). The parallelization was handled by simply relying on the Parallel Extensions to .NET to provide a Parallel.For loop (see linked source for specific implementation details in Program.cs, line 173 and following… less than 100 lines total). Once all of the blocks were uploaded, we called PutBlockList() [...]

Time to do some digging…

Mon, 21 Dec 2009 19:50:09 GMT

I’ve been getting my test harness and reporting tools setup for some performance baselining that I’m doing relative to cloud computing providers and when I left the office on Friday I set off a test that was uploading a collection of binary files (NetCDF files if you care) to an Azure container. I was doing nothing fancy… looping through a directory, for each file found, upload to the container using the defaults for BlobBlock and then record the duration (start/finish) for that file and the file size. The source directory contained 144 files representing roughly 58 GB of data. 32 of the files were roughly 1.5 GB each and the remainder were about 92.5 MB. I came in this morning expecting to find the script long finished with some numbers to start looking at. Instead, what I found is that, after uploading some 70 files (almost 15 GB), every subsequent upload attempt failed with a timeout error – stating that the operation couldn’t be completed in the default 90-second time window. I started doing some digging into what was happening and so far have uncovered the following: By default, the Storage Client that ships with the November CTP breaks your file up into 4 MB blocks (assuming you are using BlobBlock – which you should if your file is over the 64 MB limit. The client then manages 4 concurrent threads uploading the data. as each thread completes, another is started – keeping four active most the entire time. At some point Saturday afternoon (just after 12 noon UTC), the client could no longer successfully upload a 4 MB file (block) in the 90 second window, and all subsequent attempts failed. I initially assumed that my computer had simply tripped up or that a local networking event caused the problem so I restarted the tool – only to find every request continuing to fail. I then began to wonder if the problem was the new storage client library (not sure why) so I pulled out a tool to manage  Azure storage – Cloud Storage Studio ( and noticed that I was able to successfully upload a file. I remembered that CSS (by default) splits the file into fairly small blocks, so I cracked open Fiddler and began monitoring what was going on. I learned that it was using 256 KB blocks (this is configurable via settings in the app). I then adjusted my upload script to set the ServiceClient.WriteBlockSizeInBytes property (ServiceClient is a property of the CloudBlockBlob object) to 256k and re-ran the script. This time, I had no troubles at all (other than a painfully slow experience). So, I can upload data (not a service outage) but while 256K blocks work, the 4 MB blocks that worked on Friday no longer work – I’m assuming that there’s a networking issue on my end, or something in the Azure platform. To provide more clarity, I adjusted the tool again, this time using a WriteBlockSizeInBytes value of 1MB and re-ran the tool – again, seeing successful uploads.   While this last step was running, I thought it might be good to go back and do some crunching on the data I had so far. The following chart represents the uploads rate from the files that successfully were uploaded on Friday/Saturday followed by the a chart showing the probability density. The mean rate was 2.74 mbits/sec with a standard deviation of 0.1968. It is interesting to note that there was no upward drift at the end of the collection of successful runs, indicating that more than likely, the “fault” was likely caused by something specific rather than being the result of a gradual shift or failure based on usage (imagine a scenario wherein as more data is populated in a container, indexes slow down, causing upload speeds to trail off). Upload Speeds [click image for full size] Probability Density [click image for full size]   I then ran similar reports against the data I from this morning’s runs. I’m still in the process of generating a full report on the data[...]

Windows Azure, Climate Data, and Microsoft Surface

Fri, 18 Dec 2009 21:57:35 GMT

I’ve been working on moving a large collection data to, from, and around Azure as we are testing the data profile for scientific computing and large-scale experiment post-processing and, in order to verify the data we uploaded and processed turned out as we wanted tit to, I built a simple visualization app that does a real-time query against the data in Azure and displays it. Originally the app was built as a simple WPF desktop application, but I got to thinking that it would be particularly interesting on the Surface and therefore took a day or two to port it over. The video below is a walkthrough of the app – the dialog is a bit cheesy but the app is interesting as it provides a very tactile means of interacting with otherwise stale data.

Automated Chart Generation

Fri, 18 Dec 2009 21:49:23 GMT

It’s late on the Friday afternoon before Christmas week which means things are pretty quiet around the office. This quiet has the net-effect of allowing me to get quite a bit done. The last few days have been very productive with respect to our research project and Azure work (more on that coming soon) which is now in full swing. We are currently working on collecting performance data from our codes running in Azure (and soon in the Amazon cloud) and are also doing some testing of transfer speeds of data both to/from the cloud as well as between compute and storage in the cloud.

I’ve been working to automate much of this testing so we can do things in a repeatable fashion as well has have something that others could run (both other users like ourselves as well as possibly vendors should we come across something that requires a repro scenario). So far, running tests and generating data in CSV or XML format is pretty simple, but I found myself wanting to automatically generate charts/graphs of the data as part of the test process to allow a quick visualization of how the test performed. I spent a good bit of the day looking at old tools for command-line generation of charts (i.e. RDTool, etc.) and none of them were exactly what I was looking for – not to mention my proclivity to using C# and VS.NET tools and my desire to have something that looked refined/polished and not overly raw.

Thankfully, I stumbled upon something I should have remembered existed but simply hadn’t had the need to use before – the System.Windows.Forms.DataVisualization.Charting class. If you aren’t familiar with this assembly, it was released at PDC08 and has a companion Web class for performing similar operations in ASP.NET applications. In my basic testing I was able to build a console application that would ingest the CSV output from my testing harness and then generate some fairly nice looking charts based on that data. The following shows a chart (click the chart to see it full size) generated from ~1800 data points, and automatically generates a 50% band and 90% band allowing the viewer to very easily ascertain the averages and data points. This was generated using a combination of the FastPoint and BoxPlot chart types.


“Live” Monitoring Of Your Worker Roles in Azure

Thu, 03 Sep 2009 18:08:21 GMT

[Cross-posted from here] I’ve been working for a bit on some larger-scale jobs targeting the Windows Azure platform and early last week had assembled a collection of worker roles that were supposed to be processing my datasets for a number of days moving forward. Unfortunately, they wouldn’t stay running. As always, they “worked on my machine”, so I naturally assumed that the problem was with the Azure platform :). I then proceeded to do what I thought was the correct action… go to the Azure portal and request that the logs be transferred to my storage account so I could review them and fix the problem. What I learned, is that there were two problems with this solution: The time delay between requesting the logs and actually being able to review them is prohibitive for productive use. In my experience, the minimum turn around was 20 minutes and was most often 30 or longer. I’m not sure why this was happening – is this by design, or a temporary bug, or an artifact of the actual problem with my code, or what, but I know it was too long. Logs appear to get “lost”. In my scenario, my worker roles were throwing an exception that was un-caught my by code. Near as I can tell, when this happens, the Azure health monitoring platform assumes that the world has come to an end, shuts down that instance, and spins a new instance. While this (health monitoring and auto-recovery) is a good thing, one side effect (caveat is the fact that this is my experience and may not be reality) is that the logs were stored locally and, when the instance was shutdown/recycled, those log files went to the great bit-bucket in the sky. I was stuck in a failure mode with no visibility as to what was going wrong nor how to fix it. After pounding my head for a bit, I came up with the following solution – trap every exception possible and use queues. The first aspect allowed my worker roles to stay running. This may not always be the right answer, but for my use case, I adapted my code to handle the error cases and trapping / suppressing all exceptions proved to be a good answer. Further, doing so allowed me to grab the error message and do something interesting with it. The second step (using queues) solved the (my) impatience problem. I created a local method called WriteToLog that did two things: write to the regular Azure log, and write to a queue I created called status (or something similarly brilliant). I replaced all of my “RoleManager.WriteToLog()” calls with calls to the local method and I then wrote a console app that would periodically (every few seconds) pop as many messages as it could (API-limited to 32) off of the status queue, dump the data as a local csv for logging and write the data to the screen. This allowed me to drastically reduce the feedback loop between my app and me, enabling me to fix the problems quickly. There are certainly some downsides to this approach (do queues hit a max?, what is the overhead introduced by logging to a queue, once a message is dequeued, it is not available for other clients to read, etc), but it was a nice spot fix. A better implementation would have a flag in the config file or something similar that would control the queue-logging. As you can see from the image above, I also wrote a little winform app to display the approximate queue length so I’d have an idea of the progress and how much work remained.[...]

SilverLight and Paging with Azure Data

Thu, 20 Aug 2009 19:59:52 GMT

[Cross-posted from here]

If you’ve been watching by blog at all lately, you know that I’ve been playing with some larger data sets and Azure storage, specifically Azure table storage. Last week I found myself working with a SilverLight application to visualize the resulting data and display it to the user, however I did not want to use the ADO.NET Data Services client (ATOM) due to the size of data in transmission. Consequently, I set up a web role that proxied the data calls and fed them back to the caller as JSON. Due to the limitation on Azure table storage of only returning 1,000 rows at a time, I needed to access the response headers in my SilverLight client to determine after each request if there were more rows waiting… and that was the rub… every time I tried to access the response headers collection (tried both with a WebClient and HttpWebRequest), I received a System.NotImplementedException.


I pounded my head on this for a few days with no success until a helpful twitterer (@silverfighter) provided me a link that got me rolling. The root of the problem was my ignorance of how SilverLight’s networking stack functioned. As I (now) understand it, by default any networking calls (WebClient or HttpWebRequest) are actually handled by the browser and not .NET. This results in you getting access to only what the browser object hands you, which in my case, did not include the response headers.


The key here is that SilverLight 3 provides you the ability to tell the browser that you’d rather handle those requests yourself. By simply registering the http protocol (you can actually do it as granular as a site level) as handled by the Silverlight client, “magic” happens and you suddenly have access to the properties of the WebClient (ResponseHeaders) and HttpWebRequest (Response.Headers) objects that you would have expected to. The magic line you need to add prior to issuing any calls is as follows:


bool httpResult = WebRequest.RegisterPrefix("http://", WebRequestCreator.ClientHttp);


(yes… that’s it…)


The links to the appropriate articles are as follows:

AtomPub, JSON, Azure and Large Datasets, Part 2

Thu, 20 Aug 2009 19:27:36 GMT

[Cross-posted from here] Last Friday I posted some initial results from some simplistic testing I had done comparing pulling data from Azure via ATOM (the ADO.NET data services client) and JSON. I was surprised at the significant difference in payload and time to completion. A little later, Steve Marx questioned my methodology based on the fact that Azure storage doesn’t support JSON. Steve wasn’t being contrary, but rather pushing for clarification to the methodology of my testing as well as a desire to keep people from attempting to exploit the JSON interface of Azure storage when none exists. This post is a follow up to that one and attempts to clarify things a bit and highlight some expanded findings.   The platform I’m working against is an Azure account with a storage account hosting the data (Azure Tables), and a web role providing multiple interaction points to the data, as well as making the interaction point anonymous. Essentially, this web role serves as a “proxy” to the data and reformats it as necessary. After Steve’s question last week, I got to wondering particularly about the overhead (if any) the web role/proxy was introducing and if, esp. in the case of the ATOM data, it was drastically affecting the results. I also got to wondering if the delays I was experiencing in data transmission were, in some part, caused by the fact of having to issue 9 serial requests in order to retrieve the entire 8100 rows that satisfied my query.   To address these issues, I made the following adjustments: Tweaked my test harness for ATOM to optionally hit the storage platform directly (bypassing the proxy data service). Tweaked the data service to allow an extra query string parameter to indicate that the proxy service should make as many calls to the data service as necessary to gather the complete result set and then return the results as a single batch to the caller. This allowed me to eliminate the 1000 row limit as well as to issue only a single HTTP request from the client. I increased the test runs from 10 to 20 – still not scientifically accurate by any means, but a bit longer to provide a little better sense of the average lengths for each request batch. The results I received as follows and not altogether different than one might expect: As you can see from the charts above, the JSON FULL option was the fastest with an average time to completion of 14.4 seconds. When compared to the regular JSON approach, you can infer that the overhead introduced from multiple calls is roughly 4 seconds (18.55 average time to completion).   In the ATOM category, I find it interesting that the difference between the ATOM Direct (directly to the storage service) was only marginally faster (0.2 of a second on average) than the ATOM FULL approach. This would indicate that the network calls between the web role and the storage role are almost a non-factor (hinting at rather good network speeds). Remember, in the case of ATOM Full, the web role is doing the exact same thing as the test client is doing in Atom Direct, but additionally bundling the XML response into a single blob (rather than 9) and then sending it back to the client.   The following chart shows the average payload per request between the test harness and Azure. Atom Full is different then Atom and Atom Direct in that the former is all 8,100 rows whereas the later two represent a single batch of 1000. It is interesting to note that the JSON representation of all 8,100 records is only marginally larger than the ATOM representation of 1,000 records (1,326,298 bytes compared to 1,118,892 bytes). At the end of the day, none of this is too surprising. JSON is less verbose in markup than ATOM and would logically be smaller on the wire and therefore complete sooner (although I wouldn’t have imagined it was a factor [...]

AtomPub, JSON, Azure, and Large Datasets

Fri, 14 Aug 2009 19:27:33 GMT

UPDATE 8/20/2009, 15:29 EST: There is some confusing content in this post (i.e. Azure storage doesn’t support JSON). A follow up to this post with further explanation/detail is available here -- UPDATE 8/14/2009, 17:16 EST: @smarx pointed out that this post is a bit misleading (my word) in that Azure storage doesn’t support JSON. I have a web role in place that serves the data, which, upon reflection could be introducing some time delays into the Atom feed. I will test further and update this post. ---- [Cross-posted from here] I’m just really beginning to scratch the surface on my work on cloud computing and scientific computing but it seems that nearly every day I’m able to spend time on this I come away with something at least moderately novel. Today’s observation is, on reflection, a bit of a no-brainer but it wasn’t immediately obvious to me. I’m kicking around some scientific data and have a single collection of data, with somewhere north of 40,000 subsets of data, with each subset containing roughly 8,100 rows. I’ve had an interesting time getting this data into Azure tables, but that’s not the point of this post. Once the data resided in Azure, I built a little ADO.NET client to pull the data down based on various queries (in my case, a single “subset” at a time, or 8,100 rows). In case you are wondering, the data is partitioned in Azure based on the key representing each subset, so I know that each query is only hitting a single partition. I proceeded to follow the examples for paging and data calls (checking for continuation tokens, etc.) and it wasn’t long before I had a client that would query for a particular slice of data, and then make however many individual data calls necessary until the complete result set was downloaded and ready for processing. I was, however, disappointed in the time it took to pull down a single slice of data… averages were around 55 seconds. Pulling down a number of slices of the dataset, at nearly a minute each, was a bit slow. I spent some time poking around with Fiddler and some other tools and discovered that I was suffering from XML bloat. The “convenience” factor of having a common, easy-to-consume format was killing me. Each response coming back from the server (1000 rows) was averaging over 1MB of XML. After a while of kicking around my options (frankly, too long), I decided to try pulling the data as JSON. I hadn’t used JSON previously, but had heard it touted as being very lightweight. I also found some nice libraries on CodePlex for de-serializing the response so I could use it as I had the results of the Atom feed. Once I made this change, I was shocked to see the amount of improvement (I expected some, but what I saw was much more than anticipated). My average time dropped to around 14 seconds for the entire batch and the average size of the response body dropped to about 163k.   I’ve included some charts below showing the results of my tests. I ran the tests 10 times for each of the approaches. Code base for each test harness is identical with the exception of the protocol-specific text handling. Time measurements are from the start of the query through the point that each response has been de-serialized into an identical .NET object (actually, a List objects).   These first two charts show the time required to retrieve the entire slice of data. The unit of measure is seconds.       These charts show the average payload returned per request for the two different methods. In both cases, the unit of measurement is bytes.   [...]

Silverlight and Azure Table Data Paging

Wed, 05 Aug 2009 21:07:25 GMT

[Cross posted from here] I’m playing around with a data visualization app using Silverlight and data hosted in Azure Tables and have been learning quite a bit in the process. Firstly, Azure tables only allows you to return 1000 records in a given query. If you issue a query that has a larger matching result set, Azure will return some extra headers indicating as such (x-ms-continuation-NextPartitionKey and x-ms-continuation-NextRowKey). It wasn’t hard to find an example of data paging using Azure table data, however it used the Execute() method of the DataServiceQuery object. Unfortunately, this isn’t available in Silverlight as you have to use the asynchronous methods (BeginQuery and EndQuery). I’m a bit slow, and for whatever reason translating the MSDN sample for synchronous to the asynchronous model took me longer than it should have. I’m posting this so that maybe the next person will find this, get the answer they need, and move on and not waste the same amount of time I did.   My button event handler looks quite a bit different from the MSDN sample but is pretty easy to figure out:     I instantiate the context, create a query based on that context, cast that query as a DataServiceQuery and then call the BeginExecute method passing my callback method and the query as my state object. (Note: in case you are wondering about the Where clause in the query above, I know that all of the data that matches the first conditional is located in a given partition within the Azure table and have found that specifying the partition greatly increases the performance).   My callback method (ProcessDataRequest) does a bit of recursion to support the unknown number of subsequent calls needed to retrieve all of the matching records.   The contents of the ProcessDataRequest method are listed below:   Note that unlike some samples that are simply focusing on async data calls and don’t handle paging via headers, I cast the output of EndExecute as a QueryOperationResponse object which allows me to subsequently access the headers and interrogate them for the continuation keys. If I find the continuation keys, I create a new query object, set the additional query options, and execute it in the same fashion as the original call.  The AddPointsToScreen method simply processes the values and renders them as polygons to the screen. I’ve included it here not because there is anything special in it, but rather for completeness.   [...]

Azure, Visualization, and Large Datasets

Wed, 05 Aug 2009 20:33:43 GMT

[This is cross-posted from here]

I’ve been working on kicking the tires of Azure’s data functions and in the process was able to get my hands on a large set of climate data for testing purposes. I’m fighting with some size issues and azure, but thought I’d start by loading up some experimental temperature runs into Azure tables and then build a visualization tool to help the viewer to wrap his/her mind around the numbers. This is my first real Silverlight app and, while it has a long way to go, it’s an interesting first stab.




Data slowness: The first big issue I encountered (and still am) is the time it takes to pull the 8,100 data points represented above from Azure. My current start-to-finish time is just under 60 seconds which is about 50 seconds too long for my liking. I’m still kicking around some ideas of how to speed it up and what I might be able to do (server side) to improve on this, but considering I have 40k + sets of 8,100 data points (I’d like to do client-side animation of the data at some point), a minute per set is prohibitively long. Even if you only took 100 representative sets, you are still looking at a data transfer time that is unacceptable.


I’m also struggling with the way in which the data is being rendered to the control. I’m currently using the Bing Maps CTP Silverlight control and, while it is certainly better than the AJAX version, once you’ve placed 8100 polygons on it, performance degrades. Further, since the polygons are not interactive and the set of them is rather static, I’m wondering if generating a raster layer or WMS of some sort is a better approach for the display/visualization.


The next steps in the process are to get the app cleaned up, allow the user to select the time window for which data should be displayed, and improve the performance.

Azure Blob Storage Blob IDs and “+”

Thu, 30 Jul 2009 13:14:34 GMT

[This is cross-posted from here]

I’ve been kicking the tires on Azure’s blob storage and am working on uploading a 1.2GB+ NetCDF file. I stumbled across a couple of samples online that were very helpful in avoiding the de facto client library that ships with the SDK however I found myself bit by something (likely due to my error somehow) that I thought I’d pass along.


When processing a larger file, my upload process would always fail at block #248. At first, I assumed it was a network transience issue and simply re-ran the upload, however, after having it fail on the exact same block 3 times, I decided that it wasn’t the network. In digging a bit into things, I found that the problem had to do with the encoding of the block IDs. The offending piece of code is here:




where i is an integer representing the index of the current block within the file and blockIds is an array of IDs used to build the block ID list as part of a putBlockList operation.

The Azure SDK would indicate that this code snippet is perfectly valid… block IDs need to be a base 64-encoded string uniquely identifying the block within the blob. Further, each ID (within a blob) must be of the same length prior to encoding (same number of bytes). In this scenario, BitConverter.GetBytes returns a 4-byte array of values for all numbers within the range (in my case, 0 – 314). The following is an example of the resulting string for some numbers:

  • 246: 9gAAAA==
  • 247: 9wAAAA==
  • 248: +AAAAA==

There continues about 4 that begin with a ‘+’ sign, and a similar number that begin with ‘\’. Every other index in my collection began with a normal alpha character. After doing some poking around I found some indications that others were having similar problems and went down the path of encoding the line differently (i.e. HttpServerUtility.UrlTokenEncode, etc) to no avail. What I ended up with is simply prefixing my values with a standard “safe” string (“BlockId”)




This yielded a blockId that was unique, consistent length (notice the formatting of the indexer in the ToString() method), and “safe” in that it always began with a URL-safe character.


I’m certain that there is likely a better way to solve this problem, but this did the trick for me and maybe it will be helpful to someone else.

Codestock Session

Sat, 27 Jun 2009 01:04:39 GMT

[This is cross posted from here]

(image) I’d like to thank all of you who attended my session today at CodeStock. I had a great time talking with you all and sharing my experiences with SharePoint and TFS with you all.


Downloads from today’s session:

Also, I promised a collection of links for the tools I had installed.

Customizations to STSDev 1.3

Tue, 23 Jun 2009 15:08:34 GMT

[This is cross-posted from here] As part of my session on Deployment and build using TFS and SharePoint for CodeStock 09 I took the source code from the STSDev project on CodePlex ( and made a number of modifications. Some of these I would classify as clearly bugs, but most of them are simply adjustments to the core to fit my needs/desires. I’m documenting them here and providing a zip of the source for the benefit of those attending my session. These changes and source code are completely unsupported and you use them at your own risk. That being said, I hope that they are helpful and speed you in your integration between SharePoint and TFS. NOTE: unless specified, all of these changes are to the “Core” project. First minor change is that I moved the solution file up one level to the parent folder. This is truly nothing but a nit-pick but seems to make source control trees happier and therefore something that I almost always do. Upgraded the projects/solutions to Visual Studio 2008 Added an app.config file with an assembly binding redirect pushing old references to Microsoft.Build.Framework to utilize version This is one of the changes needed to get support for .NET 3.5 working properly. Changed the target framework property for the stsdev.csproj to .NET Framework 3.5. This change, in concert with the previous, allows the 3.5 selection in the UI to work properly. Major Change: Support for alternate bin paths. The 1.3 version as published on CodePlex always uses the compiler output in projectdir\bin\debug when assembling the *.wsp file. This happens regardless of what build configuration you have selected (yes, even release). This doesn’t work for TFS builds since the output is, by default, in a different location on the build server. To support this use case, the following changes were made: Program.cs: Changes were made prior to calling SolutionBuilder.RefreshDeploymentFiles to support the passing in of an additional parameter indicating the alternate bin path. Changed the Create method of Builders\DeploymentFiles\CabDdfBuilder.cs to accept the alternateBinPath as an additional parameter. Changed the Create method of Builders\DeploymentFiles\CabDdfBuilder.cs such that, if an alternate bin path is provided, it will use that value when adding references to assemblies rather than the otherwise-hard-coded /bin/debug/ Updated the first overload of the RefreshDeploymentFiles method in SolutionBuilder.cs to pass the alternate bin path to the second overload of RefreshDeploymentFiles. Updated the second overload of the RefreshDeploymentFiles method in SolutionBuilder.cs to build the TargetPath from the alternateBinPath if provided and otherwise to use the default. Updated the second overload of the RefreshDeploymentFiles method in SolutionBuilder.cs to pass the alternateBinPath parameter to CabDdfBuilder.Create(). Updated the CompleteSolution method in SolutionBuilder.cs to pass an empty value for alternateBinPath to RefreshDeploymentFiles since, during the initial project creation, it is ok to utilize the default /bin/debug path. Major Change: When creating a project, always create a parent solution directory in the same way that Visual Studio defaults to. This is helpful when working in a source controlled environment with multiple projects in the same solution. To support this feature, the following changes were made: Changed the property in Resources\Common\Microsoft.SharePoint.targets.xml to utilize the $(ProjectDir) variable rather than $(SolutionDir). Changed the Create method of SolutionFiles\Solu[...]

Speaking at CodeStock

Tue, 23 Jun 2009 15:06:27 GMT


[This is cross-posted from here]

I’m privileged to have been given the opportunity to speak at CodeStock (details below) this coming Friday. I’ll be speaking on the topic of Deploying and Packaging SharePoint solutions using TFS. The abstract for my session is:

Have you been using the VS Extensions for SharePoint to create SharePoint packages and found yourself wondering how best to integrate with your source control platform and build system? Consistent packaging of SharePoint solutions can be a challenge and is not for the faint of heart. Come to this session and learn how our team utilizes TFS, Team Build, SandCastle, SharePoint Installer, and STSDev in concert to produce consistent installation packages for our SharePoint/MOSS environment.

CodeStock is about Community. For Developers, by Developers (with love for SysAdmins and DBAs too!). Last year an idea started at CodeStock to mix Open Spaces within a traditional conference. This year we're going to crank things up to 11 and rip off the knob - and you're being drafted to help!

  • Keynote by Microsoft RIA Architect Evangelist Josh Holmes
  • From Developer to Business Owner roundtable with guest Nick Bradbury creator of HomeSite, TopStyle, and FeedDemon
  • 50+ break out sessions + Open Spaces (self-organizing sessions)
  • Grand Prize: VSTS 2008 Team Suite with MSDN Premium
  • Virtual sessions with Jeffery Richter and John Robbins

Space is limited so register today at

Common API Set for Cloud Storage?

Thu, 09 Apr 2009 19:12:13 GMT

(image) I’m working on a project which has as one of its goals the “publishing” of some very large datasets (order of 1PB) to the “cloud” for consumption by the general populous for use in scientific research. Rather than designing/inventing our own API, our decision has been to provide an interface consistent with the APIs produced by some of the leading cloud storage providers. Our goal would be that if someone is already used to/has tooling to working with cloud data sources such as Amazon’s S3 service or Microsoft’s Azure Blob storage, those same tools/experiences should directly apply by simply changing the http endpoint.

Unfortunately, neither Amazon or Microsoft seem poised to provide an open source implementation of their interfaces (from a service host perspective) so we are faced with the challenge of reverse-engineering their server-side interfaces. Neither is overly complex, so this certainly isn’t rocket science, but as I’ve been digging/poking at them the last few days I’ve found myself wishing there was more commonality between them. There are certainly many similarities and the “lift” moving between them is not heavy, but that’s actually the rub… they are so similar in many ways that I find myself wondering why they aren’t identical… are the deltas actually adding any value, or are they simply the repercussions of different groups of people solving the same problem while trying to convince the world they are not copying one another. I should be clear that I’ve been focused on the REST APIs and haven’t spent any time with the SOAP interfaces so I can’t speak to the commonality or lack thereof at that level.

From a client perspective, they are similar enough that there are many tools (both OSS and commercial) that can front-end either platform, so the differences aren’t *that* significant. John Spurlock has done some good work in providing some libraries for C# that provide wrappers for the REST APIs and a client application ( and its foundation and Alin Irimie has assembled a simple chart illustrating some of the similarities (

Clouds and Traditional Hosting Companies

Wed, 25 Mar 2009 21:14:04 GMT

NOTE: this blog has moved to Please update your links and feed readers appropriately.


(image) I sat on a conference call yesterday wherein we discussed our thoughts on where the cloud was going and how it was going to impact the “traditional” hoster. Our company has been working with hosters for many  years (I started in 2000 and they had been doing it before that) and have seen cycles of service come and go. It used to be that just providing personal web + email was all you had to do to be successful. Then, the “Application Service Provider” (ASP) model started and morphed into providing business-related services and eventually to where companies outsourced entire portions of their infrastructure (think Hosted Exchange, Hosted CRM, Hosted OCS). Throughout much of this, however, locality seemed important – not in reality, but in perception. The vast majority of people would choose to host their services with a company that was close to them, or they had learned by reference from a friend or colleague.

As we walked through the current market directions, and in later conversations on the same topic, I’ve become more and more convinced that unless significant changes are made, the traditional “hoster” is destined for failure. Web hosting, data hosting, email, etc. are all being commoditized by the big boys (read Microsoft, Amazon, Google) making it almost impossible for the small hoster to compete. I expect to see a rise in the number of white label services, and also in the number of mixed-metaphor hosters – What I mean by this last statement is hosters who will present a unified stack of services to their customers and while much of those services may simply be white-labeled reselling of someone else’s services, some key aspects of the “stack” will be provided directly by the hoster. The key is specialization, or niche services. Maybe a hoster offers a specific financial or ERP system hosted and then marries that with services from other vendors.

Another aspect of the call that was interesting to me was the postulation that these hosters were looking for ways to “get into the cloud” – I chuckled a little because they, of all people, understand the benefits of the cloud as they have been cornerstones of such for years. They’ve built their businesses on providing “cloud” hosted services. They understand where the margins are/are not – and know better than most that it’s all about scale. This is precisely why I think that the existing model is in trouble – a moderate size hoster with 1000 – 5000 servers simply cannot compete with the big three who are buying datacenter blocks in shipping container-units – each of which hold between 2000 and 4000 servers.

The game is changing… those who innovate and adapt will be here tomorrow to talk about it. Those who don’t, won’t.

2009 BJU Programming Contest

Wed, 25 Mar 2009 00:27:35 GMT

NOTE: This blog has moved to: Please update your links and feed readers accordingly.



I had the privilege of being one of the alumni-judges at the annual Bob Jones University Computer Science departments programming contest. This was the first time I’ve participated in this type of contest and I found it very interesting. The CS department had a fairly slick harness for executing the contest and supporting the judging in multiple languages and multiple platforms. As with anything of this nature, there were a few bumps in the road, but nothing of any consequence.

The contest turned out wonderfully… we  had around 35 contestants (I lost count because we overflowed the one room and had to use a different room). There were 10 problems of various difficulties to be solved in a 3-hour time window. The contestants could solve the problems in any order, and could choose both their platform (Windows/Linux) and their language (C#, Java, Python, C++, Ruby). To my surprise, many of the contestants switched between languages rather than using just one as I would have expected. Every contestant solved at least one problem properly and all of the problems were solved by at least one person. The distribution of problems solved was pretty balanced as well.

As a judge, my job was to monitor the queue for submitted answers, run the submissions through the test harness and reply on the results back to the contestant. I was a bit amazed (though I shouldn’t have been) at the wide variety of coding styles and levels of verbosity to solve the same problems. The contestants could also submit questions to the judges, and the favorite for the day was “can I leave and not come back?”.

I’d like to congratulate the winners and all of the contestants for a fine job and look forward to participating in next year’s event.

Misplaced Modifier?

Fri, 13 Mar 2009 10:11:27 GMT

I was on a few websites this morning (go figure) and noticed a handful of visual missteps (mostly minor). One of them was on the Mix 09 conference page and made me chuckle a little.

On the page talking about being able to watch Mix online ( they, I think, are trying to encourage you to leave a comment.



Or, they really don’t like me, and are simply telling me to leave.