Subscribe: A Moment of Zen
http://jeffreystedfast.blogspot.com/feeds/posts/default
Added By: Feedage Forager Feedage Grade B rated
Language: English
Tags:
api  buffer  byte  email  gmime  mailkit  memory  mime parser  mime parsers  mime  mimekit  net  parser  support  time 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: A Moment of Zen

A Moment of Zen



The Ramblings of Jeffrey Stedfast



Updated: 2017-11-07T15:55:26.655-05:00

 



EXCLUSIVE: Texas Massacre Hero, Stephen Willeford, Describes Stopping Gunman

2017-11-07T15:55:26.677-05:00

allowfullscreen="" frameborder="0" height="270" src="https://www.youtube.com/embed/B4HEchh0XD8" width="480">

To donate to the Sutherland Springs Baptist Church to help them recover from this tragedy, check out this GoFundMe campaign.





CNN Asks its viewers about Trump's response to Charlottesville

2017-08-24T10:22:05.943-04:00

allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/DECs5qE4yvQ/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/DECs5qE4yvQ?feature=player_embedded" width="320">




CNN's Jake Tapper Exposed As Alt-Right After Attacking Icon Who Never Di...

2017-07-21T15:54:58.152-04:00

allowfullscreen="" frameborder="0" height="270" src="https://www.youtube.com/embed/TVzrKOSjkJs" width="480">






GMime 2.99.0 released

2017-04-10T03:10:29.637-04:00

After a long hiatus, I am pleased to announce the release of GMime 2.99.0! See below for a list of new features and bug fixes. About GMime GMime is a C library which may be used for the creation and parsing of messages using the Multipurpose Internet Mail Extension (MIME), as defined by numerous IETF specifications. GMime features an extremely robust high-performance parser designed to be able to preserve byte-for-byte information allowing developers to re-seralize the parsed messages back to a stream exactly as the parser found them. It also features integrated GnuPG and S/MIME v3.2 support. Built on top of GObject (the object system used by the GNOME desktop), many developers should find its API design and memory management very familiar. Noteworthy changes in version 2.99.0 Overhauled the GnuPG support to use GPGME under the hood rather than a custom wrapper. Added S/MIME support, also thanks to GPGME. Added International Domain Name support via GNU's libidn. Improved the GMimeMessage APIs for accessing the common address headers. They now all return an InternetAddressList. g_mime_init() no longer takes any flag arguments and the g_mime_set_user_charsets() API has also been dropped. Instead, GMimeParserOptions and GMimeFormatOptions have taken the place of these APIs to allow customization of various parser and formatting options in a much cleaner way. To facilitate this, many parsing functions and formatting functions have changed to now take these options arguments. InternetAddress now has a 'charset' property that can be set to override GMime's auto-detection of the best charset to use when encoding names. GMimeHeaderIter has been dropped in favor of a much simpler index-based API on GMimeHeaderList. GMimeHeaderList no longer caches the raw message/mime headers in a stream. Instead, each GMimeHeader now has its own cache. This means that changing the GMimeHeaderList or any of its GMimeHeaders no longer invalidates the entire cache. GMimeParser has been fixed to preserve (munged or otherwise) From-lines that sometimes appear at the start of the content of message/rfc822 parts. GMimeParser now also scans for encapsulated PGP blocks within MIME parts as it is parsing them and sets a flag on each GMimePart that contains one of these blocks. GMimePart now has APIs for dealing with said encapsulated PGP blocks. Developers interested in migrating to the upcoming GMime 3.0 API (of which GMime 2.99.0 is a preview) should take a look at the PORTING document included with the source code as it contains a fairly comprehensive list of the API changes that they will need to be aware of. Getting the Source Code You can download official public release tarballs of GMime at https://download.gnome.org/sources/gmime/ or ftp://ftp.gnome.org/pub/GNOME/sources/gmime/. If you would like to contribute to the GMime project, it is recommended that you grab the source code from the official GitHub repository at https://github.com/jstedfast/gmime. Cloning this repository can be done using the following command: git clone https://github.com/jstedfast/gmime.git Documentation API reference documentation can be found at https://developer.gnome.org/gmime/2.99/. Documentation for getting started can be found in the README.md. [...]



MailKit 1.14 released

2017-04-09T19:07:44.473-04:00

I am pleased to announce the release of MailKit 1.14!

See below for a list of new features and bug fixes.


About MailKit

MailKit is a C# library which is built on top of MimeKit and is intended to be used for interfacing with IMAP, POP3 and SMTP servers.

MailKit features incredibly robust IMAP, POP3 and SMTP clients with network APIs that are all capable of being canceled. API's that might transfer significant amounts of data between the client and server also include the ability to report progress. Asynchronous API's are also available.

Built on top of .NET, MailKit can be used with any of the .NET languages including C#, VB.NET, F#, and more. It will also run on any platform that Mono or the new .NET Core runtime have been ported to including Windows, Linux, Mac OS, Windows Phone, Apple TV, Apple Watch, iPhone/iPad, Xbox, PlayStation, and Android devices.


Noteworthy changes in version 1.14

  • Improved IMAP's BODYSTRUCTURE parser to sanitize the Content-Disposition values. (issue #486)
  • Improved robustness of IMAP's BODYSTRUCTURE parser in cases where qstring tokens have unescaped quotes. (issue #485)
  • Fixed IMAP to properly handle NIL as a folder name in LIST, LSUB and STATUS responses. (issue #482)
  • Added ImapFolder.GetHeaders() to allow developers to download the entire set of message headers.
  • Added SMTP support for International Domain Names in email addresses used in the MAIL FROM and RCPT TO commands.
  • Modified SmtpClient to no longer throw a NotSupportedException when trying to send messages to a recipient with a unicode local-part in the email address when the SMTP server does not support the SMTPUTF8 extension. Instead, the local-part is passed through as UTF-8, leaving it up to the server to reject either the command or the message. This seems to provide the best interoperability.

Installing via NuGet

The easiest way to install MailKit is via NuGet.

In Visual Studio's Package Manager Console, simply enter the following command:

Install-Package MailKit

Getting the Source Code

First, you'll need to clone MailKit from my GitHub repository. To do this using the command-line version of Git, you'll need to issue the following command in your terminal:

git clone --recursive https://github.com/jstedfast/MailKit.git

Documentation

API documentation can be found at http://mimekit.net/docs.

A copy of the xml formatted API documentation is also included in the NuGet and/or Xamarin Component package.




MimeKit 1.14 released

2017-04-09T18:37:25.002-04:00

I am pleased to announce the release of MimeKit 1.14!

See below for a list of new features and bug fixes.


About MimeKit

MimeKit is a C# library which may be used for the creation and parsing of messages using the Multipurpose Internet Mail Extension (MIME), as defined by numerous IETF specifications.

MimeKit features an extremely robust high-performance parser designed to be able to preserve byte-for-byte information allowing developers to re-seralize the parsed messages back to a stream exactly as the parser found them. It also features integrated DKIM-Signature, S/MIME v3.2, OpenPGP and MS-TNEF support.

Built on top of .NET, MimeKit can be used with any of the .NET languages including C#, VB.NET, F#, and more. It will also run on any platform that Mono or the new .NET Core runtime have been ported to including Windows, Linux, Mac OS, Windows Phone, Apple TV, Apple Watch, iPhone/iPad, Xbox, PlayStation, and Android devices.


Noteworthy changes in version 1.14

  • Added International Domain Name support for email addresses.
  • Added a work-around for mailers that didn't provide a disposition value in a Content-Disposition header.
  • Added a work-around for mailers that quote the disposition value in a Content-Disposition header.
  • Added automatic key retrieval functionality for the GnuPG crypto context.
  • Added a virtual DigestSigner property to DkimSigner so that consumers can hook into services such as Azure. (issue #296)
  • Fixed a bug in the MimeFilterBase.SaveRemainingInput() logic.
  • Preserve munged From-lines at the start of message/rfc822 parts.
  • Map code page 50220 to iso-2022-jp.
  • Format Reply-To and Sender headers as address headers when using Header.SetValue().
  • Fixed MimeMessage.CreateFromMailMessage() to set the MIME-Version header. (issue #290)

Installing via NuGet

The easiest way to install MimeKit is via NuGet.

In Visual Studio's Package Manager Console, simply enter the following command:

Install-Package MimeKit

Getting the Source Code

First, you'll need to clone MimeKit from my GitHub repository. To do this using the command-line version of Git, you'll need to issue the following command in your terminal:

git clone --recursive https://github.com/jstedfast/MimeKit.git

Documentation

API documentation can be found at http://mimekit.net/docs.

A copy of the xml formatted API documentation is also included in the NuGet and/or Xamarin Component package.




Code Review: Microsoft's System.Net.Mail Implementation

2015-03-20T07:29:11.675-04:00

For those reading my blog for the first time and don't know who I am, allow myself to introduce... myself. I'm a self-proclaimed expert on the topic of email, specifically MIME, IMAP, SMTP, and POP3. I don't proclaim myself to be an expert on much, but email is something that maybe 1 or 2 dozen people in the world could probably get away with saying they know more than I do and actually back it up. I've got a lot of experience writing email software over the past 15 years and rarely do I come across mail software that does things better than I've done them. I'm also a critic of mail software design and implementation. My latest endeavors in the email space are MimeKit and MailKit, both of which are open source and available on GitHub for your perusal should you doubt my expertise. My point is: I think my review carries some weight, or I wouldn't be writing this. Is that egotistical of me? Maybe a little. I was actually just fixing a bug in MimeKit earlier and when I went to go examine Mono's System.Net.Mail.MailMessage implementation in order to figure out what the problem was with my System.Net.Mail.MailMessage to MimeKit.MimeMessage conversion, I thought, "hey, wait a minute... didn't Microsoft just recently release their BCL source code?" So I ended up taking a look and pretty quickly confirmed my suspicions and was able to fix the bug. When I begin looking at the source code for another mail library, I can't help but critique what I find. MailAddress and MailAddressCollection Parsing email addresses is probably the hardest thing to get right. It's what I would say makes or breaks a library (literally). To a casual onlooker, parsing email addresses probably seems like a trivial problem. "Just String.Split() on comma and then look for those angle bracket thingies and you're done, right?" Oh God, oh God, make the hurting stop. I need to stop here before I go into a long rant about this... Okay, I'm back. Blood pressure has subsided. Looking at MailAddressParser.cs (the internal parser used by MailAddressCollection), I'm actually pleasantly surprised. It actually looks pretty decent and I can tell that a lot of thought and care went into it. They actually use a tokenizer approach. Interestingly, they parse the string in reverse which is a pretty good idea, I must say. This approach probably helps simplify the parser logic a bit because parsing forward makes it difficult to know what the tokens belong to (is it the name token? or is it the local-part of an addr-spec? hard to know until I consume a few more tokens...). For example, consider the following BNF grammar: address = mailbox / group mailbox = name-addr / addr-spec name-addr = [display-name] angle-addr angle-addr = [CFWS] "<" addr-spec ">" [CFWS] / obs-angle-addr group = display-name ":" [mailbox-list / CFWS] ";" [CFWS] display-name = phrase word = atom / quoted-string phrase = 1*word / obs-phrase addr-spec = local-part "@" domain local-part = dot-atom / quoted-string / obs-local-part domain = dot-atom / domain-literal / obs-domain obs-local-part = word *("." word) Now consider the following email address: "Joe Example" The first token you read will be "Joe Example" and you might think that that token indicates that it is the display name, but it doesn't. All you know is that you've got a 'quoted-string' token. A 'quoted-string' can be part of a 'phrase' or it can be (a part of) the 'local-part' of the address itself. You must read at least 1 more token before you'll be able to figure out what it actually is ('obs-local-part' makes things slightly more difficult). In this case, you'll get a '<' which indicates the start of an 'angle-addr', allowing you to assum[...]



The Wait Is Over: MimeKit and MailKit Reach 1.0

2014-10-16T14:32:36.801-04:00

After about a year in the making for MimeKit and nearly 8 months for MailKit, they've finally reached 1.0 status.

I started really working on MimeKit about a year ago wanting to give the .NET community a top-notch MIME parser that could handle anything the real world could throw at it. I wanted it to run on any platform that can run .NET (including mobile) and do it with remarkable speed and grace. I wanted to make it such that re-serializing the message would be a byte-for-byte copy of the original so that no data would ever be lost. This was also very important for my last goal, which was to support S/MIME and PGP out of the box.

All of these goals for MimeKit have been reached (partly thanks to the BouncyCastle project for the crypto support).

At the start of December last year, I began working on MailKit to aid in the adoption of MimeKit. It became clear that without a way to inter-operate with the various types of mail servers, .NET developers would be unlikely to adopt it.

I started off implementing an SmtpClient with support for SASL authentication, STARTTLS, and PIPELINING support.

Soon after, I began working on a Pop3Client that was designed such that I could use MimeKit to parse messages on the fly, directly from the socket, without needing to read the message data line-by-line looking for a ".\r\n" sequence, concatenating the lines into a massive memory buffer before I could start to parse the message. This fact, combined with the fact that MimeKit's message parser is orders of magnitude faster than any other .NET parser I could find, makes MailKit the fastest POP3 library the world has ever seen.

After a month or so of avoiding the inevitable, I finally began working on an ImapClient which took me roughly two weeks to produce the initial prototype (compared to a single weekend for each of the other protocols). After many months of implementing dozens of the more widely used IMAP4 extensions (including the GMail extensions) and tweaking the APIs (along with bug fixing) thanks to feedback from some of the early adopters, I believe that it is finally complete enough to call 1.0.

In July, at the request of someone involved with a number of the IETF email-related specifications, I also implemented support for the new Internationalized Email standards, making MimeKit and MailKit the first - and only - .NET email libraries to support these standards.

If you want to do anything at all related to email in .NET, take a look at MimeKit and MailKit. I guarantee that you will not be disappointed.




The Future of Debugging in Xamarin Studio

2014-04-28T17:19:44.124-04:00

There comes a time in ever man's life when he says to himself, "there has got to be a better way..."Set Next StatementHave you ever been stepping through a method or hit a breakpoint and discovered that variables did not have the expected values? Don't you wish you could go back in time and start stepping through that method from an earlier point to see how things went so horribly wrong? Of course you do.Last week, with the help of Zoltan Varga (who added the necessary runtime support), I implemented support in Xamarin Studio to set the next statement to execute when you resume execution of your program in the debugger. You can set the next statement to be any statement in the current method; any statement at all. This essentially allows you to jump back in time or completely step over execution of statements after the current position.Don’t worry. As long as you hit Run To Cursor at precisely the moment the lightning strikes the tower, everything will be fine!Run to CursorIf you're like me, you've probably found yourself stepping through some code in the debugger and you get to a loop or something that you know is fine and you just don't feel like hitting Step Over the 5 bajillion times necessary to get past it, so what do you do? Hopefully you don't hit Step Over those 5 bajillion times. Hopefully you just set a breakpoint somewhere after that loop and then hit Continue.The problem with this solution is that it's tedious.Soon, however, you'll be able to simply right-click and select Run To Cursor (or just set/use a keybinding) and the debugger will resume execution until it reaches your cursor!Client-Side Evaluation of Simple PropertiesAssuming that you haven't disabled "Allow implicit property evaluation and method invocation" in your Xamarin Studio debugger preferences, whenever class properties are evaluated in the debugger (in the Watch pad, the Locals pad, or when you hover the mouse cursor over a property), in order to get the value, the debugger has to spin up a thread in the program being debugged in order to have it evaluate the property (or other expression) because, unlike fields, properties are really just methods that have to run arbitrary code.For at least a year or so, now, we've mitigated this somewhat by cheating if the property has the CompilerGeneratedAttribute (signifying that it is an auto-property). When evaluating these properties, we would instead do a lookup of the backing field and get its value since we could do that locally without any need to round-trip to the program being debugged. While this helped a lot, there's a lot of properties out there that effectively just return a field and have no other logic (maybe an auto-property wasn't used because the setter does more than just set the field value?).To improve performance of this, I started looking into what it would take to interpret the IL locally in the IDE. Obviously this could only really work if the property getter was "simple" enough and didn't have to take locks, etc. I started asking Zoltan some questions on the feasibility of this and he wrote a simple IL interpreter (included in Mono.Debugger.Soft) that I ended up using in Xamarin Studio to try and evaluate properties locally before falling back to having the debuggee's runtime spin up a thread to evaluate the property for us. Great Scott! When Can I Start Using These Awesome Features?To use these new features, you will need Mono 3.4.1 (or later) and an upcoming release of Xamarin Studio (5.0.1? It won't quite make it into 5.0).Well, it's time I get back... to the future! ... And implementing more new features![...]



GMime gets a Speed Boost

2014-03-10T14:01:55.042-04:00

With all of the performance improvements I've been putting into MimeKit recently, it was about time to port some of these optimizations back to GMime.

In addition to other fixes that were in the queue, GMime 2.6.20 includes the "SIMD" optimization hack that I blogged about doing for MimeKit and I wanted to share the results. Below is a comparison of GMime 2.6.19 and 2.6.20 parsing the same 2GB mbox file on my 2011 Core-i5 iMac with the "persistent stream" option enabled on the GMimeParser:

[fejj@localhost gmime-2.6.19]$ ./gmime-mbox-parser really-big.mbox
Parsed 29792 messages in 5.15 seconds.


[fejj@localhost gmime-2.6.20]$ ./gmime-mbox-parser really-big.mbox
Parsed 29792 messages in 4.70 seconds.


That's a pretty respectable improvement. Interestingly, though, it's still not as fast as MimeKit utilizing Mono's LLVM backend:

[fejj@localhost MimeKit]$ mono --llvm ./mbox-parser.exe really-big.mbox
Parsed 29792 messages in 4.52 seconds.


Of course, to be fair, without the --llvm option, MimeKit doesn't fare quite so well:

[fejj@localhost MimeKit]$ mono ./mbox-parser.exe really-big.mbox
Parsed 29792 messages in 5.54 seconds.


I'm not sure what kind of optimizations LLVM utilizes when used from Mono vs clang (used to compile GMime via homebrew, which I suspect uses -O2), but nevertheless, it's still very impressive.

After talking with Rodrigo Kumpera from the Mono runtime team, it sounds like the --llvm option is essentially the -O2 optimizations minus a few of the options that cause problems with the Mono runtime, so effectively somewhere between -O1 and -O2.

I'd love to find out why MimeKit with the LLVM optimizer is faster than GMime compiled with clang (which also makes use of LLVM) with the same optimizations, but I think it'll be pretty hard to narrow down exactly because MimeKit isn't really a straight port of GMime (they are similar, but a lot of MimeKit is all-new in design and implementation).



Introducing MailKit, a cross-platform .NET mail-client library

2014-02-03T10:15:11.720-05:00

Once I announced MimeKit, I knew it would only be a matter of time before I started getting asked about SMTP, IMAP, and/or POP3 support. Let's just say, Challenge... ACCEPTED! I started off back in early December writing an SmtpClient so that developers using MimeKit wouldn't have to convert a MimeMessage to a System.Net.Mail.MailMessage in order to send it using System.Net.Mail.SmtpClient. This went pretty quickly because I've implemented several SMTP clients in the past. Implementing the various SASL authentication mechanisms probably took as much or more time than implementing the SMTP protocol. The following weekend, I ended up implementing a Pop3Client. Originally, I had planned on more-or-less cloning the API we had used in Evolution, but I decided that I would take a different approach. I designed a simple IMessageSpool interface which more closely follows the limited functionality of POP3 and mbox spools instead of trying to map the Pop3Client to a Store/Folder paradigm like JavaMail and Evolution do (Evolution's mail library was loosely based on JavaMail). Mapping mbox and POP3 spools to Stores and Folders in Evolution was, to my recollection, rather awkward and I wanted to avoid that with MailKit. At first I was loathe to do it, but over the past 2 weeks I ended up writing an ImapClient as well. I'm sure Philip van Hoof will be pleased to note that I have a very nice BODYSTRUCTURE parser, although that API is not publicly exported. Unlike the SmtpClient and Pop3Client, the ImapClient does not have all of its functionality on a single public class. Instead, ImapClient implements an IMessageStore which has a limited API, mostly meant for getting IFolders. I imagine that those who are familiar with the JavaMail and/or Evolution (Camel) APIs will recognize this design. The IFolder interface isn't designed to be exactly like the JavaMail Folder API, though. I've been designing the interface incrementally as I implement the various IMAP extensions (I've found at least 37 of them at the time of this blog post, although I don't think I'll bother with ACL, MAILBOX-REFERRAL, or LOGIN-REFERRAL), so the API may continue to evolve as I go, but I think what I've got now will likely remain - I'll probably just be including additional APIs for the new stuff. So far, I've implemented the following IMAP extensions: LITERAL+, NAMESPACE, CHILDREN, LOGIN-DISABLED, STARTTLS, MULTIAPPEND, UNSELECT, UIDPLUS, CONDSTORE, ESEARCH, SASL-IR, SORT, THREAD, SPECIAL-USE, MOVE, XLIST, and X-GM-EXT1. Phew, that was exhausting listing all of those! Also news-worthy is that MimeKit is now equally as fast as GMime, which is pretty impressive considering that it is fully managed C# code. Download MailKit 0.2 now and let the hacking begin![...]



Optimization Tips & Tricks used by MimeKit: Part 2

2013-10-07T07:08:26.948-04:00

In my previous blog post, I talked about optimizing the most critical loop in MimeKit's MimeParser by:Extending our read buffer by an extra byte (which later became 4 extra bytes) that I could set to '\n', allowing me to do the bounds check after the loop as opposed to in the loop, saving us roughly half the instructions. Unrolling the loop in order to check for 4 bytes at a time for that '\n' by using some bit twiddling hacks (for 64-bit systems, we might gain a little more performance by checking 8 bytes at a time). After implementing both of those optimizations, the time taken for MimeKit's parser to parse nearly 15,000 messages in a ~1.2 gigabyte mbox file dropped from around 10s to about 6s on my iMac with Mono 3.2.3 (32-bit). That is a massive increase in performance.Even after both of those optimizations, that loop is still the most critical loop in the parser and the MimeParser.ScanContent() method, which contains it, is still the most critical method of the parser.While the loop itself was a huge chunk of the time spent in that method, the next largest offender was writing the content of the MIME part into a System.IO.MemoryStream.MemoryStream, for those that aren't familiar with C#, is just what it sounds like it is: a stream backed by a memory buffer (in C#, this happens to be a byte array). By default, a new MemoryStream starts with a buffer of about 256 bytes. As you write more to the MemoryStream, it resizes its internal memory buffer to either the minimum size needed to hold the its existing content plus whatever number of bytes your latest Write() was called with or double the current internal buffer size, whichever is larger.The performance problem here is that for MIME parts with large amounts of content, that buffer will be resized numerous times. Each time that buffer is resized, due to the way C# works, it will allocate a new buffer, zero the memory, and then copy the old content over to the new buffer. That's a lot of copying and creates a situation where the write operation can become exponentially worse as the internal buffer gets larger. Since MemoryStream contains a GetBuffer() method, its internal buffer really has to be a single contiguous block of memory. This means that there's little we could do to reduce overhead of zeroing the new buffer every time it resizes beyond trying to come up with a different formula for calculating the next optimal buffer size.At first I decided to try the simple approach of using the MemoryStream constructor that allows specifying an initial capacity. By bumping up the initial capacity to 2048 bytes, things did improve, but only by a very disappointing amount. Larger initial capacities such as 4096 and 8192 bytes also made very little difference.After brainstorming with my coworker and Mono runtime hacker, Rodrigo Kumpera, we decided that one way to solve this performance problem would be to write a custom memory-backed stream that didn't use a single contiguous block of memory, but instead used a list of non-contiguous memory blocks. When this stream needed to grow its internal memory storage, all it would need to do is allocate a new block of memory and append it to its internal list of blocks. This would allow for minimal overhead because only the new block would need to be zeroed and no data would need to be re-copied, ever. As it turns out, this approach would also allow me to limit the amount of unused memory used by the stream.I dubbed this new memory-backed stream MimeKit.IO.MemoryBlockStream. As you can see, the implementation is pretty trivial (doesn't even require scary looking bit twiddling hacks like my previous optimization), but it made quite a difference in performance. By using this new memory stream, I was able to shave a [...]



Optimization Tips & Tricks used by MimeKit: Part 1

2013-10-11T17:31:37.254-04:00

One of the goals of MimeKit, other than being the most robust MIME parser, is to be the fastest C# MIME parser this side of the Mississippi. Scratch that, fastest C# MIME parser in the World.Seriously, though, I want to get MimeKit to be as fast and efficient as my C parser, GMime, which is one of the fastest (if not the fastest) MIME parsers out there right now, and I don't expect that any parser is likely to smoke GMime anytime soon, so using it as a baseline to compare against means that I have a realistic goal to set for MimeKit.Now that you know the why, let's examine the how.First, I'm using one of those rarely used features of C#: unsafe pointers. While that alone is not all that interesting, it's a corner stone for one of the main techniques I've used. In C#, the fixed statement (which is how you get a pointer to a managed object) pins the object to a fixed location in memory to prevent the GC from moving that memory around while you operate on that buffer. Keep in mind, though, that telling the GC to pin a block of memory is not free, so you should not use this feature without careful consideration. If you're not careful, using pointers could actually make your code slower. Now that we've got that out of the way...MIME is line-based, so a large part of every MIME parser is going to be searching for the next line of input. One of the reasons most MIME parsers (especially C# MIME parsers) are so slow is because they use a ReadLine() approach and most TextReaders likely use a naive algorithm for finding the end of the current line (as well as all of the extra allocating and copying into a string buffer): // scan for the end of the line while (inptr < inend && *inptr != (byte) '\n') inptr++; The trick I used in GMime was to make sure that my read buffer was 1 byte larger than the max number of bytes I'd ever read from the underlying stream at a given time. This allowed me to set the first byte in the buffer beyond the bytes I just read from the stream to '\n', thus allowing for the ability to remove the inptr < inend check, opting to do the bounds check after the loop has completed instead. This nearly halves the number of instructions used per loop, making it much, much faster. So, now we have: // scan for the end of the line while (*inptr != (byte) '\n') inptr++; But is that the best we can do?Even after using this trick, it was still the hottest loop in my parser:We've got no choice but to use a linear scan, but that doesn't mean that we can't do it faster. If we could somehow reduce the number of loops and likewise reduce the number of pointer increments, we could eliminate a bunch of the overhead of the loop. This technique is referred to as loop unrolling. Here's what brianonymous (from the ##csharp irc channel on freenode) and I came up with (with a little help from Sean Eron Anderson's bit twiddling hacks): uint* dword = (uint*) inptr; uint mask; do { mask = *dword++ ^ 0x0A0A0A0A; mask = ((mask - 0x01010101) & (~mask & 0x80808080)); } while (mask == 0); And here are the results of that optimization:Now, keep in mind that on many architectures other than x86, in order to employ the trick above, inptr must first be 4-byte aligned (uint is 32bit) or it could cause a SIGBUS or worse, a crash. This is fairly easy to solve, though. All you need to do is increment inptr until you know that it is 4 byte aligned and then you can switch over to reading 4 bytes at a time as in the above loop. We'll also need to figure out which of those 4 bytes contained the '\n'. An easy way to solve that problem is to just linearly scan those 4 bytes using our previous single-byte-per-loop implementation starting at dw[...]



MimeKit: Coming to a NuGet near you.

2013-10-07T07:07:17.138-04:00

If, like me, you've been trapped in the invisible box of despair, bemoaning the woeful inadequacies of every .NET MIME library you've ever found on the internets, cry no more: MimeKit is here.I've just released MimeKit v0.5 as a NuGet Package. There's still plenty of work left to do, mostly involving writing more API documentation, but I don't expect to change the API much between now and v1.0. For all the mobile MIME lovers out there, you'll be pleased to note that in addition to the .NET Framework 4.0 assembly, the NuGet package also includes assemblies built for Xamarin.Android and Xamarin.iOS. It's completely open source and licensed under the MIT/X11 license, so you can use it in any project you want - no restrictions. Once MimeKit goes v1.0, I plan on adding it to Xamarin's Component Store as well for even easier mobile development. If that doesn't turn that frown upside down, I don't know what will.For those that don't already know, MimeKit is a really fast MIME parser that uses a real tokenizer instead of regular expressions and string.Split() to parse and decode headers. Among numerous other things, it can properly handle rfc2047 encoded-word tokens that contain quoted-printable and base64 payloads which have been improperly broken apart (i.e. a quoted-printable triplet or a base64 quartet is split between 2 or more encoded-word tokens) as well as handling cases where multibyte character sequences are split between words thanks to the state machine nature of MimeKit's rfc2047 text and phrase decoders (yes, there are 2 types of encoded-word tokens - something most other MIME parsers have failed to take notice of). With the use of MimeKit.ParserOptions, the user can specify his or her own fallback charset (in addition to UTF-8 and ISO-8859-1 that MimeKit has built in), allowing MimeKit to gracefully handle undeclared 8bit text in headers.When constructing MIME messages, MimeKit provides the user with the ability to specify any character encoding available on the system for encoding each individual header (or, in the case of address headers: each individual email address). If none is specified, UTF-8 is used unless the characters will fit nicely into ISO-8859-1. MimeKit's rfc2047 and rfc2231 encoders do proper breaking of text (i.e it avoids breaking between surrogate pairs) before the actual encoding step, thus ensuring that each encoded-word token (or parameter value) is correctly self-contained.S/MIME support is also available in the .NET Framework 4.0 assembly (not yet supported in the Android or iOS assemblies due to the System.Security assembly being unavailable on those platforms). MimeKit supports signing, encrypting, decrypting, and verifying S/MIME message parts. For signing, you can either use the preferred multipart/signed approach or the application/[x-]pkcs7-signature mime-type, whichever you prefer.I'd love to support PGP/MIME as well, but this is a bit more complicated as I would likely need to depend on external native libraries and programs (such as GpgME and GnuPG) which means that MimeKit would likely have to become 32bit-only (currently, libgpgme is only available for 32bit Windows).I hope you enjoy using MimeKit as much as I have enjoyed implementing it!Note: For those using my GMime library, fear not! I have not forgotten about you! I plan to bring many of the API and parser improvements that I've made to MimeKit back to GMime in the near future.For those using the C# bindings, I'd highly recommend that you consider switching to MimeKit instead. I've based MimeKit's API on my GMime API, so porting to MimeKit should be fairly straightforward.[...]



Time for a rant on mime parsers...

2013-10-07T07:06:16.402-04:00

Warning: Viewer discretion is advised.Where should I begin?I guess I should start by saying that I am obsessed with MIME and, in particular, MIME parsers. No, really. I am obsessed. Don't believe me? I've written and/or worked on several MIME parsers at this point. It started off in my college days working on Spruce which had a horrendously bad MIME parser, and so as you read farther along in my rant about shitty MIME parsers, keep in mind: I've been there, I've written a shitty MIME parser.As a handful of people are aware, I've recently started implementing a C# MIME parser called MimeKit. As I work on this, I've been searching around on GitHub and Google to see what other MIME parsers exist out there to find out what sort of APIs they provide. I thought perhaps I'll find one that offers a well-designed API that will inspire me. Perhaps, by some miracle, I'd find one that was actually pretty good that I could just contribute to instead of writing my own from scratch (yea, wishful thinking). Instead, all I have found are poorly designed and implemented MIME parsers, many probably belong on the front page of the Daily WTF.I guess I'll start with some softballs.First, there's the fact that every single one of them were written as System.String parsers. Don't be fooled by the ones claiming to be "stream parsers", because all any of those did was to slap a TextReader on top of the byte stream and start using reader.ReadLine(). What's so bad about that, you ask? For those not familiar with MIME, I'd like for you to take a look at the raw email sources in your inboxes particularly if you have correspondence with anyone outside of the US. Hopefully most of your friends and colleagues are using more-or-less MIME compliant email clients, but I guarantee you'll find at least a few emails with raw 8bit text.Now, if the language they were using was C or C++, they might be able to get away with doing this because they'd technically be operating on byte arrays, but with Java and C#, a 'string' is a unicode string. Tell me: how does one get a unicode string from a raw byte array?Bingo. You need to know the charset before you can convert those bytes into unicode characters.To be fair, there's really no good way of handling raw 8bit text in message headers, but by using a TextReader approach, you are really limiting the possibilities.Next up is the ReadLine() approach. One of the 2 early parsers in GMime (pan-mime-parser.c back in the version 0.7 days) used a ReadLine() approach, so I understand the thinking behind this. And really, there's nothing wrong with this approach as far as correctness goes, it's more of a "this can never be fast" complaint. Of the two early parsers in GMime, the pan-mime-parser.c backend was horribly slow compared to the in-memory parser. Of course, that's not very surprising. More surprising to me at the time was that when I wrote GMime's current generation of parser (sometime between v0.7 and v1.0), it was just as fast as the in-memory parser ever was and only ever had up to 4k in a read buffer at any given time. My point is, there are far better approaches than ReadLine() if you want your parser to be reasonably performant... and why wouldn't you want that? Your users definitely want that.Okay, now come the more serious problems that I encountered in nearly all of the mime parser libraries I found.I think that every single mime parser I've found so far uses the "String.Split()" approach for parsing address headers and/or for parsing parameter lists on headers such as Content-Type and Content-Disposition.Here's an example from one C# MIME parser:string[] emails = addressHeader.Split(','); Here's how this sa[...]



Why decoding rfc2047-encoded headers is hard

2013-09-15T10:06:18.280-04:00

Somewhat inspired by a recent thread on the notmuch mailing-list, I thought I'd explain why decoding headers is so hard to get right. I'm sure just about every developer who has ever worked on an email client could tell you this, but I guess I'm going to be the one to do it.Here's just a short list of the problems every developer faces when they go to implement a decoder for headers which have been (theoretically) encoded according to the rfc2047 specification:First off, there are technically two variations of header encoding formats specified by rfc2047 - one for phrases and one for unstructured text fields. They are very similar but you can't use the same rules for tokenizing them. I mention this because it seems that most MIME parsers miss this very subtle distinction and so, as you might imagine, do most MIME generators. Hell, most MIME generators probably never even heard of specifications to begin with it seems. This brings us to: There are so many variations of how MIME headers fail to be tokenizable according to the rules of rfc2822 and rfc2047. You'll encounter fun stuff such as: encoded-word tokens illegally being embedded in other word tokens encoded-word tokens containing illegal characters in them (such as spaces, line breaks, and more) effectively making it so that a tokenizer can no longer, well, tokenize them (at least not easily) multi-byte character sequences being split between multiple encoded-word tokens which means that it's not possible to decode said encoded-word tokens individually the payloads of encoded-word tokens being split up into multiple encoded-word tokens, often splitting in a location which makes it impossible to decode the payload in isolation You can see some examples here. Something that many developers seem to miss is the fact that each encoded-word token is allowed to be in different character encodings (you might have one token in UTF-8, another in ISO-8859-1 and yet another in koi8-r). Normally, this would be no big deal because you'd just decode each payload, then convert from the specified charset into UTF-8 via iconv() or something. However, due to the fun brokenness that I mentioned above in (2c) and (2d), this becomes more complicated. If that isn't enough to make you want to throw your hands up in the air and mutter some profanities, there's more... Undeclared 8bit text in headers. Yep. Some mailers just didn't get the memo that they are supposed to encode non-ASCII text. So now you get to have the fun experience of mixing and matching undeclared 8bit text of God-only-knows what charset along with the content of (probably broken) encoded-words. That said, I was able to help the notmuch developers solve this problem by letting them know about the GMIME_ENABLE_RFC2047_WORKAROUNDS flag that they could pass to g_mime_init(guint32 flags).Any developer reading this blog post and thinking that they want to see how this is done in GMime, the source code for the rfc2047 decoder is located here. If the line numbers change in the future, just grep around for "rfc2047_token" and you should find it.In other news... I cranked out a ton more code for MimeKit (my C# MIME parser library) yesterday. Yes, I know... I've got a serious problem with masochism having already written 2 MIME parsers and now I'm working on a third. When will the hurting stop? Never!Oh, I guess I could point people at MimeKit's rfc2047 decoders. What you'll want to look at is MimeKit.Rfc2047.DecodePhrase(byte[] phrase) and MimeKit.Rfc2047.DecodeText(byte[] text).[...]



HOWTO: MonoTouch Enterprise Deployment

2012-12-11T17:11:02.791-05:00

Okay, so you've gotten the "Engage" hand-gesture from your Captain to deploy your MonoTouch app to the rest of the crew of the Enterprise. Now all you need to know is which buttons to press on your helm... Step 1. First, you'll need to make sure that you've created and installed your "In-House" Distribution Certificate via Apple's iOS Provisioning Portal. Step 2. Open your Project Options in MonoDevelop and navigate to the iPhone Bundle Signing section. If you've got MonoDevelop 3.1.0 or later, you'll be able to set your configuration to: Otherwise you'll simply have to select your Provisioning Profile manually. Once you've selected your signing certificate and provisioning profile, click the OK button to dismiss the Project Options dialog. Step 3. In MonoDevelop, click on the Build menu and select Archive. This will build your project and archive it in a location that Xcode will be able to see it in its Organizer window. Step 4. Launch Xcode and then click on the Window menu and select Organizer. At the top of Xcode's Organizer window, you will see an array of icons. Click on the one labeled Archives. Find your application in the list of archives and select it. Step 5. Now click on the Distribute... button in the top-right area of the window and select Save for Enterprise or Ad-Hoc Deployment. The next screen will prompt you for your code-signing certificate, providing you with a drop-down menu listing your available options. Clicking Next will cause an action sheet to slide into view, prompting you for the location to save the AppName.ipa package and the AppName.plist file. Important: Make sure to toggle the Save for Enterprise Distribution checkbox. Once you've finished filling out all of the fields, click on the Save button. Step 6. You'll need to upload the saved AppName.ipa and AppName.plist files to your corporate web server in the location that you specified in the previous step. You'll also need a web page that will link to your app using a hyperlink similar to the one below: Install AppName! That's it! You're done! [...]



Let's Ban Profits!

2012-09-05T20:41:39.602-04:00

(object) (embed)



A Better Alternative to the TSA?

2012-09-02T09:35:21.624-04:00

Most everyone agrees that going through airport security and being groped by the TSA is not only offensive, but also a major nuisance.

How about replacing the TSA with privately run airport security?

It sounds like San Francisco travelers much prefer their privately run airport security than the TSA at all other US airports.

width="560" height="315" src="http://www.youtube.com/embed/h6BbowVpcFo" frameborder="0" allowfullscreen>



Grading Romney's Green Energy Stimulus "Investments": Epic Fail

2012-08-14T10:59:48.521-04:00

Yesterday (well, technically late Friday night), I criticized Obama's Green Energy failures.

Today I want to point out that in my research for my previous blog post, I discovered that Mitt Romney has a Green Energy stimulus failure of his own: Konarka.

While Governor of Massachusetts, Mitt Romney gave a state loan of $1.5 million to Konarka which recently filed for bankruptcy and fired all of its workers.

According to the Boston Globe,
In January 2003, shortly after taking office in Massachusetts, Romney held another press conference -- at Konarka, where he announced a plan to loan $24 million from the state’s renewable energy trust fund to startups with the potential to create jobs.

The Bush Administration, too, granted this company $3.6 million in tax payer money and the Obama Administration later gave tax credits to the company.
Under the Bush administration, Konarka received a $1.6 million US Army contract in 2005 and a $3.6 million award from the Department of Energy in 2007. Under the Obama administration, Konarka was one of 183 clean energy companies nationwide that got a total of $2.3 billion in tax credits as part of the 2009 stimulus.

Aye yei yei, my head hurts (probably related to my flu, but this news isn't helping).

Where did the rest of that $24 million in planned loans go? Were any of those companies successful? Were those loans paid back?

Update:

According to this Politico article, it appears that Konarka paid back the $1.5 million loan. It also appears that the decision to loan that money to Konarka was made in December of 2002, before Romney became Governor.

The article also suggests that Deval Patrick's Administration loaned another $5 million to Konarka. It's not mentioned in the article when they were given this loan or if this loan was ever paid back.



Grading Obama's Green Energy Stimulus "Investments": Epic Fail

2012-08-12T16:11:35.866-04:00

Earlier this year (in January), President Obama declared in his State of the Union speech that he would not "cede the wind or solar or battery industry to China or Germany". He wanted to "double down on a clean energy industry that has never been more promising". What brought this up, you ask? Well, earlier this week, I noticed an article in my iGoogle page in the Google News widget that Waltham, Massachusetts-based company, A123 Systems, was being bought out by a Chinese conglomerate (Wanxiang Group). In 2009, when the Obama Administration provided financial relief to this company to the tune of a $250 million grant on top of the $135 million in grants and tax breaks that it got from the state of Michigan (for building a factory in 2010 and where it recently laid off 17% of its workforce), a $30 million grant for a federal wind energy storage project, and another $380 million that they made from their IPO that same year, their stock value peaked at around $25.77/share. Today? $0.45/share. The Chinese are literally buying this clean energy company that the Obama Administration wasted our tax dollars on for pennies on the dollar - but in a way, thank God they are, or A123 Systems would be filing for bankruptcy and every employee would be losing their job. This news got me thinking: what other green energy companies has the Obama Administration used tax payer money to invest in? How are they doing? We all know about Solyndra because it was a very public and embarrassing bust for the Obama Administration, but what about the others? I decided to do a little research and find out... Tesla Motor Company Tesla Motors is actually one of the more promising of the green energy stimulus recipients (having actually met their June 2012 deadline of delivering the first of their Model S electrical cars - 29 delivered as of the end of July!), but even their future isn't bright enough to force me to wear shades. In June, 2009, Tesla Motors received a $465 million loan from the Department of Energy. Of this $465 million loan, Tesla owner and Obama campaign donor, Elon Musk, reportedly pocketed $15 million. I have to wonder if there was any political back-scratching going on here... Their entry-level Model S with a reduced-range battery (160 mile range, as opposed to the 300-mile range battery that has gotten them a fair bit of press) still costs $49,900 after a whopping $7,500 federal tax deduction for buyers which still makes it more expensive than most of its competitors which are charging under $40,000 even without any federal tax deductions factored in (except Fisker Automotive, another recipient of stimulus, which is charging $95,000 for their hybrid). Supposedly 10,000 potential buyers have made deposits of $5,000 (fully refundable), which, at first glance, seems pretty impressive (especially compared to the Chevy Volt, another recipient of Obama's stimulus, which sold about 20,000 units since it first went on the market 2 years ago in 2010). Of course, GM isn't exactly the golden yardstick by which success should be measured. For comparison, most automakers sell ~200,000 cars/year for most of their models, some as many as 300,000/year. The company reportedly lost $90 million the first quarter of 2012 (up from $50 million in losses for the same quarter in 2011) and $106 million in the second quarter of 2012. So far, they've only produced around 40 Model S's (delivering 29 of those so far) but expect to produce and deliver the rest of their initial 5,000 car pro[...]



The Chevy Volt, brought to you by you ... and Obama

2012-07-18T21:52:13.076-04:00

allowfullscreen="" frameborder="0" height="270" src="http://www.youtube.com/embed/ZosJ_je04w0?fs=1" width="480">






Beware Inflation

2012-06-17T19:12:50.200-04:00

By a continuing process of inflation, government can confiscate, secretly and unobserved, an important part of the wealth of their citizens.

-- John Maynard Keynes, The Economic Consequences of the Peace