Subscribe: Storage
Added By: Feedage Forager Feedage Grade B rated
Language: English
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Storage


Updated: 2017-11-21T09:18:41.799-06:00


Blog Relocation



This is my last post on this blog as NetApp has graciously offered several of us the opportunity to use Typepad as the hosting service.

So, starting today I will be blogging using the new hosting service. The new Blog is titled Storage Nuts N' Bolts and so I hope to see you there.

Solaris 10 iSCSI configured with Dynamic Discovery


Recently we went thru re-IPing all of our servers and storage arrays in our office. For the most part everything went fine with the exception of a Solaris 10 U3 server I was running iSCSI on.

After I got thru the steps of changing the server's IP address, gateway and DNS entries I rebooted the server. Upon reboot, I noticed a flurry of non-stop error messages at the server's console:

Sep 30 18:37:37 longhorn iscsi: [ID 286457 kern.notice] NOTICE: iscsi connection(8) unable to connect to target SENDTARGETS_DISCOVERY (errno:128)Sep 30 18:37:37 longhorn iscsi: [ID 114404 kern.notice] NOTICE: iscsi discovery failure - SendTargets (0xx.0xx.0xx.0xx)

As a result of this, I was never able to get a login prompt either at the console or via telnet even though I could succesfuly ping the server's new IP address. What the message above indicates is that the initiator issues a SendTargets and waits for the Target to respond with its Targets. To my surprise there's NO timeout and the initiator will try this process indefinately. In fact, just for kicks, I left it trying for an hour and 45'.

That also means that you will be locked out of the server, as attempting to boot into single user mode results in the exact same behavior.

To get around this problem you have 2 options even though option #2, for some, may not be an option.

Option 1
a) Boot from a Solaris cdrom
b) mount /dev/dsk/c#t#d#s0 /a
c) cd /a/etc/iscsi
d) Remove or rename *.dbc and *.dbp files (iscsi not configured any longer)
e) Reboot the server
f) Use iscsiadm and configure the Solaris server with Static discovery (static-config) so you don't get into this situation again

Option 2
a) Change back to the old Target IP address
b) That will enable you to reboot the server
c) Reconfigure the server to use static-config by specifying the target-name, new Target-ip-address and port-number
d) Change the Target IP address to the new one

I followed Option #1 because #2 was not really not an option for us. So the morale of the story is that you may want to consider static-discovery on Solaris with iSCSI.

VMware over NFS: Backup tricks...continued


There have been a couple of questions on how to do file level backups of a Linux vmdk over NFS. I described the process for a Windows vmdk in a previous article here

In order to do this for a Linux vmdks you need to do the following:

  1. Create a Flexclone of the NFS volume from a snapshot
  2. Mount the flexclone to your linux server
  3. Do not use the read-only mount option as linux requires read-write access
  4. Specify -t ext3 as the mount option (you can get the FS type per partition by "df -T")
  5. Remember to use fdisk -lu to get the starting sector for each partition
    Multiply the starting sector x 512 bytes and specify the result in the "offset" field of the mount command
Here's an example to mount and explor a copy of the /boot partition of a Red Hat 4 U4 vmdk using a flexcloning:


One reader asked a good question regarding Windows. The question was how to do file level backups of partitioned windows vmdks? The answer to this lays in the offset parameter of the mount option

What you need to do in a scenario like this is:

  1. Run msinfo32.exe in your Windows vm
  2. Go to Components -> Storage -> Disks
  3. Note the Partition Starting offsets and specify them as part of the mount option.

Demos from VMworld


I promised last week to post some links to some of the demos we ran after VMworld was over. So for those who have not seen them here they are. There's audio as well so plug in your headsets.

1) VDI on Netapp over NFS

2) Eliminate duplicate data with A-SIS in a VMware environment

There are also several presentations and technical whitepapers at TechONTAP site which you may find very useful.

VMware on NFS: Backup Tricks


Ok, so if you've decided to use VMware over NFS. Then there's always some guy who's find something to neatpick about and so he'll say "Well, can't run VCB on NFS". He's right but I don't see this as an issue? Sometimes it takes imagination to find a solution to a challenge.

Using NFS as a protocol on VMware you have similar choices and flexibility as with VCB and you can mount the NFS volume or a snapshot of the volume on a server other an ESX...Other = Linux in this case.

So if you are deploying VMware on NFS here's a way to backup whole VMDK images or files within VMDKs using Netapp Snapshots given that the Snapshots are accessible to the NFS client.

Mind you that with this approach you do all kinds of cool things and not just backups without impacting the ESX host. You can also restore, or you could also provision...

So here's the process:

1) Install the Linux NTFS driver if it's not already in your Linux built.

Note: For RHEL and Fedora installs click on the About RedHat/FC RPMs

2) Mount the export onto your linux server
# mount xx.xx.xx.xx:/vol/nfstest /mnt/vmnfs

So now you can backup VMDK images or you can drill into the .snapshot directory and back them up from there.

Next step is to backup files within VMDKs by accessing the snapshot...and you get to pick from which one. For this test, I select from the hourly.3 the snapshot named testsnap

3) Mount the VMDK as a loopback mount specifying the starting offset (32256) and NTFS file system type

# mount /mnt/nfstest/.snapshot/hourly.3/testsnap/nfs-flat.vmdk /mnt/vmdk -o ro,loop=/dev/loop2,offset=32256 -t ntfs

Here's your NTFS disk as seen from Linux:

# cd /mnt/vmdk
# ls -l

total 786844
dr-x------ 1 root root 0 Dec 19 03:03 013067c550e7cf93cc24
-r-------- 1 root root 0 Sep 11 2006 AUTOEXEC.BAT-
r-------- 1 root root 210 Dec 18 21:00 boot.ini
-r-------- 1 root root 0 Sep 11 2006 CONFIG.SYS
dr-x------ 1 root root 4096 Dec 18 21:10 Documents and Settings
-r-------- 1 root root 0 Sep 11 2006 IO.SYS
-r-------- 1 root root 0 Sep 11 2006 MSDOS.SYS
-r-------- 1 root root 47772 Mar 25 2005 NTDETECT.COM
-r-------- 1 root root 295536 Mar 25 2005 ntldr
-r-------- 1 root root 805306368 Mar 13 16:42 pagefile.sys
dr-x------ 1 root root 4096 Sep 11 2006 Program Files
dr-x------ 1 root root 0 Sep 11 2006 RECYCLER
dr-x------ 1 root root 0 Sep 11 2006 System Volume Information
dr-x------ 1 root root 0 Dec 19 00:35 tempd
r-x------ 1 root root 65536 Mar 13 17:41 WINDOWS
dr-x------ 1 root root 0 Sep 11 2006 wmpub

The nice thing about the loopback mount is that Linux will see a VMDK's content for any filesystem it now you can backup Windows and Linux VMs.

Here's a more indepth presentation on VMware over NFS including the backup trick from Peter Learmonth as well as a customer presentation from the VMworld breakout sessions. Login and passwords are proivided below:

user name: cbv_rep
password: cbvfor9v9r


VMware over NFS


My background is Fibre Channel and since 2003 I've followed iSCSI very closely. In fact, for years I have never paid much attention to other protocols until recently. For a long time I felt that FC was good for everything, which sounds weird if you consider who my employer is but then again, NetApp didn't hire me for my CIFS or NFS prowess. I was hired to drive adoption of NetApp's Fibre Channel and iSCSI offerings as well as the help prospects realize the virtues of a Unified Storage Architecture.And speaking of Unified architectures leads me to VMware which represents to servers exactly what NetApp represents to storage. A Unified architecture with choices, flexibility, and centralized management without shoving a specific protocol down someones throat. Close to 90% of the VI3 environments today are deployed over FC and of that %, based on experience, I'd say that 90% are using VMFS, VMware's clustered filesystem. If you are dealing 2-3 clustered ESX, these types of deployments are not very complex. However, the complexity starts to increase exponentially as the number of servers in a VMware Datacenter start to multiply. RAID Groups, LUNs, LUN IDs, Zones, Zone management, HBAs, queue depths, VMFS Datastores, RDMs, multipathing settings etc. Then the question comes up...VMFS LUNs or RDMs. How's my performance is going to be with 8-10 VMs on a VMFS LUN and a single Disk I/O queue? What if I take the RDM route and later one i run out of LUNs? Way to many touch points, way too many things to pay attention to, way to many questions. Well, there's help...NFS. I've recently had to the opportunity to play with NFS in my environment over VMware, and I can tell you, you are missing out if you at least do not consider it and test it for your environment. Here's what I have found out with NFS and I'm not the only one:Provisioning is a breezeYou get the advantage of VMDK thin Provisioning since it's the default setting over NFSYou can expand/decrease the NFS volume on the fly and realize the effect of the operation on the ESX server with the click of the datastore "refresh" button.You don't have to deal with VMFS or RDMs so you have no dilemma here No single disk I/O queue, so your performance is strictly dependent upon the size of the pipe and the disk array. You don't have to deal with FC switches, zones, HBAs, and identical LUN IDs across ESX serversYou can restore (at least with NetApp you can), multiple VMs, individual VMs, or files within VMs. You can instantaneously clone (NetApp Flexclone), a single VM, or multiple VMsYou can also backup whole VMs, or files within VMs People may find this hard to believe, but the performance over NFS is actually better than FC or iSCSI not only in terms of throughtput but also in terms of latency. How can this be people ask? FC is 4Gb and Ethernet is 1Gb. I would say that this is a rather simplistic approach to performance. What folks don't realize is that:ESX server I/O is small block and extremely random which means that bandwidth matters little. IOs and response time matter a lot.You are not dealing with VMFS and a single managed disk I/O queue. You can have a Single mount point across multiple IP addessesYou can use link aggregation IEEE 802.3ad (NetApp multimode VIF with IP aliases)Given that server virtualization has incredible ramifications on storage in terms of storage capacity requirements, storage utilization and thus storage costs, I believe that the time where folks will warm up to NFS is closer than we think. With NFS you are thin provisioning by default and the VMDKs are thin as well. Plus any modification on the size of the NFS volume in terms of capacity is easily and immediately realized on the host side. Additionally, if you consider the fact that on average a VMFS volume is around 70-80% utilized (actually that maybe high) and the VMKD is around 70% you can easily conclude that your storage utilization is anywhere around from 49-56% exc[...]

SnapDrive for Windows 5.0 - Thin Provisioning and Space Reclamation


Back on July 11th 2006, I posted an article for Thin Provisioning. Today a reader made some very timely and appropriate comments around application support for thin provisioning and alerting and monitoring."I guess eventually OS and Apps have to start supporting thin provisioning, in terms of how they access the disk, and also in terms of instrumentation for monitoring and alerting"Back in that article I had written that I would not deploy thin provisioning for new applications for which I had no usage metrics and for applications which would write, delete or quickly re-write data in a LUN. Here's why, up until now, I would avoid the latter scenario.The example below attempts to illustrate this point.Lets assume I have thinly provisioned a 100GB LUN to a Windows server.I now fill in 50% of the LUN with data. Upon doing this, capacity utilization, from a filesystem standpoint is 50%, and from an array perspective is also 50%.I then proceed to completely fill the LUN. Now, the filesystem and array capacity utilization are both at 100%.Then I decide to delete 50% of the data in the LUN. What’s the filesystem and array capacity utilization? Folks are quick to reply that it’s at 50% but that is a partially correct answer. The correct answer is that filesystem utilization is at 50% but array utilization is still at 100%. The reason is that even though NTFS has freed some blocks upon deleting half of the data in the LUN, from an array perspective, these blocks still reside on the disk as there is no way for the array to know that the data is no longer needed.So now, if more data is written in the LUN, there is no guarantee that the filesystem, will use the exact same blocks it freed previously to write the new data. That means that in a Thin Provisioning scenario, this behavior may trigger a storage allocation on the array when in fact such allocation may not be needed. So now, we’re back to square one in attempting to solve the exact same storage over-allocation challenge.SnapDrive for Windows 5.0With the introduction of SnapDrive for Windows 5.0, Network Appliance, introduced a feature called Space Reclamation.The idea is to provide integration between NTFS and WAFL via a mechanism that will notify WAFL when NTFS has freed blocks so that WAFL, in turn, can reclaim these blocks and mark them as free.Within SnapDrive 5.0 the space reclamation process can be initiated either via the GUI or the CLI. Upon initiating the space reclamation process, a pop-up window for the given LUN will inform the Administrator as to whether or not a space reclamation operation is needed, and if indeed, how much space will be reclaimed.Additionally, the space reclamation process can be timed and the time window can be specified in minutes, 1-10080 minutes or 7 days, for the process to execute. Furthermore, there is no licensing requirement in order to utilize the Space Reclamation feature as it is bundled in with the recently released version of SnapDrive 5.0. However, the requirement is that the version of DataONTAP must be at 7.2.1 or later.Performance is strictly dependent upon the number and the size of LUNs that are under the space reclamation process.As a general rule, we recommend that the process is run during low I/O activity periods and when Snapshot operations such as snapshot create and snap restore are not in use.While other competitors offer thin provisioning, Network Appliance, once again, has been the first to provide yet another innovative and important tool that helps our customers not only to safely deploy thin provisioning but also realize the benefits that derive from it.[...]

SnapDrive for Unix: Self Service Storage Management


I was reading an article recently regarding Storage provisioning, the article was titled “The Right Way to Provision Storage”. What I took away from the article, as reader, is that storage provisioning is a painful, time consuming process involving several people representing different groups.The process according to the article pretty much goes like this:Step 1: DBA determines performance requirement and number of LUNs and defers to the Storage personStep 2: Storage person creates the LUN(s) and defers to the Operations personStep 3: Operations person maps the LUN(s) to some initiator(s) and defers to the Server AdminStep 4: Server Admin discovers the LUN(s) and creates the Filesystem(s). Then he/she informs the DBA, probably 3-4 days later, that his/her LUN(s) are ready to host the applicationI wonder how many requests per week these folks get for storage provisioning and how much of their time it consumes. I would guess, much more than they would like. An IT director of a very large and well know Financial institution, a couple of years ago, told me “We get over a 400 storage provisioning requests a week and it has become very difficult to satisfy them all in a timely manner”.Why does storage provisioning have to be so painful? It seems to me that one would get more joy out of getting a root canal than asking for storage to be provisioned. Storage provisioning should be a straight forward process and the folks who own the data (Application Admins) should be directly involved in the process.In fact, they should be the ones doing the provisioning directly from the host under the watchful eye of the Storage group who will control the process by putting the necessary controls in place at the storage layer restricting the amount of storage Application admins can provision and the operations they are allowed to perform. This would be self-service storage provisioning and data management.Dave Hitz on his blog, a few months back, described the above process and used the ATM analogy as example.NetApp’s SnapDrive for Unix (Solaris/AIX/HP-UX/Linux) is similar to an ATM. It lets data application admins manage and provision storage for the data they own. Thru deep integration with various Logical Volume Managers, filesystem specific alls, SnapDrive for Unix allows administrators to do the following with a single host command:1) Create LUNs on the array2) Map the LUNs to host initiators3) Discover the LUNs on the host4) Create Disk Groups/Volume Groups5) Create Logical Volumes6) Create Filesystems 7) Add LUNs to Disk Group8) Resize Storage9) Create and Manage Snapshots10) Recover from Snapshots11) Connect to filesystems in Snapshots and mount them onto the same or a different Host the original filesystem was or still is mountedThe whole process is fast and more importantly very efficient. Furthermore, it masks the complexity of the various UNIX Logical Volume Managers and allows folks who are not intimately familiar with them to successfully perform various storage related tasks.Additionally, SnapDrive for Unix provides snapshot consistency by making calls to filesystem specific freeze/thaw mechanisms providing image consistenty and the ability to successfully recover from a Snapshot.Taking this a step further, SnapDrive for Unix provides the necessary controls at the storage layer and allows Storage administrators to specify who has access to what. For example, an administrator can specify any or a combination of the following access methods.◆ NONE − The host has no access to the storage system.◆ CREATE SNAP − The host can create snapshots.◆ SNAP USE − The host can delete and rename snapshots.◆ SNAP ALL− The host can create, restore, delete, and renamesnapshots.◆ STORAGE CREATE DELETE − The host can create, resize,and delete storage.◆ STORAGE USE − The host can connect and disconnect storage.◆ STORAGE ALL [...]

The Emergence of OS Native Multipathing Solutions


In today’s Business environment, High Availability is not an option. It is a business necessity and is essential in providing Business Continuity. Data is the lifeblood of a Business. Can you imagine a financial firm loosing connectivity to its Business Critical Database in the middle of the day?This is where Multipathing or Path failover solutions can address High Availability and Business Continuity because not only do they eliminate single points of failure between the server and the storage but also help in achieving better performance by balancing the load (I/O load or LUN load) across multiple paths.Most new servers bought today by customers connect into SANs. Furthermore, most of these servers have high availability and redundancy requirements thus are connecting to highly available, redundant fabrics and disk arrays. When any component in the data path fails, failover to the surviving data path occurs non-disruptively and automatically.So the premise of Multipathing or path failover is to provide redundant server connections to storage and:Provide path failover in the event of a path failure Monitor I/O paths and provide alerts on critical events Over the years, administrators have recognized this need and so after the purchase of a server, they would also purchase a 3rd party multipathing solution, typically from their storage vendor. Apart from the fact that these 3rd party solutions were not designed as part of Operating System and some did not integrate particularly well, in addition, they did not interoperate well with multipathing solutions from other storage vendors that needed to installed on the same server. In essence, storage vendor specific multipathing solutions solved one problem while creating another one. This problem has lasted for years and was addressed only recently. Over the past 2-3 years a flurry of OS native multipathing solutions have emerged. Thus, today’s multipathing solution distribution model has changed drastically. Multipathing solutions can be distributed either as: 3rd party software (Symantec/Veritas DMP, PowerPath, HDLM, SDD, RDAC, SANPath etc). Embedded in the Operating System (Solaris MPxIO, AIX MPIO, Windows MPIO, Linux Device Mapper-Multipath, HP-UX PVLinks, VMware ESX Server, Netware via NSS). As an HBA vendor device driver that works with most, if not all, storage arrays (i.e Qlogic’s Linux/Netware failover driver, Windows QLDirect) As an HBA vendor device driver (Emulex MultiPulse) available to OEMs only who in turn incorporate the technology into their own products via calls made to the HBA APIs provided by the HBA vendor. Increasingly, the trend is toward the deployment of OS native multipathing solutions. In fact, with the exception of one Operating System, a substantial server/storage vendor has all but abandoned support of their traditional Multipathing solution for their newer storage arrays, in favor of the OS native ones.There are two drivers behind this trend. Cost is one reason customers elect to deploy OS native multipathing solutions. After all, you can’t beat “free”. A secondary, but equally important, driver is to achieve better interoperability among various vendors’ storage devices that happen to provision the same server(s). One driver stack and one set of HBAs talks to everybody.From a Windows standpoint, it is important to note that Microsoft is strongly encouraging all storage vendors to support its MPIO specification. Network Appliance supports this specification with a Device Specific Module (DSM) for our disk subsystems. It’s equally important to note that Windows MPIO enables the co-existence of multiple storage vendor DSMs within the same server. In fact, the current approach is similar to what Symantec/Veritas has done over the years with the Array Support Library (ASL) that provides vendor disk subsystem attribute[...]

Installing RHEL on SATA using an Adaptec 1210SA Controller


I have a Supermicro server in my lab with an Adaptec 1210SA controller connecting to a couple of SATA drives I use for testing. Given that Adaptec does not provide an RHEL driver, I've had a hard time installing the OS until I had an epiphany a week ago. Adaptec may not provide an RHEL driver for the 1210SA card they do provide a driver for the 2020SA card. Here's how I got around this little problem:

1) Got to the Adaptec site and download the RHEL driver for the 2020SA card.
2) Download and install the RAWWRITE binary for Windows

3) After downloading the RHEL package, unzip it, select the driver image based on the server's architecture, and use RAWWRITE to copy it into a floppy.

4) Power on the server, insert the RHEL CD #1 into the CDROM, and at the boot prompt type: linux dd

5) During the install you will be asked if you want to install additional drivers. Insert the Floppy and select "Yes".

At this point the driver will be loaded and then you can proceed with the OS installation.

I need to stress that this is not the recommended way of doing things but rather a workaround I use for Lab purposes only. I don't even use this system for demos. If you are considering placing such a server in production, I would highly recommend that you purchase a controller with support for the OS version you need to install.

VMware ESX 3.0.0 SAN Booting


One of the ways enterprises today with large numbers of servers are reducing costs and enable greater storage consolidation is by deploying diskless servers that boot from the SAN (FC or IP). While this technique is not new, the introduction of the Bladeserver, which provides greater manageability, reduced HW costs, simpler cable management as well as providing power, cooling and real-estate savings, has further accelerate the adoption of SAN booting.Booting from the SAN provides several advantages:Disaster Recovery - Boot images stored on disk arrays can be easily replicated to remote sites where standby servers of the same HW type can boot quickly, minimizing the negative effect a disaster can have to the business.Snapshots - Boot images in shapshots can be quickly reverted back to a point-in-time, saving time and money in rebuilding a server from scratch.Quick deployment of Servers - Master Boot images stored on disk arrays can be easily cloned using Netapp's FlexClone capabilities providing rapid deployment of additional physical servers.Centralized Management - Because the Master image is located in the SAN, upgrades and patches are managed centrally and are installed only on the master boot image which can be then cloned and mapped to the various servers. No more multiple upgrades or patch installs. Greater Storage consolidation - Because the boot image resides in the SAN, there is no need to purchase internal drives.Greater Protection - Disk arrays provide greater data protection, availability and resiliency features than servers. For example, Netapp's RAID-DP functionality provides additional protection in the event of a Dual drive failure. RAID-DP with SyncMirror, also protects against disk drive enclosure failure, Loop failure, cable failure, back-end HBA failure or any 4 concurrent drive failuresHaving mentioned the advantages, it's only fair that we also mention the disadvantages which even though are being outnumbered they still exist: Complexity - SAN Booting is a more complex process than booting from an internal drive. In certain cases, the troubleshooting process may be a bit more difficult especially if a coredump file can not be obtained.Variable Requirements - The requirements and support from array vendor to array vendor will vary and specific configurations may not even be supported. The requirements will also vary based on the type of OS that is being loaded. Always consult with the disk array vendor before you decide to boot from the fabric. One of the most popular platforms that lends itself to booting from the SAN is VMware ESX server 3.0.0. One reason is that VMware does not support booting from internal IDE or SATA drives. The second reason is that more and more enterprises have started to deploy ESX 3.0.0 on diskless blade servers consolidating hundreds of physical servers into few blades in a single blade chassis with the deployment of VMware's server virtualization capabilities. The new ESX 3.0.0 release has made significant advancements in supporting boot from the SAN as the multiple and annoying requirements from the previous release have been addressed. Here are some differences between the 2.5.x and 3.0.0 versions with regards to the SAN booting requirements:If you are going to be booting ESX server from the SAN, I highly recommend that prior to making any HBA purchasing decisions, you contact your storage vendor and carefully review VMware's SAN Compatibility Guide for ESX server 3.0 . What you will find is that certain model Emulex and Qlogic HBAs are not supported for SAN booting as well as certain OEM'd/rebranded versions of Qlogic HBAs.The setup process is rather trivial, however there are some things you will need to be aware of in order to achieve higher performance, and non-disruptive failovers should HW failu[...]

IBM Bladecenter iSCSI Boot Support


There has been a lot of demand lately to boot blade servers using the integrated NICs without the use of iSCSI HBAs.

IBM has partnered with Microsoft to enable this capability for the IBM HS20 (Type 8843) Blades and Netapp has recently announced support for it.

Here are the requirements:

Blade type: HS20 MT8843
BIOS: 1.08
HS Blade Baseboard/Management Controller: 1.16
Windows 2003 SP1 w/ KB902113 Hot Fix
Microsoft iSCSI initiator with Intergrated boot support: 2.02
Netapp DataONTAP: >= 7.1.1
Netapp iSCSI Windows Initiator Support Kit 2.2 (available for download from the Netapp NOW site)

One thing to be aware of is that the Microsoft iSCSI initiator version 2.02 with Integrated Boot support is a different binary from the standard Microsoft iSCSI initiator 2.02.

To obtain the MS iSCSI initiator 2.02 with Boot support binary follow the link and provide the following invitation code: ms-8RR8-6k43

The IBM BIOS and BMC updates can be downloaded from here: or here

You can find instructions for the process here:

Linux Native Multipathing (Device Mapper-Multipath)


Over the past couple of years a flurry of OS Native multipathing solutions have become available. As a result we are seeing a trend towards these solutions and away from vendor specific multipathing software.The latest OS Native multipathing solution is Device Mappper-Multipath (DM-Multipath) available with Red Hat Enterprise Linux 4.0 U2 and SuSE SLES 9.0 PS2.I had the opportunity to configure it in my lab a couple of days ago and I was pleasantly surprised as to how easy was to configure it. Before I show how it's done, let me talk a little about how it works.The multipathing layer sits above the protocols (FCP or iSCSI), and determines whether or not the devices discovered on the target, represent separate devices or whether they are just separate paths to the same device. In this case, Device Mapper (DM) is the multipathing layer for Linux.To determine which SCSI devices/paths correspond to the same LUN, the DM initiates a SCSI Inquiry. The inquiry response, among other things, carries the LUN serial number. Regardless of the number paths a LUN is associated with, the serial number for the LUN will always be the same. This is how multipathing SW determines which and how many paths are associated with each LUN.Before you get started you want to make a sure the following things are loaded:device-mapper-1.01-1.6 RPM is loaded multipath-tools-0.4.5-0.11Netapp FCP Linux Host Utilities 3.0 Make a copy of the /etc/multipath.conf file. Edit the original file and make sure you only have the following entries uncommented out. If you don't have Netapp the section then add it.defaults {user_friendly_names yes}#devnode_blacklist {devnode "sd[a-b]$"devnode "^(ramrawloopfdmddm-srscdst)[0-9]*"devnode "^hd[a-z]"devnode "^cciss!c[0-9]d[0-9]*"}devices {device {vendor "NETAPP "product "LUN"path_grouping_policy group_by_priogetuid_callout "/sbin/scsi_id -g -u -s /block/%n"prio_callout"/opt/netapp/santools/mpath_prio_ontap /dev/n"features "1 queue_if_no_path"path_checker readsector0failback immediate}}The devnode_blacklist includes devices for which you do not want multipathing enabled. So if you have a couple of local SCSI drives (i.e sda and sdb) the first entry in the blacklist will exclude them. Same for IDE drives (hd).Add the multipath service to the boot sequence by entering the following:chkconfig --add multipathdchkconfig multipathd onMultipathing on Linux is Active/Active with a Round-Robin algorithm.The path_grouping_policy is group_by_prio. It assigns paths into Path Groups based on path priority values. Each path is given a priority (high value = high priority) based on a callout program written by Netapp Engineering (part of the FCP Linux Host Utilities 3.0).The priority values for each path in a Path Group are summed and you obtain a group priority value. The paths belonging to the Path Group with the higher priority value are used for I/O.If a path fails, the value of the failed path is subtracted from the Path Group priority value. If the Path Group priority value is still higher than the values of the other Path Groups, I/O will continue within that Path Group. If not, I/O will switch to the Path Group with highest priority.Create and map some LUNs to the host. If you are using the latest Qlogic or Emulex drivers, then run the respective utilities they provide to discover the LUN:qla2xxx_lun_rescan all (QLogic)lun_scan_all (Emulex)To view a list of multipathed devices:# multipath -d -l [root@rhel-a ~]# multipath -l360a9800043346461436f373279574b53[size=5 GB][features="1 queue_if_no_path"][hwhandler="0"]\_ round-robin 0 [active] \\_ 2:0:0:0 sdc 8:32 [active] \_ 3:0:0:0 sde 8:64 [active]\_ round-robin 0 [enabled] \_ 2:0:1:0 sdd 8:48 [active] \_ 3:0:1:0 sdf 8:80 [active]The above shows 1 LUN with 4 pat[...]

VMware ESX 3.0.0


Over that past couple of i've started playing with the newly released version (3.0.0) of ESX Server. I've been running ESX 2.5.3 in my lab for a while now and so I decided to upgrade to 3.0.0 to get a feel of the new changes made. More importantly I wanted to see the iSCSI implementation.

I've been booting ESX 2.5.3 over an FC SAN in my lab and I have a few Windows 2003 virtual machines as well as a RHEL 4.0U2 virtual machine. The upgrade process took me about 30 minutes and was flawless.

Setting up the ESX iSCSI SW initiator was a breeze and after I was done I connected my existing VMs via iSCSI thru the ESX layer. Because there's no multipathing available for iSCSI as there is with Fibre Channel with the 3.0.0 release I used NIC Teaming to accomplish virtually the same thing. The whole process didn't take more than 10-15 minutes.

With the 3.0.0 version of ESX, VMware does not support booting ESX server over iSCSI, however, they do support VM's residing on iSCSI LUNs. Even though you could connect an iSCSI HBA (i.e Qlogic 4010/4050/4052) and boot the ESX server, the status of the iSCSI HBA for this release is deemed "experimental only". Support for the iSCSI HBA should be in the 3.0.1 release. I also hear that iSCSI multipathing support will also be available on this release as well.

So if you have a whole nunch of diskless blades you want to boot over iSCSI with VMware ESX you'll be able to get it done in the 3.0.1 release.

I also noticed that some of the restrictions in terms of suported FC HBAs for SAN booting have been lifted with the 3.0.0 release. For example, you can now use Emulex & Qlogic HBAs whereas before only Qlogic 23xx was supported. Additionally, RDMs (Raw Device Mappings) are now supported in conjuction with SAN booting whereas before they were not.

Further restrictions with regards to SAN booting that have been lifted, also include booting from the lowest number WWN, and lowest number LUN. The restriction that remains is that you can not boot ESX without a Fabric, meaning you can't boot ESX via a direct connection to a disk array. Well, I believe you can it's just that VMware won't support it.

One thing though that I have yet to figure out is why would VMware allow and support an ESX install on internal IDE/ATA drives but not on internal SATA drives. I've tried to install ESX on a server with an Adaptec 1210SA controller and during setup it couldn't find my disk. So it looks like a driver issue. Poking around on the net I found someone who used an LSI MegaRaid 150-2 controller and was successful in installing ESX on a SATA RAID 5 partition.

That made me curious so I spent $20 on Ebay and got an LSI Megaraid 150-2 controller and was successful in installing ESX. Like I said before, this is not supported by VMware which is bizarre but for testing purposes it works just fine.

One thing to watch out for is that:
  • VMware does not currently support MSCS with Windows 2003 SP1. SP1 has incorporated some changes that will not allow MSCS to fuction properly with any ESX version at this time. VMware has been working with Microsoft on a resolution but have no ETA for a fix

Back from vacation


I haven't written for a while since my family and I went on vacation to Greece which is where I'm originally from. Always love to head on over there this time of the year and spend time with family and friends. My kids thoroughly enjoy the beaches and every summer they make new friends plus they get to learn the language.

The trip over was a breeze, however, the return coincided with the London events and even though we didn't travel thru London but rather thru Zurich we felt the pain.

For those of you that travel with small kids you know what I'm talking about, especially when you have to wait for over an hour to go thru security screening. It got even worse in NY where we had to sit for 3 1/2 hours on the tarmac. By the time we got to Dallas we needed another vacation.

At least we made back safely and that's what matters.

Thin Provisioning


I've recently read several articles on Thin Provisioning and one thing that immediately jumped out at me was that each article describes Thin Provisioning as over-provisioning of existing physical storage capacity.While this can be accomplished with Thin Provisioning, it's not necessarily the point of Thin provisioning. Thin Provisioning is also about intelligent allocation of existing physical capacity rather than over-allocation. If I have purchased 1TB of storage, and I know only a portion of it will be used initially, then I could thin provision LUNs totaling 1TB while on the back-end I do the physical allocation on application writes. There's no overallocation in this scheme and furthermore, I have the ability to re-purpose unallocated capacity if need be.The big problem with storage allocation is that it's directly related to forecasting which is risky, at best. In carving up storage capacity too much maybe given to one group and not enough to another. The issue here is that storage re-allocation is difficult, it takes time, resources and in most cases, it requires application downtime. That's why most users request more capacity than they would typically need on day one. Thus capacity utilization becomes an issue.Back to the overallocation scheme. In order to do overallocation you have to have 2 things in place to address the inherent risk associated with such practice and avoid getting calls at 3am.1) A robust monitoring and alerting mechanism2) Automated Policy Space ManagementWithout these Thinly provisioning represents a serious risk and and requires constant monitoring. That's why with DataONTAP 7.1 we have extended monitoring and alerting within the Operations Manager to include thinly provisioned volumes and also introduced automated Policy Space Management (vol autosize and snap autodelete).Another thing I've just read is that when thin provisioning a windows lun, format will trigger physical allocation equal to the size of the LUN. That's not accurate and to prove that point I have created a 200MB Netapp Volume. Furthermore, inside that Volume I have created a Thinly provisioned LUN (100MB) and mapped it to a windows server and formatted it. It's worth noting that the "Used" column of the Volume that hosts this particular LUN is 3MB, depicting overhead after the format, however, the LUN itself (/vol/test/mylun), as shown in the picture, is 100MB. Below is the LUN view from the server's perspective and further proof that the LUN is indeed formatted, (Drive E:\).Personaly, I would not implement Thin Provisioning for new apps for which I have no usage patterns at all. I would also not implement it for applications that quickly delete chunks of data within the LUN(s) and write new data. Whenever you delete data on the host from a LUN, the disk array doesn’t know the data has been deleted. The host basically doesn’t tell - or rather SCSI doesn’t have a way to tell. Furthermore, when I delete xMB of data from a LUN, and write new data into it, NTFS can write this data anywhere. That means that some previously freed blocks maybe re-used but it also means that blocks never used before can also be touched. The latter will trigger a physical allocation on the array.[...]

The State of Virtualization


Storage Virtualization is the logical abstraction of physical storage devices enabling the management and presentation of multiple and disparate devices as a single storage pool, regardless of the device’s physical layout, and complexity.As surprising as it may seem, Storage Virtualization is not a new concept and has existed for years within disk subsystems as well as on the hosts. For example RAID represents an example of virtualization achieved within RAID arrays in that it reduces disk management and administration of multiple physical disks into few virtual ones . Host based Logical Volume Managers (LVM) represent another example of a virtualization engine that’s been around for years and accomplishes tasks similar.The promise of storage virtualization is to cut costs by reducing complexity, enabling better and more efficient capacity utilization, masking the inherent interoperability issues caused by the loose interpretation of the existing standards, and finally by providing an efficient way to manage large quantities of storage from disparate storage vendors.The logical abstraction layer can reside in servers, intelligent FC switches, appliances or in the disk subsystem itself. These methods are commonly referred to as: host based, array based, controller based, appliance based and switch based virtualization. Additionally, each one of these methods is implemented differently by the various storage vendors and are sub-divided into two categories: in-band and out-of-band virtualization. Just to make things even more confusing, yet another terminology has surfaced over the past year or so, the split-path vs shared-path architectures. It is of no surprise that customers are confused and have been reluctant to adopt virtualization despite the promise of the technology.So lets look at the different virtualization approaches and how they compare and contrast.Host Based – Logical Volume ManagerLVMs have been around for years via 3rd party SW (i.e Symantec) or as part of the Operating System (i.e HP-UX, AIX, Solaris, Linux). They provide tasks such as disk partitioning, RAID protection, and striping. Some of the them also provide Dynamic Multipathing drivers (i.e Symantec Volume Manager). As it is typical with any software implementation the burden of processing falls squarely on the shoulders of the CPU, however these days the impact is much less pronounce due to the powerful CPUs available in the market. The overall performance of an LVM is very dependent on how efficient the Operating System is or how well 3rd party volume managers have been integrated with the OS. While LVMs are relatively simple to install, configure and use, they are server resident software, meaning that for large environments multiple installation, configuration instances will need to be performed as well multiple and repetitive management tasks will need to be performed.. An advantage of a host based LVM is independent of the physical characteristics of external disk subsystems and even though these may have various performance characteristics and complexities, the LVM can still handle and partition LUNs from all of them.Disk Array BasedSimilar to LVMs, disk arrays have been providing virtualization for years by implementing various RAID techniques. Such as creating Logical Units (LUNs) that span multiple disks in RAID Groups or across RAID Groups by partitioning the array disks into chunks and then re-assemble them into LUs. All this work is done by the disk array controller which is tightly integrated with the rest of the array components and provides cache memory, cache mirroring as well as interfaces that s[...]

VTL Part 2


It's evident that VTLs are becoming popular backup and recovery targets. Among others, Netapp has also jumped onto the bandwagon, I figured I'd talk a little bit about the NearStore VTL offering.A year ago Netapp announced the acquisition of Alacritus. At the time Alacritus was a privately held company out of Pleasanton, CA and Netapp first partnered with Alacritus around December 2004. Together they offered a solution comprised of a Netapp Nearline storage array and the Alacritus VTL package. Less than 6 mos later Netapp decided to own the technology so it acquired Alacritus.Alacritus BackgroundAs mentioned above, Alacritus was a privately held company. Alacritus has been in the VTL business and in the general backup business a lot longer than people think. The principals at Alacritus have been together for 15 years and are responsible for several backup innovations. They are the ones who with Netapp co-developed the Network Data Management Protocol (NDMP). They developed BudTool which was the first open systems backup application. They developed Celestra, which is the the first server-less backup product. They pioneered XCOPY, extended copy SCSI command. In 2001, Alactitus developed the 1st VTL and have been delivering it since then, before other VTL competitors were even incorporated. Alacritus strategy at the time was to sell the solution thru OEMs and resellers in Japan. Most notably Hitachi.TechnologyThere are several technological innovations within the Netapp NearStore VTL delivering key benefits to customers but i'll only address 3-4 of them as I don't want to write an essay.Continuous Self-tuning - The NearStor VTL continuously and dynamically load balances backup streams across *all* available resources (disks) thus maintaining optimal system performance without developing hot spots. That means that backup streams are load balanced across all the Disk Drives across all Raid Groups for a Virtual Library which in turn means that Virtual Tapes do not reside at fixed locations. That provides the ability to load balance traffic based on the most available drives. Utlimately, what this means is that customers do not have to take any steps to manually tune the VTL.Smart Sizing - Smart sizing is based on the fact that all data compresses differently. Since data compresses at different rates, the amount of data that will fit into a tape changes from backup to backup. If you take into account that a Virtual Tape eventually will be written to a Physical Tape you want to make absolutely sure that the amount of data on the Virtual Tape will fit onto the Physical Tape. To address this, most VTL vendors make the capacity of the Virtual Tape equal to the Native capacity of the Physical Tape. The NearStor VTL offers a unique approach. By using high-speed statistical sampling of the backup stream, and by having knowledge of the Tape Drive's compression algorithm, it determines how well the data will compress when it gets to the Tape drive, and adjusts the size of the Virtual Tape accordingly to closely match the compressed capacity of the Physical Tape drive. As a result of this, customers obtain significantly higher physical media utilization rates compared to other VTLs. As an example, consider a backup of 400GB and a tape cartridge with a native capacity of 200GB. A typical VTL will need 2 Virtual Tapes each with a 200GB native capacity. If the Physical Drive compresses at 2:1 ratio that means that you'll write 200GB thus filling 1/2 of the tapes plus you'll need 2 Physical tapes to export to. With Smart Sizing, the Virtual Tape size will be adjusted to 1 Virtual Tape [...]

VTL & Tape: A Symbiotic Relationship


A lot has been written over the past year about advantages and disadvantages of tape. One thing for sure though is that Tape's not going anywhere anytime soon for various reasons some of which are included below:Tape is deeply entrenched in the Enterprise Tape's a cost effective long term storage mediumBackup applications understand Tape and perform their best when streaming to a Tape drive rather than a filesystem.Tape can be easily moved offsite for vaulting purposes. But Tape has some distinct disadvantages some of which include:Tapes are unreliable and susceptible to environmental conditions (i.e heat/humidity etc).You won't know of a bad tape until you attempt to recover from it.Sharing Tape drives requires additional software and adds cost and complexity. Streaming to a tape drive is not simple, especially with incremental backups. And while it can be done, via multiplexing, the latter has a significant effect on recovery since all interleaved streams must be read by the backup server.In order to share Tape libraries between servers additional software must be purchased, adding cost as well as complexity. One approach that customers have been using to address the above issues is to backup to a conventional disk array using D2D backup. However, what they find is that this approach adds additional configuration steps, in that they would still have to provision storage to the backup application using the disk vendors provisioning tools, still have to create RAID Groups, still have to create LUNs, still have to make decision regarding cache allocations and finally they still have to manage it. Then, reality sets in...Disk is not easily shared between servers and Operating systems without a Shared SAN filesystem or by carving and managing multiple LUNs to multiple servers/apps. All this means additional cost, complexity and management overhead. Addressing a challenge by making it more challenging is not what people are looking for. This is where the VTL comes into play. An integrated appliance with single or dual controllers and disk behind, that looks like, feels like tape but it's...Disk. Disk that Emulates Tape Libraries, with Tape drives, slots, Entry/Exit ports and Tape cartridges. Backup SW, since their inception were designed with Tape in mind, not disk. They know Tape, they perform very well with tape. They know little about disk and in some cases do not integrated at all with disk, nor do they perform optimally with disk. The VTL on the other hand appears to the Backup SW as one or more Tape Libraries of different type and characteristics (drive type, slots #, capacities). They also eliminate the need to stream to disk regardless of the backup you are taking (full/incremental) since inherently disk is faster than tape. This also means that you don't have to multiplex thus making your recovery fast. You could also easily share a single VTL among multiple servers providing each server with its own dedicated Tape library, drives, slots, robot. Essentially, what you end up is with a centrally located and manage Virtual Library that looks, feels and behaves as a dedicated physical library to each of your servers. Another benefit of the VTL is that is easily integrated with a real Physical Tape library. In fact, the majority of the implementations require it by positioning the VTL in front of a Physical Tape library. The VTL will then emulate the specific tape library with its associated characteristics such as, number of drives, slots, barcodes, robot etc. After a backup has completed you then have 2 choices with regards to Physical [...]

FlexVols: Flexible Data Management


If you're managing Storage you're most likely to have experienced some of these issues. Too much storage is allocated and not used by some applications, while other apps are getting starved. Because application reconfiguration is not a trivial process and it's time and resource consuming, let alone it requires application downtime, most folks end up buying more disk.

The root of the problem with data management is that it relies heavily on forecasting and getting the forecast right all the time is an impossible task. Another issue with Data Management is that there are too many hidden costs associated with it. Costs that can include configuration changes, training, backup/restore, and data protection etc.

In addition, there's risk. Reconfigurations are risky in that they can potentially impact reliability. DataONTAP 7G with FlexVols addresses all of the above issues plus some more.

DataONTAP 7G virtualizes volumes in Netapp and Non-Netapp storage systems (V-Series) by creating an abstraction layer that separates the physical relationship between volumes and disks. A good analogy I read from a Clipper Group report was comparing capacity allocated to FlexVols versus other traditional approaches, to a wireless phone versus a landline. While every phone has a unique number the wireless phone can be used anywhere, whereas the landline resides in a fixed location and can not be moved easily.(image)

FlexVols are created on top of a large pool of disks called an Aaggregate. You can have more than one aggregate if you want. Flexvols are stripped across every disk in the aggregate and have their own attributes which are independent of each other. For example, they can have their own snapshot schedule or their own replication schedule. They can also be increased or decreased in size on the fly. They also have another very important attribute. Space that is allocated to flexvol but not used can be taken away, on the fly, and re-allocated to another flexvol that needs it. The Aggregate(s) can also be increased in size on the fly.

Flexvols can also be cloned using our FlexClone technology which I'll address another day. But just so everyone understands, a Flexclone represents a space efficient point-in-time copy (read/write) of the parent Flexvol but can also be turned into a fully independent Flexvol itself.

Another important aspect of the flexvols is size granularity. Starting with a size of 20MB up to 16TB it gives users the ability to manage data sets according to their size while at the same time, obtain the performance of hundreds of disks. Couple that with DataONTAP's FlexShare, Class of Service, we have a very elegant solution for application consolidation within the same aggregate. By deploying 7G the days of wasting drive capacity in order to obtain performance are gone.

Another very useful feature of 7G is the ability to do Thin Provisioning as well provide Automated Policy Space Management in order to address unforseen events that can be caused by sudden spikes in used capacity.

I'll be writing more on the last two subjects pretty soon so stay tuned

The Kilo-Client Project: iSCSI for the Masses...


A little bit over a year ago Netapp Engineering was challenged to build a large scale test bed in order to exploit and test various configurations and extreme conditions under which our products are deployed by our customers. Thus, the Kilo-Client project was born.

(image) Completed, early 2006 the Kilo-Client project is, most likely, the World's Largest iSCSI SAN with 1,120 diskless blades booting of the SAN and providing support for various Operating Systems (Windows, Linux, Solaris) and multiple applications (Oracle, SAS, SAP etc). In addition, Kilo-Client, incorporates various Netapp technological innovations such as:

SnapShot - A disk based point in time copy
LUNClone - A a space optimized read/write LUN
FlexClone - A space optimized read/write Volume
SnapMirror - Replication of Volumes/qtrees/LUNs
Q-Tree - A logical container within a volume used to group files or LUNs.
SnapRestore - Near instantaneous recovery of a Volume or a LUN to a previous PIT version.

Today, not only, does the Kilo-Client project serves as an Engineering test bed but also as a facility where our customers can test their applications under a variety of scenarios and conditions. For more information on the Kilo-Client project click the link.

You may also want to consider registering for the Tech ONTAP Newsletter since there's ton of valuable information that gets posted on it on a monthly basis, from Best Practices, to new technology demos, tips/tricks and Engineering interviews.

iSCSI: Multipathing Options Menu


A question that I get asked frequently revolves around iscsi multipathing options and how folks would be provide redundancy and be able to route I/O around various failed components residing in the data path.

Contrary to what has been available for Fibre Channel, iSCSI offers multiple choices to select from, each of which has various characteristics. So here are your optionsm most of which are available across all Operating systems that provide iSCSI support today:

1) Link Aggregation - IEEE 802.3ad

Link Aggegation, also known as Teaming or Trunking, is a well known and understood standard networking technique deployed to provide reduncancy and high-availability access for NFS, CIFS as well as other types of traffic. The premise is the ability to logically link multiple physical interfaces into a single interface thus providing redundancy, and higher availablity. Link aggregation is not dependent on storage but rather a capable Gigabit Ethernet driver.

4Gb FC Gains Momentum


Various, next generation, 4Gb Fibre Channel components began rolling out around mid 2005 with moderate success rate, primarily because vendors were ahead of the adoption curve. A year later 4Gb FC has gained considerable momentum with almost every vendor having a 4Gb offering. With the available tools, infrastructure in place, backward compatibility, as well as, component availability near or at the same price points as 2Gb, 4Gb is a very well positioned technology.The initial intention with 4Gb was for deployment inside the rack for connecting enclosures to controllers inside the array. However, initial deployments utilized 4Gb FC as Interswitch Links (ISL) in Edge to Core Fabrics or in topologies with considerably low traffic locality. For these types of environments 4Gb FC greatly increased performance, while at the same time decreasing ISL oversubscription ratios. Additionally, it decreased the number of trunks deployed which translates to lower switch port burn rates thus lowering the cost per port.As metioned above, backwards compatibility is one of its advantages since 4Gb FC leverages the same 8B/10B encoding scheme as 1Gb/2Gb, speed negotiation, same cabling and SFPs. Incremental performance of 4Gb over 2Gb also allows for higher QoS for demanding applications and lower latency. Preserving existing investments in disk subsystems by being able to upgrade them to 4Gb thus avoiding fork-lift upgrades is an added bonus even though with some vendor offerings, fork-lift upgrades and subsequent data migrations will be necessary.Even though most have 4Gb disk array offerings, no vendor that I know of offers 4Gb drives thus far, however I expect this to change. Inevitably, the question becomes "What good is a 4Gb FC front-end without 4Gb drives?"With a 4Gb front-end you can still take advantage of cache (medical imaging, video rendering, data mining applications) and RAID parallelism provide excellent performance. There are some other benefits though, like higher fan-in ratios per Target Port thus lowering the number of switch ports needed. For servers and applications that deploy more than 2 HBAs, you have the ability to reduce the number of HBAs on the server, free server slots, and still get the same performance at a lower cost since the cost per 4Gb HBA is nearly identical with that of a 2Gb.But what about disk drives? To date, there's one disk drive manufacturer with 4Gb drives on the market, Hitachi. Looking at the specs of a Hitachi Ultrastar 15K147 4Gb drive versus a Seagate ST3146854FC 2Gb drive, the interface speed is the major difference. Disk drive performance is primarily controlled by the Head Disk Assembly (HDA) via metrics such as avg. seek time, RPMs, transfer from media. Interface speed has little relevancy if there are no improvements in the above metrics. The bottom line is that, characterizing a disk drive as high performance strictly based on its interface speed can lead to the wrong conclusion.Another thing to take into consideration, with regards to 4Gb drive adoption, is that most disk subsystem vendors source drives from multiple drive manufacturers in order to be able to provide the market with supply continuity. Mitigating against the risk of drive quality issues that could potentially occur with a particular drive manufacturer is another reason. I suspect that until we see 4Gb drive offerings from multiple disk drive vendors the current trend will continue[...]

iSCSI Performance and Deployment


With the popularity and proliferation of iSCSI, a lot of questions are being asked regarding iSCSI performance and when to consider deployment.iSCSI performance is one of the most misunderstood aspects of the protocol. Looking at it purely from a bandwidth perspective, Fibre Channel at 2/4Gbit certainly appears much faster than iSCSI at 1Gbit. However, before we proceed further lets define two important terms: Bandwidth and ThroughputBandwidth: The amount of data transferred over a specific time period. This is measured in KB/s, MB/s, GB/sThroughput: The amount of work accomplished by the system over a specific time period. This is measured in IOPS (I/Os per second), TPS (transactions per second)There is a significant difference between the two in that Throughput has varying I/O sizes which have a direct effect on Bandwidth. Consider an application that requires 5000 IOPS at a 4k block size. That translates to a bandwidth of 20MB/s. Now consider the same application but at a 64k size. That's a bandwidth of 320MB/s.Is there any doubt as to whether or not iSCSI is capable of supporting a 5000 IOP, 20MB/s application? How about at 5000 IOPs and 40MB/s using a SQL server 8k page size?Naturally, as the I/O size increases the interconnect with the smaller bandwidth will become a bottleneck sooner than the interconnect with the larger one. So, I/O size and application requirements makes a big difference as to when to consider an iSCSI deployment.If you are dealing with bandwidth intensive applications such as backup, video/audio streaming, large block sequential I/O Data Warehouse Databases, iSCSI is probably not the right fit, at this time.Tests that we have performed internally, as well, as tests performed by 3rd party independent organizations such as the Enterprise Storage Group confirm that iSCSI performance difference between FC and iSCSI is negligible when deployed with small block OLTP type applications. Having said that, there are also documented tests conducted by a 3rd party independent organization, Veritest, where iSCSI outperformed an equivalent array identically configured with FC using Best Practices documentation deployed by both vendors, in conjuction with an OLTP type of workload.At the end of the day, always remember that the application requirements dictate protocol deployment.Another question that gets asked frequently is whether or not iSCSI is ready for mission critical applications.iSCSI has come a long way since 2003. The introduction of host-side clustering, multipathing support and SAN booting capabilities from various OS and storage vendors provide a vote of confidence that iSCSI can certainly be considered for mission critical applications. Additionally, based on deployments, Netapp has proven over the past 3 years, that a scalable, simple to use array with Enterprise class reliability when coupled with the above mentioned features can safely be the iSCSI platform for mission-critical applications. Exchange is a perfect example of a mission critical application (it is considered as such by lots of Enterprises) that is routinely deployed over iSCSI these days.[...]

Dynamic Queue Management


When we (Netapp) rolled out Fibre Channel support almost 4 years ago, one of our goals was to simplify the installation, configuration, data and protocol management as well as provide deep application integration. In short, we wanted to make sure the burden does not fall squarely on the shoulder of the Administrator to accomplish routine day to day tasks.One of the things we paid particularly attention to, was Host side and Target side Queue Depth management. Setting host Queue depths is a much more complicated process than the various disk subsystem vendors documentation make it to be and requires specific knowledge around application throughput and response times in order to decide what the appropriate Host Queue Depth should be set to.All SAN devices suffer from Queue Depth related issues. The issue is that everybody parcels out finite resources (Queues) from a common source (Array Target Port) to a set of Initiators (HBAs) that consider these resources to be independent. As a result, on occasion, initiators can easily monopolize I/O to a Target Port thus starving other initiators in the Fabric.Every vendor documentation I've seen, explicitly specifies what the host setting of the Host Queue Depth setting should be. How is that possible when in order to do this you need to have knowledge of the application's specific I/O requirements and response rime? Isn't that what Little's Law is all about (N=X * R)?It's simply a "shot in the dark" approach hoping that the assigned queue depth will provide adequate application performance. But what if it doesn't? Well, then, a lot of vendors will give it another go...Another "shot in the dark". In the process of setting the appropriate Host Queue Depth, and depending on the OS, they will edit the appropriate configuration file, make the change, and ask the application admin to take an outage and reboot the host.The above procedure is related to two things: a) Poor Planning without knowing what the Application requirements are b) Inadequate protocol management featuresTo address this challenge we decided to implement Dynamic Queue Management and move Queue Depth management from the Host to the Array's Target Port.So what is Dynamic Queue Management?Simply put, Dynamic Queue Management manages queue depths from the Array side. By monitoring Application response times on a per LUN basis, and QFULL conditions it dynamically adjusts the Queue Depth based on the application requirements. In addition, it can be configured to:Limit the number of I/O requests a certain Initiator sends to a Target PortPrevent initiators from flooding Target ports while starving other initiators from LUN accessEnsures that initiators have guaranteed access to Queue resourcesWith Dynamic Queue Management, Data ONTAP calculates the total amount of command blocks available and allocates the appropriate number to reserve for an initiator or a group of initiators, based on the percentage you specify (0-99%). You can also specify a reserve Queue Pool where an initiator can borrow Queues when these are needed by the application. On the host side, we set the Queue Depth to its maximum value. The benefit of this practice is, that it take the guessing game out of the picture and guarantees that the application will perform at its maximum level without unnecessary host side reconfigurations, application shutdowns or host reboots. Look Ma', No Hands!!!Several of our competitor[...]