Why StorNext? A brief version of my investigations into heterogeneous file systems and SANs.

Thursday, March 25th, 2010 at 15:29

Why StorNext? A brief version of my investigations into heterogeneous file systems and SANs.

Our criteria for choosing a SAN solution had some very specific needs. But these needs don’t nessisariy just apply to our situation:
1) Multiple Avid NLE’s connected to a single storage pool
2) Non-proprietary disk hardware
3) Very scalable in both connections and storage size
4) A pay as you expand pricing model
5) Sufficient bandwidth for multicam HD editing
6) Would work over and existing fibre-channel infrastructure.

At the time we had an Avid Unity LANShare LP which had done its job well enough but could not be expanded beyond 8TB and 6 clients. The jumping off point was the CEO (Garion Hall) tasking me with investigating alternative means to exceed the 6 clients with the intention of employing more editors or at lease more editing machines to increase post production output. So with that in mind I went about outlining the kind of system we would want. In such a system we also wanted to move away from closed box, proprietary systems that lock you into their pricing and support contracts. We wanted the flexibility to expand the system as we needed it, the ability to do so ourselves with third party hardware that we could support ourselves.
Naturally we had specific bandwidth requirements that needed to be sustained (which is simple maths) and in our case we already had a 4Gb fibre-channel infrastructure that we wanted to continue using, or at the very least, not have to replace (at great expense).

There are hundreds of SAN solutions on the market for enterprise and HPC-scientific computing. By comparison there are very few designed specifically for post production. (Though that is changing at an ever increasing rate). Nearly all of the non-post specific solutions have two things in common that I believe make them poor choices for post production use. Firstly is the assumptions the traditional SAN model makes:

1) Low number of high speed clients. (I.e. A number of servers will be connected to a number of SAN controllers and then those servers will serve files out to a network.
2) Very large number of low speed clients
3) A very high volume of read/write requests (per user per file per second)
4) Generally very small file sizes. (from >1KB to under 50MB)
5) Read/writes are done non-sequentially, all over the platers/arrays/volumes

So not only are they set up to be the opposite of post production requirements:
1) Small-medium number of high speed clients (each with a user)
2) Little or no low speed clients
3) Low volume of read/write requests from workstations (often only 2-4 per user and only 1 file per user.)
4) Generally extremely large file sizes. (anywhere up to 10GB or more, around 10,000 times as big)
5) Read/writes are done sequentially (linear, long r/w from consistent plater/array/volume locations)

There is also an assumption that a single SAN controller (basically an intelligent array controller) will have the bandwidth to control many “slave” arrays. Or, put simply, there may be say ten 16drive arrays but there’s only a pair of 4Gb connections to it. So if your working in film then that huge tower of god knows how many petabytes is really only usable by two people. To further frustrate the situation, once you’ve reached the limit for a controller you need to buy a second one. (or maybe you want to buy lots of controllers so more people can get bandwidth) Then you have not one concentric storage pool but multiple SAN’s all needing individual configuration, management, fault tolerance, HA considerations and DR strategies. And to top it off, now each of your clients needs to connect to multiple SAN’s. Why did we want a SAN again?

The second major drawback to enterprise systems is their pricing structure. Because they assume a large number of slow clients and a large volume of internally controlled data, it gets prohibitively expensive to configure a system for the needs of post production. For instance, for me to replace our old Avid Unity with a solution from NetApp I would need six SAN controllers and the management modules and back end switching to support them. For me to connect six high speed clients, running non server OS’s and have 8TB would have cost me more than five times the cost of the Unity. About $250K and that’s to only MATCH the unity, not scale up from it! (and to completely knock this on the head, none of them support os X, which makes up about 75% of the post production market for workstations.)

So, then you turn to post production friendly SAN solutions. These have drawbacks too. Primarily, they’re nearly all just as proprietary and closed-box as Avid. And most of them are about the same price. So, as we are running Media Composer only I’m back to just buying a new Unity!
The exceptions to this are systems such as Autodesk Stone and Apple XSan. I will get back to them later.

FEAR NOT, there is another way! Well, two actually but they stem from the same idea. Clustered file systems (basically a fancy high speed NAS) and heterogeneous file systems (a controllerless or headless SAN with a special filesystem on it). Both share some core functionalities:

1) Single, managed storage pool
2) Scalable numbers of high and low speed clients
3) Infinitely scalable storage size (well, effectively infinite)
4) Client platform agnostic (any OS can connect to it, within reason. An Amiga500 for instance wont have a client available!)

Sounds good right. Well there are some drawbacks and some good parts. Lets start with the clustered systems.

Clustered storage is basically a genesis on the basic SAN model but takes away the “head” and the basic NAS model but with shared storage and connections across the whole cluster. Clustered systems are made up of “nodes” or blocks of storage that are intelligent, can operate individually but when connected to each other, share their storage and connections out as a single entity. On paper this is brilliant. In real life, there are some hurdles but we’ll get to that. Most of these systems rely having a back end interconnection between each node and a front end interface for the clients to attach to. Systems such as Isilon and BlueArc use Infiniban to interconnect the nodes and allow 1Gb and 10Gb Ethernet connections for the clients. This way any client connected to any node can access data from all the nodes at high speed. It’s a good model and its very effective. But again we’re pushed back to a very proprietary system because the internal file management (and indeed the file system itself on many of these products) is very complex and has to have 100% hardware/software reliance. On the up side, they’re nearly indestructible and child’s play to expand. So for post production environments that don’t have on the ground ITS, they are a really good solution. Isilon (which I’ve used) and BlueArc (which I haven’t used but gets rave reviews from studios around the world) are good examples for ease of use and have very good FT/HA. In an Isilon cluster, adding a node (9TB, ~6 usable) is a task a receptionist could do with a one page instruction page. No, really it is that easy. Plus, you can literally walk up to a cluster and yank out a drive or even a whole node and NOTHING WILL HAPPEN. (apart from someone like me getting a lot of urgent emails from it). This is because the filesystem and the data are spanned across the whole cluster. It just goes about re-striping and shunting stuff around until its happy.. The only thing your editors will notice is the usable space of the volume will have suddenly gone down. For us here at G Media though it came down to cost. Not just the initial outlay but the cost of expansion. Because each node is intelligent and includes all the hardware it means each node is, by comparison to a straight array (non-SAN) hugely expensive:
• Non-managed array $/TB ~$1,000
• Avid Unity $/TB ~$5,000
• Isilon $/TB ~$10,000
You see with things like Isilon you’re paying for the hardware, the software license, the filesystem license and the support PER NODE.

Also note that while these systems behave as a SAN they are in fact a super high performance NAS solution. And thus use protocols such as NFS and even SAMBA for the clients. This has its own issues attached. For instance, we couldn’t get older Avid Media Composer systems to get anything over 50MB/s over 1Gb Ethernet and the HP workstations (xw8200 & xw8400) simply would not let a 10G card run jumbo frames. Ultimately, we couldn’t playback more than two streams of DCVProHD1080i50. Not enough for us with 2, 3 or even 4 camera multi-cam shoots. I wont go into the other issues with running media storage on NAS but I will say that even if we had newer workstations (some of them actually are) and could use 10Gb Ethernet with jumbo frames it still means installing 10G infrastructure on CX4 copper (as Isilon didn’t support fibre), CX4 10G Ethernet switches, CX4 high speed connection heads for the cluster. AHHHGG! Very expensive.

So, heterogeneous file systems:
Heterogeneous file systems, such as SGI’s CXFS and Quantum Stornext, work on the principle of removing the “head” of a traditional SAN and allowing it to have control over any storage you choose to present it (in the form of LUNs) over what ever connection you choose to use. So the “head” becomes transparent (or virtual really) in the sense that while it controls the filesystem traffic between client and storage (like a traditional controller) It’s not the conduit (and bottle neck) for the traffic, the clients are attached to the storage directly. Clever right… This is achieved by splitting control requests and data transmission up. File requests and OS updates (metadata) are sent over standard Ethernet (either in band with an existing LAN or across its own private LAN). Data r/w is done over a storage connection (FC, IB, 10G what ever) directly. The control requests are all handled by a (or several) servers called metadata servers. They talk to all the clients and to the storage and point everyone in the right direction.

Here at G-Media, we finally decided on Quantum StorNext, so that’s the system I will describe. All of the NLE’s, the storage arrays, tape library and MDC’s are connected via fibre channel through FC switching. (the fabric for this switching is zoned specifically for this task, a topic I will discuss at some other time). At the same time all the NLE’s and the MDC’s are also connected to a LAN. When an NLE (client) requests a file it makes that request via the LAN to the MDC, the MDC holds the metadata for where on the arrays the files are, instructs the client (and thus the clients HBA) where to go, the client HBA sends SCSI commands over FC to the array controllers which send back the data and finally the client returns any updates to the MDC.

Additionally, the MDC’s are responsible for creating and maintaining the filesystem on the storage. The major advantage is you can use any storage hardware you can imagine, provided it can be mounted by the MDC’s. Even a USB stick can be added to the storage pool (not that I suggest you do this!!!). Expanding storage is as simple as adding arrays, presenting the LUNs to the MDC, getting the MDC to stripe them with the file system and add them to an existing volume. All the user will see is the available space go up.

The MDC’s themselves are just normal server hardware running RHEL, nothing proprietary or fancy at all. And a pair of them can run as an HA fail-over configuration. Adding more clients is a per seat license fee with no additional costs for file system licensing, server based connection licenses or additional hardware (assuming you have a spare port on your FC switch). You can also run a virtual SAN client over Ethernet that will actually run at 100MBps (not 30MBps like normal CIFS) called a Distributed LAN Client or DLC. And if you’re really keen you can run 10G Ethernet or IB for these DLC’s.

Clients are available for just about any imaginable platform, including OS X. In fact, the Mac license is an XSan license because XSan IS StorNext under the hood, just pre-packaged by Apple for Apple. In fact you can connect PC’s to an XServe RAID XSan by buying a StorNext license (that’s another story though). The same is true for Autodesk Stone. All StorNext under the hood. So really when you think about it if you buy XSan your paying a premium for shittier hardware that makes it hard to connect anything but Mac’s to. Whereas you could be running an enterprise wide solution for every system on better, more redundant, faster hardware with better HA and FT credentials if you just go to StorNext. I should quickly point out that Autodesk Stone is NOT shitty hardware or a bad implementation. With them your getting good stuff, and paying for the Autodesk badge.

While going down the path of a self configured and managed heterogeneous file system may not be the right choice for every post production operation, if you have capable IT people on staff it could save you time and money, especially in the long run. On the other hand, its defiantly NOT the kind of system you want to have only off site phone support for with no one technical around to maintain it. Especially because you will end up with multiple vendors, manufactures and suppliers all of whom will doubtlessly blame each other when something goes wrong.

Tags: , , , , , , , , , , , , , , , ,