Retooling the Datacenter

Grid-based Storage

2012-08-18T12:19:00.001-05:00

When I introduced the Storage Evolution, I shared some topics I talk about when meeting with a customer: Virtualization, Deduplication, Grid and Encryption. That list has grown to include Information Lifecycle Management (ILM), Convergence, Business Continuity, Clouds and Big Data. It’s about time I start writing about more of these.

The world of storage is changing – fast. When I started consulting, I used to install clusters and super-computers. My specialty was IBM’s SP super computer (like Deep Blue the one that played the Russian chess champion Garry Kasparov). My wife asked if they wore capes.

The super computer market fell apart over a decade ago to grid-based systems. What used to cost millions of dollars was swept away by inexpensive commodity Intel-based servers, usually running Linux and grid software. The market changed and super computers started to become extinct. What used to be a scale-up model became a modular massively-parallel model which became a highly distributed model. The market changed.

When you go to deploy an Internet facing app today, you are not building one large web server, but many web servers, backed by application servers backed by database servers. A load balancer distributes the incoming traffic amongst the various servers to maximize throughput and minimize response time. This again is a scale out, not scale up model.

This is the basis of the grid (or farm if you prefer). Many distributed nodes are balanced, distributed and protected so there is no single point of failure. Server and desktop virtualization is protected N+1 and can easily scale. We repeat this model throughout the enterprise.

Storage is no different.

EMC has: Avamar, Isilon, Atmos and Centerra; HP: Lefthand and 3Par; IBM: XIV and SONAS; NetApp: Cluster-mode. There are others like Hadoop which distributes the data into different function nodes. There are products that go half-way, such as EMC VMAX and IBM SVC; these effectively partition the data amongst redundant controllers but the data and throughput is siloed without being completely distributed.

Why Grids

Why grids: mainly because we need them. Imagine storage that scales with you. Instead of head swapping to a larger controller, you simply add more nodes. You now have a scalable system not just at the spindle-level, but also at the controller level.

As solid-state/flash disk becomes more prevalent in the data center, the bottleneck moves from the spindles to the controllers. As storage efficiencies such as compression, deduplication, thin provisioning, WAFL, etc. continue to grow in use, this puts more strain on the controller and it’s processing power. Snapshots, mirroring, replication and NAS all add overhead. The result is the controller, which has long been over-powered, is starting to strain.

We’ve been lucky. For most of my customers, as throughput has grown, so has capacity and spindle count. We’ve mostly been keeping pace. FlashCache, FAST Cache, auto-tiering and SSD have helped control this, but we’re getting close to the breaking point. Storage efficiency software combined with low-latency solid state is pushing controllers to the brink.

We’re growing beyond a dual-controller solution. When IBM came out with real-time compression, they stated you need a lot of free CPU on the v7000/SVC. If EMC or IBM adds dedupe, we’ll see what they do to controller loads. NetApp cluster mode arrived just in time for two of my larger customers. Whatever the cause, we’re running out of bandwidth within the controllers.

This isn’t a terrible worry. We’ve been moving toward grids for some time. We have to. Without them, we’ll eventually run out of gas. Today, I can architect any solution, but I may have to partition the data: multiple controllers, multiple SVC nodes, or multiple VMAX engines. It’s not the best solution; we end up with silos of data. Where do we want to go: a single-managed intelligent grid. One set of data that self-balances, self-tiers, distributes itself elegantly, without external add-on tools.

This is where we’re heading. It will be a better world. We will be able to start small and grow performance and capacity like we are used to in the virtualization cluster, the web farm, the VDI farm. We’ll grow our storage the same way. We’ll be much better for it.

Hot Cloud Apps Miss a Valuable Opportunity with the Enterprise

2012-06-02T18:20:00.001-05:00

There are a lot of hot startups creating exciting new cloud applications every day. These apps have made our lives easier, some appear necessary. Dropbox lets us seamlessly move data between devices and networks. Yammer lets us collaborate twitter-style within a company. Evernote and OneNote make note taking during meetings revolutionary. Even photos, video and document networks are becoming harder to ignore and block within the enterprise.

While all of these applications have proven their worth, they often violate IT security policies and companies try to police them. Security is a tricky dance, you try and block things that must not be used, especially the harmful, but tighten the screws too much and people will find a way around them. When some of the apps are blocked in a company, workers will break out cellular cards or worse, jump in their car and head to the closest coffee shop.

These applications are increasingly useful – even critical to keep pace with today’s business. Often they are blocked to keep company secrets and intellectual property where it belongs, secure within the company’s control. Companies are investing in data loss prevention to keep the information within their borders. But borders today are becoming increasingly permeable in today’s BYOD (bring your own device) world.

While companies try and circumvent this by investing in virtual desktops, virtual apps, sandboxes and containers, the utility and usefulness is somewhat lacking. Collaboration is amazing when a company can share co-development of a presentation, a product or other type of development via a Dropbox-type app. Effective collaboration is key to my own success in developing leading cutting-edge solutions like Workload Mobility between network, server, storage and virtualization teams.

While competitors are trying to step up to the plate and deliver enterprise-strength tools that mirror and mimic those found commercially, they often don’t deliver as well as their consumer counterparts.

I would like to see is the hot new startups deliver enterprise-grade solutions. Why can’t we have an app that sits inside the datacenter, keeping the secrets where they belong, but giving all the ease of use of their consumer brethren. With the commercially availability of SOAP and REST solutions that could inside companies borders, these startups, struggling for revenue and commercial viability could easily expand and extend their market presence to a whole new audience.

It shouldn’t be hard to retrofit the server beyond a cloud service provider’s borders and into the enterprise itself. Companies could control the access, keeping the secrets easily shared, but easily cutoff if needed as well when an employee leaves. The clients on you phone, laptop or tablet could add additional networks to their visibility: the commercial cloud-centric network, the enterprise private-cloud network and those of collaborative partners. Companies would own their servers and their data. The Dropboxes of the world would sell the server software – at enterprise prices and reap a valuable new revenue stream.

Venture capitalist (VC) firms should drive this adoption and the road to quick productization and capitalization. Their investments could go from red to black much quicker than they do today. The markets would respond with more generous IPOs than the grey question marks they place on valuations today. This would meet an increasing demand for companies looking for these tools that see the value in them. Perhaps the only thing holding them back is the desire to not have their potential market revenue known, as the New York Times recently reported in Disruptions: With No Revenue, an Illusion of Value.

The BYOD market is driving tremendous productivity gains. It’s what keeps IT governance walking this tightrope allowing loose rein on adoption. When everyone sees the value in these hot new startups, it’s hard to hold back when your competition is running full steam ahead. If we could still control the intellectual property, the business’ assets, then I think adoption would drive even further and the revenue streams for these companies would drive up even farther.

EMC World 2012

2012-05-29T11:37:00.001-05:00

I had the privilege of attending EMC World last week. While I had a lot of NDA material I can not present, I can say the cool new stuff is on the way soon, or here today: VPLEX, VNX, VMAX and more.

I have a new post ready, but I’m waiting to hear back on where it will land. In the meantime, I’ll leave you with a few short videos from EMC World that I participated in.

EMC World 2012 – Hot New Tech

EMC World 2012 – Automated Tiering Strategies

EMC World 2012 – Cloud in the Enterprise

Proof of Concepts and Bakeoffs

2012-04-24T00:31:00.001-05:00

If you’ve got the time, we’ve got the gear.™

Storage is the slowest resource in the datacenter. We measure things in milliseconds, while other components are in micro or nanoseconds. We’re increasingly asked to push more and more data through pipes faster and faster. When storage fails: screens go blue, kernels panic, things get ugly. Some call this a resume generating event.

For this reason, storage professionals as a general rule are a conservative bunch. We resist change. We want things to be safe and mature with low risk and no bugs. We avoid the bleeding edge. Some wait months or years after new products are released before they considering deploying them.

This approach doesn’t always serve us well.

A few years back, I had the privilege of attending the initial launch of Cisco’s Nexus product line, traveling out to Palo Alto for a partner summit. The technology looked revolutionary. Taking FibreChannel concepts and applying them to the Ethernet world yielded amazing features: non-blocking Ethernet with guaranteed delivery, virtual output queues, FCoE OTV and many others on the horizon that we weren’t privy to at product launch. The distributed line cards were interesting, the 1000v confused us before we realized how we could take the fright out of mixing ESX and VLANs (a network administrator’s nightmare), not to mention QoS, configuration consistency across the cluster and statistics. Vendors were developing Converged Network Adaptors (CNAs) and Unified Target Adaptors (UTAs) marrying a NIC with a HBA. But the thing that really mattered: the TCO was less than SAN and IP network gear sold separately.

What else really mattered? The economy sucked, budgets were tight. Anyplace you could save money made people interested. However, not everyone was sold on it.

Other SAN and IP switch vendors balked at the technology saying it wasn’t ready for prime time. “Where’s multihop?” they cried while rushing to release products of their own. Others exclaimed it would never take off. I have to seriously ask them, using my best Vazzini voice, “You fell victim to one of the classic blunders, the most famous of which is never get involved in a protocol war against Ethernet – but only slightly less well-know is this: never go against Cisco when network convergence is on the line!” In years gone by, I deployed Token-Ring, FDDI and ATM. Ethernet took them all down. I’ve seen TCP/IP take over telephony and video conferencing. FibreChannel giving ground to FCoE is only a matter of time. (By the way, multihop has arrived.)

So three years ago we architected and sold FCoE with Nexus, and our competition didn’t get it.

Then NetApp came out with a FCoE UTAs followed by EMC. Now we no longer needed dedicated FC ports in a Nexus switch. Then Cisco surprised us all by coming out with Intel servers with the launch of UCS. Now we had this distributed blade system with the brains in a UCS fabric module: a souped-up Nexus 5000 switch. Suddenly Cisco’s master plan to take over the datacenter became apparent. The TCO was amazing, others balked again.

So we sold UCS while our competition didn’t get it.

Then EMC, Cisco and VMware formed Acadia, which begot VCE and vBlocks. NetApp followed suit with FlexPods. Suddenly we could create or collapse a datacenter into one rack or just a few, depending on the size of deployment. Again people didn’t get it. It was new. It was different. It was fully qualified without a ton of interoperability testing. Meanwhile HP and IBM cobbled together so many products it looked like an integration nightmare.

So we sold vBlocks and FlexPods, while our competition didn’t get it.

But we didn’t stop there. There was a different set of requirements growing, taking business continuity to the next level. Instead of a 5-20 minute recovery time, our customers were asking for zero downtime. So we architected Long-Distance VMotion (LDVM) with EMC VPlex and NetApp MetroClusters, Cisco Nexus, load balancers and firewalls. This was taking advantage of a different usage of Nexus gear, that of Overlay Transport Virtualization (OTV), routing Layer 2 over Layer 3, much the same way that FibreChannel was routed between datacenters using Transit VSANs and Inter-VSAN Routing. This is what happens with the SAN and IP development teams get together. Later they would give us FabricPath.

We tried to do this with IBM SVC, but IBM wouldn’t go there. They stopped at 10 km, the campus boundary. So with EMC and NetApp, we now offered a different solution, one that could move VMs between metro distanced datacenters without shutting anything down. (IBM recently introduced an enhanced stretched metro cluster in December.)

We sold LDVM while our competition didn’t get it. Like I said, storage professionals are a conservative bunch.

The Proof of Concept

New technology on the bleeding edge doesn’t have to leave you bloody. I would never recommend taking something new straight to production. I avoid risk like the best of them. But new technology can be successfully deployed by use of the Proof of Concept (PoC). This let’s us put it in, kick the tires, work out the kinks and see that it works as advertised.

We have been successfully deploying all of these solutions, safely, by use of proof of concepts. When the technology is at its newest, the early adopter phase, we plug a PoC into a customer’s development gear. It has a chance to shine or fail, without impacting production workloads. By the time development has moved on to testing, QA, staging and finally production, it has all the kinks out. And it works!

The PoC doesn’t always have to be used in the traditional sense: do I want to buy this? It can also be used to fully test new technologies in your datacenter in a pre-production sense. When that new TSM with DB2 version came out, we used PoCs to test the migration at each customer to make sure the migration didn’t break anything. The PoC model works with new technology making deployments more successful. You can test it then deploy it and reap the money saving benefits without getting blood on your hands.

These technologies are still a bit radical for some people, and the traditional use of the PoC is still requested. Do vBlocks/FlexPods really work? In it goes, the VMs get loaded and voilà testing commences. Most roll into production and never have a chance to leave. They are purchased at the conclusion of the trial.

We’ve been deploying these for years now. Most of these technologies are now mature, no longer in the early adopter stage. As new features or technologies roll, we will safely test them out, in our lab, then a PoC.

Bakeoffs

As a general rule, I’m not a fan of bakeoffs. Bakeoffs are when you pit two vendors against each other in a grudge match. A lot of time and effort go into setting everything up before the test. I am however confident that they will perform well. I architect everything on the pre-sales side to be right-sized – balancing head size, spindle count, capacity and projected growth over the life of the system.

Each vendor optimizes for different results and philosophies. Some offer rigidity in favor of being more fully qualified and tested. Others offer flexibility with less thoroughly testing all combinations and software releases. They are both valid and map with different IT organizations philosophies quite well. One size does not fit all.

The issue I have with most bakeoffs is: most of the time everyone passes the test. The architectural choices often determine the speed of the solution, not the vendor. The buying decision usually falls to the bottom line.

Of the vendors I sell, each solution can be architected to work well. Whether it’s EMC, IBM or NetApp, I can make it work for you. It’s really not that complicated if you know what you’re doing. Head size is a matter of performance sized for today and growth over 3-7 years. Spindle count will be dictated by throughput, with a cushion (overhead). Once desired capacity is known, drive size is determined knowing how many spindles you need to hit throughput optimizing for the capacity target.

So when I help someone decide what solution will work best for them, optimizing the value comes into play quickly. If I’m not offering the best value, my competition is. Always offering the best value is a secret to my success.

Doing Amazing Things

Where I’m going with all of this is some of these newest and coolest technologies are actually money saving. Business as usual often isn’t. Convergence reaps rewards for your CapEx and OpEx. Taking your datacenter to the active-active model isn’t as out of reach as you might expect and not that different in management.

Some of our smallest customers are reaping the rewards. Some of our largest see the benefits. This isn’t a Fortune 100 sized solution, it’s often a money saving one. We’re saving money for customers large and small. We’re improving availability for all types. It’s an amazing time to be in this field.

So next time you think these solutions are too new, know that people have been putting them in for years with us. Next time you think availability paradigms haven’t changed, understand that people are reaping the benefits today. It’s a rapidly changing landscape, and picking the right partner often can be key. Some of us have been deploying these solutions for years now, while others are just jumping into the pool and offering them.

Is your technology partner seasoned? If you have new technology you want to deploy, is there a proven safe way to get to where you want to go? Can your partner deliver in all the areas they need to: network, storage and compute?

By the end of this year, we will have deployed many more vBlock, FlexPod and Long-Distance VMotion solutions: safely and successfully. Can your partner do that?

If you’re not sure or you don’t trust them, drop a line and we will put in a PoC. If you’ve got the time, we’ve got the gear.™

Long-Distance vMotion: Updates

2011-10-11T10:00:00.000-05:00

This is an update to the Long-Distance vMotion series I did earlier this year. If you wish to read it all, start with Long-Distance vMotion: Part 1.

The problem with blogging about technology, techniques and architectures is they change. Sometimes that change is rapid, sometimes it takes time over major releases. In a more converged world where multiple components play, they can change quite rapidly.

Since writing my Long-Distance vMotion (LDVM) series, there have been some changes. Here in lies the dilemma, do I go back and change the old articles, or do I post an update like this new entry. I could add a section to the blog with the latest analysis, called Long-Distance vMotion. Part of me feels I should leave old posts unchanged (except correcting typos and erroneous information). The other approach would be changing the old articles preserving search engine entry points that are currently sending people into the articles – they wouldn’t have to go to another place in the blog for the latest updates. I can post-date new entries, I can’t post-date new information. Which is the best approach? Let me know what you think.

So what’s new? Really two things. First, vSphere 5 came out with some new enhancements on the LDVM front that can be taken advantage of. Second, IBM has decided to enhance the SVC to support distance vMotion across the metro.

For those of you who know me personally, these LDVM blog entries are a subset of a presentation I give in my day job. The presentation goes further in-depth with pictures and animations of how each part interrelates to fully explain the topic. It also takes into account other pieces of the network stack needed to make it work. The blog is stripped down, simplified to make it suitable for easy reading. If you’re interested in this topic, you should engage me or any of my qualified peers for the full-blown presentation. To engage us, send a message to me on Linked In. You can do this from bottom of the right hand column or the Biography tab. (I limit spam this way.)

Why is it stripped down? First, it’s not easy to take pictures and animations and give all of their meaning in a blog post. Second and more importantly, we are competitive! I don’t want competitors gaining all the knowledge necessary to pull this off, not that many can. It combines different disciplines, many of which we are market leaders in. Others may be able to sell the pieces if they had the full list, but few would have the engineering staff to make it all work. We can.

vSphere 5

The first change that effects LDVM is with vSphere 5. Released a few months ago, after the initial shock and awe of the licensing settled; people started digging into the details. If you recall from Long-Distance vMotion: Part 2, vSphere 4 had a latency limit of 5 ms round trip time (RTT). In vSphere 5, with the Enterprise Plus version (and only that version) we now get a feature called Metro vMotion. VSphere 5 Enterprise Plus takes us from 5 ms RTT to 10 ms RTT. With good clean switch-free links, instead of ~400 km at 5 ms RTT latency, we should get double that distance of around ~800 km with 10 ms RTT latency. The latency is what dictates the distance.

Now 800 km might not be available with every LDVM storage solution today, but their limits tend to update faster than major releases of vSphere. It’s a decent distance that may just take us beyond the metro from ~62.5 mi to ~125 mi. Remember that’s circuit distance, not as the crow flies. Additional switching will add additional latency.

IBM SVC Extended Distance Stretched Clusters

For those of you thinking the IBM v7000 killed IBM’s plans for the SVC, it remains the only device to do LDVM – and it just got better. IBM has taken care of many of their LDVM limitations with the latest v6.3 code, due out next month. I’ve been sitting on this for a few weeks now, waiting for it to announce. Well today is that day.

I have had two local happy IBM SVC customers looking to defect to another vendor because the SVC couldn’t do LDVM. Today’s announcement and the pending v6.3 code changes that. I had previously stated that they had a campus solution, limited to 10 km or 6.25 mi. IBM will now allow a new type “extended distance stretched cluster” up to 300 km (100 km will yield better performance). They also take care of the unwieldy amount of dark fiber by now allowing SAN switch ISLs between sites. Previously you had to go from the switch directly into the storage at the remote site with long-wave single mode fiber. With a Cisco MDS solution using VSANs, trying to keep best practices with split-brain protection, I can cut my fiber down to 6 links. It was projected to be 12 links at one customer.

IBM has made a lot of updates, tested and qualified solutions to get to 300 km. There were code changes to allow the greater distance. There was testing and qualification to support using the SAN to facilitate ISL traffic. You should also be able to use FCIP as long as you’re within limits.

IBM has a large install base of SVCs and many customers will be wanting to take a serious look at retooling their exiting solutions. Now that the SVC has enterprise licensing, you can easily test this out or migrate your existing infrastructure by just adding a new pair of nodes and stealing some licensing from some existing clusters.

They still require the most fiber between sites, but I can live with 6. If they offered a software agent like EMC’s VPLEX witness or NetApp’s Microsoft SCOM plugin for split-brain protection instead of the enhanced quorum volume, I could lower my OpEx further by using only 2 links to be on par with EMC’s VPLEX.

Expect to see this area: workload mobility and active-active datacenters heat up. It lets us move VMs around, move into and out of clouds transparently without downtime and will be an evolving technology for years to come. Whether you’re talking about vSphere, HyperV Live Mobility or PowerVM Live Partition Mobility, these technologies are evolving.

IBM has entered the game.

vBlocks and FlexPods: is this Coke v Pepsi?

2011-08-29T00:28:00.001-05:00

When Cisco came out with their UCS servers, I was impressed. They took the Nexus FCoE switches and modified them into a whole new thing, the UCS: with FCoE, service profiles and an expandable distributed blade server model. What really makes sense is the bottom line, you can save real money by deploying them over traditional blade or standalone servers. They save money with price per port and not having to buy additional switches for every 14-16 servers in a traditional blade enclosure. They simplify rapid deployment of servers. They allow moving workload to new blades without having to rebuild them.

Converged networks suddenly start to make sense with the Cisco UCS. You begin to see Cisco’s master plan in action. It’s not just FCoE in a switch, but a whole system built around best practices: FCoE, boot from SAN, etc. The biggest gain immediately obvious are the service profiles: VMware abstracts servers, service profiles are somewhere in between virtualizing the hardware the VMware is built upon. Firmware, UUIDs, WWPNs, MAC addresses, everything is abstracted. It took things one step farther than HP virtual connect.

So What Is This vBlock and FlexPod Business?

Now that you understand the idea of converged networks (Ethernet and FibreChanel) and you add UCS where convergence starts to really work, how do you make it bigger, and better? The problem with a new system, a problem much of us face, is that of qualification. We need to match HBAs and NICs (or CNAs in this case), servers, storage and network gear (SAN and IP), qualifying interoperability for each component. As a storage architect this is a part of my job, making sure each component works with every other component. Also, we need to take into account existing legacy equipment, the switches that we connect to making sure each and every part works. That’s where the vBlocks and FlexPods make sense.

For the vBlock, we have EMC storage coupled with Cisco Nexus (5500 and 1000v), UCS and MDS gear, all pre-integrated into one working system, end to end including VMware vSphere. It rolls in, connect a few network cables and start loading VMs. All of the equipment is pre-qualified to work together. Sold by VCE they offer a variety of models to meet different sized workloads utilizing VNX (300 series) and VMAX (700 series) storage. You can combine them with RecoverPoint or VPLEX for replication between sites and data protection.

For the FlexPod, NetApp, Cisco and VMware provide a pre-validated reference architecture. It couples NetApp with Cisco Nexus (5500 and 1000v), UCS and VMware vSphere (skipping MDS). There’s not fixed models but instead the ability to change a component in its class: a Nexus 5596 instead of a 5548, or a FAS3270 instead of a FAS3210. The idea behind a reference architecture is a little more flexibility in the design.

Both attempt to tackle the same goals: provide a fully tested, pre-qualified validated design. The EMC offers a tightly integrated container you connect to with controlled software releases, the NetApp has you roll your own within guidelines. Both are good solutions and I’ve designed and installed many of them. They work elegantly.

New Technology

They take the risk out of new technology. You’re often left wondering if the new hardware will work with your system. By integrating the network, compute, storage and virtualization layers into a validated supported design, you can load anything that supports VMware. All of that interoperability testing is done for you. All is supported by all three vendors. Finger pointing should be eliminated. That’s a real step forward.

Of course, the environment runs more than just VMware with other supported applications: some of which aren’t yet supported to be virtualized. And there are other pre-packaged solutions, such as leading ERP software and HyperV.

Do You Like Coke or Pepsi?

Both NetApp and EMC are almost feature parity these days and the gaps are closing between them. If you’re considering a storage purchase, a virtualization project, or a greenfield datacenter these solutions are worth taking a look at. Interoperability testing takes time. Purchasing all these components separately requires a lot of integration work, and extra dollars. Calling for support can involve questions around where the issue lies: VMware, Network or Storage. The advantage is all three venders in either solution work together because it’s tested by each of them.

Of course, you’ll also want to save money. That’s what convergence is all about. If you want to take a lot of the integration headache and finger pointing out of your solution then these may just be for you.

You’ll be left with one final question, the Coke or Pepsi one: would you like EMC or NetApp. Perhaps I could ask if you’d like fries with that.

Cloud #fail

2011-08-09T23:16:00.001-05:00

This is a very brief post on the cloud computing failure of today. I hope to have a guest writer post something better, more lengthy in the future.

I’ve been partaking in some discussions among peers on today’s Amazon EC2 cloud outage – again. I’ve been listening to people say the cloud isn’t ready or is a bad idea. The cloud is the cloud, and continues to be a great decision for a lot of people where it makes sense. The failure people make is in abandoning IT best practices when going to the cloud and going with a single system or provider.

When we design for disaster recovery or business continuity, we usually design in redundant, diverse data paths to the secondary data center with carrier diversity (meaning more than one carrier). When going to the cloud, if you’ve decided to outsource everything, you should continue that diversity with multiple cloud providers and the resiliency to be able to use either. Failure to provide cloud diversity is the same as having one datacenter, you’ve got all your eggs (IT) in one basket.

When going to the cloud, you should either have a hybrid private/public cloud with redundancy, or two public cloud providers with diversity. Those that stray from IT best practices will pay the price – on twitter you’ll get the dreaded #fail associated with your name.

Long-Distance vMotion: Part 3

2011-07-12T00:11:00.001-05:00

This is the third part in a multi-part series on the topic of Long-Distance vMotion. Part 1 introduced us to disaster recovery as practiced today and laid the foundation to build upon. Part 2 built out the Long-Distance vMotion architecture with a few different approaches.

There are some limitations and challenges that we must consider when designing LDVM or other workload mobility technologies. If it were too easy, everyone would be doing it. The first area we’ll address is commonly referred to as the traffic trombone.

Traffic Trombone

Understanding what I mean by a traffic trombone requires a bit of visualization. When we have one site, everything is local. My storage is connected within my datacenter, my other subnets are routed by a local router. The path to these devices is very short, measured in meters, so latency is very small. As we migrate VMs to another datacenter, the VMs have moved, but the network traffic and storage traffic continue to go back to their original router and storage controllers, if we don’t add a little extra prevention. When we send a packet or a read/write, it goes back to the original datacenter, gets serviced, then returns back to our new datacenter where we’re now running. That backing and forthing is what we refer to as tromboning, hence the traffic trombone. (My PowerPoint presentation drives this home.) I’ll address this in two parts: network and storage.

When addressing the IP network, the first thing I’ll say about the network trombone is this is a desirable effect for existing network stateful connections (TCP). We want those connections to stay alive without disconnecting. For all new connections, we’d like to optimize the path to go through the local site. When we optimize the path, I’ll further break this down between coming to and leaving my subnet. For the remote subnets, Cisco GSS and ACE play a role with vCenter awareness to point to the correct site where this service is running. GSS points to the external ACE vIP where the workload currently lives. For leaving our subnet, we use HSRP default gateway localization forwarding traffic to the local Cisco ACE device for processing. This helps preserve symmetrical routing so our firewalls don’t drop our packets thinking something has gone wrong.

An alternative emerging technology is Locator-ID Separation Protocol (LISP). This protocol runs on both the outside of our subnet pointing to the correct site, as well as within pointing to the correct default gateway out. Think of LISP in the terms of cell phones. We used to have phone numbers that were tied to a specific cell-phone provider. When we switched carriers, we needed to get a new phone number. Phone number portability untied our phone numbers for a particular provider, it can now go with us and points to our new company we switched to. LISP does a very similar thing to IP addresses, it lets us take it with us to a new site and points to where we now live. LISP is available on certain blades in the Nexus 7000, and is also being ported to other products.

When addressing the storage area network, some products are already tackling this problem. EMC VPLEX and NetApp FlexCache each open their volumes locally without MPIO drivers extending across the datacenters, eliminating any traffic trombones. When dealing with IBM SVC and NetApp MetroCluster varieties, since they are split-controller designs the MPIO paths will have active paths to one datacenter and passive paths to the other. When the VMs move to the other site and back, one of those two sites will trombone traffic back to the primary controller for that volume. This will add latency into the IOs. In the case of SVC, we can only go campus today anyway (< 10 km) so the distance and latency is pretty short. In the case of NetApp, we need to stay with synchronous distances, but my MetroCluster customers haven’t had adverse impact to their IO. Of course, your mileage will vary depending on how much you stress your storage.

It is always important to have competent network, storage and virtualization architects take all latency, routing and cluster impacts into account to have a successful implementation. Excessive network or storage traffic needs to be understood, and QoS is always applied. A vSphere architect can help design how the cluster will be laid out and take DRS and fault-tolerant VMs and their associated traffic into account.

Future Directions and My Wish List

While EMC VPLEX and NetApp FlexCache has node redundancy at each site, IBM SVC and NetApp MetroCluster do not. This has scared one or two customers away from these solutions. It also can make code upgrades require a bit more planning. I’d like to see NetApp and IBM come up with solutions with node redundancy at each site. NetApp’s evolving cluster mode for block storage (FC/FCoE, iSCSI) may provide some of this. IBM has other technologies in the DS8000 base, such as Open-Systems Hyperswap that could possibly hold promise here.

When it comes to the storage trombone, both IBM and NetApp need to eliminate all storage trombones and IBM needs to go beyond 10 km. This could possibly be with more intelligent multipath drivers that plug into vCenter and automatically swap active/passive paths as the VMs move.

VMware vSpehere itself needs some work to go beyond their current limit of 5 ms RTT, or 400 km under perfect conditions. This is one of the biggest limitations of the technology today. In EMC’s recent EMC World, the announced VPLEX Geo for regional protection sending Microsoft Hyper-V VMs about 3000 km. EMC has a product roadmap with VPLEX to go around the world (NetApp FlexCache already does this) with VPLEX Global.

Conclusions

While I suddenly have a number of customers investigating and deploying Long-Distance vMotion, I do understand it is not an inexpensive solution. First we need a Metro Area Network (MAN) capable of doing 622 Mb/s or greater speed with under 5 ms round trip latency. We’ll need to transport the storage traffic as well. This doesn’t come cheap, but depending your options in your metro, the prices are more attainable every year.

I currently favor EMC VPLEX Metro and NetApp MetroCluster because both are validated, tested and referenced in KB articles by VMware (see below). I also have customers deploying these solutions. IBM just doesn’t go far enough, which is too bad, since I have a large SVC install base wanting this technology, but the EMC solution can front end an IBM SVC. It’s been hard-pressed finding a lot of NetApp FlexCache solutions outside of Hollywood. The technology has promise, but again is all NFS.

A lot of these same principles also apply to IBM PowerVM Live Partition Mobility, Microsoft HyperV Live Migration and Oracle VM for x85 Live Migration. Each hypervisor will come with it’s own limitations and peculiarities, so make sure you fully understand them before deploying them. Most of these solutions can be deployed within the same infrastructure as LDVM.

This technology seems to have caught some of the storage vendors off guard. When I first presented this in March of this year, I had spent six months preparing for it. I already had customers ready to deploy. Since March, I’ve run into around five more locally, and more nationally looking at these solutions. As prices in hardware and bandwidth fall, it will become as common as storage mirroring has become in the last decade.

References

Cisco/NetApp Whitepaper: Workload Mobility Across Data Centers
http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9402/white_paper_c11-591960.pdf
Cisco Overlay Transport Virtualization (OTV)
http://www.cisco.com/en/US/prod/switches/ps9441/nexus7000_promo.html
Enabling Long Distance Live Migration with F5 and VMware vMotion
http://www.f5.com/pdf/white-papers/cloud-vmotion-f5-wp.pdf
EMC VPLEX Architecture and Deployment: Enabling the Journey to the Private Cloud
http://www.emc.com/collateral/hardware/technical-documentation/h7113-vplex-architecture-deployment.pdf
EMC VPLEX 5.0 Architecture Guide
http://www.emc.com/collateral/hardware/white-papers/h8232-vplex-architecture-wp.pdf
f5 VMware vSphere Solutions
http://www.f5.com/solutions/applications/vmware/vsphere/
IBM Redbook: Implementing the IBM System Storage SAN Volume Controller V6.1 SG24-7933
http://www.redbooks.ibm.com/redpieces/abstracts/sg247933.html?Open
IBM: Split cluster configuration
http://publib.boulder.ibm.com/infocenter/svcic/v3r1m0/index.jsp?topic=/com.ibm.storage.svc.console.doc/svc_hasplitclusters_4ru96h.html
Implementation and Planning Best Practices for EMC® VPLEX™ Technical Notes
http://www.emc.com/collateral/hardware/technical-documentation/h7139-implementation-planning-vplex-tn.pdf
NetApp: A Continuous-Availability Solution for VMware vSphere and NetApp TR-3788
http://www.netapp.com/us/library/technical-reports/tr-3788.html
NetApp: MetroCluster Compatibility Matrix Dec 01, 2010
http://now.netapp.com/NOW/products/interoperability/MetroCluster_Compatibility_Matrix.pdf
VMware KB: Using VPLEX Metro with VMware HA
Mar 7, 2011 KB Article: 1026692
http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1026692
VMware KB: vMotion over Distance support with EMC VPLEX Metro
Jan 21, 2011 KB Article: 1021215
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1021215
VMware KB: VMware support with NetApp MetroCluster
Jan 21, 2011 KB Article: 1001783
http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1001783
Vmworld 2009 TA3105 – Long Distance VMotion
ftp://ftp.documentum.com/vmwarechampion/Events/VMworld/VMworld_2009/US/VMworld_Presentations_Final/TA3105/TA3105.pdf

Long-Distance vMotion: Part 2

2011-06-05T20:12:00.001-05:00

This is a rapidly changing field and there have been new updates. Please see Long-Distance vMotion: Updates for the latest changes.

This is the second part in a multi-part series on the topic of Long-Distance vMotion. Part 1 introduced us to disaster recovery as practiced today and laid the foundation to build upon.

When building out Long Distance vMotion (LDVM), we still need to focus on the same components we focused upon building out disaster recovery. We will take the leap from our recovery time taking 5 minutes to a continuous non-disruptive operation. We’ll need to change our network from two different subnets in two different datacenters, to one stretched subnet. We’ll need to take our mirrored storage and create what I call a single shared storage image. Last, we’ll get rid of Site Recovery Manager (SRM) and replace it with a VMware vSphere split cluster.

There will be some rules we’ll need to adhere to – limits – to make this all work. For the Hypervisor, we’ll use vSphere 4 or later. VMware vSphere requires a minimum bandwidth of 622 Mbps (OC12), and additionally asks us to maintain a 5 ms round-trip time (RTT) latency. Our storage volumes or LUNs will need to be able to be open for read-write access in both datacenters simultaneously, using FC, iSCSI or NFS. We’ll also need to have enough spare bandwidth to achieve continuous mirroring with low latency.

First we’ll address the network.

Building The Stretched Subnet

When Cisco came out with the Nexus 7000, one of the most compelling features they introduced was Overlay Transport Virtualization (OTV). With OTV, we can easily extend a subnet between two datacenters while achieving spanning tree isolation, network loop protection and broadcast storm avoidance. There are other technologies that can also stretch a subnet between datacenters, but with OTV there aren’t many complex tunnels to setup and maintain.

Cisco’s OTV routes layer 2 over layer 3 without having to change your network design. IP is encapsulated over the link. You can have unlimited distance, no bandwidth requirements and you can dynamically add additional sites without having to change the network design of the existing ones. Without going too deep into the technical details of the technology, it makes stretching the subnet between the two sites not take a rocket scientist to achieve, and its simplicity makes ongoing administration simpler to maintain.

Today, OTV is only available on the Nexus 7000 and has to run in its own Virtual Device Context (VDC), but Cisco plans to release it on other platforms in the future.

What About Storage vMotion?

Once you have the subnet stretched between the two sites, you can use traditional vSphere vMotion coupled with storage vMotion. We can go the maximum distance: 5 ms RTT is a vSphere limit, which equates to roughly 400 km (250 mi) circuit distance. With traditional storage vMotion, we can go the full 400 km distance and go between dissimilar storage hardware at each site. We can use any protocol vSphere supports: FC/FCoE, iSCSI and NFS.

There are some limitations.

First, now instead of just copying the memory, we’re also copying all the storage drives in addition to the memory. Each VM could take over 15 minutes to move just 100 GB of storage alone. Multiply that by how many VMs you plan to move. If you’re link is being used for anything else like replication or Voice over IP (VoIP), you’ll get less than the whole link. The biggest limitation: you cannot migrate Raw Device Mappings (RDMs), a common type of vSphere volume for databases, Microsoft Exchange and large (over 2 TB) storage volumes.

While many of these limitations can be accepted, it's not an optimal solution. Next we’ll look at storage solutions to solve the problem in vendor alphabetical order.

Shared Storage Image with EMC VPLEX Metro

VMware KB: vMotion over Distance support with EMC VPLEX Metro
Jan 21, 2011 KB Article: 1021215

EMC was one of the first to show us LDVM can be done with their virtualization appliance, the VPLEX Metro. VPLEX Metro allows us to create a shared storage image between two different synchronous-distance locations, up to 100 km (62.5 mi) apart. VPLEX virtualizes the storage behind it, aggregating it, consolidating it and providing volumes to servers. The Metro version allows the connection of two different sites, providing cache and locking coherency allowing a volume to be opened locally and simultaneously at both locations.

It is a pure FibreChannel (FC) solution with full fault-tolerant node redundancy at each site. You will need to add FC extension, such as dark fiber: Long-Wave, CWDM, DWDM or FCIP. You only need 2 FC extension links (ISLs) to make the solution work, the smallest number in all the designs and often the most expensive part of the solution. You can use all the FC best-practice designs, such as transit VSANs with Inter-VSAN Routing (IVR) or LSANs for fault domain isolation. You can utilize VMware HA Clusters and DRS.

A third site can be utilized as well with a witness agent running in a VM or server to provide split-brain protection. Perhaps the most important feature that EMC has going is it doesn’t suffer from the storage traffic trombone, as many of the other solutions do, since the volumes are opened locally at each site. We’ll talk more about the traffic trombones and how to overcome them in Long-Distance vMotion: Part 3.

We are deploying this solution with customers today.

Shared Storage Image with IBM SVC

Implementing the IBM System Storage SAN Volume Controller V6.1
November 2010 SG24-7933

While I am not a fan of this solution, I have to include it because it can be done, with some serious limitations. The first issue with IBM’s SVC for LDVM is you are limited by 10 km (6.2 mi) between sites, hardly enough distance providing only cross-campus not cross-metro protection. The SVC takes two nodes of an IO group and splits them between two different sites. IBM invented the storage virtualization category: the SVC virtualizes the storage behind it, aggregating it, consolidating it and providing volumes to servers. Splitting the two nodes (called a split-node SVC cluster) means it still keeps each node’s cache up to date in case of failure, but you’ll only be using one node at a time, regardless of where your workload is running it could be the node at the remote site. (Although at 10 km it would be hard to notice.) You also no longer have fully fault-tolerant redundant nodes at each site.

It can be FC or iSCSI on the front end to the servers, but only FC on the back end. You’ll need FC extension on the front-end, OTV will work for iSCSI, and at least for the front-end you can use all the best practice designs in regards to SANs. On the back-end however, you’re stuck. There can be no ISLs between sites and you’ll need traditional Long Wave FC SFPs to the storage, which is one reason it’s limited to 10 km. The other is beyond 10 km, the cache updates start to bog the nodes down. You are also required to have a third site, with storage there that’s capable of having an enhanced quorum disk for split-brain protection, again with no ISLs, LW SFPs and less than 10 km between all three sites. You will need 2 dark fiber connections for the front end and a minimum of 6 dark fiber connections for the back end. For each additional piece of storage per site in that IO group add an additional 2 pairs of dark fiber per system.

Unlike the EMC VPLEX solution, traffic will trombone to the active node at some point, either at the source or when moved to the destination, due to MPIO (multipath input/output) active/passive drivers sending all traffic to the preferred node.

While I wouldn’t recommend this solution, I do know of at least one customer using it, even with all the limitations mentioned. If fiber is free to you across campus, this might be for you.

Shared Storage with NetApp Metrocluster or V-Series Metrocluster

A Continuous-Availability Solution for VMware vSphere and NetApp
June 2010 TR-3788

NetApp also allows you to take two redundant nodes of a storage system and split them across two different sites with two flavors: a MetroCluster or a V-Series MetroCluster. This solution, like the EMC VPLEX Metro allows extension up to a synchronous distance of 100 km (62.5 mi). You are splitting the cluster apart, so like the IBM solution you don’t have node redundancy at each site. The NetApp MetroCluster provides shared storage with it’s own disk trays, while the V-Series can virtualize others storage similar to EMC and IBM.

NetApp offers the most protocol flexibility with FC/FCoE, NFS or iSCSI on the front end, which can use all the FC best-practice designs, such as transit VSANs with IVR or LSANs for fault domain isolation. On the native non-V-Series solution, the back end requires FC today and you have to use the Brocade switches which are included with and hard coded into the solution. You will use 4 FC connections between sites on this solution. You can utilize VMware HA Clusters and DRS.

There is no automatic split-brain protection, in the case of total network failure (if you skimp on redundant routing diversity) you will have to enter commands at the site you want to survive, bringing those volumes back online, but any one component failure is completely automatic. Like the IBM SVC traffic will trombone to the active node at some point, either at the source or when moved to the destination, due to using MPIO ALUA (asymmetric logical unit access) drivers.

We have designed and deployed NetApp MetroClusters at quite a few customers and people are quite happy with them and perform well.

Shared Storage with NetApp FlexCache

Whitepaper: Workload Mobility Across Data Centers

There is one last solution, also by NetApp that is a little different than the rest called FlexCache. The FlexCache is a pure NFS caching solution. It allows distances across continents, but due to vSphere limitations, we can go 400 km (250 mi) with this solution. With NetApp FlexCache, we can have redundant controllers at each site giving us a fault-tolerant design. More than two sites are also supported with FlexCache.

FlexCache has a primary/secondary relationship with volumes. All writes pass through a secondary back to their primary volume. If the primary site is down, writes will cache up at the secondary. With a split-brain you can run at the secondary sites. Like the EMC VPLEX there is no traffic trombone, volumes are open locally without having to cross to the second site. The controllers handle all the coordination, locking and coherency.

I find this an intriguing design, but I still don’t think NFS is the protocol that fits all needs. People are deploying NFS more with vSphere and I’ve seen Oracle systems running SAP over NFS with tremendous IO throughput, but not every application supports NFS. It might be good for vSphere alone, but these solutions often need to support more than just vSphere. I think this could mature nicely as NetApp merges the cluster and traditional block (7-mode) versions of Data On Tap going forward.

VMware vSphere Split Clusters

VMware currently restricts us to 622 Mbps bandwidth and a 5 ms RTT latency. Hopefully later versions will relax these limitations enough to go further.

VMware also doesn’t provide the intelligence for designating different sites in a split-cluster configuration so care has to go into how nodes are deployed. If you decide to implement load balancing or fault-tolerance, additional considerations need to be taken to provide adequate bandwidth between sites, and insure that workloads don’t spend a lot of time going between sites unnecessarily.

These considerations can be accomplished with proper planning with a vSphere architect.

Geographic Disaster Recovery

Building out Long-Distance vMotion we need to stay today within synchronous distances of 100 km (or 400 km with FlexCache), but what about geographic protection beyond that for disaster recovery?

We can add back our familiar WAN (or OTV over the WAN), traditional asynchronous storage mirroring and SRM between the LDVM datacenters and our remote disaster recovery site. The LDVM datacenters will appear as a single site with the DR site as it’s target. Now we have non-disruptive business continuity locally, with disaster recovery geographically.

Part 3 will look at the technical problems you still need to overcome once the architecture is built out, where I’d like to see vendors go in the future and my own conclusions.

For updates and addendums to this post, please see Long-Distance vMotion: Updates.

Long-Distance vMotion: Part 1

2011-05-03T00:43:00.001-05:00

This is the first part in a multi-part series on the topic of Long-Distance vMotion. I am currently architecting and building this out for a few of my customers.

Long-Distance vMotion (LDVM) is the holy grail of business continuity – the ability to migrate workloads across data centers or in and out of clouds with no disruption of service and zero downtime. When I started consulting in the 1990s, after my several years as a software developer, I was a high availability clustering consultant, among other things. Later I architected geographic clusters, but one thing was certain, it was very expensive, complex in architecture and difficult to manage.

Long-Distance vMotion attempts to tackle one issue, that of business continuity. Let’s face it, disasters are rare. I know there are earthquakes, tornados, hurricanes, floods and other bad things that happen. In my many years of consulting, these have rarely happened to my customers. I have two customers that have had their storage ruined, each by their own fire suppression system failing and pouring water onto their equipment. These disasters, although rare, do happen. They must be planned for. It’s risk mitigation; a business decision that doesn’t come for free.

What is much more common, what happens all the time is maintenance. More and more frequently, that maintenance needs to occur without downtime, at the very least a minimum of downtime. The most common disaster my customers have experienced in the last decade are brownouts and blackouts. These are the more common problems we face as data centers and power grids are stretched to capacity.

One of our challenges in building for disaster recovery is some kind of geographic protection. Equipment needs to be far enough away to avoid one disaster also affecting our recovery site. Flying in the face of this is a latency issue. To achieve business continuity, we need to have all transactions mirrored with guaranteed delivery. So in addition, we need low storage latency to achieve this, synchronous distances (less than 100 km or 62 mi).

As we start to strive for business continuity, or absolute zero downtime, we can proactively move workloads for impending disaster, also know as disaster avoidance. With that storm coming in we can have our workload moved before the power goes out or the pipes go down. We can also proactively move workloads for planned maintenance. When we get good and comfortable with the technology, we can start to migrate workloads between datacenters to balance workloads: for servers, for storage and for bandwidth.

We would like to have the best of both worlds: business continuity and disaster recovery. Business continuity asks for short distances today, like across the metro. Disaster recovery screams for longer geographic distances for protection. We can combine the two of them.

Before I tackle the topic of LDVM, I think it’s important to understand how we got here. So as I’m prone to do, first a little history ….

The Evolution of Recovery Point and Recovery Time Objectives

When I entered the industry in the 1980s, we protected our data with tape copies. These took the form of copy to tape or dump to tape. Commands varied by operating system but the result was the same. A tape copy or tape archive of our data sets. Our recovery points were daily for our critical data sets and weekly for less critical, less changing data, such is the application code and operating system itself. Our recovery time also went from days to weeks. It wasn’t a perfect world, but the concept of a service level agreement was more of a target. A computer that went down didn’t kill a business, but slowed it down for a time. We reverted to more manual processes.

In the 1990s backup applications entered the scene. These took the form of IBM’s TSM (ADSM in those days), Legato Networker and others. The main strength was in the automation. Now backups were automated scheduled jobs, either over a network or to a directly attached tape drive, tape library or silo. Because of the automation, our recovery points became daily and our recovery time objectives shrunk to mere days.

The last decade brought us to the era of snapshots and mirrors. BCVs, FlashCopies and Snapshots created very fast hot backups. Mirroring extended data protection to the recovery site without the need to lug tapes across country. We now could restore from a mere number of seconds to a handful of hours. We were protected to the recovery site and could be back up and running anywhere from 5 minutes or less to a few hours. Life was certainly improving.

This decade we enter the cloud and grid era. We want our recovery points to be continuous: we should be able to roll back or forward to any point in time like the DVR at home. We want recovery time to be zero, a non-disruptive recovery. While some of this is available today, it’s not quite there yet (non-disruptive recovery?). However, I’ll also point out, we’re just into this decade.

To address and achieve business continuity, we look at the same familiar components we address when we deal with virtualization: namely servers, networks and storage.

Virtualization

I’ve already covered virtualization previously, it’s an over-used word like a marketing brochure mentioning cloud computing today. But virtualization is nothing new.

Everyone’s heard of server virtualization. Blade technology made an initial splash at being the datacenter’s savior by offering a smaller footprint, centralized administration and some limited physical resource abstraction. Server virtualization really made a splash with hypervisor technology like IBM’s PowerVM (for UNIX), VMware’s vSphere ESX, Microsoft’s HyperV and Oracle’s VM for x86. It’s been proven over and over again to provide real physical server consolidation (hundreds of servers to a handful), as well as a serious enabling technology for disaster recovery and business continuity (mirroring and fault-tolerant VMs). Of course this is old hat on the mainframe, but that’s not the topic of this post.

Network virtualization is something we don’t think too much about, we take it for granted. But do you recall the period before VLANs, when networks were physically separate? How about VPNs? DNS is a way of virtualizing IP addresses, as is NAT. Interfaces are virtualized with link aggregation like 802.1ad and TRILL. Even the switches themselves are virtualized with a vSwitch, IBM’s IVE/HEA, Cisco’s Palo and Nexus 7000 Virtual Device Contexts (VDCs), where the switch is split up and virtualized just like a server running ESX is today.

Storage virtualization as well has evolved throughout the years. There was a time where a file could not be larger than a physical disk, which came in MBs and not many of them. Then RAID and LUNs allowed us to span the disks in the RAID set. Logical Volume Managers allowed us to create even greater amounts of storage aggregated together. SANs were virtualized with VSANs similar to VLANs, as well as NPIV (N-port ID Virtualization). Storage subsystems were virtualized with IBM’s SVC, NetApp’s V-series, EMC’s VPLEX and HDS’ USP-V. Secondary storage was virtualized between tape cartridges with VTS, or a whole tape library with VTLs.

Needless to say virtualization has been around a long time and continues to evolve and permeate throughout the data center.

As I said above, when we virtualize we look at servers, networks and storage. Servers contain the CPU and Memory components, the network connects it to the outside world and the storage is shared between the nodes in a grid (ESX cluster, PowerVM servers). When we virtualize a server, we take that physical server and create a virtual machines (or LPARs), which contain virtualized CPU, memory, network interfaces and storage interfaces. When we do vMotion, we copy the memory of the VM from one physical server to another, suspend it on the source and resume it on the target – all transparently and non-disruptively to that server’s clients. The same can be said of moving LPARs with Live Partition Mobility.

This is our basic building block we’ll use.

Disaster Recovery Today

So taking that basic building block, we build out the architecture of disaster recovery used today. We can connect two sites together with a wide-area network (WAN) or metro-area network (MAN). We can mirror the storage between those two sites once they’re connected. Add in Site Recovery Manager (SRM) and we are now ready for disaster recovery. Sure there may be some other pieces; places where NAT and DDNS and load balancers play, but these are the guts of it. This basic architecture is the playbook most of us use. This is our starting point, the architecture we’ll extend in the next part of this article.

Part 2 will build out the Long-Distance vMotion architecture with a few different approaches.

The Empire Strikes Back

2011-04-06T20:24:00.001-05:00

I was getting ready to write EMC off, at least in the mid-tier. The Clariion was old-tech, and an old way of doing things. They screamed unified, but it didn’t feel that way. Celerra in the NS/NX felt like a bolt-on. They were expensive, fragmented and difficult to work with.

EMC had been making a number of good buys over the past couple of years. RSA, VMware, Kashya and Data Domain come to mind. Avamar and YottaYotta were lesser-known pieces. When it came to primary storage, however it seemed stale. Then they started showing the cards they were holding.

First came VMAX. They refreshed the Symmetrix line with a modular, scalable architecture. It could grow from something small to something big. But the real changes starting coming with FAST and FAST VP.

You see, we were still doing things old school. While competitors like IBM had SVC, a way of aggregating performance across an entire storage platform, let alone multiple storage platforms, EMC remained a rock. While NetApp copied IBM’s SVC with aggregates, EMC remained a rock. While Compellent showed the world what tiering could be with automatic movement, EMC was a rock.

EMC remained quiet. They bought good companies. They made strategic acquisitions. I was partially impressed, but mostly with their M&A. Then came FAST.

Fully Automated Storage Tiering

OK, the name is kind of boring. The marketing name, FAST has a bit more. FAST brought us LUN mobility between storage tiers. It was a nice try, but it was 1.0. Not long after FAST came FAST VP (FAST 2.0 to some), then the paradigm shifted.

With FAST VP, we got an array of technologies. We were given sub-LUN tiering (what we really wanted) and virtual provisioning. FAST VP brings EMC from a legacy Web 1.0 feel into the modern Web 2.0 world. Storage with EMC became easy.

Enter the VNX and VNXe

So instead of coming out with the CX5, NX5, NS5, AX5, we get the VNX and VNXe. It is a break in small and large ways. Although one can see hardware similarities between the two generations, there are stark differences. I’ll try and point these out.

First, the hardware looks familiar. We have SPs and X-Blades. In the VNX line we have similar componentry as CX/NX/NS. The backend has been updated from FC-AL to SAS, as all storage companies are doing. It’s fully SAS v2.0, so no worrying about legacy SAS gear. We have different cards we can stick in the SP or X-blade unit, including FCoE in the SP, but I still need a separate 10 GbE card in the X-blade for NAS (VNXe is truly unified in hardware for block/NAS, but FC/FCoE doesn’t exist in this line). While this may have felt like a significant limitation in the NX/NS4 space, it feels better this way in VNX.

Why? Because with FAST VP, we have virtual storage pools. Block and NAS draw from the same auto-tiering space. Since it’s one pool of storage, the separate hardware doesn’t feel separate. (I do need more switch ports in a Nexus however.) FAST VP allows SSD, SAS and NL-SAS (i.e.. SSD, FC, SATA in previous technology/terminology) to share the same pool – it uses policies to tune itself.

In addition to FAST VP, there’s FAST Cache. If you’re familiar with NetApp’s FlashCache (i.e.. PAM II), it works similarly. You can take SSD (instead of a PCI card) and use it as an extension of cache. You’re not provisioning on these SSD drives, but instead using them to read from as a cache and write to as a cache, an extension of the existing cache. I kind of make an analogy to a layer 3 cache in the compute world. You can combine FAST Cache with FAST VP – the SSD in the virtual storage pool is provisioned from along with the SAS and NL-SAS. The FAST Cache is transient data. If you don’t quite feel ready to trust your storage on SSD, then FAST Cache is for you. Nothing is permanently stored there.

From a management perspective, gone is the ugly, hard to use Navisphere. It’s replaced with an Adobe Flash-based GUI call Unisphere that performs unified configuration with wizards ready to deploy all the best practices for Exchange, SQL, ESX, Hyper-V, CIFS and NFS, to name a few. It’s incredibly easy to use and competes well with v7000’s XIV-like interface.

EMC had a very a la carte menu of software options and features. Many things were too expensive for customers to use (RecoverPoint comes to mind – the Kashya purchase). They further cleaned up the options, like NetApp did years ago with suites and packs. (Similar to NetApp’s packs and bundles). They even make sense. I’m betting this lowering of price will cause further adoption, and more software sales.

First there’s the base software that comes free. This is file-based deduplication and file-based compression, block compression, virtual provisioning, SAN Copy (think migrate LUNs into me) and Unisphere for Web 2.0-based management. You get all protocols for free: NFS/pNFS, CIFS/MPFS, iSCSI and FC/FCoE. They also add Web 2.0 protocols REST and SOAP for object-based storage. If you’re migrating in NAS, they also give a limited term Replicator license to import your file-based data.

FAST Suite comes next, which includes FAST VP, FAST Cache, Unisphere Analyzer and Unisphere QoS Manager. I would never sell one of these without this suite.

There’s the Security and Compliance Suite for Events (AV, quotas, auditing), file-level retention and host-based encryption.

Local Protection Suite adds local snapshots with SanpView and SnapSure. The real bang though comes with a free fully licensed RecoverPoint/SE for continuous data protection. Just add RecoverPoint Appliance (RPA) hardware and you’re set for DVR-like recovery of your applications.

Remote Protection Suite adds Replicator and MirrorView. It again adds RecoverPoint/SE remote replication, just add RPAs.

Finally they add an Application Protection Suite to give Replication Manager, Data Protection Advisor and agents for Exchange, SQL, Sharepoint, Oracle and SAP.

By adding RecoverPoint Appliance hardware (little cost), EMC VNX moves to state of the art snapshot and mirroring recovery. With little cost, they are best of breed. Add in the agents – and there is a Total Protection Pack covering Local, Remote and Application Protection Suites, and you have one hell of a system for recovery. Or, take the Total Efficiency Pack and get it all, each suite listed above, often cheaper then pulling one or two out.

It’s the Price

Adding in new hardware, new software and fixing the software stack is a well-met move on it’s own, but they didn’t stop there. When IBM came out with the v7000, they showed us a new value for the price for what you got. EMC not only went from being expensive, old and kludgy, but went to being the innovator, new and the best value. They didn’t price VNX to go head-to-head with NetApp or IBM, they’re taking on the smaller players too. They are easily competitive with the Dells, HPs and other second-tier storage vendors out there.

When I say they strike back, I mean it.

EMC was easily a yawn with previous gen gear. It was a safe bet. It worked. It was supported by everyone. Now, EMC is a real leader, not just in market share but in functionality, in features as well as in price. They are hitting everyone back hard with a solid product, a well thought out stack and it’s priced to win. It also integrates well with other products, such as Atmos for archiving.

Other vendor’s need to wake up, or be left in their wake. NetApp used to be the innovator, the leader. They were my favorite, then IBM came out with the v7000 – it was feature rich and easy to use (although it still has that IP-based replication hole). That was 4Q10, now it’s 1Q11 and it’s a different world. The decade’s off to an interesting start. EMC has shown us that they were a sleeping giant, and they’ve woken up and shown us a new world. Their tech portfolio is starting to make sense. You can see the years of cross-integration work. What’s next? Well they have other IP to cross-pollinate into their product lineup. Let’s wait and see.

I’m expecting a lot from EMC this year. They have a lot going on with deduplication and VPLEX for Long-Distance vMotion. I expect great things with Vblock once VNX gets baked into that line. And by the way, I’m not reneging on the Long-Distance vMotion article. The presentation is done, was given and well received. It’s my next article up.

IBM, NetApp, HDS take heed. EMC is back.

I’ve Got The Remote Replication, Single-Storage Image MPIO Blues

2011-01-17T22:43:00.001-06:00

There are not a lot of customers I meet that don’t want some form of replication to a disaster recovery/colocation facility. What used to be financially unreachable has come down over 10-15 years to be affordable for most businesses. Remote replication, coupled with VMware or one of the other hypervisors providing server virtualization has made recovery quick, easy and within budget.

So as I look at some of the new storage systems being released lately, I’m scratching my head. Why would an affordable small to medium business mid-tier storage system provide only FibreChannel-based replication – today?

When I started doing SANs, they were SCSI, SSA and ESCON. None of these were that scalable in terms of connectivity. FibreChannel really opened up SANs to consolidate and centralize storage. And it was necessary. Without FC, we had many storage islands. We had clusters that needed terminators pulled. We would not have the VMware farms of today. It started expensive enough by itself, but iSCSI was a low cost alternative and really brought the price of SANs and FC down to the street prices we see today.

So when I look at the latest crop of storage from vendors (and you know who you are), I am floored when replication is still FC based, expecting dark fiber, DWDM, CWDM or FCIP to extend from one data center to another (and I’ve done all of these). To put it another way, everybody has TCP/IP connectivity between datacenters. Most of these same storage subsystems have iSCSI interfaces. So how hard can it be to send that same replication traffic over TCP/IP? The interface is already there! Let me do the IP QoS engineering. I’m good at it.

So imagine you are the Storage Solution Architect at a large and influential $9B reseller and you walk into a customer and am asked what each vendor has to offer? They want to replicate storage from one site to another and they have a budget under $200K. Guess what, those FC only replication solutions are knocked out. As much as I love and believe in that solution, as much as roadmaps say they might someday offer … they’re outta there.

So I’m may be head over heals about some of these new storage solutions. I may really believe in their functionality, their ease of use, their pizzazz, but when the rubber hits the road, they just don’t cut the mustard.

There are places they fit and there are customers that either have the infrastructure or can afford it. For those that don’t, I have the remote replication blues.

Single-Storage Image

Vendors, if you are still reading I have another message for you. Customers crave long-distance VMotion. Don’t wait for Gartner or IDC to tell you on a quadrant or chart. They are screaming for it.

I am currently evaluating hardware reference architectures from several vendors to accomplish this goal. Many fall short in one way or another, but wouldn’t take a lot of effort to fix (at least from my seat).

What I really want is a single-storage image. A stretched storage subsystem between two different geographic locations within metropolitan mirroring distance, that is 200 km circuit distance (although 400 km would be nice). I want a single WWNN between the storage cluster or grid. I’d prefer to have two nodes at location A and two nodes at location B, sharing one WWNN, appearing as one storage subsystem. I want volumes writable on both sides, kind of like EMC vPlex, NetApp MetroCluster or IBM split-node SVC. Each of these have some limitation I run into.

I want to vMotion from A to B. I want to change MPIO active/passive paths to be primary to the local controller at site A, passive to site B and flip them when I vMotion. I want to rely on vmfs to take care of my locking and coherency, which it does already. I want at least 100 km of distance. I want mirroring to be internal and transparent between the nodes of each site. I want Cisco IVR with an isolated fault-domain (transit VSAN) in-between sites. I really don’t have all of this yet.

I can get vMotion working, but there’s still a tromboning of reads or writes (due to the MPIO not being flipable). I don’t get enough node-pairs on each side. I need a quorum tie breaker. I need over 10 km. I need IVR and ISLs. I’m close, but I’m not there yet.

XIV seems close with it’s grid, but can I split the grid between locations? I think the latency (with XIV only) would kill the performance.

All that being said, the technology is doable with what I’ve got and I am designing it and building it. I may not have everything, the distance isn’t what I want here, I have one node at each location in that solution there, yadda yadda. I can do NFS (but that comes with its own issues). With FC on ESX, I likely need to do this all with ALUA. (I can’t load custom MPIO drivers, there goes Open System Hyper Swap.)

So now you see why I’ve got the Single-Storage Image MPIO blues.

Long Distance vMotion – The Holy Grail

Forget all the clouds. It’s winter here in Minnesota and it gets cloudy for weeks at a time with cold and snow. CIOs may be speaking of clouds, but IT managers, CTOs and Enterprise Architects want long-distance vMotion. Two local sites, one shared subnet, one shared single-storage image and a VMware grid with nodes at each of the two locations. Add SRDF/A, GlobalMirror, SnapMirror and I have 3 site geographic replication.

Give the people what they want: VM mobility.

Everyone I know as asking for it. People are scratching their heads. Until they get it, they will all have the Remote Replication, Single-Storage Image MPIO Blues.

The Storage Evolution Part 2: Deduplication

2011-01-10T23:05:00.001-06:00

This is part 2 in the Storage Evolution series.

When we created SANs we stored more and more data retrieved at higher and higher speeds, and it was good. Then we added advanced functionality like creating copies (clones, snapshots, etc.) quickly, and it was good. Then development wanted to have 6 copies of that production database, one for each developer, also test, QA and staging. We needed a daily clone for the data warehouse and business intelligence. We virtualized our servers, we booted from SAN, some of us ditched the desktops and workstations opting for Citrix and VDI. We were making copies of the same data, over and over and over again. And at the 11th hour of the second half of the day, we backed it all up.

And it just got worse.

The Tale of Tape

We had a love/hate relationship with tape. We loved it’s density, it’s streaming speed, we hated its bulk, daily off-site management, library management and load and seek time. But it was cheap, alternatives expensive.

VTLs wanted to conquer the world (and disk manufactures that didn’t make tape solutions marketed them heavily). They came out with expensive, but speedy boxes. For those where money was no object it was a breath of relief, for the rest of us we looked with envy. Then VTLs got compression to match what tape drives did, lowering costs. Then, a few years later, they got deduplicaiton and VTLs became an affordable reality.

At first deduplication came as many new technologies do, the universal retrofit, the trick out my VTL option, the appliance. But appliances are a stop-gap measure and they morphed in two ways, one is the dedupe features went straight to the VTL itself. For others, the VTL was built around the appliance. The feature is now in the device making for a single solution.

The Tale of Disk

When NetApp introduced deduplication in the form of Advanced Single-Instance Storage (A-SIS for short), I thought it was cool. I also thought everyone would have it in 2-3 years. I was wrong.

NetApp’s deduplication remains one of the most compelling features of their storage. It gets it’s biggest bang in vSphere, but works well in other places as well. The reason NetApp remains king of the primary storage deduplication hill today is embedded in it’s allocation unit, the 4K block. Others have bigger chunks, which don’t deduplicate as well. (It’s harder to match a 1 MB or 1 GB chunk.) This is why, years later, NetApp has one of the only elegant deduplication solutions.

There are some other smaller players trying to jump into this water, as well as an open source effort. But for a major league, market proven solution, they still remain king.

What Does It Buy Me?

Reread my first paragraph. Understand that each VM has Windows 2008, 2003, 2000, Linux or other commonality. Understand all those copies can be folded, either in backups, in variations on versions or in instances. We store the same patterns, binaries, images and data over and over again. We backup night after night, week after week, month after month the same data, with little actual change. For those of us storage architects, we may see a 20% change rate on backups a night, but 4% replication daily change rate. We don’t need the whole file copied over and over again, there is a lot of waste.

Deduplication aims to change that: reduce that footprint, control the growth, see if we can stretch things farther. We try to do more with less.

How Does It Work?

Most deduplication works by identifying duplicate data and removing it (very simplistic). It fingerprints data by using high performance techniques and algorithms developed for IPsec, namely MD5, SHA-1 or other more proprietary methods to quickly get a hash. That hash is then stored in a database. There is more divergence after this between implementations.

Most secondary storage (VTLs) will then deduplication in real-time, while ingesting the data (backing up) and removing the matches. Some vendors take the extra step of doing a bit-level verification, while others deem hash collisions (false matches) too statistically remote to worry over. The duplicate is removed and a pointer to the original occurrence (copy) is inserted in its place.

The primary storage example does the folding offline (a post-process) and bit-level verifies the data before throwing it out.

There are some other approaches, content aware deduplication and other proprietary mathematical schemes that offer alternatives and may offer more protection, may yield smaller datasets. Some approaches have variable size chunks, while others are more fixed. There are also backup software-based approaches: EMC Avamar and IBM TSM to name two of the many.

Where do I Deduplicate?

So you’re deduplicating in your Data Domain, you’re good right? Deduplication is one of those technologies that you’re going to be using everywhere. Primary storage (disk), backup software, VTL, email, data archive, replication software, where ever and as often as you can to store and transmit less. It’s a technology that is a good fit at every point in your environment. As it proliferates, and it will spread throughout IT, we will gain greater storage efficiencies than we have today. We will store less, we know we’re going to be asked to store more and more (retention laws anyone). We’ll have less duplicate data stored over and over and over again all over the place.

There are datasets we don’t recommend deduplicating today. I still don’t recommend high-performance databases, largely sequential workloads and other things that I might just want to leave alone. It’s not for everything, and some unfriendly datasets can actually grow with deduplication.

Compression is complimentary, but it’s not the same thing. It offers a different approach to reducing your storage and fits well in the file serving space. EMC (Celerra/Unified Compression), IBM ([Storwise] Compression Appliance) and NetApp all have compression offerings, some built in.

Each and every technology is aimed at using less space. To help stymie the uncontrollable growth these technologies will continue to evolve and offer better utilization than we had before. We’re going to need it all.

The Storage Evolution Part 1: Virtualization

2010-12-30T00:52:00.001-06:00

In order for me to talk about storage virtualization, I feel it’s important to describe what I mean by it. Storage virtualization is introducing a layer of virtualization, or abstraction, when dealing with storage. What I’m not talking about is storage for server virtualization, or storage of VMware, Hyper-V, etc. While server virtualization and storage virtualization are very complimentary, they are not the same thing.

Storage virtualization adds some kind of pointer-based approach abstracting the physical blocks from the logical blocks of a disk, LUN or unit of storage. This approach adds power and benefit to give us more flexibility on how we allocate, move, make recovery points (snapshots, copies, etc.) and replication (mirroring) for business continuity or disaster recovery.

This can be accomplished in a couple of different ways. The first occurs in what I generally call the appliance based, or out-of-the-box virtualization. This is some type of engine, box or controller that sits in the middle as an appliance and adds a layer of virtualization to heterogeneous storage: virtualization with vendor independence. The second I like to define as virtualization of a single vendor’s storage; I call it in-box and it’s commonly a homogeneous approach.

Nothing New

Virtualization is an overused word. Storage abstraction has been around for a long time. Logical volume managers are available from many vendors where physical disks are aggregated or concatenated into volume groups and chopped up into logical volumes. Some vendors have offered this technology for over 20 years (even longer in the Mainframe world).

With the prevalence of logical volume managers and storage vMotion in ESX and vSphere, one could make an argument that storage virtualization isn’t needed. But in my opinion there are too many benefits to ignore it.

Primary v. Secondary Storage and the SAN

Primary storage is where your data lives and is directly accessed on a daily basis. It’s where you primarily and continuously access things. This typically takes the form of spinning disk today, although it is already being replaced in high-end storage tiers by solid-state disk. Secondary storage is where your data is archived to or recovered from. This can take the form of automated-tape libraries (ATLs), virtual-tape libraries (VTLs) or high-density low-cost disks (without all that VTL business). (I call this rolling your own VTL.) Sometimes that low-cost disk is in large archives, such as content-addressable (or object-based) storage.

You may or may not have a Storage Area Network (SAN). If you’re drives are all in your servers, us storage guys call this direct-attached storage (DAS), where you may still reap some of those benefits with a logical volume manager. However, most of us storage guys focus on the SAN. It allows shared storage for clustering: server farms (or grids) like Citrix and vSphere (ESX), storage consolidation (less wasted space), snapshots/recovery points for rapid recovery and mirroring for business continuity. Some applications like vSphere’s Site Recovery Manager or high availability clusters wouldn’t work without a SAN.

The SAN: A Little History

When SANs came onto the scene in the late 90s, they were making all kinds of promises of saving space by consolidating all that direct-attached storage in one place. They did save space, but not as well as we liked. They were also expensive. But that expense was offset by performance. High performance computing easily choked the disks in servers. SANs, with their large caches and many RAID arrays, were the only thing to make those systems hum.

Snapshots and mirroring were also introduced. We could now take a frozen point-in-time copy to make our backups from. We could mirror that data to another set of disks (local or remote) as a crash-consistent state in case of a disaster. This did the same thing for storage that clusters did for servers. In the case that something failed, we could recover quickly without having to spin a bunch of tape to get back up and running.

Performance was often limited to a specific RAID array and moving data from one place or another was a time-consuming chore. You either needed an outage, or you had to have mirroring licenses to get everything up and synced, then take an outage to use your newly mirrored copy. The field was ripe for storage virtualization.

Appliance-based Solutions

IBM practically invented the appliance-based primary storage virtualization field in the early 2000s. Their SAN Volume Controller (SVC) was a great leap forward in flexibility, allowing better performance by aggregating many RAID arrays together in a disk group, making movement between storage and tiers of storage non-disruptive, enabling easy snapshots with their FlashCopy and then mirroring locally and globally. I’ve used it personally to enable business continuity between sites. It’s an elegant and pretty simple to administer solution.

Hitachi came out a few years later with USP-V. It provides many of the same physical to virtual abstraction with snapshots and mirroring and movement between storage tiers that SVC provides.

EMC tried and failed to make a splash with a different out-of-band (or switch-based) approach with Invista. After many technical problems and poor acceptance, it died on the vine. I was ready to write them off, when they came late in the game with vPlex (an acquisition of Yotta Yotta). VPlex brings in-band heterogeneous support and allows multi-site federation, meaning that both sites appear as one storage system. This allows things like long-distance vMotion enablement. It’s a promising technology that offers a bright spot for them.

NetApp V-series allows you to get all the nifty whiz-bang features of Data OnTap (their storage OS) in front of your existing other-vendor storage. They allow attachment of native NetApp storage shelves as well. I think the NetApp offers amazing features and flexibility to small and mid-size customers, a bulk of the companies out there. They are finally getting the ability to non-disruptively move storage between tiers, something that was a weak spot in its functionality. If they ever get their multi-controller grid-based software running with Fiberchannel, it may truly change things.

I’m not a 3PAR expert and I’m sure there are other solutions I’m missing, but these are the market leaders so I’m focusing on them for now.

In-Box Virtualization

There are a number of solutions that take a somewhat different approach. Compellent, NetApp, IBM’s XIV, EMC’s VMax and others offer virtualization in the box, meaning the firmware offers the abstraction, along with many of the features that go along with it, in a homogeneous, single-vendor solution.

Compellent’s claim to fame has been automatic storage tiering. They were the first to offer auto-migration between tiers. That once-unique feature is being widely adopted by IBM’s Easy Tier (DS8000, v7000, SVC) and EMC’s FAST (VMax, Clariion). It’s becoming a space filled by many. (Compellent is currently being purchased by Dell.)

NetApp has an elegant snapshot with robust integrated application support. Their claim to fame is data deduplication, the only primary storage vendor that has it at the block level. When they came out with dedupe, I thought everyone would have it in 2-3 years, but they remain the only game in town. And there’s a reason: they use an internal 4K block size (given to their UNIX underpinnings). The other vendors have a larger internal block, making dedupe problematic. NetApp sells a lot of storage given this unique offering, and have a space savings guarantee to back it up.

IBM’s XIV is an interesting grid-based offering. Coupled with IBM’s FlashCopy Manager (that also works with their other storage products), it offers similar application integration like NetApp. XIV’s claim to fame is one storage tier, as in don’t worry about it, our caching algorithms take care of all the work for you. It works well for most, but not all, workloads. It’s another promising technology that I expect will mature nicely.

EMC’s vMax offers virtualization with a strong emphasis on virtual [thin] provisioning and automated tiering. VMax is the Intel-based DMX replacement that scales out (as opposed to scaling up) as you build it out. EMC’s FAST (Fully Automated Storage Tiering) is available in both vMax and Clariion and is very robust, with the ability to create tiering policies from SSD/EFD to Fibrechannel to SATA drives.

Some of the solutions, HDS USP-V, IBM v7000 and NetApp V-series are kind of hybrids that do both appliance-based and in-box simultaneously. They can virtualize your existing storage while also virtualizing their own native trays of disks.

The Secondary Storage Market

Most early adoption occurs in secondary storage, the backup and restoration products, virtual tape libraries (VTLs) and similar products like archives and content addressable storage (CAS). I would venture to guess that more people are testing the waters with a VTL before a primary storage virtualization technology.

Tape libraries are being replaced with virtual tape libraries: disk controllers with intelligence to emulate a traditional automated tape library. Virtualizing tape libraries, while desirable to jockeying tapes, was an expensive proposition at first. Then VTL compression matched the compression of tape drives. When deduplication came, they could now reduce the amount of disk storage to reasonable amounts making them both high performing and affordable.

Secondary storage seems to get the leading-edge product introduction. It’s less of a risk proposition. Deduplication, encryption and other technologies were often introduced first in the secondary storage realm. These products start out as appliances – reaching a wide audience and retrofitting existing equipment, then migrate into the storage devices themselves. A VTL gateway becomes a VTL appliance with everything built in. Encryption goes from gateways to being built into drives and VTLs, media servers and clients. Deduplication goes into VTLs, media servers and a global deduplication cache among backup agents sending less data over the wire, less on disk.

But primary storage products are now mature. These technologies: dedupe and encryption are no longer bleeding edge, but general acceptance. Soon the technologies will be in every point in the storage ecosystem.

An Enabling Technology

Storage virtualization is an enabling technology. It allows migration of virtual volumes without worrying about their physical disks. That allows us to migrate from old storage to new; one vendor to another. You can do all of this without shutting down a single server, migrating it all non-disruptively.

I said earlier, SANs promised to give better storage utilization. They did, but not enough. Storage virtualization technologies like thin provisioning and storage pooling of multiple RAID groups, then carving out of that pool take the promises of better utilization and start to deliver. You can now achieve better utilization of storage resources with less wasted space. This is a mixed blessing. While gaining better utilization of your storage, you have to balance that with faster just-in-time storage purchases. You never want to run out of space (bad things happen). If you’re the kind of shop that can’t acquire additional storage easily, better utilization might not be for you. You’ll need to have extra capacity at hand.

The technology allows us to make snapshots without copying all the data. There is debate between redirect on write v. copy on write. Some vendors will just copy a bitmap when making a snapshot, then copy the actual blocks when a block changes, updating the snap shot area with the old frozen block then allow the write to occur: this is copy on write. Redirect on write says just write the newly changed block to a different area. The copy on write crowd says you gain a second physical copy on disks in some instances. It’s a valid argument. With today’s RAID-6 and RAID-DP, it’s also an old argument that’s loosing it’s worth.

Deduplication couldn’t occur without virtualization. It’s finding like blocks and deleting one, updating the pointers to both point to one copy of that identical data. This same technique works for primary storage as well as that backup and archive data.

A lot of these add up to yield less space being consumed. I love being in the storage business, people never have enough of it. Their needs constantly grow, often way faster than people want it to. These virtualization technologies save real dollars and help control the growth that people endure year after year.

Do I Need This Stuff?

You don’t need storage virtualization – just like you don’t need server virtualization. These things make our lives easier. Whether that’s automatic migration from one piece of storage to another (like moving a VM from one server to another), or a simplification of business continuity through mirroring, storage virtualization is an enabler. You can obviously manually migrate storage from one box to another (or maybe have VMware Storage vMotion move some of your storage but likely not the whole environment), just like you can manually load a new physical server. It’s time consuming in either case.

Storage virtualization takes us farther down the road of storage consolidation, just like server virtualization enables server consolidation. Storage virtualization allows us to do things like thin provisioning, deduplication and real tiering.

Do you need the full-featured products? No. Do you need vSphere when the free software or ESXi might fit? No, they are choices. They can give greater performance, recoverability, migration and flexibility. They often save you real money.

It’s hard to do these without storage virtualization; you can live without all forms of virtualization. Once you’ve virtualized your storage, you’ll never want to go back.

IBM V7000 and SVC 6.1 – What IBM is Getting Right

2010-10-08T01:57:00.000-05:00

On Thursday, IBM made some storage announcements: Storwise V7000, SVC 6.1 and DS8800. I’m going to ignore the DS8800 for now; it may be a needed upgrade, but the thunder was stolen in my opinion by the other two areas.

Storwise V7000

IBM has made a big decision introducing the V7000. IBM’s mid-tier lineup has consisted of two main OEM’d players over the last decade, LSI’s IBM DS3/4/5000 lineup, and NetApp’s IBM N-Series. The V7000 is not an OEM product, but an IBM product.

This is a big departure.

While I’m not expecting IBM to immediately ditch LSI or NetApp, IBM is clearly out making their own hardware. They had some well-known pieces DS8000, XIV and SVC; the V7000 joins this homegrown family.

So why do we care? By IBM making it’s own product, they get to dump their quite sizable intellectual property (IP) into the gear. The V7000 seems to combine some of their best products with a mid-tier price tag.

The hardware itself tries to be minimal. The controllers are built into the first shelf to conserve space with 12 and 24 drive varieties. The RAID is ported from the DS8000 line offering 0,1,5,6 and 10. You can grow to 10 enclosures (nine expansions) at the end of 1Q2011 (five today). You can fit a lot of capacity in 10 shelves, 240 TB supported in 20U. There’s SAS, NL-SAS (SATA that is SAS attached, to make it redundantly clear) and solid state SAS. It’s fully non-disruptive upgradable (NDU) from the drives to the software, according to IBM.

I have to admit, I was pretty unexcited at this point in the announcement. Then the bomb dropped.

IBM then integrated the SVC into the product and somewhat like the entry edition, you get to use it for free when virtualizing the V7000’s internal storage. What? The SVC, along with FlashCopy, Metro/GlobalMirror, internal mirroring, thin provisioning – all of it is built-in free of charge to use for this one box. You can use it as an regular-old SVC if you then add the familiar capacity-based charges of a traditional SVC by other virtualizing storage through it. Think NetApp V-Series, or HDS USP-V.

OK, that’s cool.

Then they retrofitted the V7000 and SVC functions to use XIV’s GUI. This is where I was starting to think – IBM what happened, this is sexy. You see, IBM has not exactly been known for making a decent GUI on their hardware platforms. When IBM bought XIV, people got nervous that IBM will ruin XIV’s wonderfully simple and easy to use GUI. Did IBM listen, you bet. They realized a great piece of IP and ported it to the V7000 (and the SVC).

Oh, and EasyTier is built-in too. Add Tivoli FlashCopy manager (the NetApp SnapManager equivalent) and you’ve got an easy to use hell of a product.

So now we’re taking some pretty boring storage (after all I was ho hum at that point in the call) and jazzed it up with the market-leading storage virtualization product on the market, easy-to-use GUI and EasyTier. All of this free/built-in without all of that annoying IBM per-TB pricing. I have to say, it’s pretty sexy. I like this box. (Sales to follow.)

Where does it fall short?

After the initial euphoria died out, I had to take a look at it – hard. My biggest dislike is it maintains FC-based mirroring. In plain English, I still need FibreChannel Routers to translate those FC packets into FCIP over a WAN. Plainer C-level English, more expensive. These days I expect my mid-tier disk to have an GigE or 10GbE interface saving me in expensive SAN gear. Now I have to admit – I sell expensive SAN gear and it is rock-solid when your reputation is on the line. But I see the industry, at least in mid-tier, going to Ethernet-based replication, at the very least as an option.

Honestly, I understand IBM put the SVC into it. To maintain maximum compatibility, they left the replication alone. But change is possible, as SVC 5.1 showed us introducing iSCSI. The IP stack is now in there. It would have been nice to see this option introduced. (SVC 7.1 anyone? Can I get a yeah?)

I’m willing to overlook this shortcoming in terms of the overall package. And mirroring between an SVC and V7000 (not sure if it’s officially supported but don’t know why it wouldn’t be) would be a very nice expansion of the product line.

SVC 6.1

So why would I ever buy an SVC again? My biggest workhorse of the SVC customer base was DS4/5000 behind a pair o’ SVCs. Now with the V7000, like the spaghetti sauce commercial used to say, it’s in there. How long until it makes it way into the DS8000 base?

Obviously, you can virtualize other storage (IBM, EMC, NetApp, HDS, Xiotech, Compellent, etc. more etc. and yet more etc.) behind a SVC making a nice neat package. And while I never really had a large problem with the SVC GUI (one of the best in my opinion), it also get’s the XIV-like facelift – coming to a cluster near you this November, in time for Christmas.

To confuse you, they’ve renamed some things to use more industry-standard terms that really do make more sense once you get used to them. Errors are now events. VDisk-to-host mappings are shortened to just host mappings. Managed disk (MDisk) groups become storage pools. Space efficient becomes thin provisioning, VDisks, volumes. These really do make sense.

The SVC also gets EasyTier, increased maximums for scalability (up to 8 GB extent sizes, 1 PB volumes, 256 WWNNs), new OSes (vSphere 4.1), new storage (IBM v7000, DS8800, EMC VMAX, Compellent and Fujitsu models). They also made some easier pricing models for XIV behind SVC.

So if you’re still asking why would I ever buy an SVC again, stay tuned for the active-active datacenter article (still in progress). The teaser – SVC’s with the right RPQs can be split across different datacenters where the V7000 as far as I can tell, can’t. And the big nodes with the large cache are still in the SVC.

What’s not in SVC 6.1

I’m still waiting for larger MDisks (or are they now storage pool members? hmm). I have larger drive arrays made of 450/600 GB drives squeezed into 4+P arrays and struggle to keep best practice 1 LUN/Array. Help a brother out, IBM. Please don’t let this slip on the roadmap again.

Big News

So while the V7000 has some really neat features bundled in a great price point, the SVC gets some real pizazz. The GUIs are so cool, I actually feel good about paying for SWMA.

Make no mistake, GUIs can and do sell storage. I’ve had deals undone after a requested demo, others made by it. Customer want easy to configure, easy to use storage. Not just carving disks either; NetApp’s understood for years that SnapManagers make daily life better and using snapshots practical. IBM has some solid IP going for them with the XIV GUI, and in a very un-IBM-like way, they’re capitalizing on it big. Add in Tivoli FlashCopy Manager and I’m getting some really cool storage management.

IBM just might be the storage company to watch out for. When I saw their announcement, I have to admit, I was really surprised.

After publishing this, IBM’s SVC 6.1 limits were posted and the the MDisk limits have been increased from 2 TB to 1 PB. IBM did help us out. Thanks!

XIV Is Cool, But Not Cool Enough

2010-08-24T01:33:00.001-05:00

Now that I’ve gotten your attention, I have to say I love this box. It takes fairly inexpensive components and makes them sing. The real magic comes in two forms, grid computing and the algorithms that power it.

Back in the 90s, I used to install supercomputers across the USA. These were the deep blue variety that played Garry Kasparov in chess and eventually beat him. Supercomputers were being used to do many high performance computing tasks from analyzing road wear and tear to chemistry, data mining, nuclear simulation and weather forecasting. My wife would ask me if they wore capes.

The industry was going along fine, until some people decided to take inexpensive commodity Intel servers and network them together with a high-speed network, and distributing the workload amongst the different processing nodes. This is when grid computing took down the supercomputing heavyweights.

Grids can be powerful, but the real power comes in their scale-out architecture. Instead of building a really big, massively parallel box, with increasing demands on busses and interconnects – a scale-up system – they add performance and capacity by adding additional nodes to the grid – a scale-out design. XIV is not alone in grid architecture, Left-hand is another smaller competitor. In secondary storage, we have Exagrid, Avamar as well as others. While the grid does wonderful things to scaling performance and capacity, the other real ingredient is the software powering it.

Eventually I expect the entire industry to go to grid-based computing. We’re already starting to see this. If clusters are an active-passive pair in a relationship, grids are the active-active distributed variety. Think Web servers front-ended with a load balancer, Citrix, vSphere farms – ready or not, grids are taking over the world. They just make sense. After taking over your datacenter, like the cluster, they’ll go geo[graphic] and span more than one DC creating a meshed fault-tolerant interconnected computing center.

So Why Isn’t XIV Cool Enough?

If XIV can do this much with inexpensive commodity-based components, what would happen if we took the same approach, but decided to make one monster piece of storage?

If I set out to make such a beast, I’d take the overall design and tweak the components. First, I’d ditch the 1 TB SATA drives and put in solid state disk (SSD). SSD would have amazing throughput, scaled even further by distributing the IO throughout the grid architecture.

Next I’d replace the interconnects. I’d ditch the 1 Gb Ethernet backend for InfiniBand. IBM has extensive experience architecting InfiniBand as component interconnects, and I’d think they could do this fairly easily with little development effort.

For the front end, I’d offer a little variety. It would come with either 8 or 16 Gb FC (FibreChannel), 10 and 40 Gb FCoE (FibreChannel over Ethernet) and for the I need it now, not when 16 Gb FC or 40 Gb FCoE comes out, InfiniBand. The last might take some writing of code, but the FC and FCoE would be pretty easy to plug in right away. I would leave in 1/10 Gb Ethernet for replication and those pesky legacy iSCSI people. (Here’s a nickel kid, go buy yourself a real SAN.)

All of this SSD, InfiniBand and high-speed interconnects and interfaces might require a little more oomph in the processing. Today, the nodes run Linux. IBM has extensive experience porting Linux to AIX, so why not just uplift those nodes to inexpensive p7 710 or 720 nodes? A quick change of platform, one the already supports InfiniBand interconnects and high-speed FC, FCoE and 10 GbE interfaces would add the additional processing power that an all SSD system might require. Or it might not. It’s worth looking at.

Lastly, I’d allow the grid to be stretched within between two metro-distanced data centers, allowing servers connecting to it to see it as one interconnected grid. (More in my next post on active-active datacenters).

Off The Chart Performance

If XIV is producing tier one performance with tier 2 disks and inexpensive components today, imagine what it could do with upgrading all of these things? It’s already cool. But what if … what if you wanted to bury everything out there – with scalable performance and capacity to go even further?

The magic of XIV isn’t the boxes making up the grid, it’s the software making the grid perform. The components today are cheap (or inexpensive as people prefer to say), what if they weren’t? What if it was truly built for performance – performance beyond anything we have today?

It’s already cool – but it could be cooler. It could be the most impressive, most scalable system out there. It might cost a bit more, but there are people out there who would buy it. People that still buy supercomputers today.

ILM Grows Up

2010-07-27T10:04:00.001-05:00

Information lifecycle management (ILM) has long been the holy grail for organizations trying to better put their treasure troves of information and data where it makes the best sense. As well, putting the vast amounts of stuff we keep around and never look at anymore in as inexpensive of a place possible, while still keeping it online. The automated tools have usually been lacking.

Historically, this has been either a manual or intensive process. We’ve had tools to help us identify what’s what and where it is: storage resource management (SRM) tools help us identify aged, duplicate and no longer used junk that’s been packed away on our expensive spinning hard disks. We’ve had add-ons in the form of hierarchical storage management (HSM), that tiers data into gold, silver or bronze, or FC, SATA and tape. We’ve also had appliances, such as Rainfinity to help automate the movement. Most of these tools have focused on the unstructured file data, the network shares, moving that old stuff to cheaper pastures.

Compellent has long been a player in this field, singing the praises of automated tiering, where you put a mixture of disk behind it, and it moves that data up and down the tiers, based on usage patterns. Other vendors have been slow to adopt the same strategy and have lambasted them by saying you’re spending all you precious time moving data to and fro.

However, a new crop of automated ILM tools are hitting the major vendors, and new announcements seem to be coming all the time. EMC has announced Fast – a technology that will move sub-LUN data up and down the tiers. IBM has sub-LUN tiering in their new DS8700 product, with more storage getting the feature soon (SVC via IBM public domain blogs). I expect something from NetApp someday.

Strangely enough, SATA hasn’t been the driving factor behind this new crop of ILM. It isn’t necessarily the ideal of putting old data on SATA but optimizing the solid state disk (SSD) purchases people are making. It remains cost prohibitive for companies to purchase all solid state at the moment (although some certainly do). This automatic tiering is helping optimize the movement of extremely frequent data into the super-fast SSD tier, while providing the side benefit of also providing tiering between FibreChannel (FC)/Serial-Attached SCSI (SAS) and SATA disks.

SSD is causing some changing of the way we do things. I can see a point where disks are no longer the bottle neck, but controllers and network will be. We will finally be able to justify that 8 Gb FC/10 Gb FCoE/16 Gb FC and 40 Gb FCoE network. Once SSD becomes mainstream, likely in hybrid approaches with multiple tiers, and snapshotting becomes standard practice, look for new workloads to be added to controllers and networks. I can see a point where my traditional bottleneck, spindles, moves away – for a bit.

I’ve updated my evolutionary topics:

Virtualization
Deduplication
Grid
Encryption
ILM

Storage is becoming so exciting.

Introducing The Storage Evolution

2010-03-13T19:16:00.001-06:00

What come so mind when you think about storage? There’s so much to consider. Do you think of primary storage, the world where servers directly access their information via methods such as direct attached disk (DAS) storage area networks (SANs) and network attached storage (NAS). Do you think about secondary storage, the realm of backups and archives, virtual tape libraries (VTLs), automated tape libraries (ATLs) and long-term repositories (disks, optical platters, tapes or whatever makes sense at the time).

For me, I have four evolutionary topics mapped out. When I sit down to discuss where the action is taking place in storage, I can usually distill it down to four keys areas:

Virtualization
Deduplication
Grid
Encryption

Now there are a lot of other topics to talk about, such as solid-state disk (SSD), but I consider that just another media type, such as a 600 GB FibreChannel (FC) or a serial attached SCSI (SAS) disk. Information lifecycle management (ILM), automated policy-based migration and hierarchical storage management are worthy topics, and I’ll discuss that in some future post.

A discussion can be had on infrastructure or plumbing and the evolution of FIbreChannel. People are faced with options, FC, iSCSI and the newest kid on the block FibreChannel over Ethernet (FCoE). Other TCP/IP-based protocols such as NFS and CIFS are gaining ground. Who needs block access anyway? I can run VMware and Oracle directly over NFS, and get amazing performance over what you might have on the floor today.

Professionally, I spend a lot of time enabling point-in-time copies (snapshots), as well as mirroring for replication to another data center. These are certainly areas I hit upon in every initial storage discussion I have, but it’s not exactly a evolutionary topic. It’s assumed to be standard these days – it should just be in there. Data protection deserves it’s own topic, especially when comparing crash consistent and application consistent states. I’ll certainly get to that.

Other types of protection should be inherent to the box itself with no single points of failure. Whether that protection is RAID, or some other reliable economical method, I expect it. I demand dual controllers and shelf modules to be redundant. I want hot firmware upgrades on all components from controllers, to shelves to drives.

There are endless discussion points; I’ll try and get to many of them but there are still my killer four.

When I talk about virtualization, deduplication, grid and encryption no single vendor has it all. Each has strengths in different areas. In my next few posts, I’m going to attempt to delve into what I mean by each topic one by one.

Let’s Get Started

2010-02-26T09:18:00.001-06:00

My job is a server and storage solution architect. I’ve installed supercomputers, large UNIX clusters, massive amounts of storage, huge tape libraries, complex FibreChannel SANs and Network gear. I’ve connected all of these together as well as connect geographically remote data centers with mirroring and clustering servers for the ultimate in protection and resiliency. I’ve worked as a developer writing applications and middleware. I’ve worked for small companies, large vendors and ran my own consulting company for a while. I have something to say.

What I’m hoping to do is share some ideas and some wisdom. Servers use to consume all of my time. For that past decade, my focus has been storage. Lots of storage. Storage seems to be the ultimate consumable resource. Is yours out of control?

I intend this blog to talk at both high level abstract concepts and occasionally get deep into the weeds. Maybe you’ll find something useful.

A8YJH4RXDWNT