Too Big to Fail / Too Big to Learn?
There's an interesting trend that I've been noticing in datacenters over the last few years. The pendulum has swung towards infrastructure that is getting too expensive to replicate in a test environment.
In years past, there may have been a chassis switch and a number of routers. Essentially these would run the same operating system with very similar features that smaller, less expensive units from the same vendor might run. The servers would run Windows, Linux or some other OS, running on physical or virtual platforms. Even with virtualization, this was all easy to set up in a lab.
These days though, on the server side we're now seeing more 10Gbps networking, FCoE (Fiber Channel over Ethernet), and more blade type servers. These all run into larger dollars - not insurmountable for a business, as often last year's blade chassis can be used for testing and staging. However, all of this is generally out of the reach of someone who's putting their own lab together.
On the networking side things are much more skewed. In many organizations today's core networks are nothing like last year's network. We're seeing way more 10Gbps switches, both in the core and at top of rack in datacenters. In most cases, these switches run completely different operating systems than we've seen in the past (though the CLI often looks similar).
As mentioned previously , Fiber Channel over Ethernet is being seen more often - as the name implies, FCoE shares more with Fiber Channel than with Ethernet. Routers are still doing the core routing services on the same OS that we've seen in the past, but we're also seeing lots more private MPLS implementations than before.
Storage as always is a one-off in the datacenter. Almost nobody has a spare enterprise SAN to play with, though it's becoming more common to have Fiber Channel switches in a corporate lab. Not to mention the proliferation of Load Balancers, Web Application Firewalls and other specialized one-off infrastructure gear that are becoming much more common these days than in the past.
So why is this on the ISC page today? Because in combination, this adds up to a few negative things:
- Especially on the networking and storage side, the costs involved mean that it's becoming very difficult to truly test changes to the production environment before implementation. So changes are planned based on the product documentation, and perhaps input from the vendor technical support group. In years past, the change would have been tested in advance and likely would have gone live the first time. What we're seeing more frequently now is testing during the change window, and often it will take several change windows to "get it right".
- From a security point of view, this means that organizations are becoming much more likely to NOT keep their most critical infrastructure up to date. From a Manager's point of view, change equals risk. And any changes to the core components now can affect EVERYTHING - from traditional server and workstation apps to storage to voice systems.
- At the other end of the spectrum, while you can still cruise ebay and put together a killer lab for yourself, it's just not possible to put some of these more expensive but common datacenter components into a personal lab
What really comes out of this is that without a test environment, it becomes incredibly difficult to become a true expert in these new technologies. As we make our infrastructure too big to fail, it effectively becomes too big to learn. To become an expert you either need to work for the vendor, or you need to be a consultant with large clients and a good lab. This makes any troubleshooting more difficult (making managers even more change-adverse)
What do you think? Have I missed any important points, or am I off base? Please use our comment for for feedback !
===============
Rob VandenBrink
Metafore
Comments
Google's data centers already largely operate under this model (in a clandestine fashion). Sooner or later it will 'pop', become a commodity and drive costs through the floor.
Imagine replicating all the infrastructure you have described on a single linux PC integrated with a 10gbs switch.
In this environment, testing/replication and change management become trivial. Everything is software based so rolling revisions forward/back is not an issue.
DrKewp
May 30th 2012
1 decade ago
Moriah
May 31st 2012
1 decade ago