Okay, I've decided that the time has come to move properly and stop updating here at all!!
Please go to the new blog....it's just like this one but better!!!
Okay, I've decided that the time has come to move properly and stop updating here at all!!
Please go to the new blog....it's just like this one but better!!!
As we seek to constrain and control the explosion in data growth is deletion of data and reclamation of storage an economically viable methodology?
I’ve seen a few articles over the past 18 months who calculate that this does not really make sense; if the cost of work required to reclaim that storage, then it does not make sense to do so. I remember looking at the cost of SanScreen when it was an independent company, their big sell was that it paid for itself in identifying orphaned storage and reclaiming that; unfortunately, it didn’t.
But does that mean that carrying out this sort of exercise is not worth doing? My answer to that is No! The benefits to good data management stretch beyond the economic benefits of reclaiming storage and more effective use of your storage estate.
If you never carry out this sort of exercise, you have resigned yourself to uncontrolled data growth; you have given up. Giving up is never a good idea even in the face of what feels like an unstemmable tide; you do not need to sit like Cnut and try to stop the tide coming in but you can slow it and take a greater degree of control.
This sort of exercise can be important in understanding the data that you are storing and understanding its value. And interestingly enough, you might actually want to delete valuable data for a whole variety of reasons.
You need to understand the legal status and value of the data; email in a legal discovery situation is the classic answer, if you have the data, you can be asked to produce it. This can be extremely costly and can be even more costly if you discover that you can produce data at a later date when you have said you can’t.
Those orphaned luns in your SAN, do you know whether or not, they contain legally sensitive data? Those home-directories of ex-employees, is there sensitive data stored there? The unmounted file-system on a server which has never been destroyed?
It is also important to understand the impact of the entire estate of keeping everything for ever; what is the impact on your back-up/recovery strategies? What is the impact on the system refresh and data migration in five years time? Do you only carry out this exercise when you are refreshing? If so, you are probably going to put back your migration strategy back a number of months and you could end up paying additional maintenance for longer.
There are many other consequences to a laissez-faire approach to data management; don’t just accept that data grows forever without bounds. Don’t listen to storage vendors who claim that it is cheaper to simply grow the estate but understand it is more than a short-term cost issue.
No, good data management including storage reclamation needs to become part of the day-to-day workload of the Data Management team.
As we continue to create more and more data; it is somehow ironic and fitting, that the technology that we use to store that data is becoming less and less robust. It does seem to be the way that as civilisation progresses the more that we have to say, the less chance that in a millennia's time that it will still be around to be enjoyed and discovered.
The oldest European cave paintings date to 32,0000 years ago with the more well known and sophisticated paintings from Lascaux being estimated to being 17,300 years old; there are various schools of thought as to what they mean but we can still enjoy them as artwork and get some kind of message from them. Yes, many have deteriorated and many could continue to deteriorate unless access is controlled to them but they still exist.
The first writing emerges some 5000+ years in the form of cuneiform; we know this because we have discovered clay and stone tablets; hieroglyphs arrived possibly a little later than this with papyrus appearing around the same time followed by parchment. Both papyrus and parchment are much more fragile than stone and clay; yet we have examples going back into the millennia B.C.E.
Then came along paper; first made from pulped rags and then from pulped wood; mass produced in paper mills, this and printing allowed the first mass explosion in information storage and dissemination but yet paper is generally a lot less stable than both parchment, papyrus and certainly stone and clay tablets.
Still paper is incredibly versatile and indeed was the storage medium for the earliest computers in the form of punch cards and paper-tape. And it is at this point that life becomes interesting; the representation of information on the storage medium is no longer human readable and needs a machine to decode it.
So we have moved to an information storage medium which is both less permanent than it's predecessors, needs a tool to read it and decode it.
And still progress continues, to magnetic media and optical media. Who can forget the earliest demonstrations of CDs on programmes such as Tomorrow's World in the UK which implied that these were somehow indestructible and everlasting? And the subsequent disclosures that they are neither.
Will any of the media developed today have anything like the longevity of the mediums from our history? And will any of them be understandable and usable in a millennia's time? It seems that the half-life of media both as a useful and usable is ever decreasing. So perhaps the industry needs to think about more than the sheer amount of data that we can store and more about how we preserve the records of the future.
Inspired by Preston De Guise's blog entry on the perils of deduplication; I began hypothesising if there is a constant for the maximum physical utilisation of the capacity in a storage array that can be safely utilised; I have decided to call this figure 'Storagebod's Precipice. If you haven't read Preston's blog entry; can I humbly suggest that you go read it and then come back.
The decoupling of logical storage utilisation from that of the physical utilisation which allows a logical capacity/utilisation which is far in excess of the physical capacity is one that is both awfully attractive but also terribly dangerous.
It is tempting to sit upon one's laurel's and exclaim 'What a clever boy am I!' and in one's exuberance forget that one still has to manage physical capacity. The removal of the 1:1 mapping between physical capacity and logical capacity needs careful management and arguably reduces that the maximum physical capacity that one can allocate.
Much of the storage management best practises are no more than rules of thumb and should be treated with extreme caution; these rules may no longer apply in the future.
1) It is assumed that on average data has a known volatility; this impacts any calculation around the amount of space that needs to be reserved for snap-shots. If the data is more volatile than one expects, snapshot capacity can be utilised a lot faster than expected. In fact, one can imagine an upgrade scenario which changes almost every block of data and completely blows the snapshot capacity and destroying your ability to quickly and easy return to a known state, let alone one's ability to maintain the number of snapshots agreed in the business SLA.
2) Deduplication ratios when dealing with virtual machines can be huge. As Preston points out; reclaiming space may not be immediate or indeed be simple. For example; often the reaction to capacity issues is to move a server from one array to another, something which VMware makes relatively simple but this might not buy you anything. Moving hundreds of machines might not even be very effective. Understand your data and understand that data which can be moved with maximum impact on capacity. Deduplicated data is not always your friend!
3) Automated tiering, active archives etc; all potentially allow a small amount of fast storage medium to act as a much larger logical space but certain behaviours could cause this to be depleted very quickly and lead to an array thrashing as it tries to manage the space and moving data about.
4) Thin provisioning and over-commitment ratios; this works on the assumption that users ask for more storage than they really need and that average file-system utilisations are much lower than provisioned. Be prepared to experience that this assumption makes an 'ass out of u & me'.
All of these technologies mean that one has to be vigilant and rely greatly on good storage management tools; they also rely on processes that are agile enough to cope with an environment that could ebb & flow. To be honest, I suspect that the maximum safe physical utilisation of capacity is at most 80% and these technologies may actually reduce this figure. It is ironic that logical efficiencies may well impact the physical efficiency that we have so long strived for!
One of the books that I am currently reading is 'Reality is Broken' by Jane McGonigal; in it there is a startling fact; by the age of 21, the average young person in the UK will have spent 10,000 hours gaming. That's a boggling figure and yet one which doesn't really surprise me; the question is how do we draw on this wealth of experience and how do we draw on the power of games for more than just entertainment. This led me to musing on what a games oriented Storage Management system would look like and how the various gaming cultures may manifest themselves.
Storage Management does actually lend itself well to a gaming paradigm; it is often a case of learning a task and repeating it ad infinitum; as you get better, you can move on to more complex tasks and indeed, you may even find shortcuts and hidden tricks to enable you to skip through the more tedious tasks. Storage Management often relies on the ability to plan, recognise patterns both simple and complex but most importantly, it requires the ability to convince one's self that a repeatative, tedious task is indeed fun!
I imagine that Hitachi could partner with Nintendo in the production of their new games oriented Storage Management system; a variety of power-ups would be available to you as you zone, mask and carve up LUNs. The successful completion of a task would result in the graphical representation of the disk turning into a giant fruit of some sort with your avatar doing a little dance perhaps after the collection of ten of these.
Perhaps a variation of Pac-man could represent the act of de-allocation and returning the disk to a main-pool; the ghosts representing the avaricious users chasing the poor little storage admin around the maze trying to prevent him reclaiming the disk? Or perhaps, Pac-man could represent the act of deleting un-necessary files and the power-pills in the corner could represent some illicit file that the users should not be storing and the consumption of this temporarily causes the users to scury and deny knowledge of the file allowing the admin to delete files at will?
I could see IBM's Storage Management tool being text-based and along the lines of Crowther and Wood's Colossal Cave Adventure; 'you are in a maze of twisty little passages, all alike'! Obscure commands such as Plover, XYZZY and Plugh will do magical things and make your life a lot easier. Very old-school, full of in-jokes and only really comprehensible to those of a certain age! Or perhaps a version of Space War?
EMC's Storage Management tool would be in the form of MMORPG; it would need a huge server farm to run and it takes ages to do anything until you had progressed to a certain level. At that point, you could purchase items which would enable you to do your job more efficiently; there would of course be no end to this and when-ever you believed that the game was beat, they would announce a new feature which would cost you yet more money and time to master. There would also be regular outages to upgrade the required hardware and data-centre to run the tool.
NetApp's Storage Management tool would be very similar to EMC's; there would be an online relgious war as to whose was best. The main difference would that NetApp's tool would be free initially but would require in-tool purchases to do anything at all useful. But it would be very quick and easy to master; probably suited to the more casual storage admin whereas EMC's would appeal to the hardcore gamer.
Both EMC and NetApp would have unlockable achievements; 'Master of the Zones', 'Lover of LUNs', 'NASty Boy/Girl' etc; all entitling the Admin to different badges etc to be tweeted Four-Square fashion and irritate everyone else!
Of course, we would all be waiting for the combined IBM/EMC tool; this would be called 'Super Barryo World'!
I went to the London Turing Lecture given by a very cold-ridden and croaky Donald Knuth; you always know things are going to be interesting when the speaker opens with comments along the lines of 'I got the idea for the format of this lecture from my colleague and friend at Caltech in the 60s; Richard Feynman'. See even the great Donald Knuth can name-drop with the best of them. The format was of a question and answer session where Donald took questions on any subject from the floor and I believe that it will be available to watch as a web-cast; please note that he was very cold-ridden and it's probably not his best 'performance'.
Don for a long time has been talking about Literate Programming and that programs should be written for human beings to read and not just for computers to process; arguing that 'They tend to work more correctly, are easier to change, easier to maintain, and just a pleasure to write and to read'. He is passionate that code can be beautiful and art; funnily enough I feel very similarly about IT infrastructure and I think that is what potentially Cloud can bring to the world of infrastructure.
I'm not sure we can have a 'Literate Infrastructure' but I wonder if we can get to 'Elegant Infrastructure'; I come across infrastructures all the time which make me question the byzantine perversity of infrastructure architects and designers. At times it is like an artist who has decided to throw all the colours in his palette at a canvas with little understanding of aesthetics and form; yes, you can do this but you really have to understand what you are doing and unless you are very good, you will simply produce a mess.
This is why the various block-based infrastructures are potentially so appealing (this is not a discussion as to the merits of vBlock versus Flex-pod versus another-Lego-block) as they restrict the tendencies of techies to throw everything but the kitchen sink at a problem. Yet the most stringent advocate of these infrastructures has to acknowledge that they will not solve every problem and at times, a little subtle complexity is more elegant than adding more and more blocks.
The infrastructures of the future will be simple, understandable but not necessarily devoid of colour and subtlety. Otherwise we'll fall into another trap that Don hates; 'the 90% good enough' trap. Infrastructure needs to be 100% good enough; 90% won't do because 90% will not be easy to manage or understand. I think this is the challenge that the vendors will face as they try to understand what they are selling and creating.
2011 will see acquisitions continue to come as all of the major players try to position themselves as owners of the vertical stack; we will see acquisitions to enable the general compute stack but also acquisitions which enable specialised appliance based stacks. Obviously, we will also see the major players continue to announce home-grown stack products as well.
All of these stacks will blow up smog which might well look something akin to a Cloud; the question is whether this will be a toxic cloud or something more beneficent bringing the necessary components for business growth?
One of the biggest challenges will be that we will have different types of stacks; there will be the odd case where a data centre is homogeneous but it will be more usual for a data centre to have at least two vendors. This might be for either technical reasons or for purely commercial reasons; two vendors competing may well be more honest than a single dominant vendors. And of course, there is the ever present public Cloud option to be considered.
The question is going to be one of how to manage these stacks; both from a data centre point of view but also from a rack/server position. How do we manage both the data centre and from application to spindle?
To this day we struggle to manage storage arrays from different vendors with a single pane of glass; we even struggle to get a common view let alone a common configuration tool. How much more complex will this become when we have several vertical stacks from a variety of vendors to manage?
Yes, we could struggle on and manage at the component level but how is this going to bring the reduced complexity and business benefit that these stacks could bring? We could even find ourselves using shell interfaces/APIs to produce our own consolidated tools; assuming that these shell interfaces/APIs are available.
Storage Resource Management has been a big #FAIL in general; Stack Resource Management could have a similar future unless we see vendors and other interested parties begin to get a handle on this. I certainly expect to see acquisition in this space as vendors try to steal a march on each other but really, this might a place best served by co-opetition and standards.
Formal standards? Possibly not but two or three of the bigger players could do the market a huge service by beginning to think along the lines of an open standard. And if one emerges; they could do an even bigger service but not taking the approach of 'embrace and extend'.
At the core of EMC's mega-announcements yesterday was the long awaited coming together of the Clariion and Celerra lines in the form of the VNX and VNXe. These are firmly targeted at NetApp's core market; dual controller, unified storage; probably still the fastest growing part of the storage market, well certainly within in the external shared storage market.
The VNX can simply be seen a direct convergence of the Clariion and Celerra but the VNXe takes this convergence and pushes it down into the SMB market; VNXe could be competing with Drodo, Synology, Iomega (yes I know an EMC company) and a whole host of other smaller players. NetApp have also pushed into this space, so we can look for more heated and noisy competition in this space as well.
The VNX range (and I do include the VNXe) further progress the idea that Storage is Software; the heart of every storage appliance is software and the core differentiator is no longer hardware; with almost no exceptions within this market sector, the hardware is Intel-based and pretty much commodity.
EMC, like IBM have massively invested in simplifying their management interface and UniSphere is massive step forward in simplicity. Much-hated by experts, configuration wizards enable the new breed of IT generalists to set up their devices with out becoming experts in storage. Yes, you can still get access to the underlying configuration but hopefully in most cases, it will be good enough.
I think the VNX is good enough for a great deal of the market; NetApp have ploughed this furrow very well, they often pitch their Filers as good enough and point out that you don't need expensive Symmetrix hardware to service most requirements.
And it is here that EMC have a problem, the VNX is good enough to replace a massive percentage of their Symmetrix footprint but within some territories, the account teams still lead with Symmetrix and come up with a lot of 'good reasons' why a customer should retain this footprint; it is only when the customer makes serious moves towards NetApp that the Clariion and Celerra come into play. This actually adversely impacts both the credibility of EMC but also their products.
The VNX is good enough to consign the Symmetrix range into a mainframe-like plateau and eventual decline; there will be still be workloads and environments (mainframe for one) which are served best by the Symmetrix but the so-called mid-range will be the future for many of us.
Now we have the VNX, I wonder if the next big thing for EMC will be how they federate the VNX range and enable clusters of VNX; federated VNX will yet further push it's market coverage.
And as EMC continue to evolve the VNX architecture; I think we can expect further performance improvements akin to the doubling of VMAX performance. There is probably a lot of scope for improvements as EMC work to converge the Celerra and Clariion code-bases but Unisphere should allow for much of this to be hidden from the customer.
Yet there is still a big question about VNX in general; is it actually the right product for today and more importantly tomorrow? Is it what we actually need and does it progress the storage landscape? It is certainly the right product for EMC, it builds on what they have and is better than what they have but with the Isilon purchase and the still cloudy future of Atmos; time will tell.
Over the next few weeks, I thought might share some of my annoyances with life in Corporate IT; some of them are vendor driven and some are the result of wrong-headed thinking in the end-users! I'm sure everyone has their own annoyances and some of my readers could probably write several volumes on theirs.
So first up is 'Enterprise License Agreements' or what they mean and how end-users tend to look at them; this is often a case of the end-user walking into a nicely vendor-dug trap and smiling as they fall into it.
For a single payment; you can consume as much of a vendor's product as you possibly can over a period of time; so what's not to like?
Well after the contracted period, you face a license true-up; so you've merrily gone on consuming as much as you can eat and now you have an elasticated waist-band on your servers and now you must pay for all those licenses.
Of course that payment will probably take the form of another 'Enterprise License Agreement' and so it goes on but you've got so many installs of the software; you have effectively become an addict and there's no way out.
Time after time; I hear project managers tell me 'But it's free and it doesn't cost my project anything!'. You try to explain that it's not free and there might be a better fit for their problem but it all falls on deaf-ears. After all, the project manager gets it for free and there's no annoying budgetary considerations for them to take on.
And then there is the situation where an inordinate amount is spent on an ELA and yet at the end of the period when you do the calculation, your cost per license is no better than if you'd gone for a PAYG-type model.
ELAs, one of the industry's great cons or at least, one of Bod's annoyances!
2011 is the year of 'Big Data' or I so hear; organisations are collecting ever more data which may hold some value. The problem for many of these organisations is unlocking the value and turning this data into useable information. But I am not going to talk about how this is done; that is topic for another blog and perhaps for someone who has more expertise in that area; I am more interested on how we access the information and the tools that we use.
I wonder if we could see a change in how corporate IT is consumed; the explosion in mobile, non-traditional computing devices has led many to posit a future where much of IT is consumed in the form of apps; small specialised applications which do one or two things very well and this might very well be true but before these apps become truly useful to the corporate 'Knowledge Worker', there are other changes which need to happen.
The IT department needs to enable the access to the information and at times to the raw 'Big Data' to allow these workers to move beyond the superficial; it is the area of curation and publication which could well be the growth area for IT departments. Building information stores with standard APIs to allow the publication of information which these apps can access; the IT department may not necessarily control the apps and presentation but they will control the access to the data. They may publish reference clients in much the same way that Twitter publishes a reference client but does not mandate its use.
Technologies such as Object Storage for example will come to the fore as enabling technologies as will the already more established interfaces for exposing data and information but instead of trying to restrict the use of these to IT applications; more innovative uses will be encouraged.
IT moves away from supporting individual devices but simply provides interfaces to Corporate Information and at that point becomes truly the 'Information Technology' department and not the 'PC Support department'.
I think we are some years away from this happening and there will always be a place for traditional IT as it does support many essential functions but it's probably a more interesting future for many of us than the current status quo.