End of quarter has come and end of quarter will go. I'd like to say I get to exhale but my dance card is already filling up for the next two weeks so I'll be as busy as I have been to date.
Moving on, there appears to be a fundamental miss-understanding about what data de-duplication is. How much of this is down to just not getting the proper information or people with specific interests whipping up the FUD Of War is not quantifiable at the moment.
That Old Time Evil..
Lets start at the beginning. Data De-Duplication isn't a single technology, it's not a product and it most certainly isn't a market.
De-dup is a bunch of different technologies some of which have been rattling around the place for the best part of twenty years. It's a feature of products as can be seen by it's application in so many different places. De-dup itself falls into the same market space as other capacity optimisation techniques such as compression and thin provisioning.
When you think about it like that, when it's no longer mystical and magical things start getting easy.
One of the de-duplication issues raised on a frequent basis these days is the validity of de-duplicated data if challenged in a court of law. Well why might we question only the validity of de-duplicated data?
Let me introduce you to the greatest arch enemy of assured authenticity in the modern age. Evil has a name, and it's name is compression.
Yes my friends, compression. A technology so ubiquitous it's invisible and which by it's very design it re-encodes data into sequences of fewer bits thereby modifying the very structure of the data itself.
Somebody call a lawyer I'll see you all in court!
That WORM tape drive you might have been using for archiving? Was compression enabled? If it was you can't assure the authenticity of the data since it's been changed, it was compression which came along and changed it.
Without something which will assure authenticity, technology found in Optical Media and EMC Centera, you can't be certain that the data has not been modified.
So, what else do we have in our capacity optimisation bag of tricks?
How about Single Instance Storage. A de-duplication technology which operates at the object level, not the sub-file or block level. You don't modify any data you just remove additional copies of the data and put pointers in their place. And you can assure the authenticity of that original.
But what about those pointers? Keep that lawyer on speed dial...
I suppose the point I'm making here is that you can knock any process or technology down in court if you have a capable lawyer. Assured authenticity isn't new requirement, it's always been with us and we already have technologies available to us which can help mitigate, but not eliminate, the risk that the authenticity of your data will be called into question.
Of course someone has to actually call it into question first and when was the last time information entered into evidence recovered from backups, lets not forget that backup apps also do interesting things with your data, was contested due to the information being processed by the backup app before being spat out onto media?
Should we even consider data which at anytime has been encrypted?
Lets not.
Not being a lawyer I have no idea where the burden of proof resides should someone claim a capacity optimisation, backup, archiving or encryption technology has compromised the authenticity of a piece of evidence but I'm sure the counter argument involves the words "Prove it".
Again, De-Dup isn't a single technology, product or a market. It's a bunch of different technologies, it can be found as a feature in many different products and is part of a suite of capacity optimisation techniques available to people.
Fun With De-Dup Ratios
One final thing, target based de-dup ratios are not voodoo and you don't need to commune with any dark gods to figure out what's going on.
A de-dup ratio of 2:1 is a 50% storage reduction.
A de-dup ratio of 10:1 is a 90% storage reduction.
A de-dup ratio of 20:1 is a 95% storage reduction.
See the law of diminishing returns in action?
Eventually you just start tacking 9's on to the wrong side of the decimal point since a 100% storage reduction isn't de-duplication and has a different name.
Deletion.
So if someone shows up on your door and claims they can do 25:1 while the other guy can only do 20:1 the question you should be asking yourself is if they have a local support bod who can be onsite quickly. And not if that single digit percentage difference in storage reduction capability is a deal breaker.
