The Avamar Client explained. (Quickly)
As the question was asked it's probably worth while to dig deeper into how Avamar does what it does.
Since Avamar is both a stand alone backup product and a component of NetWorker on each host to be backed up you'd either install the Avamar client or the NetWorker client.
The two important components in this discussion are the host based Avamar Client which does all the de-dup work and the Avamar Data Store which acts as the orchestration/policy engine if you're using Avamar (NetWorker takes over those functions if used with NetWorker) and ensures the protection and integrity of the de-duped data after it receives it from a client across the network.
For those of you not up on NetWorker speak feel free skip to the section beyond the screen shot while I spend a moment speaking to NW Admins.
Still here? Fine. We'll drop talk of NetWorker after this section but for those that are interested the Avamar de-dup technology exists as a NetWorker ASM (Application Specific Module) in the NetWorker client. You get full indexing, browse, client config recovery through the NW CLI and GUI and all the usual stuff while the Data Store shows up in the NetWorker Management Console as a De-Duplication Node.
Here's an over simplified run through of what the Avamar Client does when it backs up a host.
-It first walks the file system to identify modified files.
-The modified files are chunked using the Avamar sub-file variable length de-dup algorithm.
-The resulting chunks are compressed using a high speed compression algorithm.
-The compressed chunks are hashed to generate unique values.
-Those hash values are checked against a local hash cache. These are one or two small files containing the hash values of data which has been previously backed up by that client. Hash values mind you, not the data itself.
-If values aren't in the local cache the Data Store is queried to see if data with that unique value has already been stored there. (Queries like all transmissions are done in bursts so before anyone asks there's no "slow drip" across the network.)
If data with that hash value is present in the Data Store (It was backed up already by another backup client) the data is not transmitted across the LAN/WAN.
If data with that value is not present in the Data Store it is transmitted across the LAN/WAN.
And that's why at EMC World last year in Orlando I wasn't worried when after a week of being on the road Avamar backed up my laptop to a Data Store located in a lab in Ireland via the incredibly crummy 802.11B WiFi connection in the hotel lobby.
And did so in record time.
Factoring in that at the time I still had renegade PST files (EMC archives email with EmailXtender to Centera but I did have pre-email archiving PSTs from years ago), typical backup software would attempt to drag the entire 2GB sized PST files back just because I had opened them up and looked at some attachments thereby modifying the PST file itself.
Avamar on the other hand sent back a fraction of that data across the Atlantic as only what had changed in the file was shot out of my wireless card.
And that's your 10 second look at how the Avamar Client does what it does.
Have you worked with the AVE client at all? Going to roll it out shortly and I have been very impressed.
Posted by:John | May 13, 2008 at 06:32 PM
The great thing being that Avamar Virtual Edition is running the same code as a physical Avamar Data Store so everything works in the exact same way. ;)
Posted by:Storagezilla | May 14, 2008 at 02:57 PM