Author: Joseph Correia, Principle Consultant
In early 2009 it was announced that Exchange 2010 now had "built-in" archiving. This generated a lot of interest and excitement. Based on the information from Microsoft, it would appear that your archiving needs will be addressed by Exchange 2010, so why not put your plans on hold until you can upgrade or migrate to 2010?
Some of the benefits expected to be provided by Exchange 2010 archiving are elimination of PSTs across the organization, integrated user search of both active mailbox and archive mailbox, simple archival and deletion policies from Microsoft Records Management, multi-mailbox search for e-discovery, roles-based access control, and drag & drop access to your personal archive.
At first pass, it appears that most of the basic features you need in an archiving solution have been covered in Microsoft's first attempt with 2010, and I expect that they will improve the offering going forward. However, with only these features available, Exchange 2010 is probably only a good fit for small to mid-size businesses - those primarily concerned with archiving to eliminate PSTs and enable some form of search without implementing any additional software.
A deeper look at the native Exchange 2010 archiving functionality shows some significant issues that you should think about before proceeding. With Exchange 2010 archiving, mailbox database sizes are dramatically increased due to archived data being stored in the same database as the mailbox itself and the elimination of single instance storage (SIS). Other shortcomings include:
- Outlook 2010 is required to enable archiving
- eDiscovery searches are limited to the Exchange Organization
- There is no legal hold for Public Folders
- Archive access is not extended to cache mode
- There is no stubbing of messages.
Expanding on these points a little more, storing archive data in the same mailbox database as the user's mailbox means that your Exchange Server storage is not being reduced. The elimination of SIS further increases the likelihood that database sizes will increase going forward.
In addition, moving to Outlook 2010 is no simple task, as anyone who has been through an application rollout realizes. Furthermore, eDiscovery searches are limited to the Exchange organization and cannot be performed across multiple organizations thus rendering the search incomplete and somewhat indefensible in a courtroom.
So IMHO, Exchange 2010 archiving in its current iteration will likely not fit the needs and requirements of many companies that have even moderate amounts of messaging data. Mid-size to large customers will want to archive other data types (file system, Instant Messages, SharePoint) along with e-mail and require strong eDiscovery capabilities across those realms, let alone require a reduction of storage use at the archive.
Some questions you should be asking yourself before implementing archiving:
1. Why are you going to implement archiving? Is it for storage management, eDiscovery, compliance, all of the above?
2. How long will you be required to retain data within the archive?
3. How does the archiving solution scale?
4. If you currently have a 3rd party archiving solution how will it integrate or coexist?
5. What does the proposed solution give me? (For instance, storage reduction, eDiscovery capabilities, enhanced mobility, improved backups, improved DR, etc.)
Author: Brenden Doyle, Senior Consultant
As of March 1st 2010 all companies that have electronic information that is classified as personal information for a Massachusetts resident must protect that information from a possible data loss situation per 201 CMR 17.00: STANDARDS FOR THE PROTECTION OF PERSONAL INFORMATION OF RESIDENTS OF THE COMMONWEALTH. What does this mean for corporations?
Unlike Sarbanes Oxley, which forces corporate entities to take specific actions to ensure compliance with stated regulations, the Massachusetts Data Protection Regulation (MA 201 CMR 17) requires a corporation's "best effort" to ensure certain types of data are protected to the best of your ability.
This subtle change in wording places the burden squarely on the corporate entity for protecting personal information, but leaves much up to interpretation. While non-compliance with Sarbanes Oxley is potentially defensible in court by corporations who say the requirements are financially burdensome, the new 201 CMR-17 law centers on the answer to the question "Did you do everything within your power to protect this information?" This can lead to uncomfortable questions about cost per technical feature, such as "Is $50,000 too much of a financial burden for a company that had XX amount of profits last year to protect a customer's personal information?" This is not a line of questioning any corporate attorney wants to face, and certainly not following a public data breach.
Many incidents of backup tapes being lost are well documented. The size of a data breach from the loss of a set of backups can be astronomical. With the high capacity tape media available today, an LTO 4 drive can realistically hold over a terabyte worth of data. Just one tape could contain the entire HR database or sales and customer information for a whole quarter. With so much data contained on a single piece of media, the loss of a box of tapes could mean the loss of corporate records for an entire week, month, quarter, or year, depending on the backups lost.
This is why everyone is scrambling to ensure that any backup tape stored offsite is encrypted. The burden of proof will be squarely on the holder of the personal data to ensure everything reasonably possible was done to prevent that data from being compromised.
Backup tapes are routinely shipped offsite with a third party vendor to provide a level of protection from potential disasters. But without some form of encryption, there is no way to ensure that the backups cannot be comprised once they are no longer in your custody. There are a couple of options for accomplishing this today. NetBackup, for example, has both a client-side and a media server encryption option which allows the IT administrator to choose where and when to encrypt data. If all of the personal information that would require encryption is local to a single server, then encrypting at the client may be sufficient. If there are multiple servers containing personal information, the media server encryption may be more efficient.
Another popular method of encrypting backup data is to use an appliance or an LTO4 encryption- capable tape drive. Both client-side and media server encryption methods have a direct performance impact on the server doing the encryption. The appliance model removes the performance impact from the servers and maintains the proper compression ratios, offering the best of both worlds for a premium price.
With all of the encryption solutions available today, key management is the biggest concern. The encryption keys used to encrypt the data need to be protected even more securely than the data once encrypted. Maintaining the keys is a specific challenge requiring both protection of the keys and a secure method of recovering the keys in the case of a disaster. Most of the key management software solutions available provide a method of regenerating the encryption keys through the use of a passphrase. This allows the exact same set of encryption keys to be regenerated by entering the passphrase into the utility.
Daymark recommends a dual method to protect the encryption keys. As keys are not changed very often, we recommend the actual database pieces containing the keys in the key management software be burned to a CD and stored separately from the encrypted backups. A disaster recovery container stored with the offsite host vendor is highly recommended. This container should hold operating system CDs, the DR plan, the emergency contact list and the encryption key CD as well as the passphrase in case the keys need to be regenerated. This will provide a way to protect the data in an encrypted format without storing the keys with the backup tapes. And, importantly, this allows you to successfully answer questions about Mass 201 CMR -17 compliance!
You'll find more useful information here:
201 CMR - 17
http://www.mass.gov/Eoca/docs/idtheft/201CMR1700reg.pdf
201 CMR - 17 Frequently asked Questions
http://www.mass.gov/Eoca/docs/idtheft/201CMR17faqs.pdf
Reminder notification
http://www.mass.gov/?pageID=ocamodulechunk&L=4&L0=Home&L1=Government&L2=Our+Agencies+and+Divisions&L3=Division+of+Insurance&sid=Eoca&b=terminalcontent&f=doi_Bulletins_bulletins_10_02&csid=Eoca
Exchange, MSSQL, File Services, SharePoint and AD...Microsoft applications play major roles in most IT environments these days. It's not accidental that the major storage vendors are making great effort to easily integrate with these applications. VSS Framework seems to be the prevailing wind.
VSS (Volume Shadow Copy Service) provides the system infrastructure for running VSS applications on Windows-based systems. VSS enables the creation of point-in-time cache-consistent snapshots of primary data volumes and provides the low-level driver functionality required for a VSS application to manage those consistent snapshots. VSS API integration is now commonplace and available with backup applications such as EMC NetWorker, Symantec NetBackup, CommVault SIMPANA and with storage manufacturers like EMC, NetApp and HDS to name a few.
It makes a lot of sense to be able to tightly integrate your mission critical applications like messaging, database and document management with your data management infrastructure. For Windows systems, VSS looks like a solid enabling solution.
VSS is a service and device level component available in the Windows kernel. Microsoft provides a set of communication APIs that create a consistent interface between the (writers) and (requesters) during snapshot creation activity. The writer is an application that coordinates its I/O operation with the VSS operation so that data on the shadow copy or snapshot is in a consistent state. The requester is an application such as backup, archive or storage array snapshot software that request a snapshot be created. The detail in this process has likely already been taken care of through a VSS agent available from your favorite storage management application provider.
If you have not already tried using this technology in your environment I suggest you pick one of your Microsoft applications and give it a run through its paces - I think you ‘ll like the results.
As with any new target device you might be incorporating into your backup environment, you are usually doing it for one of the following reasons; to increase capacity and /or throughput or to improve manageability. The basic connectivity considerations that you would apply to any backup device such as tape, disk or optical still hold true for deduplication targets. Because of some of the unique benefits deduplication solutions offer, it is very important that you don't overlook critical architectural components in the quest to best leverage this technology.
For those who still question the maturity of the technology I would remind you that its underpinnings are loosely based on journal file system concepts which I was first exposed to in the open systems world in the early 1990's through the Digital UNIX ADVFS file system. (Remember those guys?) This type of file system abstracts the address level of the file system from the data level, creating pointers from the address level to common data sets at the data level. First used for pointer-based snapshots, this concept has since been leveraged for deduplication storing only unique data blocks at the data level and using the address level as a reference.
There are two predominate deployment configuration commonly being used across the industry today: source based and target based. Source-based deduplication occurs on the client side and deduplicates data on the host before it sends the information across the TCP/IP layer. This can make it a good candidate for servers at a remote office that might not have an optimal WAN connection to the final data storage location. Most source-based deduplication products started off as standalone backup applications with proprietary agent code that makes it difficult, if not impossible, to integrate with heterogeneous backup environments. Where backup application integration exists it is usually through an acquisition of the code by one of the big storage manufacturers and only with that manufacturer's applications.
Target-based deduplication, on the other hand, is typically appliance based and designed to plug into the backend of the existing backup infrastructure similar to a classic backup device such as an automated tape library. Target-based appliances are designed to fit into the existing backup solution paradigm and can usually be presented as both a VTL (virtual tape library) of a disk backup target. Unlike source-based solutions, target deduplication occurs at the device level after normal backup data is sent to the backup server; this provides no reduction in data at the TCP/IP level, making it a better choice for data center deduplication where LAN bandwidth is not an issue.
Both source- and target-based solutions can reduce or eliminate physical tape from a backup environment by leveraging the same deduplication mechanism at the replication layer. If you can replicate data offsite to a second appliance, then you can eliminate the need to create a physical tape for offsite storage. If only the deduplication delta has to be replicated to make a complete offsite copy, you will contain your site-to-site network connectivity costs.
Remember, the amount of data you manage in a deduplicated environment is greatly affected by the retention period of the backup data. Long term retentions can have a significant cost on the overall solution. Managing daily, weekly and even monthly data on deduplicated is common, for longer retention periods hybrid tape/ de-dupe solutions may be the best answer. Always consider any additional licensing cost for integration into an existing backup infrastructure and support for your client base (e.g. unique operation systems or database agent support).
You are tasked with solving your company's latest IT challenge; you decide to set up a meeting with your favorite and not-so-favorite technology manufacturers. Oh yeah, you also remind yourself to include a meeting with your local VAR who always comes through in a pinch when you actually have to deploy something. In your meetings you hear about solutions that will move, manipulate, protect, and organize your data while getting 35/mpg, run on alternative fuel and go from 0-60 in five seconds. After many hours of technology overload and resetting your focus on the initial task at hand, you run into your boss in the hallway. Your boss asks how the project is moving along, you reply "Great. We will be able to get the entire IT staff to and from the office for a month while carrying a cord of wood, towing the company bulldozer, all on a single tank of gas. Waite a minute, do we even have a company bulldozer?"
It's not difficult to get sidetracked when selecting technology solutions to solve your data management problems. You want to make sure you do your due diligence and understand all the potential options for solving your IT challenge and would not want to accidently leave out the best solution in your investigation. This can easily create an opportunity for the manufacturers to talk about all the cool stuff their companies have been recently developing, whether or not it has anything to do with your current project. Perhaps they can even uncover another sales opportunity even though they may have been removed from the details of your environment for over a year.
This is the perfect time to engage your local VAR who has probably been at your facility supporting you 10 times over the past year, or probably even occupied a guest cube at times. They have likely had their hands in the mix at your site and deployed similar technology at other clients. They will likely know the caveats to successful interoperability with your infrastructure. At this point you have probably reviewed a ton of marketing material and made some directional choices, now it's time to talk to the folks in the field, review and admin guide or two and flip through some release notes.
Quick introduction; I am writing this blog "Storage Navigator" to hopefully distill practical solution architectures from all the technology "BUZZ" so common in this industry. I have been involved in deploying storage solutions and managing infrastructures since the early 1990's, with companies such as Invincible Technology(ITC), Berkshire Computer, Glasshouse Technology and now Daymark Solutions.
Though most topics in this industry can easily have multiple books published for each, I will try to cover common technology interoperability leveraging filed experiences and proven, repeatable design elements. The intent is to keep concepts at a high level while still exposing critical implementation caveats. Hopefully some of the topics covered in this blog will prove useful for those currently navigating through piles of technology information in the quest for common sense solutions. I'd like to hear your thoughts on these topics as well, so please feel free to comment.