Saturday, March 16, 2013

Controlling Content Database Size in SharePoint


A SharePoint content database can be up to 4TB with data (max 200GB is recommended). However, storage size is not the problem; it is the recovery time to restore all that data that is the availability problem. The recovery time decides for how long your business critical solution will be down. As SharePoint can spread its content across multiple databases, it is recommended that your architecture segments different content across different databases based on IA and other user experience aspects, plus business requirements for availability and recovery time. Plan for structuring your solutions with a strong focus on your information architecture (IA).

Here are some options for how to control the size of the content databases, without disposing and deleting content:

A) Use an ootb Record Center as an archive for old content: The users must manually send each document to the RC using e.g. move and leave a link; note that only the latest major version with metadata is kept – all version history is lost. The information management policies supported by SharePoint for retention and disposition can be used to automate the cleanup.
As the RC has its own content databases, the live collaboration databases will grow slower or even shrink as outdated information is moved to the archive. Keeping the live databases small ensures shorter recovery time; while the recovery time for the archived content can be considerable, but not business critical.
Search must be configured appropriately to cover both live and archived content.

B) Use a third-party archiving solution for SharePoint from e.g. MetaLogix or AvePoint. This has the same pros & cons as in option A, but the functionality is probably better in relation to keeping version history and batch management of outdated content.
Search must be configured appropriately to cover both live and archived content.

C) Use a third-party remote blob storage (RBS) solution for SharePoint, such as MetaLogix StoragePoint, so that documents are registered in the database, but not stored there. This gives smaller content databases, but more complicated backup and recovery as the content now resides both in databases and on disk. Provided that you don’t lose both at the same time, the recovery time should be shorter.
Search will work as before, as all content is still logically in the “database”.

D) Use powershell scripts or other code to implement the disposition of outdated content. The script can e.g. copy old documents to disk and delete old versions from the content database; the drawback being that all metadata will be lost and there is no link left in SharePoint.
The databases size will shrink as data is actually deleted, and backup and recovery is more complicated as content is now both in the database and on disk (same as for option C).
Search can be configured to also crawl and index the files on disk, but content ranking will suffer as the valuable metadata is lost.

My recommendation is to consider option A first, especially if you are able to define automated rules and exploit the built-in information management policies in SharePoint. The keyword is *able* - in my experience, everyone is positive to having automated retention and disposition, but noone even at large banks and law firms are able to come up with the policies.

Always consider using RBS for databases larger than 200GB, and note that RBS also helps you meet the disk IOPS requirements of SharePoint.


No comments: