Analysis Services data directory maintenance
By default, all of the files
used by Analysis Services to store MOLAP data are stored in the
Analysis Services data directory. The location of this directory is set
during installation, and can be changed by changing the value of the DataDir server property.
As with any other type of
file handled by Windows, the files in the Analysis Services data
directory are prone to fragmentation. During processing a lot of files
are created, written, and deleted, and if the disk is very fragmented
these files may be split across different portions of the disk, slowing
down access to them.
A regular defragmentation of the
drive on which the data directory exists will improve the overall speed
of the disk subsystem, which will of course have benefits for Analysis
Services query and processing performance. As a result we suggest
regular disk defragmentation of the Analysis Services data directory's
drive as part of the server's maintenance plan, making sure of course
that it only takes place at a time when we are not processing or when
users are not querying the cube.
Backup
It may be stating the obvious,
but after processing has finished we should always back up our Analysis
Services database. This can be done very easily from Integration
Services using an Execute Analysis Services DDL task; once again, we can
generate the XMLA command needed to back up a database by
right-clicking on it in SQL Management Studio, selecting Back Up, and then pressing the Script button on the Back Up Database dialog.
Copying databases between servers
There are several scenarios
where we may need to copy an Analysis Services database from one server
with one on a different server. Here are a couple of examples:
We have
several frontend servers directly accessible by users in the DMZ area of
the network, and one backend server inside the internal network. When
the cube is processed on the internal server, it needs to be duplicated
to the frontend servers so that it can be queried by our users.
We
have two powerful Analysis Services servers and several cubes. Half of
the cubes are processed on one server and half are processed on the
other. When both servers have finished processing, the databases are
synchronized between the two servers, so they both have the same data
and can share the query load.
Using Analysis Services 2008 we have three options to synchronize data between servers:
Use the Synchronize XMLA command: This is the preferred method in most situations. When executed, the Synchronize
command tells one Analysis Services instance to synchronize a database
with a database on another instance. During the synchronization process
Analysis Services checks for any differences between the version of the
database on the destination database and the version on the source
server. Updated objects are copied over the network from the source
server to the destination server until the destination is identical to
the source.
The XMLA needed for a Synchronize command can be generated in SQL Management Studio by right-clicking on Databases node of an instance in the Object Explorer
pane, running the Synchronize Database wizard and at the end choosing
to generate script rather than running the synchronization immediately.
This XMLA command can be run from an Execute Analysis Services DDL Task
in Integration Services.
The
synchronization operation might be slow in situations where there is a
lot of data in a cube or where there are a lot of differences between
the source and the destination. However, the major advantage of using
this approach is that users can continue to query the destination server
while synchronization takes place, although when synchronization has
completed there will be a (usually short) period where new users cannot
connect and new queries cannot be run, while the old version of the
database is replaced by the new version.
Note that synchronization
cannot be used at the same time as lazy aggregation building is taking
place because there is a risk of database corruption.
Use backup/restore:
Even if this approach seems crude when compared with the more elegant
Synchronize XMLA command, it has some advantages. For a start, we're
going to be backing up our database anyway after it has been processed
so using the backup to copy the database to another server will require
little extra development. Secondly, we can take the same backup file and
copy it to multiple destination servers, where they can be restored in
parallel.
A problem with this
approach is that if we try to restore over our existing database on the
destination server then we cannot query that database while the restore
is taking place. One way of working around this would be to restore to a
different database to the one that users are currently querying, and
then to redirect all new queries to this second database after the
restore has completed. This would be feasible if the only client tool
used was Reporting Services, where connection strings can be
parameterized, but not if a client tool like Excel was used. Another
issue is that we have to copy the entire database over the network even
if there is only a small amount of new data, and if the backup file is
very large then this might take a relatively long time and require extra
disk space.
Use Attach/Detach:
This method simply involves detaching the database from the source
server, copying the database files over to the destination server,
detaching the old version of the database on the destination server and
then reattaching the new version of the database. Since attaching and
detaching a database is a very fast operation this method involves the
minimum of downtime; it also has the advantage that if a database is
attached to one instance with its ReadModeMode property set to ReadOnly, then the same database files can be attached to multiple other instances in the same way.
The main problem with this
approach is that when we detach a database it cannot of course be
queried at all, and any open connections to it are closed. This may lead
to errors in some client tools and, at best, all users will have to
reconnect to the server once the attach operation has completed.
Our recommendation is to use the Synchronize command if it runs fast enough and doesn't result in queries being blocked for too long. If we can't use Synchronize because it takes too long, or we don't mind that connections will be dropped, then attach/detach is the best option.