Creating Partitions Using Dynamic SQL

Recently I wrote a post on partition switching, however, one of the big pieces I left out was how to go about properly partitioning your database tables. Unfortunately, partitioning strategy can vary greatly, but generally your partitions will be aligned with your clustered index (quite frequently date). I recommend reading the Fast Track reference guide for guidance on how to layout file groups efficiently.

So now that we have that clear, the next question is – how is the data actually partitioned? Well there is three pieces to be aware of, the first is the partition function. A partition function is used to map the rows of a table into specific partitions (it’s worth noting that you can now have up to 15,000 partitions per table in SQL Server 2012, versus 1,000 per table in SQL 2005 & 2008). In its simplest form you can define your partition function with a rather simple SQL statement.

--MSDN Example
CREATE PARTITION FUNCTION myRangePF1 (int) AS RANGE LEFT FOR VALUES (1, 100, 1000);

This example creates 4 partitions (yes four, not three as it would appear to create) for an integer value with each number in the range representing the left side of the boundary:

Partition # Column Values
 1  Column <= 1
 2  1 < Column <= 100
 3  100 < Column <= 1000
 4 Column > 1000

However, depending the size of our tables and complexity of our data sometimes a model this simplistic doesn’t always work. In these situations we can use dynamic SQL to create our partition function. In this next example we are creating a partition for every month in a fact table which has an integer key for date. Our date key format is YYYYMMDD.

--Create date partition function with increment by month.
DECLARE @DatePartitionFunction nvarchar(max) = N'CREATE PARTITION FUNCTION FactSalesPartitionFunction (int) AS RANGE RIGHT FOR VALUES (';
DECLARE @i datetime2 = '19900101';
WHILE @i < '20500101'
BEGIN
                SET @DatePartitionFunction += '''' + CAST(DATEPART(year, @i)*10000 + DATEPART(month, @i)*100 + DATEPART(day, @i) AS varchar(8)) + '''' + N', ';
                SET @i = DATEADD(MM, 1, @i);
END
SET @DatePartitionFunction += '''' + CAST(DATEPART(year, @i)*10000 + DATEPART(month, @i)*100 + DATEPART(day, @i) AS varchar(8)) + '''' + N');';
EXEC sp_executesql @DatePartitionFunction;
GO

Once we have our partition function created we then need to create a partition scheme which allows us to map our database partitions created by the partition function to file groups. Again this can be as simple or complex as you would like it to. You can map all your partitions to one file group:

CREATE PARTITION SCHEME myRangePS1 AS PARTITION myRangePF1 ALL TO ( testfg );

You can also map each partition to a different file group (note: you have to have one file group specified for each partition, however, you are allowed to reuse file groups):

--MSDN Example
CREATE PARTITION SCHEME myRangePS1 AS PARTITION myRangePF1 TO ( test1fg, test2fg, test3fg, test1fg );

Or, you can once again write dynamic SQL to lay out your partitions into file groups. In this example we take the  720 partitions created in our previous partition function and evenly spread them across 12 file groups (which would ideally be spread across 12 drives):

DECLARE @DatePartitionScheme nvarchar(max) = N'CREATE PARTITION SCHEME FactSalesPartitionScheme AS PARTITION FactSalesPartitionFunction TO (';
DECLARE @i datetime2 = '19900101';
DECLARE @m int = 12
WHILE @i <= '20500101'
BEGIN
                SET @DatePartitionScheme += '''' + 'Sales_' + CAST((@m % 12) + 1  AS varchar(2)) + '''' + N', ';
                SET @i = DATEADD(MM, 1, @i);
                SET @m = @m + 1           
END
SET @DatePartitionScheme += '''' + 'Sales_' + CAST((@m % 12) + 1  AS varchar(2)) + '''' + N');';
EXEC sp_executesql @DatePartitionScheme;
GO

Once the partition function and the partition scheme is created the next task is to either recreate your clustered index on partition scheme or create a new table on the partition scheme -

--Creating a clustered index using a partition scheme
CREATE CLUSTERED INDEX [ixTimeKeyClustered] ON [Sales].[FactSales] (  [TimeKey] ASC )
WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = ON, ONLINE = OFF, ALLOW_ROW_LOCKS  = OFF, ALLOW_PAGE_LOCKS  = OFF) 
ON [FactSalesPartitionScheme]([TimeKey]) 
GO 

--Creating a table (heap) using a partition scheme (MSDN Example)
CREATE TABLE PartitionedTable (col1 int, col2 char(10)) ON myRangePS1 (col1);
GO

I’m Presenting at SQL Saturday #160!

Great news! I found out this week that I will be presenting at SQL Saturday #160. During my presentation I will be covering how to load balance your SSAS databases using a unique approach called Analysis Services Load Balancing (ASLB). Microsoft uses a variation of this method internally, and I’m excited to discuss the topic with all of you.

Introduction to Partition Switching

Quite often when building a data warehouse we are given the challenge of having to delete existing data before importing the latest and greatest. In an ideal world we would just insert new records and be on our way, but quite often that is not the case. Deleting data isn’t the best option either as by default a DELETE statement takes an exclusive lock on the table it is being executed against. For smaller warehouses you can often work around this by use of the NOLOCK query hint, however, there is a better way to remove data from a table. Partition switching is that its an almost instantaneous schema change (behind the scenes there is actually a pointer change happening inside the filegroup) that equates to selectively truncating data from your table.

There are a few important requirements to be aware of when exporing partition switching:

  • The receiving table must already exist in the table, be empty, and have the same column structure, indexes, constraints, and partition scheme the source table.
  • Both the source and destination tables or partitions must be in the same file group, the pointer switch which happens behind the scenes is not possible without this.
  • There cannot be any XML or Full Text indexes on the source table
  • There can be no foreign key relationships between the source and destination tables, nor can there be any foreign key relationships which reference the source table.
  • Tables involved in the partition switch can not be sources of replication.
  • Triggers cannot be fired during the switch.
  • Full details on the rules of partition switching can be found on MSDN

So now that we understand at a high level what partition switching is and its limitation, lets look at how we eliminate data using partition switching. First we are going to make two identical copy of our partitioned table. This can easily be automated using a stored procedure, or if your table structure never changes these tables can be created one time and a truncate can be used instead of a drop at the end of the process.

Once the destinations to be used for partition switching are created, the next step is to load new and existing data you wish to keep into one of your duplicates. In my example I am taking 2010 data from a source system using an ETL process and also doing an INSERT INTO … SELECT from FactSales to FactSales#Keep with the 2010 data I do not want to eliminate.

Once all the data we would like to keep is staged into FactSales#Keep the next step is to switch the partition from the orginal table, FactSales, into the elimination table, FactSales#Elim by executing ALTER TABLE FactSales SWITCH PARTITION <source_partition_number_expression> TO FactSales#Elim <target_partition_number_expression>. Since these partitions are in the same file groups SQL Server simply does a pointer switch on the filesystem to achieve this.

Upon swapping out the partition we wish to eliminate we now have an empty partition in the original FactSales table, this allows us to now swap new data in. So we will execute the same switch only this time swapping data in from FactSales#Keep into FactSales.

Once we have successfully loaded FactSales, we can drop the two temporary tables we created at the beginning of the excercise, thus allowing us to replace existing data in our fact table with minimal impact to performance in the warehouse.

 

Indexing Data Warehouses in SQL Server 2012

This week I was at the Detroit MTC for a SQL Server 2012 release event where Rick Brewis (Microsoft) and my friend Josh Fennessy (BlueGranite) presented on the new functionality of SQL Server 2012. One of the highlights Rick’s presentation was the new Columnstore Index, and the huge performance increases it yields over existing models. After seeing the performance increases I started wondering how the Columnstore Index compared to other common indexing strategies available in the SQL Server engine in a data warehouse.
Continue reading →

Checking IP Against CIDR Netmask

I’ve been in the process of improving the functionality in ASLB, one of the improvements I have been working on adding is functionality for ASLB to be smart enough to understand geo-location in a large network.

However in doing so  I went through many iterations on how to identify which network a user is in for picking the server priority. The result was to create a network table and a scalar function which returned a bit if the provided IP address was in the supplied sub network. Below is the function I used to accomplish this. With a little more work I may modify the function to accept the netmask in CIDR notation instead of two separate columns.

-- ==================================================================================
-- Author:		Anthony Mattas
-- Create date: 1-28-2012
-- Description:	This function takes an IP address, CIDR network, and CIDR netmask
--		(i.e. 192.168.0.0/16) and checks to see if the IP address is in the 
--              network
-- ==================================================================================
CREATE FUNCTION [dbo].[udf_CheckNetmask]
(
	@Address VARCHAR(16),
	@Network VARCHAR(16),
	@Netmask INT
)
RETURNS BIT
AS
BEGIN
	DECLARE @Octet1 INT
	DECLARE @Octet2 INT
	DECLARE @Octet3 INT
	DECLARE @Octet4 INT

	DECLARE @BAddress BINARY(4)
	DECLARE @BNetwork BINARY(4)

	DECLARE @Return BIT

    SELECT @BAddress = CAST( CAST( PARSENAME( @Address, 4 ) AS INTEGER) AS BINARY(1))
            + CAST( CAST( PARSENAME( @Address, 3 ) AS INTEGER) AS BINARY(1))
            + CAST( CAST( PARSENAME( @Address, 2 ) AS INTEGER) AS BINARY(1))
            + CAST( CAST( PARSENAME( @Address, 1 ) AS INTEGER) AS BINARY(1)) 

    SELECT @BNetwork = CAST( CAST( PARSENAME( @Network, 4 ) AS INTEGER) AS BINARY(1))
        + CAST( CAST( PARSENAME( @Network, 3 ) AS INTEGER) AS BINARY(1))
        + CAST( CAST( PARSENAME( @Network, 2 ) AS INTEGER) AS BINARY(1))
        + CAST( CAST( PARSENAME( @Network, 1 ) AS INTEGER) AS BINARY(1)) 

	SELECT @Return =
		CASE
			WHEN
			0 = (cast(@BAddress as bigint) ^ cast(@BNetwork as bigint))
				&
				~(power(CAST(2 AS bigint), 32 - @Netmask) - 1)
			THEN 1
			ELSE 0
		END

	RETURN @Return

END

GO