TSQL code to see all cols in a DB containing only NULLs

A colleague of mine lately asked me if I know about a quick way, how to get a list of all the columns in a Database, that contain only NULLs. Well, here’s a script I came up with. I am pretty sure, there are other ( even better ) options out there, but for some reason I liked this so I decided to share the code here.

BTW. Posting MSSQL logger values inside Elasticsearch using Python blogpost is coming soon, so stay tuned ūüôā

 

DECLARE @ID int;
DECLARE @query1 nvarchar(MAX);
DECLARE @TableName nvarchar(255);
DECLARE @ColumnName nvarchar(255);
DECLARE @SchemaName nvarchar(255);

USE AdventureWorksDW2014; --USE THE DB YOU ARE INTERESTED IN
SET NOCOUNT ON;

IF OBJECT_ID('tempdb..#tmp_counts') IS NOT NULL
DROP TABLE #tmp_counts;

IF OBJECT_ID('tempdb..#RESULTSET') IS NOT NULL
DROP TABLE #RESULTSET;

CREATE TABLE #tmp_counts (
ID int IDENTITY (1,1) PRIMARY KEY,
SchemaName nvarchar(255),
TableName nvarchar(255),
ColumnName nvarchar(255));

CREATE TABLE #RESULTSET (
SchemaName nvarchar(255),
TableName NVARCHAR(255),
ColumnName NVARCHAR(255),
[CountOfRows] int,
[CountOfRows_NULL] int);

INSERT INTO #tmp_counts
SELECT
s.name [Schema],
t.name [TableName],
c.name [ColumnName]
FROM sys.tables t
INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
INNER JOIN sys.columns c ON c.object_id = t.object_id
WHERE c.Is_Nullable = 1
ORDER BY 1,2;
WHILE EXISTS (SELECT * FROM #tmp_counts)

BEGIN

SELECT
TOP(1) @ID = ID,
@SchemaName = Schemaname,
@TableName = TableName,
@ColumnName = ColumnName
FROM #tmp_counts
ORDER BY ID;

SELECT @query1 =
'INSERT INTO #RESULTSET  SELECT '''+@SchemaName+''','''+ @TableName+ ''' AS TableName,'''+@ColumnName+ ''' AS ColumnName,
(SELECT COUNT(*) AS CountOfRows FROM ['+@SchemaName+'].['+@TableName+'] WITH (NOLOCK)),
(SELECT COUNT(*) AS CountOfRows_NULL FROM ['+@SchemaName+'].['+@TableName+'] WITH (NOLOCK) WHERE ['+@ColumnName+'] IS NULL);';

EXEC (@query1);

DELETE FROM #tmp_counts WHERE ID = @ID;
END

--YOUR FINAL QUERY IS HERE:
SELECT 'Column with all rows NULL' [Check Description], *
FROM #RESULTSET
WHERE CountOfRows > 0 AND
CountOfRows = CountOfRows_NULL;

Advertisements

Posting SQL Server ETL Logging and Monitoring table data into ElasticSearch using Python part 1 of 3

Lately I’ve been working on a very specific project. This concept should be able to pass our ETL logging and monitoring MSSQL table data into ElasticSearch, allow the TPM to build Kibana dashboards on top and raise events that will go into PagerDuty. In this series, I will focus on the first part, that is the push of the ETL Logging and Monitoring table data into ElasticSearch document DB using a Python script. The kudos here goes to a colleague of mine, who helped me out with the Python code as I am still a Python student. BTW if you’re interested in Learning Python, here is a link to a great book online. ( Personally I like to play around with Python Data Science projects the most, hope to write a post or two on this topic as soon as possible )

This first blogpost will handle the installation of the MSSQL table that could be used for logging.

Next coming will be part 2 of this series, handling the Python code. In the last part, I would like to wrap up this project.

So here is an installation script for inspiration with some dummy data inserts:


USE [master]
GO

CREATE DATABASE [Logger]
GO

USE [Logger]
GO

CREATE SCHEMA [LogMon]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [LogMon].[LogMonitor](
[LogMonitor_ID] [int] IDENTITY(1,1) NOT NULL,
[Execution_UTC_Datetime] [datetime] NULL,
[Source_File_Processed_Datetime] [datetime] NULL,
[Source_FileName_Timestamp_Date] [date] NULL,
[Source_FileName] [nvarchar](255) NULL,
[Customer_Name] [nvarchar](255) NULL,
[Data_Feed_Type] [nvarchar](255) NULL,
[Server_Name] [nvarchar](255) NULL,
[Stage_Name] [nvarchar](255) NULL,
[Target_Name] [nvarchar](255) NULL,
[Check_Name] [nvarchar](255) NULL,
[Check_Value] [int] NULL,
[Moved_To_ES] [bit] NULL,
PRIMARY KEY CLUSTERED( [LogMonitor_ID] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]) ON [PRIMARY]
GO

ALTER TABLE [LogMon].[LogMonitor] ADD DEFAULT ((0)) FOR [Moved_To_ES]
GO

INSERT INTO [LogMon].[LogMonitor]
([Execution_UTC_Datetime],
 [Source_File_Processed_Datetime],
 [Source_FileName_Timestamp_Date],
 [Source_FileName],
 [Customer_Name],
 [Data_Feed_Type],
 [Server_Name],
 [Stage_Name],
 [Target_Name],
 [Check_Name],
 [Check_Value])
SELECT i.*
FROM (
   SELECT
     GETDATE() [Execution_UTC_Datetime],
     DATEADD(DAY,-1,GETDATE()) [Source_File_Processed_Datetime],
     CAST(DATEADD(DAY,-1,GETDATE()) AS DATE) [Source_FileName_Timestamp_Date],
     'Entities_'+CAST(CAST(DATEADD(DAY,-1,GETDATE()) AS DATE) AS VARCHAR(10))+'.txt' [Source_FileName],
     'Customer_Name_A' [Customer_Name],
     'Entities' [Data_Feed_Type],
     @@SERVERNAME [Server_Name],
     'Stage1' [Stage_Name],
     'dbo.Entity' [Target_Name],
     'Created Rows' [Check_Name],
     ABS(CHECKSUM(NewId())) % 1000 [Check_Value]
   UNION ALL
   SELECT
     GETDATE(),
     DATEADD(DAY,-1,GETDATE()),
     CAST(DATEADD(DAY,-1,GETDATE()) AS DATE),
     'Entities_'+CAST(CAST(DATEADD(DAY,-1,GETDATE()) AS DATE) AS VARCHAR(10))+'.txt',
     'Customer_Name_A',
     'Entities',
     @@SERVERNAME,
     'Stage1',
     'dbo.Entity',
     'Modified Rows',
     ABS(CHECKSUM(NewId())) % 1000
   UNION ALL
   SELECT
     GETDATE(),
DATEADD(DAY,-2,GETDATE()),
CAST(DATEADD(DAY,-2,GETDATE()) AS DATE),
     'Contacts_'+CAST(CAST(DATEADD(DAY,-2,GETDATE()) AS DATE) AS VARCHAR(10))+'.csv',
     'Customer_Name_B',
     'Contacts',
     @@SERVERNAME,
     'Stage2',
     'dbo.Contact',
     'Created Rows',
     ABS(CHECKSUM(NewId())) % 1000
   UNION ALL
   SELECT
     GETDATE(),
DATEADD(DAY,-2,GETDATE()),
CAST(DATEADD(DAY,-2,GETDATE()) AS DATE),
     'Contacts_'+CAST(CAST(DATEADD(DAY,-2,GETDATE()) AS DATE) AS VARCHAR(10))+'.csv',
     'Customer_Name_B',
     'Contacts',
     @@SERVERNAME,
     'Stage2',
     'dbo.Contact',
     'Modified Rows',
     ABS(CHECKSUM(NewId())) % 1000
) i
GO

And the dummy data that shall be passed to ElasticSearch should look like this:

sampletable

 

SSRS cascading report parameters using MDX queries

SSRS report parameters cascading is a regular usability requirement. In this tutorial, I will demonstrate how to proceed using MDX queries. The background to this is, that the default queries generated by the SSRS wizards are far below the standard we wish to deliver. Let’s dive in using the famous Multidimensional AdventureWorks DW OLAP Project.

Lets start by creating a Dataset for the first parameter in the cascade. Start in the Dataset Query designer as shown below.

QueryDesigner1

The MDX query I used is like this:

WITH
MEMBER [Measures].[ParCaption] AS [Product].[Category].CURRENTMEMBER.MEMBER_CAPTION
MEMBER [Measures].[ParValue] AS [Product].[Category].CURRENTMEMBER.UNIQUENAME

SELECT
{[Measures].[ParCaption], [Measures].[ParValue]} ON COLUMNS,
[Product].[Category].ALLMEMBERS ON ROWS
FROM [Adventure Works]

Next step is actually creating these report parameters, the first Parameter P_ProductCategory should be set like this:

Parameter_1

Parameter_2

The second parameter needs to be created exactly the same way once you prepare its Dataset as described in the next step.

Continue by creating another SSRS Dataset used for the second parameter P_ProductSubcategory in the cascade. This parameter value gets calculated on the fly as you pick the first parameter value.

WITH
MEMBER [Measures].[ParCaption] AS [Product].[Subcategory].CURRENTMEMBER.NAME
MEMBER [Measures].[ParValue] AS [Product].[Subcategory].CURRENTMEMBER.UNIQUENAME

SELECT
{[Measures].[ParCaption], [Measures].[ParValue]} ON COLUMNS,
[Product].[Subcategory].[Subcategory] ON ROWS
FROM [Adventure Works]
WHERE STRTOSET(@P_ProductCategory)

Notice the STRTOSET function. In case we would look for a boolean value, we could use STRTOMEMBER instead. In case we would look for multiple parameters, you would write WHERE ( STRTOSET(@P_ProductCategory), STRTOMEMBER(@ProductBooleanParameter) )

To make this work, we need to set the parameters of the second Dataset like this:

Dataset_Report

Notice you might run into an error (actually a VS bug) when writing the MDX query related to the Dataset in the query editor saying ¬†“The query contains the XXXXXName parameter, which is not declared.” In that case, review the forum here¬†but the solution is rather quick. Spoiler: Look for the Query Parameters icon in the top menu ( highlighted in orange box in the Query designer printscreen in the first screenshot from above) and set your parameters for the first time manually with some default value as well, that should make things work here.

Next step is creating the result dataset for the SSRS Report matrix. The query I used is trivial and is set like this:

SELECT (
[Product].[Category].[Category],
[Product].[Subcategory].[Subcategory],
[Product].[Product].[Product]) ON ROWS,
[Measures].[Order Count] ON COLUMNS
FROM (
SELECT (STRTOSET(@P_Product_Category), STRTOSET(@P_Product_Subcategory)) ON COLUMNS
FROM [Adventure Works]
)

Notice here, that in MDX you cannot use the same dimension hierarchy more then once, so you cannot use it in the SELECT and WHERE at the same time. This is the reason I decided to go for a Sub-Select, but there are many other options you can easily find on the internet. And here you go, after choosing Bikes and Clothing in the Product Category Parameter, you get only the relevant Product Subcategories, below are few screenshots of the simple SSRS Report:

Parameter_3

Report

TSQL histogram

Sometimes I like to fiddle around with TSQL. Not sure how useful this trick might be, but here is a code I came up with, that delivers histograms based on your data and a few variables you define. So as you can see, the dataset for this script is quite known Adventure Works DW dbo.factInternetSales . You can define the bucket count and the Bar chart width variables to fine tune your output. This histogram splits the dataset into the declared buckets based on the ProductKey FK column values.

SET NOCOUNT ON;

USE AdventureWorksDW2014;

DECLARE @BucketCount DECIMAL(8,3) = 10;
DECLARE @BarChartWidth INT = 100;
DECLARE @iKeyCount DECIMAL(8,3) = (SELECT COUNT(DISTINCT ProductKey) FROM FactInternetSales);
DECLARE @iBucketSize DECIMAL(8,3) = @iKeyCount / @BucketCount;

SELECT
i.*,
REPLICATE('=', @BarChartWidth * i.SumSalesAmount / (SELECT SUM(SalesAmount) FROM factInternetSales)) AS [BarChart]
FROM (
     SELECT
     ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) Bucket_ID,
     MIN(ii.ProductKey) Bucket_Range_From,
     MAX(ii.ProductKey) Bucket_Range_To,
     COUNT(ii.Bucket_ProductKeys_Count) Bucket_ProductKeys_Count,
     SUM(ii.SumSalesAmount) SumSalesAmount
     FROM (
          SELECT
          ProductKey,
          CEILING(CAST((ROW_NUMBER() OVER (ORDER BY ProductKey) )/@iBucketSize AS DECIMAL(8,3))) iBucket_ID,
          COUNT(*) Bucket_ProductKeys_Count,
          SUM(SalesAmount) SumSalesAmount
          FROM FactInternetSales
          GROUP BY ProductKey
          ) ii
      GROUP BY ii.iBucket_ID
      ) i;

And the result may look like this:

histogram1

But I would also like to see more accurate solution. So digging deeper, I came up with a code, that splits the dataset into the declared buckets based on the composite PK SalesOrderNumber, SalesOrderLineNumber. ( I also added precision to the decimal Datatype ) This code still feels quite straightforward to me, but gets a little bit more complex.

SET NOCOUNT ON;

USE AdventureWorksDW2014;

DECLARE @BarChartWidth INT = 100;
DECLARE @BucketCount DECIMAL(38,18) = 10;
DECLARE @iCount DECIMAL(38,18) = (SELECT COUNT(*) FROM FactInternetSales);
DECLARE @iBucketSize DECIMAL(38,18) = @iCount / @BucketCount;

;WITH CTE AS
(
     SELECT
     i.iKey,
     i.iID,
     i.iBucket_ID,
     i.SumSalesAmount
     FROM (
          SELECT
          TOP 100 PERCENT
          CAST([SalesOrderNumber] AS VARCHAR) + '-' + CAST([SalesOrderLineNumber] AS VARCHAR) iKey,
          ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) iID,
          CEILING(CAST((ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) ) / @iBucketSize AS DECIMAL(38,18))) iBucket_ID,
          [SalesOrderNumber],
          [SalesOrderLineNumber],
          SUM(SalesAmount) SumSalesAmount
          FROM FactInternetSales
          GROUP BY [SalesOrderNumber],[SalesOrderLineNumber]
          ORDER BY CAST(REPLACE([SalesOrderNumber],'SO','') AS INT),[SalesOrderLineNumber]
          ) i
)

SELECT
i.iBucket_ID,
i.Bucket_Range_From,
i.Bucket_Range_To,
(SELECT Cfrom.iKey FROM CTE Cfrom WHERE Cfrom.iID = i.Bucket_Range_From) Bucket_Range_SalesOrderKey_From,
(SELECT Cto.iKey FROM CTE Cto WHERE Cto.iID = i.Bucket_Range_To) Bucket_Range_SalesOrderKey_To,
i.Bucket_Row_Count,
i.Bucket_Sales_Amount,
REPLICATE('=', @BarChartWidth * i.Bucket_Sales_Amount / (SELECT SUM(SalesAmount) FROM factInternetSales)) AS [BarChart]
FROM (
     SELECT
     TOP 100 PERCENT
     ii.iBucket_ID,
     MIN(ii.iID) Bucket_Range_From,
     MAX(ii.iID) Bucket_Range_To,
     COUNT(ii.iID) Bucket_Row_Count,
     SUM(ii.SumSalesAmount) Bucket_Sales_Amount
     FROM
     CTE ii
     GROUP BY ii.iBucket_ID
     ORDER BY ii.iBucket_ID
     ) i
ORDER BY iBucket_ID;

The final result is as expected ( check the Bucket_Row_Count column ) and can look like this:

histogram2

 

Compare multiple DB schemas

Sometimes you may need to do a quick examination on how are the DBs, that are supposed to have the same schema, different. This quick query allows you to seek for the outlying and the intersecting columns across multiple databases. If you are looking only for the outlying columns and in which DBs they are, the code below is to be used. If you are looking for the intersection of multiple DBs, just switch from HAVING COUNT(0) <> to HAVING COUNT(0) = and remove the list of databases that the columns are in..

In my example, I created 3 DBs Test 1-3 with some differences between the schemas. The key determining a unique column is defined in the Compound_key column.

schema_compare

IF OBJECT_ID('tempdb..#DB_Schema_Compare') IS NOT NULL
DROP TABLE #DB_Schema_Compare;
CREATE TABLE #DB_Schema_Compare
( [DB_Name] NVARCHAR(100),
[Compound_Key] NVARCHAR(Max)
)

USE Test3;

INSERT INTO #DB_Schema_Compare
SELECT
'Test3',
--s.name SchemaName,
--t.name TableName,
--c.name ColumnName,
--c.is_computed ColumnIsComputed,
--c.is_identity ColumnIsIdentity,
--c.is_nullable ColumnIsNullable,
--c.collation_name ColumnCollationName,
--c.max_length ColumnMaxLength,
CAST(
CAST(s.name AS VARCHAR(200)) + '^' +
CAST(t.name AS VARCHAR(200)) + '^' +
CAST(c.name AS VARCHAR(200)) + '^' +
CAST(c.is_computed AS VARCHAR(1)) + '^' +
CAST(c.is_identity AS VARCHAR(1)) + '^' +
CAST(c.is_nullable AS VARCHAR(1)) + '^' +
ISNULL(CAST(c.collation_name AS VARCHAR(200)) + '^','^') +
CAST(c.max_length AS VARCHAR(10)) + '^' +
CAST(ty.name AS VARCHAR(200))
AS VARCHAR(MAX)) CompoundKey
FROM sys.columns c
inner join sys.tables t on c.object_id = t.object_id
inner join sys.schemas s on s.schema_id = t.schema_id
inner join sys.types ty on ty.system_type_id = c.system_type_id;

SELECT
i.*,
STUFF(( SELECT ',' + SUB.[DB_Name] AS [text()]
FROM
#DB_Schema_Compare SUB
WHERE
SUB.Compound_Key = i.Compound_Key
ORDER BY SUB.[DB_Name]
FOR XML PATH('')
), 1, 1, '' )
AS [DBs_we_HAVE_this_compound_key_in]
FROM (
     SELECT
     COUNT(0) CNT,
     Compound_Key
     FROM
     #DB_Schema_Compare
     GROUP BY [Compound_Key]
     HAVING COUNT(0) <> (SELECT COUNT(DISTINCT [DB_Name]) FROM #DB_Schema_Compare)
) i;

Waterfall chart in Excel 2013 and older

Excel 2016 delivers a waterfall chart type, but what about older versions? This is a neat and nice workaround I actually learned from the business folks. The yellow values are coming from a data model or any other data source except for the Measure Bridge End, which is a SUM of the cell C11 and the range D12:D16 . The rest of the values is calculated as shown in the comments. Hopefully all the steps taken are understandable from the full-size image located here.

capture

TSQL Large data loads split by a declared batch size

A couple of days back, I was asked how would I use SQL grouping functions to split huge data load into separate batches. Below is the code I came up with. The next logical step would be to load the statements into a temp table, iterate through it and execute the statements with sp_executesql. It is needed to be said, that if you have big gaps of missing IDs in the PK you are scanning, this might not be the best and most accurate solution.


USE [AdventureWorks2012];
GO

DECLARE @BATCHSIZE INT = 10000;

/*CHECK THE MIN AND MAX IDs IN THE SOURCE TABLE*/
SELECT MIN(SalesOrderID) MinID,MAX(SalesOrderID) MaxID FROM [Sales].[SalesOrderHeader];
/**/

SELECT
--i.PartitionID,
--MIN(i.SalesOrderID) MinID,
--MAX(i.SalesOrderID) MaxID,
'INSERT INTO TARGET(..) SELECT COL1,COL2,.. FROM SOURCE WITH (NOLOCK) WHERE SalesOrderID BETWEEN '+
CAST(MIN(i.SalesOrderID) AS VARCHAR(10)) + ' AND ' +
CAST(MAX(i.SalesOrderID) AS VARCHAR(10)) + '; '
AS STMT
FROM
(
	SELECT
	SalesOrderID,
	SalesOrderID / @BATCHSIZE PartitionID
	FROM [Sales].[SalesOrderHeader] WITH (NOLOCK)
) i
GROUP BY i.PartitionID
--ORDER BY i.PartitionID;

Safe and predictable MDX queries for PowerPivot data loads

I have spent most of the last month digging deep and developing in PowerQuery and PowerPivot. I must say that I am impressed and there is still a lot more for me to learn. However I am not that much impressed about the documentation available online as there have been some issues I could not find an useful answer to.

Today I would like to share one quick tip related to querying OLAP database from PowerQuery or directly from PowerPivot. Imagine that you would place in the data loader section of a PowerQuery or PowerPivot a really basic MDX query looking like this:

Query 1:

SELECT
[Measures].[Internet Order Count] ON 0,
NONEMPTY([Customer].[Customer].[Customer],[Measures].[Customer Count]) ON 1
FROM
(SELECT
[Sales Territory].[Sales Territory].[Group].&[Europe] ON 0
FROM
(SELECT [Reseller].[Business Type].&[Specialty Bike Shop] ON 0
FROM
[Adventure Works])
)
WHERE [Product].[Category].&[40]

The results are:

Capture2

It could easily happen, that if you choose WHERE condition returning no rows, in this case determined by the non-existing Product Category, (or if your user would not have sufficient security rights under the OLAP role ), that the PowerPivot model could become corrupted and loose a relationship related to one of the (not)returned columns. You can rewrite the mentioned Query 1 in a much better way to ensure, that you always get the needed columns no matter how many rows are retrieved.

Query 2:

WITH
MEMBER [iCustomerName] AS [Customer].[Customer].CURRENTMEMBER.NAME
MEMBER [iCustomerKey] AS [Customer].[Customer].CURRENTMEMBER.Properties( "Key(0)" )

SELECT
{[iCustomerKey],[iCustomerName],[Measures].[Internet Order Count]} ON 0,
NONEMPTY([Customer].[Customer].[Customer],[Measures].[Customer Count]) ON 1
FROM
(SELECT
[Sales Territory].[Sales Territory].[Group].&[Europe] ON 0
FROM
(SELECT [Reseller].[Business Type].&[Specialty Bike Shop] ON 0
FROM
[Adventure Works])
)
WHERE [Product].[Category].&[40]
The results are:
Capture1
As you can see from the next¬†picture, for Query 1 we have no Customer Name column in the¬†model, for Query 2 we are just fine. I must admit that Customer Name is not a great example since we have the Customer Key returned anyway, but I guess¬†you can see the point where I’m heading here. To stay on the safe side, write bullet-proof MDX queries!
 tabular1

Some thoughts on running PowerQuery in 32bit Excel 2013/2016 while connecting to XLS files

I really do like PowerQuery for the ease of use, its powerfulness and the self-explanatory ETL code in M. However I must say, that although Microsoft claims its a Self Service BI tool, things can get quite complex, especially if your clients have 32 bit Excel installed on their machines. So quite often the issues come up, when the clients try to connect via PowerQuery to huge local Excel data source files and at some point hit the out of memory error message. Now in 32bit Excel, you need to watch the virtual allocated memory in Process Explorer application. This useful application can be downloaded from here. You need to catch the Excel process and in the properties, you can easily find this value. No sense watching Excel memory claims from the Windows built-in Task Manager app. Quite often you can see, that Excel committed memory charge is around 500 MB, however you are receiving the out of memory message, because virtual memory allocated at this point can be around 1.95 GB, which I consider the point, where you can be certain that Excel will crash ( The point of no return ). You might be lucky and the virtual memory allocation might rise up to around 2,1 GB, but you are really surfing the threshold here. Very likely you will not be able to save the file at this point as well. So the recommendations based upon my personal experience are:

  • Whenever possible, try to¬†load data from .csv instead of .xls files in case you need to load local data
  • Load only the columns you are sure you will need in your data model, no sense loading for example DWH system columns like GUIDs, PKs, System¬†Dates etc..
  • Set the PowerQuery current workbook settings in a lightweight manner¬†so that¬†it does not automatically detect column data types, does not automatically create relationships between tables, does not update relationships when refreshing queries loaded to the data model, when possible,¬†ignore the privacy levels
  • Track the virtual allocated memory in the Process Explorer application while adding each¬†query. You might find a¬†query loading just a few rows that causes high virtual memory allocation for no¬†obvious¬†reason, and that’s the perfect time to start tuning the query steps one after another. (¬†Unfortunately¬†I¬†have not found so far a way, how to do some more accurate query performance tracking when the queries are loading data from local Excel file source, however when loading data from different¬†data sources, you have a pretty decent¬†option setting the PowerQuery tracking ON and then you can load the log text file in another PowerQuery¬†and¬†set¬†some basic transformations on the file to get the specific query duration etc.¬†).
  • When not needed, disable loading query results to a Worksheet
  • Try avoiding chaotic and memory consuming steps like adding columns, changing their datatypes and then removing them at the end of the query flow
  • Disable COM Excel add-ins because of their memory consumption
  • When nothing else seems to work, you can try downloading and installing Large Address Aware capability change for Excel from here¬†, this KB released in May 2016 raises the 32 bit Excel memory limit¬†from 2 to 3 GB

And that’s all I can think of at the moment.¬†Don’t forget , that you should reserve at least 500 MB for PowerPivot if you plan to load the data into the data¬†model.¬†If I come across any more recommendations, I will share them here, however combining these steps served me¬†pretty well.

sp_testlinkedserver in a try…catch block

Sometimes you definitely need to go with quick workarounds. I am pretty much sure I am not the only BI developer working from time to time with legacy and somehow wacky old¬†code used for production purposes. This time I came across a legacy scheduled stored procedure filling a dataset for SSRS reporting purposes calling openrowset to run MDX query against an OLAP cube but the linked server was failing from time to time because of the weak connections. Whenever the linked server call would fail, there would be simply¬†no reporting as the MDX results were later used in an INNER JOIN ūüôā . I kind of wonder what did the people writing this code thought back then. Anyway I needed this SP to stop failing and to have results even if the linked server connection would fail and to be informed that the linked server call failed so I could react and persist the results from the last successful run.

The solution is quite easy and so far seems bullet proof. Lets use this sample MDX code enveloped in an openrowset for example:

SELECT a.*
FROM OpenRowset( 'MSOLAP','DATASOURCE=myOlapServer; Initial Catalog=FoodMart;',
'SELECT Measures.members ON ROWS,
[Product Category].members ON COLUMNS
FROM [Sales]')
as a
GO

So the trick is to add this chunk of code after sp_testlinkedserver which tests if we are able to connect to the specified linked server and we need to run this together in the try block. Also we might want to set the variable @err to know that an error happened. The code could look something like this:


DECLARE @err BIT = 0;

BEGIN TRY
EXEC sp_testlinkedserver N'MSOLAP';

SELECT a.*
INTO #results
FROM OpenRowset( 'MSOLAP','DATASOURCE=myOlapServer; Initial Catalog=FoodMart;',
'SELECT Measures.members ON ROWS,
[Product Category].members ON COLUMNS
FROM [Sales]' )
as a;

END TRY

BEGIN CATCH
SET @err = 1;
END CATCH

IF @err = 1
BEGIN
--FOR EXAMPLE USE AND PERSIST RESULTS FROM THE LAST TIME THE SP WAS SUCCESSFULLY EXECUTED..
END

Disclaimer: This is just a quick fix tutorial, I definitely agree this is not the best example of using
the try…catch block. Details on sp_testlinkedserver can be easily found here.