Excel VBA for pivot table report safe delivery

As mentioned in the previous Excel VBA post, I would like to share my thoughts on safe Excel pivot tables reports delivery. The best I can recommend right now is an Excel file with an action, that on file open event hides all the rows in the pivot tables , so the user cannot see the data being pulled and saved into this sheet by someone else last time. The user has to refresh data, and thats where the provided credentials take place, right? Next thing I always do is, that I put password on the VBA code in the file. You have a tutorial for this step here. That way you ensure the user cannot see the macro VBA code and cannot do anything else then execute.


Private Sub Auto_Open()

Dim pf As PivotField
Dim PT As PivotTable
Dim WS As Worksheet
Dim aLock As Integer

On Error Resume Next

For Each WS In ActiveWorkbook.Worksheets
For Each PT In WS.PivotTables
With PT
.ManualUpdate = True
.EnableDrilldown = False
.EnableFieldList = False
.EnableFieldDialog = False
.EnableWriteback = False

For Each pf In .PivotFields
With pf
.DragToPage = False
.DragToRow = False
.DragToColumn = False
.DragToData = False
.DragToHide = False

End With
Next pf

End With

Next PT
Next WS

End Sub

 

Excel VBA macro for report sheet injection

Today I would like to talk a bit about Excel VBA macros. Recently I needed to come up with a macro for an Excel 2016 report sheet, that would inject this report sheet into every Excel file in the directory. This report sheet obviously contains data connections which are also being distributed to the end users. Regarding distribution purposes, this macro replaces the report sheet injected last time with the current version. ( Using this, you  can easily develop and distribute new versions of the report sheet to your users ). Sounds like some pretty useful stuff doesnt it?

I need to mention, that I am not an dedicated VBA developer, so you might find the attached code a little bit harsh, but it works perfectly. The code was also checked by the Office Code Compatibility Checker tool ( available here ). This is a pretty cool tool from Microsoft ensuring you have your VBA code compatible between MS Office versions and that classical 32/64 bit issue. You just launch this tool and it automatically rewrites and comments your code.

Now lets get to the data connections as there are definitly more options. I have been elaborating with these two: You can build .odc connection in the workbook from the .odc connection on the target machine or you can build the connection string on your own from the VBA code. I chose the second option in the final version, but I leave this up to your considerations and what better suits your environment. You can also see that there is Auto_Open VBA macro code being executed for every file injection. I will focus on this next time, you can just skip this step for now.

I think that I will do more posts on VBA macros in Excel in the future. I would also like to share my experience with the way you can deliver and secure data delivered to your Excel report users using VBA.

Sub Distribute_Sheet()
Application.DisplayAlerts = False
Dim MyFolder As String
Dim SrcWks As Worksheet
Dim Wkb As Workbook
Dim file As Variant
Dim delfile As Object
Set SrcWks = ThisWorkbook.Worksheets("_yourFileName_")
MyFolder = ThisWorkbook.Path & "\"

  file = Dir(MyFolder)
  While (file <> "")
   If InStr(file, "_yourFileName_") = 0 And InStr(file, ".xlsx") > 0 Then

      Set Wkb = Workbooks.Open(Filename:=ThisWorkbook.Path & "\" & file, ReadOnly:=False)

      Application.Run (Auto_Open)

      MsgBox ("Injecting _yourWorksheetName_ sheet with ext.data connection into: " & MyFolder & file)
        For Each delfile In Wkb.Sheets
        If delfile.Name = "_yourWorksheetName_" Then delfile.Delete
        Next delfile

        Dim strConnection As String
        Dim WBC As WorkbookConnection
        For Each WBC In ActiveWorkbook.Connections
          strConnection = WBC.Name
          If InStr(WBC.OLEDBConnection.Connection, "_yourConnOLAPName_") = 0 Then
          With ActiveWorkbook.Connections(strConnection). _
            OLEDBConnection
            .CommandText = .CommandText
            .CommandType = .CommandType
            .Connection = .Connection & ";CustomData=""_yourConnOLAPName_"""
            .RefreshOnFileOpen = True
            .SavePassword = False
            .SourceConnectionFile = ""
            .MaxDrillthroughRecords = 1
            .ServerCredentialsMethod = .ServerCredentialsMethod
            .AlwaysUseConnectionFile = False
            .RetrieveInOfficeUILang = False
          End With
          With ActiveWorkbook.Connections(strConnection)
            .Name = .Name
            .Description = ""
          End With
          ActiveWorkbook.Connections(strConnection).Refresh
          End If

         Next WBC

      SrcWks.Copy Before:=Wkb.Sheets(1)

      Wkb.ShowPivotTableFieldList = False

      Wkb.Close SaveChanges:=True

   End If
   file = Dir
 Wend

Application.DisplayAlerts = True

End Sub

SSRS reports personalisation

I would like to talk here about SSRS reports personalisation. The point is, that there are situations, when you feel you could do a little more to ease the use in some specific reports.

For example you have 20 additional columns and you can show or hide these columns based on what the user chose to see last time. Obviously this solution must also contain a parameter listing all the possible columns called ShowFields. Now we know, that SSRS 2008 lets you receive parameterized reports based on subscription to reports that run on demand. When you subscribe to a report that runs on a snapshot basis, your subscription must use the parameters values defined in the snapshot. But what if you cannot use subscriptions ?

I came across 2 scenarios when report subscription was not possible. One was, that you had an browser based CRM / ERP type application which used embedded SSRS reports running from an Iframe in the app and the second was that you might have had an custom build container-browser for SSRS reports project. Personally I dont feel these options are that rare, as both mentioned were used at my last employer, and the second mentioned, the custom build container for SSRS reports was an front-end to a pretty robust financial Data Warehouse solution.

So what will be the steps in this process?

1.If the user runs the report for the first time, the ShowFields multi-valued report parameter shows all column names for this report, user might change the values chosen in the ShowFields parameter and they are being sent with SSRS global variables Windows login of the user and the report name to the stored procedure ReportColumnParameterCache_DeleteInsert residing at the end of the stored procedure displaying this report dataset

2.This SP deletes the cache from last time and inserts the current values

3.Next time the user runs the report, the ShowFields multi-value parameter default values are being filled from an UDF retrieving cached results for the current user and the current report

4.The process is repeated except that now the parameter contains fields chosen by the user last time

Let me guide you through the steps needed to accomplish this report personalization. Lets start with some DB back-end stuff. I remember when I was developing this solution, I prepared an report -level configuration table rpt.Report holding Report_ID INT set as identity column , ReportName, ReportDescription, IsActive flag and IsVisible flag ( for report listing in the application purposes – this way you could really quickly state that the report is under construction, is obsolete etc. ). The sky is the limit here. The second configuration table rpt.ReportColumn would be used for report column configuration holding ReportColumn_ID INT as Identity, Report_ID from the rpt.Report table, ReportColumnName, IsActive flag,  IsVisible flag for country-specific purposes ( when a column might be never needed for users from some country in a multi-country enviroment ) and IsAdditional flag determining whether this column is an additional or a non-additional column. The next step might be a little bit time consuming as you must fill these tables with values , but you can come up with some clever string concatenation generating insert statements in Excel.

Now you need to create a table holding the report parameter cache. It must be a quick and a narrow one for some solid performance. I called mine rpt.ReportColumnParameterCache and it had just ReportColumnParameterCache_ID as an Identity column, ReportColumn_ID, and the Windows user login UserID named ReportUser NVARCHAR(100) column. MSDN describes UserID as The ID of the user running the report. If you are using Windows Authentication, this value is the domain account of the current user. The value is determined by the Reporting Services security extension, which can use Windows Authentication or custom authentication. 

image001

And what are the next steps? We have to create a stored procedure ( lets name it rpt.ReportColumnParameterCache_DeleteInsert ) which has input parameters from SSRS report parameters UserID, ReportName (global variables) and ColumnNames ( ColumnNames would look like Column1,Column2,Column3 coming from the ShowFields report multi-valued parameter explained in the next paragraph) . Firstly this SP would delete your login-related cache for this report from the last time. I remember doing some basic measurements using time statistics and the scenario with deleting and inserting the cache rows was a little bit faster in my enviroment then any other option available. In the next step, this SP would transform the ReportName to the Report_ID from rpt.Report and then the ColumnNames into ReportColumn_IDs using a table value UDF and an inner join to the rpt.ReportColumn based on ReportColumnName and ReportName. Here are some examples how to implement a custom split-string function or in case you are lucky and already using MSSQL 2016, you can go with the internal SPLIT_STRING function. In the last step, this SP would insert these values into the rpt.ReportColumnParameterCache table. This SP has to be located at the end of the stored procedure used for retrieving the data into the report main dataset.

On the SSRS side, we need to create a report parameter holding multiple string values called ShowFields. This parameter should have available values fulfilled from a query showing available columns for this Report. The default values would be the values you have stored in your ReportColumnParameterCache table for your Windows login and this report, so in this case, it would be the result of a function containing a select, where the ReportUser value equals your Windows login (globals!UserID) and the ReportName (globals!ReportName) equals the value in the rpt.Report table.

The last step is to set the affected columns visibility in the SSRS report table / matrix based on a formula saying something like:

=iif(instr(Join(Parameters!ShowFields.Value,”, “) ,”Column1″) <> 0 and instr(Join(Parameters!ShowFields.Value,”, “) ,”None”) = 0,false,true)

Rebuild / reorganize those indexes

Lately I brought back to life a script that I would like to briefly talk about. It is made of dynamic TSQL code which I find pretty neat and really useful if it preceeds the OLAP processing tasks ( ideally together in a SSIS package with a consecutive statistics update task using sp_updatestats ) .

I am not going to share the whole thing, but I will give some hints how to build this code for yourself.

In the declaration part, you set the variables with the fragmentation thresholds which determine if the index is to be rebuild or just reorganized and the minimal index page count. Just a hint here, we have this currently setup as 14 percent threshold to reorganize, 35 to rebuild and 500 as the minimal page count.

Next step is to create a temp table which will hold the index names and the dynamically generated rebuild / reorganize statements. After this table is filled, you iterate through the table and execute the statement per each row.

The main magic however happens using this query:

SELECT
b.name as key_name
,a.avg_fragmentation_in_percent
,a.page_count, c.name as Tab_name
,d.name as Schema_name
,a.partition_number as Partition_Number
,'ALTER INDEX [' + b.name + '] ON [MyDB].[' + d.name + '].[' + c.name + '] '
+ CASE
WHEN a.avg_fragmentation_in_percent BETWEEN @reorg_pct AND @rebuild_pct
THEN 'REORGANIZE'
WHEN a.avg_fragmentation_in_percent > @rebuild_pct
THEN 'REBUILD'
END
+ CASE
WHEN a.partition_number > 1
THEN ' PARTITION = ' + CAST(a.partition_number AS NVARCHAR)
ELSE ''
END
AS index_maintenance
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL) a
JOIN sys.indexes b ON b.object_id = a.object_id AND b.index_id = a.index_id
JOIN sys.objects c ON c.object_id = b.object_id
JOIN sys.schemas d ON d.schema_id = c.schema_id
WHERE a.index_id > 0 AND c.name NOT LIKE 'sys%' AND a.page_count > @pagecnt
AND a.avg_fragmentation_in_percent > @reorg_pct

Excel Fuzzy matching Add-in

Fuzzy is an adjective and it means “having shapes that do not have clear edges”.

Some time ago, I have been asked by my boss to lookup our company clients in the Panama Papers database extract, recently published on the internet. The challenge seemed pretty clear. Hundreds of thousands of company clients from the region located in CRM MSSQL database table on one side and about 1000+ clients on the Panama Papers side in a .xls file. Problem was, that there were quite many company type abbreviations i.e. Company XXX AG, Company XXX GMBH and you had only a very limited time to come up with some results.

The first thing that came across my mind was a SSIS fuzzy lookup data flow task, but this was a one-off and a in-a-hurry task, so there was no time for setting up a full size SSIS fuzzy matching project.

The second thought I had pointed me to this Excel add-in I am going to write about today.

Here is the link to the Microsoft download site where you can download this add-in.

I am pretty sure, that this is no rocket science and you will find using this tool pretty straightforward, but let me describe my personal experience.

So first, you need to install the setup.exe file and run Excel again and then you need to go into File -> Options -> Add-Ins -> Manage Excel Add-Ins and enable this Add-in. Once you have this Add-in in the tabs section, you are ready for some fuzzy matching magic.

So at this phase you need to copy these 2 datasets into two separate sheets and you also need to create table objects on the datasets. ( Insert -> Table 🙂 )

Next step is to choose the columns to be matched and the number of matched results per 1 row from the left table.

What I suggest is, to start with a small dataset and try to find the perfect Similarity Threshold between this small dataset and the set containing the results we are looking for, in this case the Panama Papers dataset. Once you are happy with the results, you can apply this setup on the whole dataset of your clients. As you will notice, when the matches are being processed, your Excel might freeze or stop responding. No worries, just let it run and keep on waiting for the results!

In my case, threshold 85 was the perfect choice. I got also some false-positive results, but at least I was sure I am not missing out some clients and deleting these results is quite quick anyway.

Please notice, that you have to setup your Windows Locale settings like language and formats and area to US, otherwise this add-in will not deliver any results and Excel will very likely crash.