I really do like PowerQuery for the ease of use, its powerfulness and the self-explanatory ETL code in M. However I must say, that although Microsoft claims its a Self Service BI tool, things can get quite complex, especially if your clients have 32 bit Excel installed on their machines. So quite often the issues come up, when the clients try to connect via PowerQuery to huge local Excel data source files and at some point hit the out of memory error message. Now in 32bit Excel, you need to watch the virtual allocated memory in Process Explorer application. This useful application can be downloaded from here. You need to catch the Excel process and in the properties, you can easily find this value. No sense watching Excel memory claims from the Windows built-in Task Manager app. Quite often you can see, that Excel committed memory charge is around 500 MB, however you are receiving the out of memory message, because virtual memory allocated at this point can be around 1.95 GB, which I consider the point, where you can be certain that Excel will crash ( The point of no return ). You might be lucky and the virtual memory allocation might rise up to around 2,1 GB, but you are really surfing the threshold here. Very likely you will not be able to save the file at this point as well. So the recommendations based upon my personal experience are:
- Whenever possible, try to load data from .csv instead of .xls files in case you need to load local data
- Load only the columns you are sure you will need in your data model, no sense loading for example DWH system columns like GUIDs, PKs, System Dates etc..
- Set the PowerQuery current workbook settings in a lightweight manner so that it does not automatically detect column data types, does not automatically create relationships between tables, does not update relationships when refreshing queries loaded to the data model, when possible, ignore the privacy levels
- Track the virtual allocated memory in the Process Explorer application while adding each query. You might find a query loading just a few rows that causes high virtual memory allocation for no obvious reason, and that’s the perfect time to start tuning the query steps one after another. ( Unfortunately I have not found so far a way, how to do some more accurate query performance tracking when the queries are loading data from local Excel file source, however when loading data from different data sources, you have a pretty decent option setting the PowerQuery tracking ON and then you can load the log text file in another PowerQuery and set some basic transformations on the file to get the specific query duration etc. ).
- When not needed, disable loading query results to a Worksheet
- Try avoiding chaotic and memory consuming steps like adding columns, changing their datatypes and then removing them at the end of the query flow
- Disable COM Excel add-ins because of their memory consumption
- When nothing else seems to work, you can try downloading and installing Large Address Aware capability change for Excel from here , this KB released in May 2016 raises the 32 bit Excel memory limit from 2 to 3 GB
And that’s all I can think of at the moment. Don’t forget , that you should reserve at least 500 MB for PowerPivot if you plan to load the data into the data model. If I come across any more recommendations, I will share them here, however combining these steps served me pretty well.