Thursday, July 3, 2014

Slow performance in PowerShell Get-ChildItem over UNC paths

I got a ticket submitted to me that proved to be quite interesting. The basics where that a specific maintenance script were running for ages on a server causing some issues.

The script itself was quite simple, it did a recursive gci on a path and performed an action on each item matching a where clause that identified empty folders. Nothing wrong there. However, even if the specified path contained a massive amount of folders and files, it still shouldn't have been running for days (literally) causing the ticket to be created.

When looking into the matter, I found this blog post on the Windows PowerShell blog that goes into detail why Get-ChildItem has slow performance at times. The main culprit is the .NET APIs that is too chatty and causes a lot of overhead traffic over the network when trying to query for files. This was fixed in PowerShell 3.0 that uses new API:s.

A quick test using a folder with 700 subfolders and a total of 40 000 files where I execute the script line

(gci d:\temp -Recurse | where-object {$_.PSIsContainer}) | Where-object {($_.GetFiles().Count + $_.GetDirectories().Count) -eq 0} | out-file d:\dir.txt

reveals the following execution time numbers:

Using PowerShell 2.0 and accessing the path as an UNC: 100s
Using PowerShell 3.0 and accessing the path as an UNC: 33s
Using PowerShell 2.0 and accessing the path locally: 6s
Using PowerShell 3.0 and accessing the path locally: 5s

In my case, the server was both running PowerShell 2.0 and accessing the path as an UNC causing a major performance hit. The solution is simply to both run the script locally on the server where the task has to be performed as well as upgrading to at least PowerShell version 3.0. As can be seen from my quick test, the best performance gain is clearly to run the script locally giving about 17 times better performance.

While no rocket science that accessing tons of files locally has to be faster then doing it over the network, it is still fairly common to see scripts executed on application servers and performing tasks on file shares.

This also proves that it is important in our business to stay on top of new versions of tools and frameworks in order to catch these types of improvements. I can be certain that there is quite a number of people out there bashing gci heavily for it's slow performance when in fact it has improved a lot between PowerShell 2.0 and 3.0.