Friday, October 9, 2015

Faster filtering in PowerShell than using Where-Object

In a project I'm using a PowerShell script to read in a lot of .csv files and then do some lookups between these in order to get the wanted output. This all works well, but has been a bit slow lately due to a lot more data in the files.

I narrowed the speed issue down to a few lines where I iterate over some files in order to find a few rows that I need. Basically, the Where-Object cmdlet is the culprit.

A simplified example is below:

$mycsvfile = Import-Csv .\mydata.csv
$dataiwant = $mycsvfile | Where-Object {$_.idno -eq 5}

I had a few similar lines in the script, making the time to select the data add upp to 9 seconds, which was far to long in this case.

I found this post that show a few variants on filtering collections, and by simply using the PS v4 .Where() notation to do the same thing, I could bring this down to a single second.

$mycsvfile = Import-Csv .\mydata.csv
$dataiwant = $mycsvfile.Where({$_.idno -eq 5})

So lesson learned: PowerShell is evolving quickly and what I thought was a nice way to do something might very well be just fine, but there might also be a quicker way just around the corner. In this example, I'm sacrificing the streaming capabilities, but gain a lot of performance, just by changing a few characters.