PowerShell Data Analysis

Scott Ooi
4 min readJun 4, 2022

--

Screenshot of the PowerShell script being demonstrated in this tutorial from Visual Studio Code.
Escape Excel hell and analyse your CSVs with PowerShell

When was the last time you got something useful out of a 100 000-line CSV file in Excel? Probably never, since it would crash. I am sure there are Excel tools that handle this, but here is an example of how I use PowerShell to get information from large CSV outputs.

Table of Contents

Introduction

Worked Example

Extra Credit

Introduction

Dataset

Download the CSV to follow along:
CISA Known Exploited Vulnerabilities Catalog **
** as of June 2022 the column in the CSV we are interested in is named “vendorProject”

Problem Statement

Our end goal:
How many entries does each vendor have in the CISA Known Exploited Vulnerabilities catalog?

PowerShell Commands Used

How we get there:

Count the number of entries for each vendor in the CISA Known Exploited Vulnerabilities catalog

LINE 1 imports the CSV file as $csv

LINE 2 gets each unique vendor as a string and stores it as $vendor_all

LINE 3 counts the number of times the string is found in the vendorProject column of $csv for each of the entries in $vendor_all

Worked Example

LINE 1 is self-explanatory.

In LINE 2, $csv | select -expand "vendorProject" fetches all the entries in the vendorProject column of the CSV file. However, it selects every entry — we are only interested in each unique entry i.e. only the first occurrence of each vendor, so we pipe it to sort -u .

Note that select is the alias of Select-Object as opposed to Select-String . You can check this with alias select — you can do the same with alias sort to determine its “long-form”. Because PowerShell makes it easy to check aliases, I fall in the camp that prefers it to make scripts less unwieldy.

Finally, we store the output as $vendor_all so we can use it in LINE 3.

LINE 3 is where the fun is.

Because we want to count the number of entries in the catalog for each vendor, let’s begin with a ForEach-Object block — convince yourself that “%” is an alias with alias -Definition ForEach-Object :

%{}

The vendors we are looking for are in $vendor_all so we pipe them in. Within the %{} script block, the $_ variable represents the current object of the loop. Convince yourself by running:

$vendor_all | %{$_}

We will store this current object as the separate variable $vendor as we will be using a second script block within this script block, so $_ will represent a new “current object” instead… Hopefully this will make sense when we reach that bit.

$vendor_all | %{$vendor = $_}

Now that we have got $vendor_all in our script block, we want get $csv in so we have something to match $vendor against. We begin on a new line within the script block:

$vendor_all | %{
$vendor = $_
$csv
}

The column we want to match against is vendorProject, so let’s fetch it. Note

  • that “?” is an alias for Where-Object, and
  • how $_ now represents the current object from $csv instead!
$vendor_all | %{
$vendor = $_
$csv | ?{$_."vendorProject"}
}

And for each new piping of $csv, look for each occurrence of vendorProject that matches the current $vendor string from $vendor_all .

$vendor_all | %{
$vendor = $_
$csv | ?{$_."vendorProject" -eq $vendor}
}

Then count it:

$vendor_all | %{
$vendor = $_
$csv | ?{$_."vendorProject" -eq $vendor} | measure | select Count
}

This outputs just the numbers. So we can actually see which vendor it is in the first place, let’s place that entire expression in (brackets) so it executes first, then print $vendor and a tab character in front of it:

$vendor_all | %{
$vendor = $_
$vendor + "`t" + ( $csv | ?{$_."vendorProject" -eq $vendor} | measure | select Count )
}

Finally, we can turn it into a one liner by using a “;” in place of the new line:

$vendor_all | %{ $vendor = $_ ; $vendor + "`t" + ( $csv | ?{$_."vendorProject" -eq $vendor} | measure | select Count ) }

Hopefully, you get an output similar to below:

Screenshot of PowerShell showing the output of the commands with Microsoft having a Count of 223.
Microsoft… What did you expect?

Extra Credit

To be honest, that LINE 3 command is just the beginning — I have found real value in querying a second column if the first column meets a certain criteria. Take the following Problem Statement for example:

How many products does each vendor have in the CISA Known Exploited Vulnerabilities catalog?

The data of interest is in a different product column depending on what is found in the vendorProject column.

Take a look at the following, the difference being in the LINE 3 script which actually do not use any commands not already covered:

Count the number of products for each vendor in the CISA Known Exploited Vulnerabilities catalog

--

--

Scott Ooi
Scott Ooi

Written by Scott Ooi

Ex-petroleum geologist. Fell in love with Linux and the CLI. Became a sysadmin. Fell in love with information security. Tech keeps me curious, humble, learning.

Responses (1)