Merging and Filtering CSV files using Linux
Posted on Wed 03 February 2021 in Linux
I recently ran into a problem after downloading multiple CSV files. I wanted to firstly merge all those files and then extract/filter data from that merged file. Doing this is possible using code, but Linux has a bunch of powerful tools that can also handle such tasks and can do them very quickly(in comparison to Python).
Let us say we have a bunch of CSV files labelled like so: data_1.csv, data_2.csv and so on
Merging
In order to merge[1] all these CSV files, we can use sed
:
sed 1d data-*.csv > merged.csv
The command collects all the data- files and after merging them, saves them to merged.csv
. From the reference article I learned that if you used *.csv
it may also merge the merged.csv
file and create duplicates and other issues.
Filtering - exact match
In order to find an exact entry in the CSV file, we can use awk
[2]:
awk -F"," '$1==somestring' merged.csv
The ","
tells awk
to view the comma as a separator. The $1
says to look in the first column. somestring
is the exact match we are looking for in column 1. merged.csv
is the file we want to look into for the value.
Filtering - partial strings
My main filtering goal was to use partial strings to find certain values that match the partial string. Again we will use awk
[3]:
awk 'index($1, "partialstring")' merged.csv
Similar to the above, $1
looks in column 1 for the "partialstring"
string. In both the awk
commands you can also save the output to a file like so:
awk 'index($1, "partialstring")' merged.csv > output.txt
I hope these 3 commands help you save a lot of time if you are attempting to do anything with CSV files.
Sources:
1 - https://blog.softhints.com/how-to-merge-multiple-csv-files-with-python/
2 - https://stackoverflow.com/questions/2373885/searching-a-csv-file-using-grep
If you don't know how to use RSS and want email updates on my new content, consider Joining my Newsletter
The original content of this blog is a Waqf solely for the Pleasure of Allah. You are hereby granted full permission to copy, download, distribute, publish and share this content without modification under condition that full attribution is given to this author by creating a link either above or below the content that links back to the original source of the content. For any questions or ambiguity, you are requested to contact me via email for clarification.