Split a large CSV into smaller CSVsMon 29 December 2008 by Thejaswi Puthraya
Split a large CSV file into smaller CSVs
At work, we sometimes have to deal with huge CSV files and it is difficult to open these files in memory because there is a limit of the buffer size.
So the best way would be to split the file into smaller chunks so that they can be easily read.
So here is the bash script that does the job (split_csv):
#!/bin/bash num_lines=$1 num_digits=$2 input_file=$3 output_pattern=$4 split -d -l $num_lines -a $num_digits $input_file $output_pattern i=0 while [ $i -lt $num_digits ] do if [ $i -eq 0 ] then idx=0 else idx="0$idx" fi i=$(( i+1 )) done first_file="$output_pattern$idx" header=`head -1 $first_file` for i in $( ls $output_pattern* ) do if [ "$i" != "$first_file" ] then echo $header | cat - $i > temp_file mv temp_file $i rm -rf temp_file fi done
A simple usage would be:
split_csv 10000 3 input.csv test-
The usage is similar to that of the split utility.