Split a large CSV into smaller CSVs
Mon 29 December 2008 by Thejaswi PuthrayaSplit a large CSV file into smaller CSVs
At work, we sometimes have to deal with huge CSV files and it is difficult to open these files in memory because there is a limit of the buffer size.
So the best way would be to split the file into smaller chunks so that they can be easily read.
So here is the bash script that does the job (split_csv):
#!/bin/bash
num_lines=$1
num_digits=$2
input_file=$3
output_pattern=$4
split -d -l $num_lines -a $num_digits $input_file $output_pattern
i=0
while [ $i -lt $num_digits ]
do
if [ $i -eq 0 ]
then
idx=0
else
idx="0$idx"
fi
i=$(( i+1 ))
done
first_file="$output_pattern$idx"
header=`head -1 $first_file`
for i in $( ls $output_pattern* )
do
if [ "$i" != "$first_file" ]
then
echo $header | cat - $i > temp_file
mv temp_file $i
rm -rf temp_file
fi
done
A simple usage would be:
split_csv 10000 3 input.csv test-
The usage is similar to that of the split utility.