Split a large CSV into smaller CSVs

Mon 29 December 2008 by Thejaswi Puthraya

Split a large CSV file into smaller CSVs

At work, we sometimes have to deal with huge CSV files and it is difficult to open these files in memory because there is a limit of the buffer size.

So the best way would be to split the file into smaller chunks so that they can be easily read.

So here is the bash script that does the job (split_csv):

#!/bin/bash

num_lines=$1
num_digits=$2
input_file=$3
output_pattern=$4

split -d -l $num_lines -a $num_digits $input_file $output_pattern

i=0
while [ $i -lt $num_digits ]
do
  if [ $i -eq 0 ]
  then
    idx=0
  else
    idx="0$idx"
  fi
  i=$(( i+1 ))
done

first_file="$output_pattern$idx"

header=`head -1 $first_file`

for i in $( ls $output_pattern* )
do
  if [ "$i" != "$first_file" ]
  then
    echo $header | cat - $i > temp_file
    mv temp_file $i
    rm -rf temp_file
  fi
done

A simple usage would be:

split_csv 10000 3 input.csv test-

The usage is similar to that of the split utility.