Wednesday 17 December 2014

How to split large files into number of smaller files

             It is a very good tool that helps the administrators in their difficult times. There some times where it is difficult to open big text or log files. It will take more time to load the file and consume more memory. Also sometimes Administrators might need to share the certain part of file, Instead of sharing a big files in GB, it is always good to share what is required. This 'split' command has various options to customize in splitting the files.

This split actually reads the your file and write into different pieces. The name of the first output file is name with aa appended, and so on lexicographically, up to zz (only ASCII letters are used, a maximum of 676 files).  If no output name is given, x is the default. Also source file will not be deleted after splitting into segments.

SYNTAX :

   split [ option ] [ file name ] [ prefix name ]   



By default it will split the files with 1000 lines if no option is given.

      -l line_count  The input file is split into pieces line_count lines in size.

      -a suffix_length suffix_length letters are used to form the suffix of the output filenames.  This option allows creation of more than 676 output files.  The output file names created cannot exceed the  maximum file name length allowed in the directory containing the files.

      -b n           The input file is split into pieces n bytes in  size.

      -b nk          The input file is split into pieces n x 1024 bytes in size.  No space separates the n from the k.

      -b nm          The input file is split into pieces n x 1048576 bytes in size.  No space separates the n from the  m.
 


  • To split the files with 1000 lines each.
    split myfile.txt   

      In this simple example, Assume the file is 5000 lines, It will split into 5 pieces with each 1000 of lines. New smaller files will be written in the same path with the name xaa, xab, xac, xad, and xae.

  • To split the files into 500 lines pieces
    split -l 500 myfile.txt part        

Suppose if the myfile.txt is having 2000 lines, then the output will be partaa, partab, partac, and partad.

  • To split the file into 100MB pieces
     split -b 100m myfile.txt part     


PRACTICAL:

  • To show how it works in real time I have created a file named bigFile.log with 2000 lines. I am showing how to split this file with each pieces of 250 lines.   
   split -l 250 bigFile.log part   
joe@poc:/applic/oracle/split_blog> ls -s
total 84              82 bigFile.log       2 createFile.sh
joe@poc:/applic/oracle/product/10.1.2/xjjh1/split_blog> ls
bigFile.log    createFile.sh
joe@poc:/applic/oracle/split_blog> split -l 250 bigFile.log part
joe@poc:/applic/oracle/split_blog> ls
bigFile.log    createFile.sh  partaa         partab         partac         partad         partae         partaf         partag         partah
joe@poc:/applic/oracle/split_blog> tail -5 partaa
LINE_NUMBER_246
LINE_NUMBER_247
LINE_NUMBER_248
LINE_NUMBER_249
LINE_NUMBER_250
joe@poc:/applic/oracle/split_blog> tail -5 partad
LINE_NUMBER_996
LINE_NUMBER_997
LINE_NUMBER_998
LINE_NUMBER_999
LINE_NUMBER_1000
oracle@chorst4:/applic/oracle/split_blog> tail -5 partah
LINE_NUMBER_1996
LINE_NUMBER_1997
LINE_NUMBER_1998
LINE_NUMBER_1999
LINE_NUMBER_2000                                                   
  •  This example shows how I split the file(bigFile.log) of 168894B into 15KB file segments.
   split -b 15k bigFile.log segment   
joe@poc:/applic/oracle/split_blog> ll
total 348
168894 Dec 18 14:54 bigFile.log
   267 Dec 18 14:53 createFile.sh
joe@poc:/applic/oracle/split_blog> split -b 15k bigFile.log segment 
joe@poc:/applic/oracle/split_blog> ll
total 678
168894 Dec 18 14:54 bigFile.log
   267 Dec 18 14:53 createFile.sh
 15360 Dec 18 15:02 segmentaa
 15360 Dec 18 15:02 segmentab
 15360 Dec 18 15:02 segmentac
 15360 Dec 18 15:02 segmentad
 15360 Dec 18 15:02 segmentae
 15360 Dec 18 15:02 segmentaf
 15360 Dec 18 15:02 segmentag
 15360 Dec 18 15:02 segmentah
 15360 Dec 18 15:02 segmentai
 15360 Dec 18 15:02 segmentaj
 15294 Dec 18 15:02 segmentak
joe@poc:/applic/oracle/split_blog> tail -5 segmentaa
LINE_NUMBER_962
LINE_NUMBER_963
LINE_NUMBER_964
LINE_NUMBER_965
LINE_NUMBER_966
joe@poc:/applic/oracle/split_blog> tail -5 segmentag
LINE_NUMBER_6385
LINE_NUMBER_6386
LINE_NUMBER_6387
LINE_NUMBER_6388
LINE_NUMBER_6389
joe@poc:/applic/oracle/split_blog> tail -5 segmentak
LINE_NUMBER_9996
LINE_NUMBER_9997
LINE_NUMBER_9998
LINE_NUMBER_9999
LINE_NUMBER_10000

No comments:

Post a Comment