Saturday, March 21, 2009

Counting lines

Situation: a process is creating a big amount of files (ASCII) with quite a big amount of lines in each. You need to check how many lines are created in total (all files) at a given time, so you can track progress of the process. Fields in the files are fixed length.

Obvious solution: do something like wc -l *

Problem: for really big files and a lot of them, the time taken by wc is really big and a lot of resources are used for this count. A real time solution cannot be implemented this way.

Smarter solution: find out how many characters make a line in a file, divide the file size by this number and you get number of lines in the file. TO find out total number of lines in all files, just sum together the number of lines for each file.

Implementation: tail -1 | wc -l gives the number of characters in a line; ls -l gives size (in characters) of a file. So,
ll * | awk '{cnt=cnt+$5;}END{printf("%ld",cnt/[chars_per_line]);}'
gives the answer.

No comments:

Post a Comment