473,320 Members | 1,863 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

What is the fastest way to count lines in a text file?

I want to very quickly count the number of lines in text files without having
to read each line and increment a counter. I am working in VB.NET and C#.
Does anyone have a very fast example on how to do this?

Thanks,

Matt
Dec 26 '05 #1
14 23135
Mesterak,

In different test in these newsgroups have showed that just looping through
the file using the string as a Char array (not testing on a string however
testing on a char) and testing on the linebreack char is mostly the fastest
method.

I hope this helps,

Cor
Dec 26 '05 #2
Can you provide a code example else point me to the relevant posts?

"Cor Ligthert [MVP]" wrote:
Mesterak,

In different test in these newsgroups have showed that just looping through
the file using the string as a Char array (not testing on a string however
testing on a char) and testing on the linebreack char is mostly the fastest
method.

I hope this helps,

Cor

Dec 26 '05 #3
Maybe using regular expression can be fast solution ( for large text
files ).
You will count matches for \r\n or \n

--
Vadym Stetsyak aka Vadmyst
http://vadmyst.blogspot.com

"mesterak" <me******@discussions.microsoft.com> wrote in message
news:C9**********************************@microsof t.com...
I want to very quickly count the number of lines in text files without
having
to read each line and increment a counter. I am working in VB.NET and C#.
Does anyone have a very fast example on how to do this?

Thanks,

Matt

Dec 26 '05 #4
Here is one message thread

http://groups.google.com/group/micro...5c33cc87237dbf

Be aware that in this case the samples provided by Jay about the characters
are the fastest and not the VB Find which it is if it is about strings.

I hope this helps,

Cor
Dec 26 '05 #5
I tried the following which did not seem to work:

strContents = Regex.Replace(strContents, "\r{0,}\n+", vbCrLf)
myArrayList.AddRange(strContents.Split(CType(vbCrL f, Char)))
"Vadym Stetsyak" wrote:
Maybe using regular expression can be fast solution ( for large text
files ).
You will count matches for \r\n or \n

--
Vadym Stetsyak aka Vadmyst
http://vadmyst.blogspot.com

"mesterak" <me******@discussions.microsoft.com> wrote in message
news:C9**********************************@microsof t.com...
I want to very quickly count the number of lines in text files without
having
to read each line and increment a counter. I am working in VB.NET and C#.
Does anyone have a very fast example on how to do this?

Thanks,

Matt


Dec 26 '05 #6
Vadym Stetsyak <va*****@ukr.net> wrote:
Maybe using regular expression can be fast solution ( for large text
files ).
That's very unlikely, IMO.
You will count matches for \r\n or \n


And how will you provide the text for the regular expression to match?
As far as I'm aware, you can't provide regular expressions with
TextReaders - you have to provide them with strings.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Dec 26 '05 #7
Mesterak,

In those messages I show show you is using the split and the regex the
farmost slowest method to count lines.

Cor
Dec 26 '05 #8
So how can I count the lines of the file without loading the whole file into
memory as a string and counting lines?

"Jon Skeet [C# MVP]" wrote:
Vadym Stetsyak <va*****@ukr.net> wrote:
Maybe using regular expression can be fast solution ( for large text
files ).


That's very unlikely, IMO.
You will count matches for \r\n or \n


And how will you provide the text for the regular expression to match?
As far as I'm aware, you can't provide regular expressions with
TextReaders - you have to provide them with strings.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Dec 26 '05 #9
mesterak <me******@discussions.microsoft.com> wrote:
So how can I count the lines of the file without loading the whole file into
memory as a string and counting lines?


By reading chunks at a time (using StreamReader) and counting '\n'
occurrences.

Here's some sample code:

using System;
using System.IO;

class Test
{
static int CountLines (TextReader reader)
{
char[] buffer = new char[32*1024]; // Read 32K chars at a time

int total=1; // All files have at least one line!

int read;
while ( (read=reader.Read(buffer, 0, buffer.Length)) > 0)
{
for (int i=0; i < read; i++)
{
if (buffer[i]=='\n')
{
total++;
}
}
}
return total;
}

static void Main(string[] args)
{
foreach (string file in args)
{
using (StreamReader reader = new StreamReader(file))
{
Console.WriteLine ("{0}: {1} lines", file,
CountLines(reader));
}
}
}
}

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Dec 26 '05 #10
Thanks, that works perfectly!!!

I wrote the following which apparently works but does require that the
entire file be read into memory (your code is better):

Public Function GetLineCount(ByVal FileName As String) As Integer

If File.Exists(FileName) Then
Dim LogReader As StreamReader
LogReader = New StreamReader(New FileStream(FileName,
FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
Dim strContents As String = LogReader.ReadToEnd
LogReader.Close()
LogReader = Nothing
Dim r As New Regex(Chr(10))
Dim LineCount As Integer = r.Matches(strContents).Count
r = Nothing
Return LineCount
End If

End Function
"Jon Skeet [C# MVP]" wrote:
mesterak <me******@discussions.microsoft.com> wrote:
So how can I count the lines of the file without loading the whole file into
memory as a string and counting lines?


By reading chunks at a time (using StreamReader) and counting '\n'
occurrences.

Here's some sample code:

using System;
using System.IO;

class Test
{
static int CountLines (TextReader reader)
{
char[] buffer = new char[32*1024]; // Read 32K chars at a time

int total=1; // All files have at least one line!

int read;
while ( (read=reader.Read(buffer, 0, buffer.Length)) > 0)
{
for (int i=0; i < read; i++)
{
if (buffer[i]=='\n')
{
total++;
}
}
}
return total;
}

static void Main(string[] args)
{
foreach (string file in args)
{
using (StreamReader reader = new StreamReader(file))
{
Console.WriteLine ("{0}: {1} lines", file,
CountLines(reader));
}
}
}
}

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Dec 26 '05 #11
Ok, I used your baseline code to rewrite my VB.NET function. It is very fast
and efficient. The only thing I needed to added was a check to see if the
last character was a LF and increment the total if not; I get the correct
number of lines every time! Processing ~200MB of log files (209 files)
occurs extremely fast (only added 2 seconds overall to the date/time indexing
functions I was already performing.)

Thanks a million!!!

Here's my new VB.NET function to benefit anyone else needing to count lines
in a file in VB.NET:

Public Function GetLineCount(ByVal FileName As String) As Integer
Dim total As Integer = 0

If File.Exists(FileName) Then
Dim buffer(32 * 1024) As Char
Dim i As Integer
Dim read As Integer

Dim reader As TextReader = File.OpenText(FileName)
read = reader.Read(buffer, 0, buffer.Length)

While (read > 0)
i = 0
While i < read

If buffer(i) = Chr(10) Then
total += 1
End If

i += 1
End While

read = reader.Read(buffer, 0, buffer.Length)
End While

reader.Close()
reader = Nothing

If Not buffer(i - 1) = Chr(10) Then
total += 1
End If

End If

Return total
End Function

"Jon Skeet [C# MVP]" wrote:
mesterak <me******@discussions.microsoft.com> wrote:
So how can I count the lines of the file without loading the whole file into
memory as a string and counting lines?


By reading chunks at a time (using StreamReader) and counting '\n'
occurrences.

Here's some sample code:

using System;
using System.IO;

class Test
{
static int CountLines (TextReader reader)
{
char[] buffer = new char[32*1024]; // Read 32K chars at a time

int total=1; // All files have at least one line!

int read;
while ( (read=reader.Read(buffer, 0, buffer.Length)) > 0)
{
for (int i=0; i < read; i++)
{
if (buffer[i]=='\n')
{
total++;
}
}
}
return total;
}

static void Main(string[] args)
{
foreach (string file in args)
{
using (StreamReader reader = new StreamReader(file))
{
Console.WriteLine ("{0}: {1} lines", file,
CountLines(reader));
}
}
}
}

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Dec 26 '05 #12
Jon,

While I saw you in past forever telling about multithreading, is this in my
opinion a perfect situations for multithreading.

An IO operation has forever (IO) stops in it and is therefore perfectly to
paralyse with the counting thread.

Just my opinion.

Cor
So how can I count the lines of the file without loading the whole file
into
memory as a string and counting lines?


By reading chunks at a time (using StreamReader) and counting '\n'
occurrences.

Here's some sample code:

using System;
using System.IO;

class Test
{
static int CountLines (TextReader reader)
{
char[] buffer = new char[32*1024]; // Read 32K chars at a time

int total=1; // All files have at least one line!

int read;
while ( (read=reader.Read(buffer, 0, buffer.Length)) > 0)
{
for (int i=0; i < read; i++)
{
if (buffer[i]=='\n')
{
total++;
}
}
}
return total;
}

static void Main(string[] args)
{
foreach (string file in args)
{
using (StreamReader reader = new StreamReader(file))
{
Console.WriteLine ("{0}: {1} lines", file,
CountLines(reader));
}
}
}
}

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Dec 27 '05 #13
Cor Ligthert [MVP] <no************@planet.nl> wrote:
While I saw you in past forever telling about multithreading, is this in my
opinion a perfect situations for multithreading.

An IO operation has forever (IO) stops in it and is therefore perfectly to
paralyse with the counting thread.


It's certainly *possible* that it would speed things up. I wouldn't
suggest that it's worth doing unless the performance of doing it in a
single thread is a problem though. Assuming the IO performance
dominates the time taken, you'd only be able to shave off the time
taken for the scanning, which I suspect would be absolutely minute.
Compare this with the development cost/risk of turning a simple bit of
single-threaded code into multi-threaded code, and I'd certainly need
to see concrete figures before taking that risk.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Dec 27 '05 #14
The 2 VB.NET functions I created based on your code example are pretty darn
fast. I counted a total of several million lines across about 200+ files in
a matter of a few seconds. If someone has issues with this speed to require
multi-threading, then something's just wrong!

However, one of my new line counting functions is used in a separate thread
after my app initially counts the lines and partially indexes the files'
entries by date/time (to get a time reference per file so I only parse parts
of files applicable to the date/time window of interest.) The line counter
that runs in a separate thread goes back over all of the files and determines
the actual byte position per chr(10) detected. This enables the user of my
log viewer to quickly jump to a particular line and also speeds content
paging (for viewability performance.) So to answer Cor, yes it is good to
use in a separate thread when there are extended purposes at play which you
may not want your app (or user) to wait on to complete.

-Matt

"Jon Skeet [C# MVP]" wrote:
Cor Ligthert [MVP] <no************@planet.nl> wrote:
While I saw you in past forever telling about multithreading, is this in my
opinion a perfect situations for multithreading.

An IO operation has forever (IO) stops in it and is therefore perfectly to
paralyse with the counting thread.


It's certainly *possible* that it would speed things up. I wouldn't
suggest that it's worth doing unless the performance of doing it in a
single thread is a problem though. Assuming the IO performance
dominates the time taken, you'd only be able to shave off the time
taken for the scanning, which I suspect would be absolutely minute.
Compare this with the development cost/risk of turning a simple bit of
single-threaded code into multi-threaded code, and I'd certainly need
to see concrete figures before taking that risk.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Dec 29 '05 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

22
by: Ling Lee | last post by:
Hi all. I'm trying to write a program that: 1) Ask me what file I want to count number of lines in, and then counts the lines and writes the answear out. 2) I made the first part like this: ...
2
by: GregM | last post by:
Hi First off I'm not using anything from Twisted. I just liked the subject line :) The folks of this list have been most helpful before and I'm hoping that you'll take pity on a the dazed and...
60
by: Julie | last post by:
What is the *fastest* way in .NET to search large on-disk text files (100+ MB) for a given string. The files are unindexed and unsorted, and for the purposes of my immediate requirements, can't...
3
by: javanooby | last post by:
Hi, I am having problems with this bit of code: public class main { public class readAccounts { reader1 r = new reader1();
8
by: shivam001 | last post by:
I have the following file as the input APPLE 0 118 1 110 1 125 1 135 2 110 3 107 3 115 3 126 ORANGE 0 112 1 119 2 109 2 119 3 112 4 109 4 128 MANGO 0 136 1 143 2 143 3 143 4 136 BANANA 0 5 1...
3
by: waynejr25 | last post by:
can anyone help me add a function that will count the occurance of each word in an input file. here's the code i have so far it counts the number of characters, words, and lines but i need the...
89
by: Tubular Technician | last post by:
Hello, World! Reading this group for some time I came to the conclusion that people here are split into several fractions regarding size_t, including, but not limited to, * size_t is the...
184
by: jim | last post by:
In a thread about wrapping .Net applications using Thinstall and Xenocode, it was pointed out that there may be better programming languages/IDEs to use for the purpose of creating standalone,...
9
by: Clinto | last post by:
Hi, I am trying to find the fastest way to search a txt file for a particular string and return the line that contains the string. I have so for just used the most basic method. Initialized a...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.