I want to very quickly count the number of lines in text files without having
to read each line and increment a counter. I am working in VB.NET and C#.
Does anyone have a very fast example on how to do this?
Thanks,
Matt 14 23135
Mesterak,
In different test in these newsgroups have showed that just looping through
the file using the string as a Char array (not testing on a string however
testing on a char) and testing on the linebreack char is mostly the fastest
method.
I hope this helps,
Cor
Can you provide a code example else point me to the relevant posts?
"Cor Ligthert [MVP]" wrote: Mesterak,
In different test in these newsgroups have showed that just looping through the file using the string as a Char array (not testing on a string however testing on a char) and testing on the linebreack char is mostly the fastest method.
I hope this helps,
Cor
Maybe using regular expression can be fast solution ( for large text
files ).
You will count matches for \r\n or \n
--
Vadym Stetsyak aka Vadmyst http://vadmyst.blogspot.com
"mesterak" <me******@discussions.microsoft.com> wrote in message
news:C9**********************************@microsof t.com... I want to very quickly count the number of lines in text files without having to read each line and increment a counter. I am working in VB.NET and C#. Does anyone have a very fast example on how to do this?
Thanks,
Matt
I tried the following which did not seem to work:
strContents = Regex.Replace(strContents, "\r{0,}\n+", vbCrLf)
myArrayList.AddRange(strContents.Split(CType(vbCrL f, Char)))
"Vadym Stetsyak" wrote: Maybe using regular expression can be fast solution ( for large text files ). You will count matches for \r\n or \n
-- Vadym Stetsyak aka Vadmyst http://vadmyst.blogspot.com
"mesterak" <me******@discussions.microsoft.com> wrote in message news:C9**********************************@microsof t.com...I want to very quickly count the number of lines in text files without having to read each line and increment a counter. I am working in VB.NET and C#. Does anyone have a very fast example on how to do this?
Thanks,
Matt
Vadym Stetsyak <va*****@ukr.net> wrote: Maybe using regular expression can be fast solution ( for large text files ).
That's very unlikely, IMO.
You will count matches for \r\n or \n
And how will you provide the text for the regular expression to match?
As far as I'm aware, you can't provide regular expressions with
TextReaders - you have to provide them with strings.
--
Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mesterak,
In those messages I show show you is using the split and the regex the
farmost slowest method to count lines.
Cor
So how can I count the lines of the file without loading the whole file into
memory as a string and counting lines?
"Jon Skeet [C# MVP]" wrote: Vadym Stetsyak <va*****@ukr.net> wrote: Maybe using regular expression can be fast solution ( for large text files ).
That's very unlikely, IMO.
You will count matches for \r\n or \n
And how will you provide the text for the regular expression to match? As far as I'm aware, you can't provide regular expressions with TextReaders - you have to provide them with strings.
-- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too
mesterak <me******@discussions.microsoft.com> wrote: So how can I count the lines of the file without loading the whole file into memory as a string and counting lines?
By reading chunks at a time (using StreamReader) and counting '\n'
occurrences.
Here's some sample code:
using System;
using System.IO;
class Test
{
static int CountLines (TextReader reader)
{
char[] buffer = new char[32*1024]; // Read 32K chars at a time
int total=1; // All files have at least one line!
int read;
while ( (read=reader.Read(buffer, 0, buffer.Length)) > 0)
{
for (int i=0; i < read; i++)
{
if (buffer[i]=='\n')
{
total++;
}
}
}
return total;
}
static void Main(string[] args)
{
foreach (string file in args)
{
using (StreamReader reader = new StreamReader(file))
{
Console.WriteLine ("{0}: {1} lines", file,
CountLines(reader));
}
}
}
}
--
Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Thanks, that works perfectly!!!
I wrote the following which apparently works but does require that the
entire file be read into memory (your code is better):
Public Function GetLineCount(ByVal FileName As String) As Integer
If File.Exists(FileName) Then
Dim LogReader As StreamReader
LogReader = New StreamReader(New FileStream(FileName,
FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
Dim strContents As String = LogReader.ReadToEnd
LogReader.Close()
LogReader = Nothing
Dim r As New Regex(Chr(10))
Dim LineCount As Integer = r.Matches(strContents).Count
r = Nothing
Return LineCount
End If
End Function
"Jon Skeet [C# MVP]" wrote: mesterak <me******@discussions.microsoft.com> wrote: So how can I count the lines of the file without loading the whole file into memory as a string and counting lines?
By reading chunks at a time (using StreamReader) and counting '\n' occurrences.
Here's some sample code:
using System; using System.IO;
class Test { static int CountLines (TextReader reader) { char[] buffer = new char[32*1024]; // Read 32K chars at a time
int total=1; // All files have at least one line!
int read; while ( (read=reader.Read(buffer, 0, buffer.Length)) > 0) { for (int i=0; i < read; i++) { if (buffer[i]=='\n') { total++; } } } return total; }
static void Main(string[] args) { foreach (string file in args) { using (StreamReader reader = new StreamReader(file)) { Console.WriteLine ("{0}: {1} lines", file, CountLines(reader)); } } } }
-- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too
Ok, I used your baseline code to rewrite my VB.NET function. It is very fast
and efficient. The only thing I needed to added was a check to see if the
last character was a LF and increment the total if not; I get the correct
number of lines every time! Processing ~200MB of log files (209 files)
occurs extremely fast (only added 2 seconds overall to the date/time indexing
functions I was already performing.)
Thanks a million!!!
Here's my new VB.NET function to benefit anyone else needing to count lines
in a file in VB.NET:
Public Function GetLineCount(ByVal FileName As String) As Integer
Dim total As Integer = 0
If File.Exists(FileName) Then
Dim buffer(32 * 1024) As Char
Dim i As Integer
Dim read As Integer
Dim reader As TextReader = File.OpenText(FileName)
read = reader.Read(buffer, 0, buffer.Length)
While (read > 0)
i = 0
While i < read
If buffer(i) = Chr(10) Then
total += 1
End If
i += 1
End While
read = reader.Read(buffer, 0, buffer.Length)
End While
reader.Close()
reader = Nothing
If Not buffer(i - 1) = Chr(10) Then
total += 1
End If
End If
Return total
End Function
"Jon Skeet [C# MVP]" wrote: mesterak <me******@discussions.microsoft.com> wrote: So how can I count the lines of the file without loading the whole file into memory as a string and counting lines?
By reading chunks at a time (using StreamReader) and counting '\n' occurrences.
Here's some sample code:
using System; using System.IO;
class Test { static int CountLines (TextReader reader) { char[] buffer = new char[32*1024]; // Read 32K chars at a time
int total=1; // All files have at least one line!
int read; while ( (read=reader.Read(buffer, 0, buffer.Length)) > 0) { for (int i=0; i < read; i++) { if (buffer[i]=='\n') { total++; } } } return total; }
static void Main(string[] args) { foreach (string file in args) { using (StreamReader reader = new StreamReader(file)) { Console.WriteLine ("{0}: {1} lines", file, CountLines(reader)); } } } }
-- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too
Jon,
While I saw you in past forever telling about multithreading, is this in my
opinion a perfect situations for multithreading.
An IO operation has forever (IO) stops in it and is therefore perfectly to
paralyse with the counting thread.
Just my opinion.
Cor So how can I count the lines of the file without loading the whole file into memory as a string and counting lines?
By reading chunks at a time (using StreamReader) and counting '\n' occurrences.
Here's some sample code:
using System; using System.IO;
class Test { static int CountLines (TextReader reader) { char[] buffer = new char[32*1024]; // Read 32K chars at a time
int total=1; // All files have at least one line!
int read; while ( (read=reader.Read(buffer, 0, buffer.Length)) > 0) { for (int i=0; i < read; i++) { if (buffer[i]=='\n') { total++; } } } return total; }
static void Main(string[] args) { foreach (string file in args) { using (StreamReader reader = new StreamReader(file)) { Console.WriteLine ("{0}: {1} lines", file, CountLines(reader)); } } } }
-- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too
Cor Ligthert [MVP] <no************@planet.nl> wrote: While I saw you in past forever telling about multithreading, is this in my opinion a perfect situations for multithreading.
An IO operation has forever (IO) stops in it and is therefore perfectly to paralyse with the counting thread.
It's certainly *possible* that it would speed things up. I wouldn't
suggest that it's worth doing unless the performance of doing it in a
single thread is a problem though. Assuming the IO performance
dominates the time taken, you'd only be able to shave off the time
taken for the scanning, which I suspect would be absolutely minute.
Compare this with the development cost/risk of turning a simple bit of
single-threaded code into multi-threaded code, and I'd certainly need
to see concrete figures before taking that risk.
--
Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
The 2 VB.NET functions I created based on your code example are pretty darn
fast. I counted a total of several million lines across about 200+ files in
a matter of a few seconds. If someone has issues with this speed to require
multi-threading, then something's just wrong!
However, one of my new line counting functions is used in a separate thread
after my app initially counts the lines and partially indexes the files'
entries by date/time (to get a time reference per file so I only parse parts
of files applicable to the date/time window of interest.) The line counter
that runs in a separate thread goes back over all of the files and determines
the actual byte position per chr(10) detected. This enables the user of my
log viewer to quickly jump to a particular line and also speeds content
paging (for viewability performance.) So to answer Cor, yes it is good to
use in a separate thread when there are extended purposes at play which you
may not want your app (or user) to wait on to complete.
-Matt
"Jon Skeet [C# MVP]" wrote: Cor Ligthert [MVP] <no************@planet.nl> wrote: While I saw you in past forever telling about multithreading, is this in my opinion a perfect situations for multithreading.
An IO operation has forever (IO) stops in it and is therefore perfectly to paralyse with the counting thread.
It's certainly *possible* that it would speed things up. I wouldn't suggest that it's worth doing unless the performance of doing it in a single thread is a problem though. Assuming the IO performance dominates the time taken, you'd only be able to shave off the time taken for the scanning, which I suspect would be absolutely minute. Compare this with the development cost/risk of turning a simple bit of single-threaded code into multi-threaded code, and I'd certainly need to see concrete figures before taking that risk.
-- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Ling Lee |
last post by:
Hi all.
I'm trying to write a program that:
1) Ask me what file I want to count number of lines in, and then counts the
lines and writes the answear out.
2) I made the first part like this:
...
|
by: GregM |
last post by:
Hi
First off I'm not using anything from Twisted. I just liked the subject
line :)
The folks of this list have been most helpful before and I'm hoping
that you'll take pity on a the dazed and...
|
by: Julie |
last post by:
What is the *fastest* way in .NET to search large on-disk text files (100+ MB)
for a given string.
The files are unindexed and unsorted, and for the purposes of my immediate
requirements, can't...
|
by: javanooby |
last post by:
Hi, I am having problems with this bit of code:
public class main
{
public class readAccounts
{
reader1 r = new reader1();
|
by: shivam001 |
last post by:
I have the following file as the input
APPLE 0 118 1 110 1 125 1 135 2 110 3 107 3 115 3 126
ORANGE 0 112 1 119 2 109 2 119 3 112 4 109 4 128
MANGO 0 136 1 143 2 143 3 143 4 136
BANANA 0 5 1...
|
by: waynejr25 |
last post by:
can anyone help me add a function that will count the occurance of each word in an input file. here's the code i have so far it counts the number of characters, words, and lines but i need the...
|
by: Tubular Technician |
last post by:
Hello, World!
Reading this group for some time I came to the conclusion that
people here are split into several fractions regarding size_t,
including, but not limited to,
* size_t is the...
|
by: jim |
last post by:
In a thread about wrapping .Net applications using Thinstall and Xenocode,
it was pointed out that there may be better programming languages/IDEs to
use for the purpose of creating standalone,...
|
by: Clinto |
last post by:
Hi,
I am trying to find the fastest way to search a txt file for a
particular string and return the line that contains the string. I have
so for just used the most basic method. Initialized a...
|
by: DolphinDB |
last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation.
Take...
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: Vimpel783 |
last post by:
Hello!
Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
|
by: jfyes |
last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
|
by: CloudSolutions |
last post by:
Introduction:
For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
|
by: Shællîpôpï 09 |
last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
|
by: af34tf |
last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
|
by: Faith0G |
last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
| |