I am taking a course to help me learn perl for fun. I have never been good at programming languages at all (as you will see), but I really want to learn this as best I can.
Anyway, I had an assignment in which I was supposed to write a perl/cgi program that would count lines/words in an uploaded file. Well, I wrote it, and it works! Very happy with myself. Here's the code:
Code:
#!/usr/bin/perl
use CGI ':standard';
use CGI::Carp 'fatalsToBrowser';
print_form() unless param;
print_results() if param;
print end_html;
print_results() if param;
print end_html;
sub print_form {
print header;
print start_html(-title=>'Word Counter PERL program', -BGCOLOR=>'#f0f8ff');
print hr;
print h1("Word Counter Program");
print hr;
print start_multipart_form();
print "Please click the Browse button to upload a file of your choice or click in the checkbox below to use the default file, The Raven by Edgar Allen Poe. ";
print br;
print filefield(-name=>'upload', -size=>40);
print br;
print checkbox(-name=>'check here to use The Raven', -value=>'off');
print br;
print br;
print submit(-label=>'Count it!');
print reset;
print endform;
}
sub print_results {
my $file = param('upload');
print header;
print start_html;
print h1("The file you selected has:");
print br;
if (!$file) {
$file="raven.txt";
open(FILE, "$file") || die "Cannt: $!\n";
while (<FILE>) {
$lines{$ARGV}++;
@words = split(/\W+/);
$count{ARGV} += @words;
}
foreach $linecount (sort keys %lines) {
print h2('File Lines'), $lines{$linecount};
}
foreach $wordcount (sort keys %count) {
print h2('File Words'), $count{$wordcount};
}
}
else {
while (<$file>) {
$lines{$ARGV}++;
@words = split(/\W+/);
$count{$ARGV} += @words;
}
foreach $linecount2 (sort keys %lines) {
print h2('File Lines'), $lines{$linecount2};
}
foreach $wordcount2 (sort keys %count) {
print h2('File Word'), $count{$wordcount2};
}
}
print h2('File Name'), $file;
}
You can see it in action at http://www.bartlett-family.net/cgi-bin/counter.cgi if you like. Anyway, now, I have to add some functionality to it and am thoroughly confused and over my head. Basically, I need to re-write this same script so that, not only does it count words, but it must:
Read the given text file one line at a time and builds an associative array where the keys are the words of the file and the values are the number of times each corresponding word occurs in the file.
Print a report which shows each word, how many times it appeared, and the percentage of times the word appeared in the file. This report should be sorted alphabetically by word.
At the end of the above report, print out a summary report listing the number of lines in the file, the number of words in the file, the number of unique words in the file, the word which occurred most often along with it's attributes as described above. One way to help insure this is to make each word 'lower case' (using the lc() method) when comparing, so that the words "The" and "the" will not count as two words.
Now, I must say that at this point, I am way over my head. I have been reading out the wazoo and cannot figure out how to do this at all. Could someone please give me some help? I certainly don't want anyone to DO it for me, but I would really appreciate any help. This is maddening, and it is supposed to be fun for me! Thanks!
Now, I have completely rewritten it based on what I need to get done for this assignment. Could anyone help further? I'm kind of stuck and it's driving me nuts!
The page comes up fine when you open it in a browser, but if click on "Check it out", the browser goes to a server error screen (regardless of whether you select the default file, upload one, or do nothing at all). Looking at my Apache logs, I am shown the the error is a "Premature end of script headers: wordcount2.cgi" error. Can anyone please take a look at my code and tell me what they think might be causing this? I think I've been staring at it for too long, lol.
Code:
#!/usr/bin/perl
use CGI ':standard';
use CGI::Carp 'fatalsToBrowser';
if (!param) {print_form();} #Call Upload form if no file has been submitted
else {word_count();} #Call Word count routine if file has been submitted
#Routine displays HTML form that allows user to upload a text file or select default file to count
sub print_form {
print header;
print start_html(-title=>'Word Counter PERL program', -BGCOLOR=>'#f0f8ff');
print hr;
print h1("Word Counter Program");
print hr;
print start_multipart_form();
print "Please click the Browse button to upload a file of your choice or click in the box below to use the default file, The Raven by Edgar Allen Poe";
print br;
print filefield(-name=>'upload', -size=>40);
print br;
print checkbox(-name=>'check here to use The Raven', -value=>'off');
print br;
print br;
print submit(-label=>'Check it out!');
print reset;
print endform;
}
#Routine calculates the number of words, lines and characters in the user selected file
#Displays HTML table with results of the calculations
sub word_count {
my $char_count = 0; #Declare sub specific variables
my $line_count = 0;
my $word_count = 0;
my @words;
my $file = param('upload'); #Moves uploaded file into memory if entered
if(param('Default')){ #Check is user requested default file
open ($file, 'raven.txt') or die; #Open default text file
}
else {if (!$file) { #If no file was found return message to user
print "No file uploaded. Click the Back button on your browser to try again!",
return;
}
}
while (<$file>) { #Loop steps through text file one line at a time
$line_count++; #increments for each line in file
$char_count += length($_); #Counts number of characters in each line and adds to previous
@words = split; #Splits each line on whitespace and assigns to array
#$word_count += scalar(@words); #Counts elements in array and adds to previous total
foreach (split /\W+/) {
next if ($_ eq /\n/);
$word_count++;
$_ = lc($_);
$count{$_}++;
}
}
$indword_count = scalar(keys(%count));
print "idv words: $indword_count";
print #Displays results of counts
"<h2>Here are your results</h2>",
"<table border = 1 ><tr>",
"<th>File Name</th>",
"<th># of Lines</th>",
"<th># of Words</th>",
"<th># of Characters</th>",
"</tr>",
"<td>$file</td>",
"<td>$line_count</td>",
"<td>$word_count</td>",
"<td>$char_count</td>",
"</tr></table><br>",
"<p><a href='wordcount.cgi'>Return to Main Screen</a><br>";
print #Displays results of counts
"<h2>Here are your results</h2>",
"<table border = 1 ><tr>",
"<th>Word</th>",
"<th># of Occurences</th>",
"<th>% of Occurences</th>",
"</tr>";
foreach $word (sort keys %count) {
$percent = (($count{$word}/$word_count) * 100);
print
"<tr><td>$word</td>",
"<td>$count{$word}</td>",
"<td>";
printf "%2.3f", $percent;
print
"%</td></tr>";
# was seen $count{$word} times.<br>";
}
print "</tr></table></br>";
print "<br><br>";
print #Displays results of counts
"<h2>Here are your results</h2>",
"<table border = 1 ><tr>",
"<th>Word</th>",
"<th># of Occurences</th>",
"<th>% of Occurences</th>",
"</tr>";
foreach $word (sort {$count{$b} <=> $count{$a}} keys %count) {
$percent = (($count{$word}/$word_count) * 100);
print
"<tr><td>$word</td>",
"<td>$count{$word}</td>",
"<td>";
printf "%2.3f", $percent;
print
"%</td></tr>";
# was seen $count{$word} times.<br>";
#print "$word was seen $count{$word} times.<br>";
}
}
The script itself looks syntax-wise, fine. Your problem is, you have it so it there's a param, that it will call the word_count sub routine. However, you lack the proper header output, as you have done correctly in the print_form sub routine if there is no param. Thus, you are not outputting any header and the script will therefore fail to run in a CGI environment. That is the cause for that particular error anyhow. I didn't look at the code other than in regards to that error (as I don't have the time nor desire to, even if it is simple and short), so I can't comment on other aspects or suggest anything, let alone to assume or state you have done something well or incorrectly. I get the impression this is an assignment for the sake of you being taught anyway, so I'll leave it to your instructor. Though I just simply looked for the cause of this error, this doesn't seem bad for someone learning. Good job.
I ended up fixing this yesterday. Although syntactically correct, I had some commas where there should have been semi colons and I wasn't properly calling a variable. Plus, my table was screwed up. All better now. Thanks anyway, man!
Sorry I didn't have the time to help review more of the code. However, it sounds like you got that all figured out before I saw your post anyway. Cheers.