[Bio] / Sprout / DupCheck.pl Repository:
ViewVC logotype

View of /Sprout/DupCheck.pl

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.3 - (download) (as text) (annotate)
Tue Feb 5 05:47:32 2008 UTC (11 years, 6 months ago) by parrello
Branch: MAIN
CVS Tags: mgrast_dev_08112011, mgrast_dev_08022011, rast_rel_2014_0912, rast_rel_2008_06_18, rast_rel_2008_06_16, rast_rel_2008_12_18, mgrast_dev_04082011, rast_rel_2008_07_21, rast_rel_2010_0928, rast_2008_0924, mgrast_version_3_2, mgrast_dev_12152011, rast_rel_2008_04_23, mgrast_dev_06072011, rast_rel_2008_09_30, rast_rel_2009_0925, rast_rel_2010_0526, rast_rel_2014_0729, rast_rel_2009_05_18, rast_rel_2010_1206, mgrast_release_3_0, mgrast_dev_03252011, rast_rel_2010_0118, mgrast_rel_2008_0924, mgrast_rel_2008_1110_v2, rast_rel_2009_02_05, rast_rel_2011_0119, mgrast_rel_2008_0625, mgrast_release_3_0_4, mgrast_release_3_0_2, mgrast_release_3_0_3, mgrast_release_3_0_1, mgrast_dev_03312011, mgrast_release_3_1_2, mgrast_release_3_1_1, mgrast_release_3_1_0, mgrast_dev_04132011, rast_rel_2008_10_09, mgrast_dev_04012011, rast_release_2008_09_29, mgrast_rel_2008_0806, mgrast_rel_2008_0923, mgrast_rel_2008_0919, rast_rel_2009_07_09, rast_rel_2010_0827, mgrast_rel_2008_1110, myrast_33, rast_rel_2011_0928, rast_rel_2008_09_29, mgrast_rel_2008_0917, rast_rel_2008_10_29, mgrast_dev_04052011, rast_rel_2009_03_26, mgrast_dev_10262011, rast_rel_2008_11_24, rast_rel_2008_08_07, HEAD
Changes since 1.2: +0 -2 lines
Removed obsolete use clauses.

#!/usr/bin/perl -w

=head1 Duplicate Key Check

C<DupCheck> [I<options>] I<keySize> I<fileName>

Find duplicate keys in a sorted file. The first parameter is the number of fields in
the key, the second is the name of the file to examine. The goal is to be able to
determine exactly where in the file the duplicate keys exist.

The currently-supported command-line options are as follows.

=over 4

=item trace

Numeric trace level. A higher trace level causes more messages to appear. The
default trace level is 2.

=back

=cut

use strict;
use Tracer;
use Cwd;
use File::Copy;
use File::Path;

# Get the command-line options.
my ($options, @parameters) = Tracer::ParseCommand({ trace => 2 }, @ARGV);
# Set up tracing.
my $traceLevel = $options->{trace};
TSetup("$traceLevel errors Tracer DocUtils ERDB", "TEXT");
# Get the parameters.
my ($fldCount, $fileName) = @parameters;
Open(\*INFILE, "<$fileName");
# Get the first line of the file.
my $oldKey = GetKey($fldCount);
# Loop through the file.
my $lineCount = 1;
while (my $line = <INFILE>) {
    # Count this line.
    $lineCount++;
    # Get the current line's key.
    my $key = GetKey($fldCount);
    # Compare it to the old key.
    if ($key eq $oldKey) {
        print "Duplicate key at line $lineCount: $key\n";
    } elsif (lc $key eq lc $oldKey) {
        print "Case-duplicate key at line $lineCount: $key\n";
    }
    $oldKey = $key;
}
print "$lineCount lines read.\n";

# Get the key field of the next record.
sub GetKey {
    my ($fldCount) = @_;
    my $line = <INFILE>;
    my $retVal;
    if ($line) {
        chomp $line;
        my @fields = split /\s*\t\s*/, $line;
        $retVal = @fields[0 ... $fldCount-1];
    }
    return $retVal;
}

1;

MCS Webmaster
ViewVC Help
Powered by ViewVC 1.0.3