Mon, 26 Jan 2009
My CPAN Tidy - Jan 2009
It's been a while since I gave any attention to my CPAN modules but as an
incentive to get more hands on with git I added them to my own gitweb, fixed the
two that were failing tests and tided up some of the complaints from
CPANTS.
I'm sure I've missed something (or got it flat out wrong) but it's nice to have at least a local copy of my modules without any issues remaining. Until someone finds more of course... The first two updates have been sent to CPAN and I'll do the others later in the week if the new ones are declared fine.
Like this post? - Digg Me! | Add to del.icio.us! | reddit this!
Posted: 2009/01/26 20:52 | /perl | Permanent link to this entry | This entry and same date
Sun, 04 Jan 2009
Simple Stemming with Perl
Stemming is the process for reducing inflected (or sometimes derived)
words to their stem, base or root form.
-- Wikipedia article on
Stemming
Ever used a website that allowed you to tag content? Ever ended up accidently using slightly different tags? Something like graphs and graphing or blog and blogs? (I hope so, otherwise it's just me...) To spot some of the more obvious overlaps you can stem each of the words and look for a common base. Where one's found there is the possibility of mistaken duplication. For example if you passed hunts, hunted and hunting through a stemmer each would return 'hunt'. If you want to try for yourself there are online stemmers available.
As a more concrete example let's look at the wonderful service del.icio.us. You upload your own bookmarks, tag them with a number of keywords and can then group, sort and search them by your own defined terms. Except I have a habit of tagging articles about similar topics with nearly, but not quite the same tag.
The perl code below shows how easy it is (using Lingua::Stem from CPAN) to run your own data through a stemmer and look for overlaps. There are implementations in most languages (PyStemmer is also very nice) and the wikipedia article is actually a very easy to follow introduction.
#!/usr/bin/perl -w
use strict;
use warnings;
use Lingua::Stem;
use Net::Delicious;
my $del = Net::Delicious->new(
{
user => "username",
pswd => "password"
}
);
my $stemmer = Lingua::Stem->new( -locale => 'EN-UK' );
my %stems;
for my $tag ( $del->tags() ) {
my $stemmed = $stemmer->stem( $tag->tag );
push( @{ $stems{$stemmed->[0]} }, $tag->tag );
}
for my $stemmed (sort keys %stems ) {
# we only care about base words with more than one tag associated
next unless ( scalar @{ $stems{$stemmed} } > 1);
print "Possible duplicates -\n";
print " -- ";
print join(" : ", @{ $stems{$stemmed} }), "\n";
}
Like this post? - Digg Me! | Add to del.icio.us! | reddit this!
Posted: 2009/01/04 19:32 | /perl | Permanent link to this entry | This entry and same date
Tue, 01 Jan 2008
Perl 5.10 - My Favourite Three Features
Since the release of Perl 5.10 (back on 2007/12/18) there have been a fair
few articles discussing all the shiny new features - including smart matching, a
built-in switch and state variables but my favourite three haven't really
received much coverage. So I'll add to the pile of blog posts.
First up is a tiny (from the outside anyway) change that may have the biggest impact of all the new features on my day to day perl - the display of the actual name of uninitialized variables.
# older perls
$ perl584 -we 'print $foo, "\n";'
Use of uninitialized value in print at -e line 1.
# perl 5.10
$ perl510 -we 'print $foo, "\n";'
Use of uninitialized value $foo in print at -e line 1.
From the perspective of someone who has to spend the occasional afternoon reading Apache errorlogs I really like this one.
Now we move on to stackable file tests; something I was surprised perl couldn't do when I first noticed it was missing years ago -
# older perls
...
if (-s $file && -r _ && -x _) {
print "$file isn't zero length and is +rx\n";
}
...
# perl 5.10
...
if (-s -r -x $file) {
print "$file isn't zero length and is +rx\n";
}
...
Lastly on my little list is named captures - instead of referencing
$1 and $2 etc. you can now assign them names at
the point of capture and then pull the values out of a hash at a later time
-
# requires 5.10 or above. But not 6.
my %date;
my $sample_date = '20071225';
if ( $sample_date =~ /(?<year>\d{4})(?<month>\d{2})(?<day>\d{2})/ ) {
%date = %+;
}
say "The year is $date{'year'}";
While none of these are massive attention grabbing additions like the powerful smart matching, switch statement or say (one of those is not like the others ;)) they help make the day-to-day stuff a little more pleasant.
Bonus feature -
my $x;
my $default = 'foo';
$x = 0;
$x ||= $default;
say "\$x is $x";
$x = 0;
$x //= $default;
say "\$x is $x";
Like this post? - Digg Me! | Add to del.icio.us! | reddit this!
Posted: 2008/01/01 11:59 | /perl | Permanent link to this entry | This entry and same date
Tue, 14 Aug 2007
Adding a fact to Pfacter
While dabbling with Puppet
I've spent a fair amount of time investigating facter, one of the tools
(although puppet uses it as a library) it's built on. While I quite like
the format it uses to define a fact I'm hampered by my lack of ruby
experience; simple things take me longer than they should. So when I
noticed Pfacter while looking
for a module on CPAN recently I thought I'd have a look at how it could be
done in perl.
Firstly I have to mention the install, or rather lack of one. The module doesn't install via CPAN on Linux (and after looking at its CPAN Testers page it seems I'm not the only one with the problem) which makes it a bit of a pain. Once I'd hand wrangled an install I decided to see what it picked up about the system and how to add my own fact.
I can't complain about the number of built in facts, it's pretty
comparable to the original facter (which I've added my own custom facts to
now.) Adding a fact was a little more complex. Once I'd written one (I
cribbed from the existing facts - but with strict and warnings added)
I dropped it in to the correct directory (which I found with
find / -name cfclasses.pm) and ran pfacter; and nothing
happened.
After re-checking the code of the ipv6.pm fact
I found the first major difference between the perl and ruby versions. Ruby
facter loads all the facts in a directory and attempts them all, pfacter
requires you to manually add them to the @modules array before
it'll run them. Once I'd done that and reran pfacter it showed that my
machine is ipv6 enabled.
So the verdict? It could be useful to pull out certain information about the system, I found the perl version easier to extend as it doesn't add its own little language to the mix, but for most people the biggest selling point of facter is its use within puppet - which this version lacks. Still, it was interesting to see another (very) different approach to collecting system information.
Like this post? - Digg Me! | Add to del.icio.us! | reddit this!
Posted: 2007/08/14 19:46 | /perl | Permanent link to this entry | This entry and same date
Fri, 01 Jun 2007
Daemon Percentages - Perl 6 Version
After heading to the Nordic Perl Workshop and watching sessions by Jonathan
Worthington and brian d
foy I decided to have a little play with Perl 6 and see if I could port
my Daemon Percentages script (Perl 5 and
Ruby
versions already exist) to Perl 6.
Thanks to material in the slides from the above sessions and asking
a couple of questions in #perl6 I got a basically working Daemon
percentages Perl 6 script running on my Windows desktop under pugs in a
couple of hours (I had problems finding an example of the substitution). I
managed to kill pugs twice, find some non-implemented parts of the language
(I think) and eventually get output values that matched the other
versions.
While I enjoyed fiddling with the language (and the roles and traits look very interesting) it's still too early for me to invest any real time in the project or the language. It's come a long way in the last couple of years but it's no where near ready for casual users yet. I never really got past wondering if it was me or the implementation that were causing the problems. In most cases it was me.
Notes: while looking around for examples I found that some of Jonathans other Perl 6 slides are well worth a read. The people in the IRC channel were also very helpful.
Like this post? - Digg Me! | Add to del.icio.us! | reddit this!
Posted: 2007/06/01 21:21 | /perl | Permanent link to this entry | This entry and same date
Wed, 28 Mar 2007
Simulating Typing in Perl - Take Two
In my Simulating Typing in Perl post
I included a small chunk of perl for varying the typing speed of a fake
user. While it works it did have some oddities that were noticeable by a
sharp eyed viewer.
Thanks to a pointer from Mark Fowler I've now revised the script slightly and included String::KeyboardDistance. This nifty module knows how far away keys on the keyboard are from each other and so helps to smooth the delays out a little; for example the string 'aaaaa' is now typed much faster than before (because there is no travel involved) where as 'qpqpqpq' will be slower due to the finger movement - although I'm not bothered enough to make repeated sequences faster.
I've also uploaded the revised automatic typing script to UnixDaemon.net
Like this post? - Digg Me! | Add to del.icio.us! | reddit this!
Posted: 2007/03/28 23:02 | /perl | Permanent link to this entry | This entry and same date
Tue, 27 Mar 2007
Simulating Typing in Perl
You'd think it would be easy - have a program type a previously written
program at a human speed (minus the typos). Vim has record and reply
functionality but it's done with typical vim efficiency: yes, instantly.
At EuroOSCON a couple of years ago Damian Conway handed out a
presentation tidbit, he uses the hand_print function from IO::Prompt to make
himself look like a master typist. Well, he could just have been saying
that to make us feel better, maybe he can type that fast... Anyway, I tried
a simple example using the module:
#!/usr/bin/perl
use strict;
use warnings;
use IO::Prompt qw/hand_print/;
hand_print("I am not really typing this...");
It works but the typing speed is so uniform it makes it obvious over past a handful of lines. So I wrote my own that adds a little randomness to the typing speed, it's not pretty, it does what I want and its output is "Out on the big bad web."
#!/usr/bin/perl
use strict;
use warnings;
use Time::HiRes qw(usleep);
$|++;
my $input;
{
local $/ = undef;
$input = <ARGV>;
}
$input =~ s/(.)/sleep_and_show($1)/esg;
sub sleep_and_show {
print $_[0];
usleep int rand(200_000);
}
It's a little more jittery, which is more like my typing, and has the
nice side effect of a pretty looking invocation - ./seditor
file_to_type - which could be a valid command.
Like this post? - Digg Me! | Add to del.icio.us! | reddit this!
Posted: 2007/03/27 19:11 | /perl | Permanent link to this entry | This entry and same date
Wed, 21 Mar 2007
A bigger boy made me do it - Log::Dispatch::Twitter
For reasons that are too dull to post about (yes, even on THIS blog!) I
spent some time today looking at Log::Dispatch. Bob (the afore mentioned bigger
boy) then made^Wsuggested I integrate it with the shining example of wasted
time that is Twitter. So I (not very)
proudly present: Log::Dispatch::Twitter!
Now, where's the build system source code...
Like this post? - Digg Me! | Add to del.icio.us! | reddit this!
Posted: 2007/03/21 17:08 | /perl | Permanent link to this entry | This entry and same date
Sat, 10 Mar 2007
Daemon Logging Percentages and Playing with Ruby Idioms
While digging in to some large log files recently I needed to work out
which daemons were causing the most noise, so I wrote a little perl
script called
daemon_percentages.pl. It was short, ran quickly and did what I wanted.
And then my lunch plans were cancelled due to rain.
With nothing but boredom, a newly compiled version of ruby and the google homepage at my side I decided to write a version in ruby. And then I realised how long it's been since I last looked at ruby. After slightly longer than the perl version took, and with a couple of false starts, I ended up with daemon_percentages.rb.
I had forgotten how much I disliked ri. It feels slower
than perldoc and I find it awkward to use. Then I hit the
lack of a post-increment operator; while I understand the reasons for its
omission I've got used to having it, so that took a couple of minutes to
debug. And then the biggie for me, a lack of hash key autovivification.
I'd forgotten how much of a perlism it is and so I spent a little while looking at different ways to do it (and got some good pointers from Will Jessop). In the end I tried the following:
# option 1
if tally.has_key? daemon
tally[daemon] += 1
else
tally[daemon] = 1
end
# option 2
tally[daemon] = 0 unless tally.has_key? daemon
tally[daemon] += 1
# option 3
tally[daemon] = (tally.has_key? daemon) ? tally[daemon] + 1 : 1
Option 1 felt too long, I didn't like option 2 when I reread it as the code seemed to imply I'd decided something and immediately then changed my mind so I settled on option 3. Although it's a little more complex (and denser) it's such a common thing for me to use I'd rather have it on a single line and gloss over the syntax as it becomes more familiar.
While I had some small teething problems I do like the look of the ruby code and apart from the missing perlisms it felt quite natural to write. I'm not willing to jump ship just yet (CPAN is still too useful) but I think I'll be writing more of my personal tools in ruby.
Like this post? - Digg Me! | Add to del.icio.us! | reddit this!
Posted: 2007/03/10 23:19 | /perl | Permanent link to this entry | This entry and same date
Thu, 23 Nov 2006
Should I Release WebService::Yahoo::SpellingSuggestion?
I wrote the WebService::Yahoo::SpellingSuggestion
perl module for one of my little side projects. It was easy to wrap,
seemed to work fine when I did a few tests by hand and didn't take
very long to be CPANised; I'm trying to stay in good habits and treat
all my internal modules as if they'd be released - I just skimp on the
tests a bit. Which I know is bad.
Unfortunately, while it was fine for the light testing, I wasn't very happy with it once I started to use it for heavier loads. The webservice it calls doesn't seem to handle more than a couple of words at a time and makes it impossible to differentiate between a word that's spelt correctly and one that it doesn't know about (it returns undef on either). So now I have a small dilemma. Do I release it to CPAN, add a note listing what I'm not happy with about the service and mention I'm not using it anymore and let anyone else who wants to use download a copy or should I just leave it, unreleased, in a dark corner of my version control system?
Like this post? - Digg Me! | Add to del.icio.us! | reddit this!
Posted: 2006/11/23 00:59 | /perl | Permanent link to this entry | This entry and same date
Sun, 03 Sep 2006
WWW::Shorten::Smallr on CPAN - Initial Release
I've just uploaded the initial release of WWW::Shorten::Smallr
to CPAN and it should be making its way through the mirrors right about
now.
The module itself is simple, it shrinks the given URL using the http://smallr.com/ web site. I wrote this for two reasons, firstly smallr is the official link shortener of one of the mailing lists I frequent and I wanted it available from the Vim Shortener I wrote. Secondly I wanted to have another play around with Module::Build. Which I understand slightly better now.
Like this post? - Digg Me! | Add to del.icio.us! | reddit this!
Posted: 2006/09/03 23:16 | /perl | Permanent link to this entry | This entry and same date
Wed, 16 Aug 2006
CPAN META.yml to DOAP Converter - Can't Be Bothered
Last year I was quite interested in the Description of a Project (DOAP)
project. I added DOAP files to all my Sourceforge projects, wrote some
little util scripts, contributed DOAP files to a couple of the Free
software projects I use that had asked for them... and then promptly
forgot all about it.
A couple of recent posts about the Python Package index and DOAP interested me enough to dig out one of my half-finished scripts, it's the (very messy) first pass of a CPAN META.yml to DOAP converter for the automatic creation of DOAP files for perl modules. And after a little play I'm giving up. The sheer number of META.yml files that are badly formed, missing certain fields, based on previous versions of the file contents or even just empty files has surprised even the cynic in me. If I was bothered enough to have another go I'd seriously think about a two pass attempt, first the META.yml and then just mine the POD for the missing data. And hope they don't disagree.
The point of this post? CPAN is THE perl killer app but it has so much cruft it's scary. And to remind me to work on things I can actually survive...
Like this post? - Digg Me! | Add to del.icio.us! | reddit this!
Posted: 2006/08/16 19:33 | /perl | Permanent link to this entry | This entry and same date
Wed, 28 Jun 2006
File::Find::Rule::VCS and RCS Directories
The File::Find::Rule::VCS
module excludes certain directories, artifacts from version control
systems, from your File::Find::Rule queries. While it's aware of the big
two (subversion and CVS) today I needed a version that was aware - and can
ignore - RCS directories. So I hacked the module and tada, we now have a
File::Find::Rule::VCS RCS support patch.
I've sent a copy to the module author but I'm putting it here as well in case it gets rejected.
Like this post? - Digg Me! | Add to del.icio.us! | reddit this!
Posted: 2006/06/28 17:04 | /perl | Permanent link to this entry | This entry and same date
Thu, 20 Jan 2005
Threat Warning One Liner
Any attempt at explaining why I wanted to do this will sound odd so for now
I'll just post the one liner...
perl -MLWP::Simple -e 'get("http://www.dhs.gov/") =~ /dhs-advisory-(\w+)\.gif/;print "Threat level is $1!\n";'
This gets the current threat level for the US and prints it to standard out.
Like this post? - Digg Me! | Add to del.icio.us! | reddit this!
Posted: 2005/01/20 21:30 | /perl | Permanent link to this entry | This entry and same date
Mon, 22 Nov 2004
Test::URI -- Running out of things I can't test!
If you are not already subscribed then it may well be worth subscribing to
the CPAN
RSS feed. It's very easy to let little gems like Test::URI slip through.
The downside of course is that I am slowly running out of things I can't test!
Like this post? - Digg Me! | Add to del.icio.us! | reddit this!
Posted: 2004/11/22 22:53 | /perl | Permanent link to this entry | This entry and same date
Wed, 08 Sep 2004
I've added a new entry to my miniprojects page, the getpageranks script (written in Perl) allows you to pass in a file containing URL's, one to a line with whitespace and comments allowed. Each entry in the file will then be checked with Google and the PageRank will be displayed.
Note: If an invalid URL is given the PageRank will be returned as zero, this makes it very difficult to determine which sites are invalid and which are just unpopular. Although you may consider them equivalent ;)
The getpageranks script itself is very short and should be easy to follow. For full usage instructions run the command with either -h or -u.
Like this post? - Digg Me! | Add to del.icio.us! | reddit this!
Posted: 2004/09/08 22:50 | /perl | Permanent link to this entry | This entry and same date
Google::PageRank - About time.
Unless you run IE on Windows
with the Google toolbar installed it's always been difficult to determine
the PageRank of any given URL, while a FireFox/Mozilla extension was created it
was, from my experiences, very flaky. It also required manual use.
I was pleasantly surprised today to see a module called Google::PageRank hit my local CPAN mirror. I've had a quick play and it worked on all my test cases. Tool writers, start your engines!
Like this post? - Digg Me! | Add to del.icio.us! | reddit this!
Posted: 2004/09/08 21:10 | /perl | Permanent link to this entry | This entry and same date

