UNIXDAEMON Small Mosaic

Categories:

/books
/career
/cloud
/events
/firefox
/geekstuff
/linux
/meta
/misctech
/movies
/nottech
/perl
/programming
/security
/sites
/sysadmin
/tools
/tools/ansible
/tools/commandline
/tools/gui
/tools/network
/tools/online
/tools/puppet
/unixdaemon

Archives:

March 20146
February 20141
January 20145
December 20135
November 20132
October 20134
August 20134
July 20132
June 20131
May 20131
April 20131
March 20131
Full Archives


Mon, 26 Jan 2009

My CPAN Tidy - Jan 2009

It's been a while since I gave any attention to my CPAN modules but as an incentive to get more hands on with git I added them to my own gitweb, fixed the two that were failing tests and tided up some of the complaints from CPANTS.

I'm sure I've missed something (or got it flat out wrong) but it's nice to have at least a local copy of my modules without any issues remaining. Until someone finds more of course... The first two updates have been sent to CPAN and I'll do the others later in the week if the new ones are declared fine.

Posted: 2009/01/26 20:52 | /perl | Permanent link to this entry


Sun, 04 Jan 2009

Simple Stemming with Perl

Stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form.
-- Wikipedia article on Stemming

Ever used a website that allowed you to tag content? Ever ended up accidently using slightly different tags? Something like graphs and graphing or blog and blogs? (I hope so, otherwise it's just me...) To spot some of the more obvious overlaps you can stem each of the words and look for a common base. Where one's found there is the possibility of mistaken duplication. For example if you passed hunts, hunted and hunting through a stemmer each would return 'hunt'. If you want to try for yourself there are online stemmers available.

As a more concrete example let's look at the wonderful service del.icio.us. You upload your own bookmarks, tag them with a number of keywords and can then group, sort and search them by your own defined terms. Except I have a habit of tagging articles about similar topics with nearly, but not quite the same tag.

The perl code below shows how easy it is (using Lingua::Stem from CPAN) to run your own data through a stemmer and look for overlaps. There are implementations in most languages (PyStemmer is also very nice) and the wikipedia article is actually a very easy to follow introduction.


#!/usr/bin/perl -w
use strict;
use warnings;
use Lingua::Stem;
use Net::Delicious;

my $del = Net::Delicious->new(
                               {
                                 user => "username",
                                 pswd => "password"
                               }
                             );

my $stemmer = Lingua::Stem->new( -locale => 'EN-UK' );

my %stems;
for my $tag ( $del->tags() ) {
  my $stemmed = $stemmer->stem( $tag->tag );

  push( @{ $stems{$stemmed->[0]} },  $tag->tag );
}

for my $stemmed (sort keys %stems ) {
  # we only care about base words with more than one tag associated
  next unless ( scalar @{ $stems{$stemmed} } > 1);

  print "Possible duplicates -\n";
  print "  --  ";
  print join(" : ", @{ $stems{$stemmed} }), "\n";
}


Posted: 2009/01/04 19:32 | /perl | Permanent link to this entry


Tue, 01 Jan 2008

Perl 5.10 - My Favourite Three Features

Since the release of Perl 5.10 (back on 2007/12/18) there have been a fair few articles discussing all the shiny new features - including smart matching, a built-in switch and state variables but my favourite three haven't really received much coverage. So I'll add to the pile of blog posts.

First up is a tiny (from the outside anyway) change that may have the biggest impact of all the new features on my day to day perl - the display of the actual name of uninitialized variables.


# older perls
$ perl584 -we 'print $foo, "\n";'
Use of uninitialized value in print at -e line 1.

# perl 5.10
$ perl510 -we 'print $foo, "\n";'
Use of uninitialized value $foo in print at -e line 1.

From the perspective of someone who has to spend the occasional afternoon reading Apache errorlogs I really like this one.

Now we move on to stackable file tests; something I was surprised perl couldn't do when I first noticed it was missing years ago -


# older perls
...
if (-s $file && -r _ && -x _) {
  print "$file isn't zero length and is +rx\n";
}
...

# perl 5.10
...
if (-s -r -x $file) {
  print "$file isn't zero length and is +rx\n";
}
...

Lastly on my little list is named captures - instead of referencing $1 and $2 etc. you can now assign them names at the point of capture and then pull the values out of a hash at a later time -


# requires 5.10 or above. But not 6.

my %date;
my $sample_date = '20071225';

if ( $sample_date =~ /(?<year>\d{4})(?<month>\d{2})(?<day>\d{2})/ ) {
  %date = %+;
}

say "The year is $date{'year'}";

While none of these are massive attention grabbing additions like the powerful smart matching, switch statement or say (one of those is not like the others ;)) they help make the day-to-day stuff a little more pleasant.

Bonus feature -


my $x;
my $default = 'foo';

$x = 0;
$x ||= $default; 

say "\$x is $x";

$x = 0;
$x //= $default;

say "\$x is $x";

Posted: 2008/01/01 11:59 | /perl | Permanent link to this entry


Tue, 14 Aug 2007

Adding a fact to Pfacter

While dabbling with Puppet I've spent a fair amount of time investigating facter, one of the tools (although puppet uses it as a library) it's built on. While I quite like the format it uses to define a fact I'm hampered by my lack of ruby experience; simple things take me longer than they should. So when I noticed Pfacter while looking for a module on CPAN recently I thought I'd have a look at how it could be done in perl.

Firstly I have to mention the install, or rather lack of one. The module doesn't install via CPAN on Linux (and after looking at its CPAN Testers page it seems I'm not the only one with the problem) which makes it a bit of a pain. Once I'd hand wrangled an install I decided to see what it picked up about the system and how to add my own fact.

I can't complain about the number of built in facts, it's pretty comparable to the original facter (which I've added my own custom facts to now.) Adding a fact was a little more complex. Once I'd written one (I cribbed from the existing facts - but with strict and warnings added) I dropped it in to the correct directory (which I found with find / -name cfclasses.pm) and ran pfacter; and nothing happened.

After re-checking the code of the ipv6.pm fact I found the first major difference between the perl and ruby versions. Ruby facter loads all the facts in a directory and attempts them all, pfacter requires you to manually add them to the @modules array before it'll run them. Once I'd done that and reran pfacter it showed that my machine is ipv6 enabled.

So the verdict? It could be useful to pull out certain information about the system, I found the perl version easier to extend as it doesn't add its own little language to the mix, but for most people the biggest selling point of facter is its use within puppet - which this version lacks. Still, it was interesting to see another (very) different approach to collecting system information.

Posted: 2007/08/14 19:46 | /perl | Permanent link to this entry


Fri, 01 Jun 2007

Daemon Percentages - Perl 6 Version

After heading to the Nordic Perl Workshop and watching sessions by Jonathan Worthington and brian d foy I decided to have a little play with Perl 6 and see if I could port my Daemon Percentages script (Perl 5 and Ruby versions already exist) to Perl 6.

Thanks to material in the slides from the above sessions and asking a couple of questions in #perl6 I got a basically working Daemon percentages Perl 6 script running on my Windows desktop under pugs in a couple of hours (I had problems finding an example of the substitution). I managed to kill pugs twice, find some non-implemented parts of the language (I think) and eventually get output values that matched the other versions.

While I enjoyed fiddling with the language (and the roles and traits look very interesting) it's still too early for me to invest any real time in the project or the language. It's come a long way in the last couple of years but it's no where near ready for casual users yet. I never really got past wondering if it was me or the implementation that were causing the problems. In most cases it was me.

Notes: while looking around for examples I found that some of Jonathans other Perl 6 slides are well worth a read. The people in the IRC channel were also very helpful.

Posted: 2007/06/01 21:21 | /perl | Permanent link to this entry


Wed, 28 Mar 2007

Simulating Typing in Perl - Take Two

In my Simulating Typing in Perl post I included a small chunk of perl for varying the typing speed of a fake user. While it works it did have some oddities that were noticeable by a sharp eyed viewer.

Thanks to a pointer from Mark Fowler I've now revised the script slightly and included String::KeyboardDistance. This nifty module knows how far away keys on the keyboard are from each other and so helps to smooth the delays out a little; for example the string 'aaaaa' is now typed much faster than before (because there is no travel involved) where as 'qpqpqpq' will be slower due to the finger movement - although I'm not bothered enough to make repeated sequences faster.

I've also uploaded the revised automatic typing script to UnixDaemon.net

Posted: 2007/03/28 23:02 | /perl | Permanent link to this entry


Tue, 27 Mar 2007

Simulating Typing in Perl

You'd think it would be easy - have a program type a previously written program at a human speed (minus the typos). Vim has record and reply functionality but it's done with typical vim efficiency: yes, instantly.

At EuroOSCON a couple of years ago Damian Conway handed out a presentation tidbit, he uses the hand_print function from IO::Prompt to make himself look like a master typist. Well, he could just have been saying that to make us feel better, maybe he can type that fast... Anyway, I tried a simple example using the module:

  
  #!/usr/bin/perl
  use strict;
  use warnings;
  use IO::Prompt qw/hand_print/;

  hand_print("I am not really typing this...");
  

It works but the typing speed is so uniform it makes it obvious over past a handful of lines. So I wrote my own that adds a little randomness to the typing speed, it's not pretty, it does what I want and its output is "Out on the big bad web."

  
  #!/usr/bin/perl
  use strict;
  use warnings;
  use Time::HiRes qw(usleep);
  $|++;

  my $input;

  {
    local $/ = undef;
    $input = <ARGV>;
  }

  $input =~ s/(.)/sleep_and_show($1)/esg;

  sub sleep_and_show {
    print $_[0];
    usleep int rand(200_000);
  }
  

It's a little more jittery, which is more like my typing, and has the nice side effect of a pretty looking invocation - ./seditor file_to_type - which could be a valid command.

Posted: 2007/03/27 19:11 | /perl | Permanent link to this entry


Wed, 21 Mar 2007

A bigger boy made me do it - Log::Dispatch::Twitter

For reasons that are too dull to post about (yes, even on THIS blog!) I spent some time today looking at Log::Dispatch. Bob (the afore mentioned bigger boy) then made^Wsuggested I integrate it with the shining example of wasted time that is Twitter. So I (not very) proudly present: Log::Dispatch::Twitter!

Now, where's the build system source code...

Posted: 2007/03/21 17:08 | /perl | Permanent link to this entry


Sat, 10 Mar 2007

Daemon Logging Percentages and Playing with Ruby Idioms

While digging in to some large log files recently I needed to work out which daemons were causing the most noise, so I wrote a little perl script called daemon_percentages.pl. It was short, ran quickly and did what I wanted. And then my lunch plans were cancelled due to rain.

With nothing but boredom, a newly compiled version of ruby and the google homepage at my side I decided to write a version in ruby. And then I realised how long it's been since I last looked at ruby. After slightly longer than the perl version took, and with a couple of false starts, I ended up with daemon_percentages.rb.

I had forgotten how much I disliked ri. It feels slower than perldoc and I find it awkward to use. Then I hit the lack of a post-increment operator; while I understand the reasons for its omission I've got used to having it, so that took a couple of minutes to debug. And then the biggie for me, a lack of hash key autovivification.

I'd forgotten how much of a perlism it is and so I spent a little while looking at different ways to do it (and got some good pointers from Will Jessop). In the end I tried the following:

  
  # option 1
  if tally.has_key? daemon
    tally[daemon] += 1
  else
    tally[daemon] = 1
  end

  # option 2
  tally[daemon] = 0 unless tally.has_key? daemon
  tally[daemon] += 1

  # option 3
  tally[daemon] = (tally.has_key? daemon) ? tally[daemon] + 1 : 1
  

Option 1 felt too long, I didn't like option 2 when I reread it as the code seemed to imply I'd decided something and immediately then changed my mind so I settled on option 3. Although it's a little more complex (and denser) it's such a common thing for me to use I'd rather have it on a single line and gloss over the syntax as it becomes more familiar.

While I had some small teething problems I do like the look of the ruby code and apart from the missing perlisms it felt quite natural to write. I'm not willing to jump ship just yet (CPAN is still too useful) but I think I'll be writing more of my personal tools in ruby.

Posted: 2007/03/10 23:19 | /perl | Permanent link to this entry


Thu, 23 Nov 2006

Should I Release WebService::Yahoo::SpellingSuggestion?

I wrote the WebService::Yahoo::SpellingSuggestion perl module for one of my little side projects. It was easy to wrap, seemed to work fine when I did a few tests by hand and didn't take very long to be CPANised; I'm trying to stay in good habits and treat all my internal modules as if they'd be released - I just skimp on the tests a bit. Which I know is bad.

Unfortunately, while it was fine for the light testing, I wasn't very happy with it once I started to use it for heavier loads. The webservice it calls doesn't seem to handle more than a couple of words at a time and makes it impossible to differentiate between a word that's spelt correctly and one that it doesn't know about (it returns undef on either). So now I have a small dilemma. Do I release it to CPAN, add a note listing what I'm not happy with about the service and mention I'm not using it anymore and let anyone else who wants to use download a copy or should I just leave it, unreleased, in a dark corner of my version control system?

Posted: 2006/11/23 00:59 | /perl | Permanent link to this entry


Sun, 03 Sep 2006

WWW::Shorten::Smallr on CPAN - Initial Release

I've just uploaded the initial release of WWW::Shorten::Smallr to CPAN and it should be making its way through the mirrors right about now.

The module itself is simple, it shrinks the given URL using the http://smallr.com/ web site. I wrote this for two reasons, firstly smallr is the official link shortener of one of the mailing lists I frequent and I wanted it available from the Vim Shortener I wrote. Secondly I wanted to have another play around with Module::Build. Which I understand slightly better now.

Posted: 2006/09/03 23:16 | /perl | Permanent link to this entry


Wed, 16 Aug 2006

CPAN META.yml to DOAP Converter - Can't Be Bothered

Last year I was quite interested in the Description of a Project (DOAP) project. I added DOAP files to all my Sourceforge projects, wrote some little util scripts, contributed DOAP files to a couple of the Free software projects I use that had asked for them... and then promptly forgot all about it.

A couple of recent posts about the Python Package index and DOAP interested me enough to dig out one of my half-finished scripts, it's the (very messy) first pass of a CPAN META.yml to DOAP converter for the automatic creation of DOAP files for perl modules. And after a little play I'm giving up. The sheer number of META.yml files that are badly formed, missing certain fields, based on previous versions of the file contents or even just empty files has surprised even the cynic in me. If I was bothered enough to have another go I'd seriously think about a two pass attempt, first the META.yml and then just mine the POD for the missing data. And hope they don't disagree.

The point of this post? CPAN is THE perl killer app but it has so much cruft it's scary. And to remind me to work on things I can actually survive...

Posted: 2006/08/16 19:33 | /perl | Permanent link to this entry


books career cloud events firefox geekstuff linux meta misctech movies nottech perl programming security sites sysadmin tools tools/ansible tools/commandline tools/gui tools/network tools/online tools/puppet unixdaemon

Copyright © 2000-2013 Dean Wilson :: RSS Feed