Entrepreneurship, Linux, and Ruby
Posts tagged tutorial
Bayesian Classification on Rails
Jan 26th
A project I've been working on watches Twitter for some search keywords, with the goal of finding new customers, jobs, items for sale, etc. For example, a computer repair shop might want to watch for the keywords "laptop" and "broken", and then reply to tweets where they think they can help.
But as anyone who uses Twitter can attest, even with some very specific search terms, language filtering and geocoding, there is going to be a lot of white noise. I decided to take this one step further.
Bayesian classification (your garden-variety spam filter) in ruby is quite easy, thanks to ruby-stemmer and the excellent classifier gem. The canonical example:
b = Classifier::Bayes.new :categories => ['Interesting', 'Uninteresting']
b.train_interesting "here are some good words. I hope you love them"
b.train_uninteresting "here are some bad words, I hate you"
b.classify "I hate bad words and you" # returns 'Uninteresting'
Of course, if you're implementing this in a Rails application, chances are you want the classifier to learn from real data over time. In my case, I want it to learn that a tweet is uninteresting when I delete it, and I want it to learn that a tweet is interesting when I visit the Tweet#show action.
It seems the usual method is to marshal the classifier object with madeleine, which creates a new snapshot file each time you train it. This is both easy and fast, but we're going to end up with thousands or millions of snapshot files in no time flat. Additionally, all bets are off if we have a few users who are really into cheap viagra. We need to give each User his own classifier and let him train it over time.
First, let's set up our environment. Grab the latest ruby-stemmer and classifier gems from Github, and build them from source. I recommend this because the gem versions I got on my first try were way out of date and quite broken, and because you'll need a classifier fork with my remove_stemmer method to marshal your classifiers using ActiveRecord.
$ cd ruby-stemmer
$ rake compile
$ sudo rake install
$ cd ..
$ git clone git://github.com/logankoester/classifier.git
$ cd classifier
$ sudo rake install
$ sudo gem install twitter
Generate a fresh rails app if you want to follow along.
$ script/generate resource user id:integer classifier:text
$ script/generate resource keyword id:integer user_id:integer text:string
$ script/generate resource tweet id:integer keyword_id:integer user_id:integer text:string read:boolean interesting:boolean
$ script/generate migration ChangeClassifierDefaults
Alternatively, you can clone the code from this tutorial with:
Now edit the migration you just created to look like this:
def self.up
change_column :tweets, :interesting, :boolean, :default => false
change_column :tweets, :read, :boolean, :default => false
change_column :keywords, :text, :string, :default => ""
end
def self.down
change_column :keywords, :text
change_column :tweets, :read
change_column :tweets, :interesting
end
end
…and run it
Open your config/environment.rb file, and add the following gems to the Initializer block.
config.gem 'luisparravicini-classifier', :lib => 'classifier'
config.gem 'twitter'
Now we can use ActiveRecord's built-in YAML serialization to store the classifier.
has_many :tweets
has_many :keywords
serialize :classifier, Classifier::Bayes
before_create :initialize_classifier
before_update :remove_stemmer
private
def initialize_classifier
self.classifier = Classifier::Bayes.new(
:categories => ['Interesting', 'Uninteresting']
)
remove_stemmer
end
def remove_stemmer
self.classifier.remove_stemmer
end
end
The remove_stemmer method requires a little explanation. When a Classifier is initialized, it also creates a Stemmer object to use, which ordinarily gets marshalled along with its Classifier. But when demarshalled later, the Stemmer object (which is really just a C extension) will get caught with its shorts down, and either throw an error like "Stemmer is not initialized", or in older versions, simply segfault your rails environment!
The solution is simple; my fork implements a remove_stemmer method on Classifier::Base, which will force the stemmer to be reinitialized the next time it is needed. Call this method before you marshal your classifier, and your troubles will melt away.
Moving on to the Tweet model, we want to classify each tweet when it is created.
belongs_to :user
belongs_to :keyword
before_save :classify
def classify
text = self.text.gsub /#{self.keyword.text}/, "
if self.user.classifier.classify(text) == 'Interesting'
self.interesting = true
end
end
end
Of course, we don't want to throw off the results by including a word which is going to occur in every tweet, so we remove the search term from the text prior to classification.
Add a little method to your Keyword model to grab new tweets from the Twitter Search API
belongs_to :user
has_many :tweets, :dependent => :destroy
after_save :search
def search
search = Twitter::Search.new(self.text).fetch
search.results.each do |r|
t = Tweet.create(
:keyword => self,
:user_id => self.user,
:text => r.text
)
t.save
end
end
end
Almost done! Now we need to train our sweet new classifier. I've opted to do this entirely from the controller, so that messing around in the console won't inadvertently have an impact on the machine's learning. We also want to mark the tweet in question as already read, so that the lesson is only learned once.
def show
@tweet = Tweet.find(params[:id])
unless @tweet.read?
current_user.classifier.train_interesting(
@tweet.text.gsub(/#{@tweet.keyword.text}/, ")
)
current_user.save
@tweet.read = true
@tweet.save
end
end
def destroy
if @tweet = Tweet.find(params[:id])
if @tweet.destroy
current_user.classifier.train_uninteresting(
@tweet.text.gsub(/#{@tweet.keyword.text}/, ")
)
current_user.save
end
end
end
end
And there you have it… a simple machine learning solution for extracting awesome tweets. Let's try it out!
Fire up a script/console session.
>> u = User.create
=> #<User id: 1, classifier: #<Classifier::Bayes:0xb64f2354 @categories={:Uninteresting=>{}, :Interesting=>{}}, total_words0, stemmernil, options{:encoding=>"UTF_8", :categories=>["Interesting", "Uninteresting"], :language=>"en"}, created_at: "2010-01-26 22:17:19", updated_at: "2010-01-26 22:17:19"
As you can see, our new user has a Bayesian Classifier waiting around to learn what kind of tweets he likes.
=> #<Keyword id: 1, user_id: 1, text: "robots", created_at: "2010-01-26 22:20:55", updated_at: "2010-01-26 22:20:55">
>> Tweet.all.size
=> 15
You can use the following oneliners from script/console to play around with the training:
Tweet.all.each {|t| u.classifier.train_interesting(t.text) if t.text.downcase.include? "cyborgs" } # Any tweet with the word "cyborgs" is interesting
Tweet.all.each {|t| u.classifier.train_uninteresting(t.text) if t.text.downcase.include? "discount" } # Any tweet with the word "discount" is uninteresting
Tweet.find_all_by_interesting(true).each { |t| pp t.text }.size # Print the interesting tweets and count them
Tweet.all.each {|t| t.classify } # Rerun the classification on every tweet
Of course, this technique can be applied to sorting pretty much any kind of text. Interesting/uninteresting tweets are just one example from my life. Start hacking!
Fun with ion3
Oct 7th
Ion™ is a tiling tabbed window manager designed with keyboard users in mind.
In recent years I've been a GNOME / Compiz guy, but while I've enjoyed it's tight integration with Ubuntu and flashy effects, I've always missed the simplicity of so-called minimalist window managers, mainly fvwm. These days, however, practically everything I do happens inside a Firefox, gvim, or gnome-terminal.
I want keyboard-driven. I want scriptable. And I don't want windows hiding behind other windows. Ever.
Enter ion3. I've only been using it for the last 24 hours, and even though I haven't memorized all of the keymaps, or learned how to code in Lua (yet!), I already love it. So far the only problem I've not been able to overcome is a bug in the latest Adobe Flash that breaks fullscreen video. This isn't specific to ion3 – it's a problem with any focus-follows-mouse system. I hear there is a workound, but it didn't seem to work for me. I consider it a microscopic trade-off for such an efficient window manager. Many of my previously sluggish applications now run incredibly fast, and with a couple days of practice I'll be working faster too.
Installation
Now just log out, choose ion3 and start a new session. The first time you log in you'll be greeted with the man page, which I highly suggest reading. If you try not to "cheat" by using the mouse, you'll pick up almost everything in a couple of hours, and from there you'll find yourself navigating faster and faster until you don't have to think about it at all. Just like vim.
Ion is both simple and well-documented, so it would be pointless for me to write introductory tutorial. Instead, here are a couple tricks I've discovered.
Modifying your configuration
One of the first things you're going to want to do when you're done messing around is change a few settings. For the most part, this is done in a file called cfg_ion.lua. Copy the system-wide file (I found mine at /etc/X11/ion3/cfg_ion.lua) to ~/.ion3/cfg_ion.lua and open it with a text editor.
$ cp `locate cfg_ion.lua | head -1` ~/.ion3/cfg_ion.lua
$ gvim ~/ion3/cfg_ion.lua
You'll need to restart Ion for your changes to take effect. Don't worry, all your applications will stay open; only the window manager needs to be restarted. Hit F12 and type session/restart.
I messed this file up a few times experimenting, and I'll probably mess it up a few more. If you screw up this file like I did, your F12 shortcut can disappear, and you'll need another way to restart Ion after you've fixed it. Keep a terminal open whenever you're editing, because you may not be able to launch one. The trick to restart Ion from the console is simple:
$ kill -USR1 21108
Remapping Mod1
The Mod1 key is used to initiate most interactions with Ion. On most systems, this is Alt. This is usually a very bad choice, because a lot of other applications need the Alt key for other things. I tried the Flying Window key, but it turns out it's in a very uncomfortable place on the keyboard. The number keys are used a lot. Try reaching Win+6, and you'll see what I mean. CapsLock has been working great for me, and as an added bonus, makes it much more work to shout on IRC.
Check your keymaps with xmodmap -pm. On my system, Mod3 was unused, so I remapped CapsLock to that.
Edit (or create) ~/.Xmodmaprc and insert these lines at the bottom…
add Mod3 = Caps_Lock
Then run it…
Also add this line to ~/.Xsession so it is run automatically whenever you start X.
If your xmodmap -pm now reads…
then you're in luck! Now you just need to edit the META variable near the top of your cfg_ion.lua to reflect the change
and restart.
All done! I hope you enjoy learning and using Ion3 as much as I have. I don't think I'll be switching again any time soon.
Setting up rTorrent with Firefox
Mar 14th
Being the closest thing we have to a native uTorrent in Linux, I really like Deluge. But, at least for me, it uses a seemingly impossible amount system resources. Since a Bittorrent client is the kind of thing I want to leave running in the background, I needed a lighter alternative.
I don't see any real need for a graphical interface when ultimately all it's doing is moving bits around on a network, so I went with rTorrent. One of the benefits of using command-line software is that you can use SSH and screen to control it over the network… we don't need no fancypants AJAX interface for this!
Part 1 – rTorrent
If you're using Ubuntu, you can get rTorrent from the repositories, like so…
Now that you've got the software, you're going to need to configure it. rTorrent looks for a configuration file called .rtorrent.rc in your home directory. Don't panic. Just save the sample as ~/.rtorrent.rc and open it up in your favorite text editor.
You don't need to worry about most of the stuff in this file, but you can if you want to. Here's how I have it set up:
# I like to limit this because I'm often connected through cheap
# wireless routers that have trouble with lots of connections.
min_peers = 40
max_peers = 450
# Same as above but for seeding completed torrents (-1 = same as downloading)
#min_peers_seed = 10
max_peers_seed = 50
# Maximum number of simultanious uploads per torrent.
max_uploads = 30
# Where do you want your downloads to go?
directory = ~/downloads
# You can put this anywhere you like, but I put it here.
# Remember that you'll have to create this directory
session = ~/.rtorrent/session
# Watch a directory for new torrents, and stop those that have been
# deleted.
# This will be important when we're setting up Firefox.
schedule = watch_directory,5,5,load_start=~/downloads/torrents/*.torrent
schedule = untied_directory,5,5,stop_untied=
# Port range to use for listening.
# Remember if you're connected through a NAT router, you'll
# need to forward these ports.
port_range = 50471-50479
# Enable peer exchange (for torrents not marked private)
peer_exchange = yes
Part 2 – Save Link In Folder
Okay, so you've got rTorrent all set up now, and configured to watch for new *.torrent files in a directory (mine is ~/downloads/torrents/*.torrent). Now let's configure Firefox. There's an extension by Achim Seufert called Save Link In Folder. You'll want to install this.
After your browser restarts, go to Tools > Add-ons > Save Link In Folder > Preferences and add a new folder, like this…

Remember – the download directory must be the one you told rTorrent to watch!

Now when you click a torrent link, just save it instead of opening it with Deluge. If rTorrent is running it will notice the new torrent, and get to work! You can even queue up torrents while rTorrent is off, for downloading later.
I'm definitely an rTorrent noob, having just set this up tonight, but so far I like it a lot, and no longer have the performance issues I had using Deluge. This configuration would also be ideal for setting up a seedbox / media center machine, if you set up all your Firefoxes to save torrent files to a network mount on the server.
Links
Installing Linux Mint 6 on your Asus EeePC 901
Jan 8th
I was a fan of ubuntu-eee (now known as EasyPeasy) for a long time, but after upgrading to EasyPeasy 1.0 on my 20gb EEE 901 ($379.99) tonight, I've decided it's time to move on, and I'm happy I chose Linux Mint. Here's what I've done so far to get it running great:
Install Linux Mint
Since the EeePC has no optical media drive, you will need a USB flash drive to install Mint. Download the Linux Mint .iso file and use UNetbootin to burn it to your USB disk, then plug it into your Eee.
Turn on the machine and hit F2 to enter the BIOS setup. Set the boot priority to try the USB disk first. You may also want to make sure the webcam/bluetooth is turned on, while you're here. Save your changes and reboot, and Mint will guide you through the rest of the installation.
When it asks you how you want to partition your disk, choose Guided – use entire disk and let it use the larger of the two SSDs.
Install the EeePC kernel
As usual with Linux installations, most of your hardware will work from the get-go, but not everything. The first thing you'll want to do is get the wireless card working. Plug in an ethernet cable, and then follow these instructions. I recommend the lean kernel, and uninstalling the generic one since it will just be wasting precious disk space.
Enable Desktop Effects
Mint makes this easy for you by taking care of installing the correct drivers for your video card. All you should need to do is turn Compiz on in Preferences > Appearance > Desktop Effects.

Allow tall windows to move past the top of the screen
Sooner or later you're going to run into a window that is too tall to display on the 9" screen, and cannot be resized. The solution is to open up a terminal and run this command:
$ gconftool-2 --set /apps/compiz/plugins/move/allscreens/options/constrain_y --type bool 0
This will allow you to move these windows past the top of the screen (use ALT+Drag anywhere in the window. There are a number of other useful gconftool hacks on Ubuntu's EeePC page.
Create $HOME/bin directory
You're going to want a place to store little scripts and tools where they can be executed on the command line.
$ mkdir ~/bin
Now add it to your PATH so bash can find it. Open up your ~/.bashrc file and append
if [ -d ~/bin ] ; then
PATH=~/bin:"${PATH}"
fi
Make the LCD ultra bright!
This hack is really cool. I found it on the EasyPeasy wiki. Create a new file called ultra-bright in $HOME/bin and paste in this line, then save.
sudo setpci -s 00:02.1 f4.b=ff
You will need to make it executable, so
$ chmod +x ~/bin/ultra-bright
Now you can run
$ ultra-bright
to turn on the extra brightness, and use the bright/dim function keys to reset it. If you're like me, you'll want the extra brightness turned on all the time, so go ahead and create an entry for it in Preferences > Sessions.

Enable the WiFi / Bluetooth / webcam toggle and performance tuner
For this you'll need a package called eee-control. You've already installed the Eee kernel, so this package should be available to you from the repository you added to your Software Sources.
$ sudo apt-get install eee-control
Alternatively, you can download the Ubuntu package from the website.
Now you can find this nifty utility in your Administration menu. Unless you need the extra battery life, I recommend setting performance to "super". To make use of your webcam, install the cheese and/or skype packages.
Make sure text is being rendered crystal clear
Open Preferences > Appearance > Fonts and select "Subpixel smoothing (LCDs)". Then click Details… and set Hinting to "Full". If you're like me you absolutely hate Ubuntu's default monospace font. I prefer Terminus. To switch, install the xfonts-terminus package and make it the default Fixed Width font.
Boost GNOME Performance with /etc/hosts
Following this guide will help improve your system performance, and it takes about 2 seconds.
Installing Avant Window Navigator
AWN is similar to the Dock on Mac OS X.
$ sudo apt-get install avant-window-navigator
Right-click the panel at the bottom of your screen and check "Allow Panel to be Moved". Drag it to the top of your screen, right-click and lock it again. Now launch Accessories > Avant Window Navigator. It's kind of ugly and huge by default, but we can fix that.
Right-click on AWN and click Preferences. Turn on "Auto hide bar when not in use", then switch to the Bar Appearance tab and change Bar Height to something more reasonable, like 32.
Now you'll want to get rid of the Window List at the top of the screen. Right-click on it and select "Remove from Panel". Gone! Now there is lots of room for program shortcuts and silly panel applets.
That's it!
You now have a usable OS on your Asus EeePC, which means you are both cooler and more attractive than every other clown with a clunky Xandros-based netbook. Thanks for following my guide. Let me know in the comments section how your installation experience went and if you have any other EEE tricks worth sharing.


