Saturday, March 14, 2009

Railscasts crawler (Download all screencasts easily)

I wrote this script, as I felt really annoying to download each screencast everytime I needed them. So I ended up writing script to automatically download all screencasts without hassle i.e crawler.

Its in Ruby of course :-) and requires simple Hpricot gem.
If you not having it. Just type this command in your terminal --

$gem install hpricot

--Rest include this script in your /lib folder 
--Change the path in the script where you want to download all the screencasts
--Go to your projects development environment (script/console) and run the script by these commands--

video = Railscasts.new #new Railscasts object
video.start #will start downloading all screencasts from Railscasts

Note:
  1. If you stop the script in between manually, it will not download those screencasts that are already in your hard disk.
  2. All logs are maintained in Railsproject/log/railscasts.log. 
  3. Handling all exceptions raised
Improvements/Suggestions  are appreciated.

Thanks

And yes script: 


# Author : Akshay Gupta
#file: railscasts.rb
# First check you have all gems installed. Place the script in /lib folder and run the script.
# I don't have expertize in ruby, please tell how it can be improved.
# change the path accordingly, where you want to save path
# My working env is on MacOS, one need to make some changes if running on Windows

require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'logger'
$log = Logger.new('log/railscasts.log')
$path = "/Users/akshaygupta/railsvideo/railscasts/"
$stop = false


class Railscasts
attr_accessor :url

def initialize
@@page = 1
@@url = "http://railscasts.com/episodes?page="
start
end

def url
@url = @@url+@@page.to_s
end

def start
url
build_doc
screencasts_links
download_screencasts
next_page
if !$stop
start
else
puts "Successfully done :) Enjy all the screencasts"
end
end

def build_doc
begin
$log.info("*********Fetching #{@url}***********")
@doc = Hpricot(open(@url))
rescue Exception => e
$log.debug("Problem fetching #{e}")
end
end


def screencasts_links
begin
@download_links =
(@doc/".download/a[1]").collect {|a| (a.search("[@href]").first[:href])}
$log.info(" All Download links on this page :\n #{@download_links}")
rescue
$log.info("Problem in download links")
end
end

def download_screencasts
@download_links.each do |mov|
begin
file = mov.split('/').last
res = `cd #{$path}; ls | grep "#{file}"`
if !res
$log.info("Now downloading file #{file}")
result = `cd #{$path}; wget "#{mov}"`
if result
$log.info("Successfully Downloaded #{file}")
end
else
$log.info("Already downloaded #{file}")
end
rescue Exception => e
$log.info("problem downloding file #{e}")
end
end
end


def next_page
if @@page < 17
@@page += 1
else
$log.info("All screencasts downloaded :-), Mission accomplished!!")
$stop = true
end
end
end