While Life:: Railscasts crawler (Download all screencasts easily)

I wrote this script, as I felt really annoying to download each screencast everytime I needed them. So I ended up writing script to automatically download all screencasts without hassle i.e crawler.

Its in Ruby of course :-) and requires simple Hpricot gem.

If you not having it. Just type this command in your terminal --

$gem install hpricot

--Rest include this script in your /lib folder

--Change the path in the script where you want to download all the screencasts

--Go to your projects development environment (script/console) and run the script by these commands--

video = Railscasts.new #new Railscasts object

video.start #will start downloading all screencasts from Railscasts

Note:

If you stop the script in between manually, it will not download those screencasts that are already in your hard disk.
All logs are maintained in Railsproject/log/railscasts.log.
Handling all exceptions raised

Improvements/Suggestions are appreciated.

Thanks

And yes script:


# Author : Akshay Gupta
#file: railscasts.rb
# First check you have all gems installed. Place the script in /lib folder and run the script.
# I don't have expertize in ruby, please tell how it can be improved.
# change the path accordingly, where you want to save path
# My working env is on MacOS, one need to make some changes if running on Windows

require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'logger'
$log = Logger.new('log/railscasts.log')
$path = "/Users/akshaygupta/railsvideo/railscasts/"
$stop = false


class Railscasts
attr_accessor :url

def initialize
  @@page = 1
  @@url  = "http://railscasts.com/episodes?page="
  start
end

def url
  @url = @@url+@@page.to_s
end

def start
  url
  build_doc
  screencasts_links
  download_screencasts
  next_page
  if !$stop
    start
  else
    puts "Successfully done :) Enjy all the screencasts"
  end
end

def build_doc
  begin
    $log.info("*********Fetching #{@url}***********")
    @doc = Hpricot(open(@url))
  rescue Exception => e
    $log.debug("Problem fetching #{e}")
  end
end


def screencasts_links
  begin
    @download_links = 
      (@doc/".download/a[1]").collect {|a| (a.search("[@href]").first[:href])}
    $log.info(" All Download links on this page :\n #{@download_links}")
  rescue
    $log.info("Problem in download links")
  end
end

def download_screencasts
  @download_links.each do |mov|
    begin
      file = mov.split('/').last
      res = `cd #{$path}; ls | grep "#{file}"`
      if !res
        $log.info("Now downloading file #{file}")
        result = `cd #{$path}; wget "#{mov}"`
        if result
          $log.info("Successfully Downloaded #{file}")
        end
      else
        $log.info("Already downloaded #{file}")
      end
    rescue Exception => e
      $log.info("problem downloding file #{e}")
    end
  end
end

 
  def next_page
    if @@page < 17
      @@page += 1
    else
      $log.info("All screencasts downloaded :-), Mission accomplished!!")
      $stop = true
    end
  end
end

4 comments:

Waseem said...: Great script. Why don't you create a gist of it? http://gist.github.com; March 15, 2009 at 1:26 PM
Akshay Gupta said...: Thanks buddy for the appreciation and suggestion

Ok will do it soon, as working on some other scripts altogether.; March 15, 2009 at 1:36 PM
Filipe Grillo said...: Simple and effective
I just changed the "if !res" line to "if res.empty?" because the script was telling me that I already have all the casts!

thank you for the script, and keep up the good work!; July 9, 2009 at 5:32 PM
Anonymous said...: http://www.anleger.blog; July 14, 2017 at 11:58 AM

While Life:

Saturday, March 14, 2009

Railscasts crawler (Download all screencasts easily)

4 comments:

About Me

Recommend Me

Blog Archive

LinkedIn Profile

Status :

Followers

Labels