Skip to content

crawley-project/crawley-ruby

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is a ruby implementation of the crawley framework


Comming Soon Features!

  • High Speed WebCrawler built on EventMachine.
  • Supports databases engines like Postgre, Mysql, Oracle, Sqlite.
  • Command line tools.
  • Extract data using XPath.
  • Cookie Handlers.

Write your Models

""" models.rb """

require 'rubygems'
require 'data_mapper'
require 'dm-migrations'    

class Package
    include DataMapper::Resource
        
    property :updated,      String
    property :package,      String
    property :description,  String    
end

Write your Scrapers

""" crawlers.rb """

require 'crawlers'
require 'scrapers'

class PypiScraper < BaseScraper

    @@matching_urls = ["%pypi.python.org/pypi%"]

    def scrape response        
        super response
    end
end

class PypiCrawler < BaseCrawler

    #add your starting urls here
    @@start_urls = ["http://pypi.python.org/pypi"]

    #add your scraper classes here
    @@scrapers = [PypiScraper.new]

    #specify your maximum crawling depth level
    @@max_depth = 1

end    

About

A ruby implementation of the crawley framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published