Scrapeo

A command-line SEO web scraping / analysis tool

Installation

Run git clone git://github.com/wheresmyjetpack/scrapeo.git
cd into the scrapeo directory and run make deploy to install required packages into a virtualenv
Optional (With super user privileges) ln -s $HOME/.virtualenvs/venv/bin/scrapeo /usr/local/bin/scrapeo (Or somehwere in your path)

Alternative -- Install via pip

Simply run pip install scrapeo to install from the Python Package Index - OR -
Clone the repo, cd into the newly created directory and pip install .
It's recommended that you install scrapeo in a virtualenv instead of in your global site-packages directory

Concept

Scrape and analyze elements like meta data and content from web pages
Provide a quick and easy-to-use tool for those who prefer command-line interfaces
Provide useful analytical and assessment data

Features

Installation via pip or make
Scrape pages from the command-line for meta tags by attribute-value pairs or by a single attribute's value
Useful shortcuts like -d to get a page's meta description, or -c to retrieve a canonical URL
Makefile for common development tasks, like building wheel, source, and deb packages

Make commands

make test - run all tests
make deb - build Debian package (requires system packages in requirements-dev.txt)
make source - build source tarball
make wheel - build Python wheel
make daily - make daily snapshot
make deploy - create vitrual environment and install
make install - install program
make init - install all requirements
make clean - clean project, remove .pyc and other temporary files

Project Structure

    |-- docs
    |   |-- build
    |   |   |--doctrees
    |   |   `--text
    |   |      `-- index.txt
    |   |-- Makefile
    |   `-- source
    |       |-- conf.py
    |       `-- index.rst
    |-- scrapeo
    |   |-- __init__.py
    |   |-- utils
    |   |   |-- __init__.py
    |   |   `-- web_scraper.py
    |   |-- __init__.py
    |   |-- core.py
    |   |-- exceptions.py
    |   |-- main.py
    |   `-- helpers.py
    |-- tests
    |   |-- data
    |   |   `-- document.html
    |   |-- __init__.py
    |   |-- test_helpers.py
    |   `-- test_Scrapeo.py
    |-- CHANGES.txt
    |-- LICENSE.txt
    |-- MANIFEST.in
    |-- Makefile
    |-- README.md
    |-- README.rst
    |-- requirements-dev.txt
    |-- requirements.txt
    `-- setup.py

Changelog

0.1.1

Move from Python's html.parser to the external html5lib package to help deal with different forms of empty tags, eg. <meta> and <meta />
Docs (generated using Sphinx and autodoc)
Python 2 compatibility
-c canonical link option added
-s option for specifiying what element attribute to scrape a value from
-r flag for scraping the content attribute of a robots meta tag
-H option for scraping the text from the first heading by type (h1,h2,h3,etc.)
Numerous bug fixes
Test coverage

0.1.0

Initial development release

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scrapeo

Installation

Concept

Features

Make commands

Project Structure

Changelog

0.1.1

0.1.0

About

Releases 4

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
docs		docs
scrapeo		scrapeo
tests		tests
.gitignore		.gitignore
CHANGES.txt		CHANGES.txt
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
README.rst		README.rst
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py

License

wheresmyjetpack/scrapeo

Folders and files

Latest commit

History

Repository files navigation

Scrapeo

Installation

Concept

Features

Make commands

Project Structure

Changelog

0.1.1

0.1.0

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Packages