How to install Scrapy on debian using virtualenv

Posted febrero 15th, 2012 in Blog, Data and Metrics by admin

Scrapy allthough it’s in python is the best swiss army knife tool arround for scraping, forget about nokogiri :) or nasty perl regexp crawlers.. scrapy is serious bussines.

In this guide i will show you how to setup python virtualenvs and install the last version of scrapy into one. If you are not really familiar with virtualenvs i suggest you read http://pypi.python.org/pypi/virtualenv.

Before starting you need to install

Once all packages are installed, lets setup our first virtualenv

Now lets install scrapy from the nightly build version (if you want to install a stable version, just checkout a branch instead)

If you follow every steps and the all mighty gods of linux grant you their grace, hopefully everything should work and you can test it by just running:

Some helpfull links you may find cool:
* http://www.stereoplex.com/blog/understanding-imports-and-pythonpath | Everything you should know about python paths
* http://doc.scrapy.org/en/0.12/topics/architecture.html | Scrapy architecture overview

Why graphite wont show data after 24hs period?

Posted febrero 14th, 2012 in Blog, Data and Metrics by admin

Problem: You changed storage-schemas.conf to save data for a longer period than 24hs but still all graphs show data only up to 24hs.

Short Answer: RTFM.

Long Answer:
The manual clearly says that if you change storage-schemas.conf carbon won’t resize your whisper database, your change will apply to newly created databases, all the ones that existed before the change will keep the old format.

However, there’s a command line utility called whisper-resize.py provided by graphite which is very handy for this situations. For example, let’s say you have changed the format for databases that belong to the pattern *.daily.* and that the new format you want is “60s:90d,1h:180d”. Whisper stores all it’s data inside /opt/graphite/storage/whisper so you need to go there and find the precise path to your whisper (.wsp) database file. Then run:

That will update your file and create a backup file with the original format with extension bkp. But, if you have been using it dynamically it’s very likely you have tens or hundreds of metrics you want to update. That’s what xargs is for:

And you are done, only thing left is after test everything went ok you need to remove all backups:

Good luck with graphite!