{ Scrapit v0.2 }

Scrap webpages for keywords

Try it out

Scrapit is an API for scrapping webpages for keywords. Using Scrapit you can extract important keywords from webpages. That are quite relevant to the page that has been scrapped. Scrapit is builton Python. Since Python has some great libraries for html and text parsing.

Scrapit uses lxml along with BeautifulSoup for processing and parsing html.

Using lxml is significantly caused increase in speed.

It also makes use of Topia.termextract for extracting keywords from the heaps of text from webpages and filtering it to remove stopwords.

Using the API:

You need to make calls to
http://scrapit.herokuapp.com/q/?q={url}

Parameters:

  • q : (required) url to be fetched
  • occurs : (optional) Will only return the words that are repeated more that once on the webpage. Set to '1' while you want to enable it
  • pretty : (optional) Used for pretty printing the response. Set to '1' while you want to enable it

Examples:

(Please note that the API is still under development so the results might not be as were expected)

Created By: Virendra Rajput