Recently, I have been working on my own API for making requests and parse the HTML DOM.
Day by day, I am using Python for web crawling on my side projects. In my basic project setup, I have to use the
requests library for (yeah) making HTTP requests,
lxml for DOM parsing and/or
scrapy for a more structured way of crawling.
But that isn’t all, I have to use proxies and it can be frustrating if you don’t have a system to manage a list of proxies or some working free proxies to just use for a fast project.
So I have made UrlWorker, an easy API that uses free proxies and parses the HTML DOM for you. Just configure it through a request and it will parse whatever you want.
The free proxies are tested and rotated on each request, the DOM is parsed blazing fast. You’ll have just to import your favorite HTTP requests library and call this API each time you need it.
P.S. There is a limit on the max requests per day. The project’s in beta, so I had to limit it somehow.