BatchURLScraper - Extraction data using XPath, CSSPath, XQuery and Regex

Author Topic: BatchURLScraper - Extraction data using XPath, CSSPath, XQuery and Regex  (Read 5551 times)

Offline majentoTopic starter

  • Trade Count: (0)
  • Semi-Newbie
  • *
  • Thank You 2
  • Posts: 18
  • Karma: 0
    • SEO crawler for scan and technical audit of the website
Hello!

We present to your attention a free BatchURLScraper software, designed to extract data from web pages using XPath, CSSPath, XQuery and Regex methods.







BatchURLScraper features:
  • data parsing and extraction from a list of URLs
  • flexible configuration of parsing using XPath, CSSPath, XQuery and Regex extraction methods
  • export reports to Excel (CSV format)

Download page (5 Mb): https://site-analyzer.pro/soft/batch-url-scraper/ [nofollow]

We will be glad to receive any feedback and wishes regarding the work of the program.


Offline majentoTopic starter

  • Trade Count: (0)
  • Semi-Newbie
  • *
  • Thank You 2
  • Posts: 18
  • Karma: 0
    • SEO crawler for scan and technical audit of the website
New version BatchURLScraper 1.3







What's new:
  • expanded the number of pages for parsing from 1000 to 5000 URLs
  • added the ability to scrape through HTML templates
  • added the ability to extract data through CSSpath attributes
  • added the ability to scrape through External and Internal HTML
  • added the ability to use Proxy Servers lists
  • fixed bug with incorrect User-Agent saving

Homepage: https://site-analyzer.pro/soft/batch-url-scraper/ [nofollow]

Offline majentoTopic starter

  • Trade Count: (0)
  • Semi-Newbie
  • *
  • Thank You 2
  • Posts: 18
  • Karma: 0
    • SEO crawler for scan and technical audit of the website
New version BatchURLScraper 1.4

What's new:
  • fixed error with validation of HTML templates
  • optimized work with regular expressions
  • we added ability to ignore duplications in scraping results
  • fixed problem with not correct using pauses between requests to web pages
  • range of pauses between requests has been extended to one and a half minutes
  • finalized and improved translation
  • fixed memory leaks

Offline Ensafeindia

  • Trade Count: (0)
  • Full Member
  • ***
  • Thank You 3
  • Posts: 222
  • Karma: 0
    • Mortgage calculator with pmi and taxesmortgage calculator with pmi and taxes
ixed error with validation of HTML templates
optimized work with regular expressions
we added ability to ignore duplications in scraping results
fixed problem with not correct using pauses between requests to web pages
range of pauses between requests has been extended to one and a half minutes
finalized and improved translation
fixed memory leaks

 

Related Topics

  Subject / Started by Replies Last post
1 Replies
4054 Views
Last post 05-11-2010, 02:51:26
by Sevam
1 Replies
2166 Views
Last post 04-16-2017, 22:25:46
by wellliving
1 Replies
7137 Views
Last post 09-02-2019, 02:09:11
by amayajace