PHPScraper is a universal web-scraping util for PHP, built with simplicity in mind. The goal is to make xPath Selectors optional and avoid the commonly needed boilerplate code. Just create an instance of PHPScraper, go to a website, and start collecting data. All scraping functionality can be accessed either as a function call or a property call. For example, the title can be accessed in two ways. Many common use cases are covered already. You can find prepared extractors for various HTML tags, including interesting attributes. You can filter and combine these to your needs. In some cases there is an option to get a simple or detailed version. PHPScraper can assist in collecting feeds such as RSS feeds, sitemap.xml-entries and static search indexes. This can be useful when deciding on the next page to crawl or building up a list of pages on a website.
Features
- Process the RSS feeds, sitemap.xml, etc.
- Process CSV-, XML- and JSON files and URLs
- Batteries included: Meta data, Links, Images, Headings, Content, Keywords
- Flexible Calling as an Attribute or Method
- There are plenty of examples on the PHPScraper website and in the tests
- You can configure proxy support