HtmlUnit
HtmlUnit is a "GUI-Less browser for Java programs" that models HTML documents and provides an API to interact with web pages, such as invoking pages, filling out forms, and clicking links, similar to a standard web browser. It offers fairly good JavaScript support, which is constantly improving and is capable of handling complex AJAX libraries, simulating browsers like Chrome, Firefox, or Edge depending on the configuration used. Typically used for testing purposes or retrieving information from websites, HtmlUnit is not a generic unit testing framework but is intended to simulate a browser within another testing framework such as JUnit or TestNG. It is utilized as the underlying "browser" by various open source tools like WebDriver, Arquillian Drone, and Serenity BDD, and is employed by many projects for automated web testing, including Apache Shiro, Apache Struts, and Quarkus.
Learn more
Bright Data
Bright Data is the world's #1 web data, proxies, & data scraping solutions platform. Fortune 500 companies, academic institutions and small businesses all rely on Bright Data's products, network and solutions to retrieve crucial public web data in the most efficient, reliable and flexible manner, so they can research, monitor, analyze data and make better informed decisions.
Bright Data is used worldwide by 20,000+ customers in nearly every industry. Its products range from no-code data solutions utilized by business owners, to a robust proxy and scraping infrastructure used by developers and IT professionals.
Bright Data products stand out because they provide a cost-effective way to perform fast and stable public web data collection at scale, effortless conversion of unstructured data into structured data and superior customer experience, while being fully transparent and compliant.
Learn more
trifleJS
TrifleJS is a headless browser designed for test automation, utilizing the .NET WebBrowser class and the V8 JavaScript engine to emulate Internet Explorer environments. Its API is modeled after PhantomJS, making it familiar to users of that framework. TrifleJS supports various versions of Internet Explorer, allowing emulation of IE7, IE8, and IE9, depending on the installed version. Developers can execute scripts via the command line, specifying the desired IE version for emulation. The platform offers an interactive mode (REPL) for debugging and testing JavaScript code.
Learn more
Jaunt
Jaunt is a Java library designed for web scraping, web automation, and JSON querying. It provides a fast, ultra-light headless browser that enables Java programs to perform tasks such as web scraping, form handling, and interfacing with REST APIs. Jaunt supports parsing of HTML, XHTML, XML, and JSON, and offers features like HTTP header and cookie manipulation, proxy support, and customizable caching. The library does not support JavaScript execution; however, for automating JavaScript-enabled browsers, Jauntium is recommended. Jaunt is available under the Apache License, with a monthly edition that expires periodically, requiring users to download the latest version upon expiration. The library is suitable for tasks such as parsing and extracting data from web pages, filling out and submitting forms, and handling HTTP requests and responses. Comprehensive tutorials and documentation are available to assist users in getting started with Jaunt.
Learn more