Browse free open source Java Web Scrapers for Windows and projects below. Use the toggles on the left to filter open source Java Web Scrapers for Windows by OS, license, language, programming language, and project status.

  • Get Avast Free Antivirus with 24/7 AI-powered online scam detection Icon
    Get Avast Free Antivirus with 24/7 AI-powered online scam detection

    Get protection for today’s online threats. Free.

    Award-winning antivirus protection, as well as protection against online scams, dangerous Wi-Fi connections, hacked accounts, and ransomware. It includes Avast Assistant, your built-in AI partner, which gives you help with suspicious online messages, offers, and more.
    Free Download
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    WebMagic

    WebMagic

    A scalable web crawler framework for Java

    WebMagic is a scalable crawler framework. It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting. It also provides annotation with POJO to customize a crawler, and no configuration is needed. Some other features include the fact that it is multi-thread and has distribution support. WebMagic is very easy to integrate. Add dependencies to your pom.xml. WebMagic use slf4j with slf4j-log4j12 implementation. If you customized your slf4j implementation, please exclude slf4j-log4j12. You can write a class implementation of PageProcessor.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    jsoup

    jsoup

    Java library for working with real-world HTML

    jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do. jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree. The parser will make every attempt to create a clean parse from the HTML you provide, regardless of whether the HTML is well-formed or not. You have HTML in a Java String, and you want to parse that HTML to get at its contents, or to make sure it's well formed, or to modify it. The String may have come from user input, a file, or from the web.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Arachnid is a Java-based web spider framework. It includes a simple HTML parser object that parses an input stream containing HTML content. Simple Web spiders can be created by sub-classing Arachnid and adding a few lines of code called after each page
    Downloads: 0 This Week
    Last Update:
    See Project
  • Picsart Enterprise Background Removal API for Stunning eCommerce Visuals Icon
    Picsart Enterprise Background Removal API for Stunning eCommerce Visuals

    Instantly remove the background from your images in just one click.

    With our Remove Background API tool, you can access the transformative capabilities of automation , which will allow you to turn any photo asset into compelling product imagery. With elevated visuals quality on your digital platforms, you can captivate your audience, and therefore achieve higher engagement and sales.
    Learn More
  • 5
    Aracnis is a Java based framework for building distributed web spiders. These spiders can be used to accomplish a variety of tasks, for example, screen-scraping and link integrity checking.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    GitGet

    GitGet

    Ever wanted to download only a part of a Git repository.

    Ever wanted to download only a part of a Git repository. Just paste the URL of the repo you want to download and sit back and enjoy. This simple java application makes use of Web Scraping and downloads only those files you need, thus helping you save your precious bandwidth and space.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    phoneutria
    A Java Web crawler: multi-threaded, scalable, with high performance, extensible and polite. It can be used to crawl and index any web or enterprise domain and is configurable through a XML configuration file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.