Skip to content

sureshinuguru/webcrawler

Repository files navigation

webcrawler

This is a simple, recursive Java Web-Crawler for internal and external links which excludes facebook and twitter and images on a specific website or same domain, which creates a simple XML-file including the found pages and the returned status-code. While it attempts to crawl through any website and find new links, it won't crawl a site multiple times or try to crawl a downloadable file.

Download

Run

mvn clean dependency:copy-dependencies package

GUI

Double-click the downloaded file or use the console:

java -jar WebCrawler-1.0.jar

Console

java -jar WebCrawler-1.0.jar http://wiprodigital.com

Example Output

GUI

![GUI]

Console

INTERNAL LINKS:
[1] [200] http://wiprodigital.com
[2] [200] http://wiprodigital.com/who-we-are
[3] [200] http://wiprodigital.com/what-we-do
[4] [200] http://wiprodigital.com/what-we-think
[5] [XXX] ...

EXTERNAL LINKS:

[200] https://designit.com/happening/news/create-the-future-together

[200] http://www.un.org/sustainabledevelopment/sustainable-development-goals/

INTERNAL / EXTERNAL IMAGES:

[200] http://17776-presscdn-0-6.pagely.netdna-cdn.com/wp-content/themes/wiprodigital/images/wdlogo.png

[200] http://17776-presscdn-0-6.pagely.netdna-cdn.com/wp-content/themes/wiprodigital/images/designit_logo.png

[200] http://17776-presscdn-0-6.pagely.netdna-cdn.com/wp-content/uploads/2016/05/designit-logo.jpeg

### XML-File


## License

Copyright (C) 2016 [Suresh Inuguru]

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the [GNU General Public License](https://github.com/sureshinuguru/webcrawler/blob/master/LICENSE) for more details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages