Complete books are located in the book branch.
Books will be updated regularly to keep up with Geeksforgeeks.org.
Geeksforgeeks.org has (finally) updated its user interface lately and optimized for mobile devices. This project started when the old geeksforgeeks site was clumsy.
New: Complete books are located in the book branch
Note: Books in the master branch is not actively updated and maintained anymore. Instead get the books in the book branch. The book branch will be merged into the master branch in the future. The code in the master branch was hacked together when I just got to know Scrapy and lxml. So fair warning, it's not pretty if you decide to take a look at the code.
Old: To get the latest version of the books, look under the directory called goodies. Each book under geeksforgeeks-books is generated with articles under a tag/category on geeksforgeeks.org. The book under leetcode-book is generated from the articles on leetcode.com.
There is an App now! Geeksforgeeks Reader makes it easier to read on your iOS devices. This app just got started. Feature requests are welcome.
For people who use leetcode, there is an app as well: Leetcoder.
If you want to generate the books yourself. Here is an incomplete guide.
-
install Scrapy. It's is used to download webpages from
geeksforgeeksandleetcode. It follows the next page link and downloads webpages.Install it with
pip install scrapy. I created two separate scrapy projects calledgeeksforgeeksandleetcodeto download wepages from the sites. -
lxml and Boilerpipy (or BeautifulSoup). After downloading the html files, you need to extract the articles from them, I'm using
Boilerpipybecause it can handle webpages with different layout. But if you are only interested in thegeeksforgeekssite, you can just uselxmlto extract the articles. It will probably be faster too.Boilerpipyalso removes the title of an article sometimes. So I had to do some post-processing withlxmlafter to add the title back. -
Pandoc. It's used to convert html files or markdown files to epub, pdf and docx format files. The latex engine used in Pandoc can't handle gif images so only a few pdf books have been generated so far.
-
kindlegen is needed to generate
mobifiles for reading on Kindle or the Kindle App. -
WordCloud. The book covers are generated with
wordcloudwith a bit of meta in mind.
-
Crawling with Scrpay. Go to the
geeksforgeekssubdirectory and run commands likescrapy crawl geeksforgeeks -a category=category -a name=name.For example, running
scrapy crawl geeksforgeeks -a category=tag -a name=pattern-searchingwill crawl from the pagehttp://www.geeksforgeeks.org/tag/pattern-searching/. category and name are two arguments the spider takes. On geeksforgeeks, things can be organized bytagorcategory. Specify the category/tag and the name, Scrapy will do the rest for you. -
Generate a book. Now go into the
geeksforgeeks-bookssubdirectory and you should be able to find a directory calledpattern-searching. Now runpython generate_book.py pattern-searching 1.0. It will clean the html files, concatenate the cleaned files into one html file, then usepandocto create an epub and pdf format files from the it. In the end a mobi file is created usingkindlegen.
Style the books better. Those books are essentially styled via css. Therefore styling <pre> and <code>, for instance, will style the code of the epub books.
Convert gif images to png and use them instead so pandoc can handle them.
Every tag or category on geeksforgeeks.org can be turned into a book. So you are welcome to add/suggest more books.
The style for generating epub books is under styles subdirectory. epub books are styled via css. Welcome to submit your stylesheets.
The content in the books doesn't belong to me. I created the books so that I can read them offline on iPad or Kindle, and (hopefully) for a better reading experience.
The content on geeksforgeeks.org is licensed under Creative Commons Attribution-NonCommercial-NoDerivs 2.5 India. See the license here.
The copyright of the content on leetcode belongs to the site and its owner.
The code in this project is licensed under Apache License, version 2.0. See the license here.
Jing Zhou, gnijuohz at gmail.com.
LinkedIn
Twitter
@lebshah
You can report issue right here.