Skip to content

CodingBrushUp/Booking_com_Serializer

Repository files navigation

Booking.com Serializer

Web Extraction

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contact
  5. Acknowledgments

About The Project

this is a data extractor that reads out content from "extraction.booking.html" and extracts the below listed information.

Input

Html content taken from extraction.booking.html

Output

output format: JSON string

file name: Hotel.json

Model

Hotel Model:

  • Hotel name
  • Address
  • Classification
  • Review points
  • Number of reviews
  • Description
  • Room categories
  • Alternative hotels

(back to top)

Built With

(back to top)

Getting Started

Prerequisites

Installation

  1. Clone the repo
    git clone https://github.com/CodingBrushUp/Booking_com_Serializer.git
  2. copy Resources folder to your C:\ (our feed is in this folder)

go to the project root folder, run cmd on url bar to open command prompt

to build and run project:

run these commands:

dotnet restore
dotnet build
dotnet run --project ./Task1/Task1.csproj

to test project:

dotnet test

(back to top)

Note:

  1. we have a variety of libraries for Web Crawling such as HttpAgility, Selenium, etc. Selenium is a better choice for real projects.
  2. ...

(back to top)

contact

contact[at]alihaghighi[dot]pro

https://www.linkedin.com/in/ali-s-haghighi/

About

An early sample code for crawling booking.com Hotel details page with the help of HtmlAgility Library.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published