Document .Net is 100% C# managed library which gives you API to create, parse, load, modify, convert, edit, generate pie charts, merge, do mail merge and digitally sign documents in PDF, DOCX, RTF, HTML and Text formats. Rasterize documents to Images and render to WPF Framework Element. + Completely created in managed C#. No Microsoft Office automation. + Has own DOCX parser and writer according to ECMA-376 specification. + Has own RTF parser and writer according to RTF 1.8 specification. + Has own PDF parser and writer according to PDF reference 1.7. + Has own HTML writer according to HTML5 reference. + Create PDF/A compliance documents. + Digitally sign PDF documents. + Multi-platform: Windows / macOS / Linux.
Deprecated as there's new maintainer for original HAP project. Please check the new repo at https://github.com/zzzprojects/html-agility-pack. This is a port of HtmlAgilityPack library created by Simon Mourrier and Jeff Klawiter for .NET Core platform. This NuGet package supports can be used with Universal Windows Platform, ASP.NET 5 (using .NET Core) and full .NET Framework 4.6. Original description: This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).
Fizzler is a W3C Selectors parser and generic selector framework for document hierarchies. This package enables Fizzler over HTMLAgilityPack, adding QuerySelector and QuerySelectorAll (from Selectors API Level 1) for HtmlNode objects.
NFX CORE Package NFX UNISTACK includes: Application Container + Dependency Injection facilities Configuration engine BigMemory: Local/Distributed piles/heaps, ability to store hundreds of millions of objects resident in memory for long times without killing GC BigMemory Cache - store hudreds of millions of objects in RAM without GC pressure (full GC scan <20ms @ 300M+ objects 64 Gb taken) Logging with 8+ destinations/sinks(text, email, flood filter etc.) Distributed contact-oriented communication framework NFX.Glue (replaces WCF) Security with users, credentials, roles, permissions JSON parsing, ser/deser support Ultra efficient Binary serialization support Erlang CLR support with native types: tuples, lists, pattern matching Text lexing/parsing and processing pipeline: C# lexer, JSON lexer/Parser RelationalSchema language compiler - generate DDL for different targets Templatization engine (for web, emails and not only textual content) NFX.WAVE - Web Server with hybrid injectable threading model (replaces IIS + ASP.NET) NFX.WAVE.Mvc - MVC framework for web pages WV.js - a web component library auto-bindable to server MVC/MVVM Database access layer with virtual commands/queries/transactions ID generation - GlobalDistributed IDS (GDID), FID - fast process-wide ID Virtual Social Network - Twitter/Facebook/Google+ et al Virtual Payment Processing - Stripe,PayPal providers Virtual File Systems - AmazonS3, SVN, Local QR Code Creation In progress: Virtual document model with rendering to PDF, HTML and other formats In Progress: PDF DOM model + rendering
CsQuery is an HTML parser, CSS selector engine and jQuery port for .NET 4 and C#. It implements all CSS2 and CSS3 selectors, all the DOM manipulation methods of jQuery, and some of the utility methods.
Turn unstructured HTML pages into structured data. The OpenScraping library can extract information from HTML pages using a JSON config file with xPath rules. It can scrape even multi-level complex objects such as tables and forum posts.
==DEPRECATED== Source moved/split into several repositories/packages: https://www.nuget.org/packages/CSharpTest.Net.Collections/ https://www.nuget.org/packages/CSharpTest.Net.Commands/ https://www.nuget.org/packages/CSharpTest.Net.RpcLibrary/ https://www.nuget.org/packages/CSharpTest.Net.Tools/ Repositories are located under the following account: https://github.com/csharptest
dcsoup is a .NET library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods. This library is basically a port of jsoup, a Java HTML parser library. see also: http://jsoup.org/ API reference is available at: https://raw.githubusercontent.com/matarillo/dcsoup/master/sandcastle/Help/dcsoup.chm
NSoup is a .NET port of the jsoup (http://jsoup.org) HTML parser and sanitizer originally written in Java.
SgmlReader for Portable Library. SgmlReader is "SGML" markup language parser, and derived from System.Xml.XmlReader in .NET CLR. But, most popular usage the "HTML" parser. (It's scraper!!) /* Use SgmlReader in Html parse mode. */ XDocument document = SgmlReader.Parse(stream); Done!
HTML, JSON, XML, SQL, and text parser for Komodo. Please either install Komodo.Daemon to integrate search within your application, or Komodo.Server to run a standalone server. Komodo is an information search, metadata, storage, and retrieval platform.
CsQuery is an HTML parser, CSS selector engine and jQuery port for .NET 4 and C#. It implements all CSS2 and CSS3 selectors, all the DOM manipulation methods of jQuery, and some of the utility methods.
HTML to RTF .Net is 100% C# assembly to convert HTML documents into RTF, DOCX and Text formats. Can also Merge RTF documents and replace text in them. Absolutely standalone solution, doesn't require MS Office or any other. Requires only .NET Framework 4.6.2 and up or .NET 6.0 and up. The component can read and parse all types of HTML: HTML 3.2, HTML 4.01, HTML 5 with CSS and XHTML 1.01. The component has own HTML parser, DOCX and RTF writers.
This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).
A C# port of standalone version of the readability lib
Markdown support for classic ASP.NET MVC and WebForms applications using the popular MarkDig Markdown parser. Provides the following features: * Markdown.Parse(), Markdown.ParseHtml() methods * Markdown Islands Markdown Control * A Markdown WebForms Control for static Markdown text or dynamic assignment * Load static text * Load content from files * Databind text * Loose Markdown file support - drop Markdown files into your site * Supports HTML Sanitation * Supports White Space handling for Markdown text * Customization of Markdown Parser * Supports many Markdown support features courtesy of MarkDig For .NET Core support with similar features see `Westwind.AspNetCore.Markdown`.
![Package is still in development. It works, but is far away from being very stable]! H2o (HtmlToObject) is a parser which parses HTML into objects using XPath, Css or Regex selectors. Selectors can easily be assigned to classes using attributes.
NFX.Web Package NFX UNISTACK includes: Application Container + Dependency Injection facilities Configuration engine BigMemory: Local/Distributed piles/heaps, ability to store hundreds of millions of objects resident in memory for long times without killing GC BigMemory Cache - store hudreds of millions of objects in RAM without GC pressure (full GC scan <20ms @ 300M+ objects 64 Gb taken) Logging with 8+ destinations/sinks(text, email, flood filter etc.) Distributed contact-oriented communication framework NFX.Glue (replaces WCF) Security with users, credentials, roles, permissions JSON parsing, ser/deser support Ultra efficient Binary serialization support Erlang CLR support with native types: tuples, lists, pattern matching Text lexing/parsing and processing pipeline: C# lexer, JSON lexer/Parser RelationalSchema language compiler - generate DDL for different targets Templatization engine (for web, emails and not only textual content) NFX.WAVE - Web Server with hybrid injectable threading model (replaces IIS + ASP.NET) NFX.WAVE.Mvc - MVC framework for web pages WV.js - a web component library auto-bindable to server MVC/MVVM Database access layer with virtual commands/queries/transactions ID generation - GlobalDistributed IDS (GDID), FID - fast process-wide ID Virtual Social Network - Twitter/Facebook/Google+ et al Virtual Payment Processing - Stripe,PayPal providers Virtual File Systems - AmazonS3, SVN, Local QR Code Creation In progress: Virtual document model with rendering to PDF, HTML and other formats In Progress: PDF DOM model + rendering
HtmlMonkey is a lightweight HTML/XML parser written in C#. It allows you to parse an HTML or XML string into a hierarchy of node objects, which can then be traversed or queried using jQuery-like syntax. In addition, the node objects can be modified or even built from scratch using code. Finally, the classes can generate the HTML or XML from the data.
HtmlMonkey is a lightweight HTML/XML parser written in C#. It allows you to parse an HTML or XML string into a hierarchy of node objects, which can then be traversed or queried using jQuery-like selectors. In addition, the node objects can be modified or even built from scratch using code. Finally, the classes can generate the HTML or XML from the data.
NFX.MongoDB Provider Package NFX UNISTACK includes: Application Container + Dependency Injection facilities Configuration engine BigMemory: Local/Distributed piles/heaps, ability to store hundreds of millions of objects resident in memory for long times without killing GC BigMemory Cache - store hudreds of millions of objects in RAM without GC pressure (full GC scan <20ms @ 300M+ objects 64 Gb taken) Logging with 8+ destinations/sinks(text, email, flood filter etc.) Distributed contact-oriented communication framework NFX.Glue (replaces WCF) Security with users, credentials, roles, permissions JSON parsing, ser/deser support Ultra efficient Binary serialization support Erlang CLR support with native types: tuples, lists, pattern matching Text lexing/parsing and processing pipeline: C# lexer, JSON lexer/Parser RelationalSchema language compiler - generate DDL for different targets Templatization engine (for web, emails and not only textual content) NFX.WAVE - Web Server with hybrid injectable threading model (replaces IIS + ASP.NET) NFX.WAVE.Mvc - MVC framework for web pages WV.js - a web component library auto-bindable to server MVC/MVVM Database access layer with virtual commands/queries/transactions ID generation - GlobalDistributed IDS (GDID), FID - fast process-wide ID Virtual Social Network - Twitter/Facebook/Google+ et al Virtual Payment Processing - Stripe,PayPal providers Virtual File Systems - AmazonS3, SVN, Local QR Code Creation In progress: Virtual document model with rendering to PDF, HTML and other formats In Progress: PDF DOM model + rendering
A package, which includes helpful auxiliary methods. With this package, software development is faster. Features: - Extensions - Helpers - Filters * AjaxOnlyAttribute * ContentTypeFilterAttribute * GlobalizationFilterAttribute * PasswordValidatorAttribute * PreventDuplicateRequestAttribute * RequiredIfAttribute * RequiredIfNotNullAttribute * SessionEndLifeTimeFilterAttribute - Managers * ManagerBase * CacheManager * CertificatesManager * HtmlManager * HttpManager * InstanceManager * JsonManager * NotificationManager * NullManager * ResultManager * SecurityManager * SessionManager * RazorManager * WebConfigManager * UrlManager * PdfManager * Authorization Policy Based Manager - ModelBinders * DecimalModelBinder * HtmlValidationModelBinder - Providers * XmlSerializerProvider (abstract)
HTML Parsing and Sanitizing utility. Convert HTML to XHTML
A set of e-mail components which implement IMAP, SMTP, POP3, Exchange Web Services, SSL/TLS support, parsing and building MIME and S/MIME messages, Outlook .MSG and .PST conversions, mail merge over database, OAuth 2.0, DNS MX lookup, e-mail address validation, parsing winmail.dat, bounced messages processing, HTML messages cleanup, messages with embedded pictures, async/await methods, and much more. Supports .NET Framework, .NET Core, .NET Standard, UWP, Xamarin.
Web/Html Helper Class Library. Markdown Parser / Html Santizer Example: using StackExchange.DataExplorer.Helpers; Convert Markdown to Html: var html = HtmlUtilities.RawToCooked(form["markdown"]) Santize / Normalize Html: @Html.Raw(HtmlUtilities.Safe(html))
GroupDocs.Parser for .NET is a useful parsing class library which allows to extract different data from documents of various formats. The data extraction API supports PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX and many more formats.
A simple .net assembly to use to parse Open Graph information from either a URL or an HTML snippet. You can read more about the Open Graph protocol @ http://ogp.me.
Fast, lightweight HTML parser with accurate source information.
A utility to convert html document into a tree of HtmlNode elements. It can also parse the css styles and apply to the html elements
Microsoft Research Multi-Device Experiences (MDX) Utilities Package that contains the shared caching, logging, and HTML microdata parsers implementations.
This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams). --------------------------------------- This library is sponsored by ZZZ Projects: https://entityframework-extensions.net/ https://eval-expression.net/ https://dapper-plus.net/ --------------------------------------- HAP is trusted by companies worldwide with over 150 million downloads.
The Wojdav Bootstrap Mvc is a set of ASP.NET MVC UI components and a toolset used to create them. The components are based on well-known Bootstrap 4 and dedicated to ASP.NET Core MVC and ASP.NET MVC 5 frameworks. A toolset consists of a fast parser of HTML code and classes allowing the manipulation of the HTML code in a similar way to the jQuery library. WARNING: This is first Stable Release with potential breaking changes. Read more in changelog on website.
WQuery enables parsing and then editing a HTML code with the assistance of a fluent interface just like in the case of a jQuery library. WQuery is a part of a Wojdav Bootstrap Mvc package. The parsing of the HTML code is based on a WHtmlParser library. For now, a WHtmlParser contains some limitations when it comes to ability to parse the HTML code. The limitations directly influence the ability of parsing the code by the WQuery library. Before you start using WQuery, please look at the limitations at this address: https://wojdavbootstrapmvc.com/WQuery. WARNING: This is first Stable Release with potential breaking changes. Read more in changelog on website.
WHtmlParser is HTML code parser which works fast and does not have limitations imposed by parsers according to a WC3 specification. If you want to parse and modify a HTML code in an object-oriented way I recommend using a WQuery library which wraps a parser into an object, which shares user-friendly methods similar to those in jQuery. WARNING: This is first Stable Release with potential breaking changes. Read more in changelog on website.
A headless browser supporting web navigation, html parsing, css style parsing, and javascript parsing/execution.
Provides a simple HTML document parser that can be used to extract information from web pages. Social Meta-data can be easily extracted from page. Information is taken from Open Graph meta data or Twitter Card meta data, as well as standard HTML meta data.
Free .NET HTML parser (C#) is an open source high-performance .NET C# module that was created to parse HTML for links, indexing and other purposes.