Nanonets-OCR-s is an advanced image-to-markdown OCR model that transforms documents into structured and semantically rich markdown. It goes beyond basic text extraction by intelligently recognizing content types and applying meaningful tags, making the output ideal for Large Language Models (LLMs) and automated workflows. The model expertly converts mathematical equations into LaTeX syntax, distinguishing between inline and display modes for accuracy. It also generates descriptive <img> tags for images like logos, charts, and graphs, enabling better interpretation by downstream systems. Signatures and watermarks are detected and isolated within dedicated tags to maintain document integrity, which is vital for legal and business uses. Form elements like checkboxes and radio buttons are converted into standardized Unicode symbols for consistent handling. Additionally, complex tables are extracted and formatted in both markdown and HTML to support versatile document processing.