Search Results for "metadata extraction tool"

Showing 426 open source projects for "metadata extraction tool"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Get Avast Free Antivirus with 24/7 AI-powered online scam detection Icon
    Get Avast Free Antivirus with 24/7 AI-powered online scam detection

    Get protection for today’s online threats. Free.

    Award-winning antivirus protection, as well as protection against online scams, dangerous Wi-Fi connections, hacked accounts, and ransomware. It includes Avast Assistant, your built-in AI partner, which gives you help with suspicious online messages, offers, and more.
    Free Download
  • 1
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Amazon EC2 Metadata Mock

    Amazon EC2 Metadata Mock

    A tool to simulate Amazon EC2 instance metadata

    Instance metadata is data about your instance that you can use to configure or manage the running instance. Instance metadata is divided into categories, for example, hostname, events, and security groups. You can also use instance metadata to access user data that you specified when launching your instance. For example, you can specify parameters for configuring your instance, or include a simple script. You can build generic AMIs and use user data to modify the configuration files supplied...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3

    Pandoc

    The universal markup converter

    Pandoc is a universal document converter able to convert files from a multitude of markup formats into another. With Pandoc, you have a swiss-army knife of a converter, able to convert practically any markup format into any other. Pandoc contains a Haskell library for conversions as well as a command-line tool that uses this library. It can convert to and from just about anything-- lightweight markup formats, HTML formats, documentation formats, ebooks, TeX formats, word processor formats...
    Downloads: 144 This Week
    Last Update:
    See Project
  • 4
    DBeaver

    DBeaver

    Free universal database tool

    DBeaver is a free, multi-platform database tool that supports any database having a JDBC driver. It is useful for developers, SQL programmers, database administrators and analysts. DBeaver comes with plenty of great features such as metadata and SQL editors, ERD, data export/import/migration and more. Plugins are available for certain databases, and there are also several database management utilities. DBeaver’s Enterprise Edition provides even more features and supports non-JDBC...
    Downloads: 119 This Week
    Last Update:
    See Project
  • Powering the best of the internet | Fastly Icon
    Powering the best of the internet | Fastly

    Fastly's edge cloud platform delivers faster, safer, and more scalable sites and apps to customers.

    Ensure your websites, applications and services can effortlessly handle the demands of your users with Fastly. Fastly’s portfolio is designed to be highly performant, personalized and secure while seamlessly scaling to support your growth.
    Try for free
  • 5
    Video-subtitle-extractor

    Video-subtitle-extractor

    A GUI tool for extracting hard-coded subtitle (hardsub) from videos

    Video hard subtitle extraction, generate srt file. There is no need to apply for a third-party API, and text recognition can be implemented locally. A deep learning-based video subtitle extraction framework, including subtitle region detection and subtitle content extraction. A GUI tool for extracting hard-coded subtitles (hardsub) from videos and generating srt files. Use local OCR recognition, no need to set up and call any API, and do not need to access online OCR services such as Baidu...
    Downloads: 51 This Week
    Last Update:
    See Project
  • 6
    yt-dlp

    yt-dlp

    A youtube-dl fork with additional features and fixes

    yt-dlp is a youtube-dl fork based on the now inactive youtube-dlc. The main focus of this project is adding new features and patches while also keeping up to date with the original project
    Downloads: 89 This Week
    Last Update:
    See Project
  • 7
    Dungbeetle

    Dungbeetle

    A distributed job server

    Dungbeetle is a metadata and data lineage tracking tool developed by Zerodha to map and visualize how data flows across systems. It helps teams maintain data transparency by tracking dependencies between databases, tables, and reports, offering a centralized view of data pipelines. Dungbeetle is designed to enhance observability and trust in analytics ecosystems.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 8
    GROBID

    GROBID

    A machine learning software for extracting information

    GROBID is a machine learning library for extracting, parsing, and re-structuring raw documents such as PDF into structured XML/TEI encoded documents with a particular focus on technical and scientific publications. First developments started in 2008 as a hobby. In 2011 the tool has been made available in open source. Work on GROBID has been steady as a side project since the beginning and is expected to continue as such. Header extraction and parsing from article in PDF format. The extraction...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 9
    NetBox

    NetBox

    The premiere source of truth powering network automation

    .... It is a web-based application that can be used to manage IP addresses and the devices and cables connected to them, as well as providing a data center infrastructure management (DCIM) tool. It supports virtualization, inventory management, and cable management. It has a web-based user interface and RESTful API, to easily integrate with other tools and automate tasks.
    Downloads: 49 This Week
    Last Update:
    See Project
  • Picsart Enterprise Background Removal API for Stunning eCommerce Visuals Icon
    Picsart Enterprise Background Removal API for Stunning eCommerce Visuals

    Instantly remove the background from your images in just one click.

    With our Remove Background API tool, you can access the transformative capabilities of automation , which will allow you to turn any photo asset into compelling product imagery. With elevated visuals quality on your digital platforms, you can captivate your audience, and therefore achieve higher engagement and sales.
    Learn More
  • 10
    Lantern

    Lantern

    Tool to access videos, messaging, and other popular apps

    Can't access your favorite apps? Download Lantern to easily access videos, messaging, and other popular apps while at school or work. Lantern is an application that allows you to bypass firewalls to use your favorite applications and access your favorite websites. Lantern does not cooperate with any law enforcement in any country. Lantern encrypts all of your traffic to blocked sites and services to protect your data and privacy. Lantern passed multiple third party white box security audits...
    Downloads: 48 This Week
    Last Update:
    See Project
  • 11
    CSV Lint

    CSV Lint

    CSV Lint plug-in for Notepad++ for syntax highlighting

    CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting fixed width datasets, change datetime format, decimal separator, sort data, count unique values, convert to xml, json, sql etc. A plugin for data cleaning and working with messy data files. Use CSV Lint for metadata discovery, technical data validation, and reformatting on tabular data files. It is not meant to be a replacement for spreadsheet programs like Excel or SPSS, but rather...
    Downloads: 23 This Week
    Last Update:
    See Project
  • 12
    deepdoctection

    deepdoctection

    A Repo For Document AI

    DeepDoctection is a document AI framework that applies deep learning techniques to analyze and extract structured data from scanned documents, PDFs, and images. deepdoctection is a Python library that orchestrates document extraction and document layout analysis tasks using deep learning models. It does not implement models but enables you to build pipelines using highly acknowledged libraries for object detection, OCR and selected NLP tasks and provides an integrated frameworks for fine-tuning...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 13
    ripgrep

    ripgrep

    Regex pattern directory search tool that respects your .gitignore

    ripgrep is a line-oriented search tool that actively searches the directory you're currently in for a regex pattern. By default, ripgrep will ignore your .gitignore and skip hidden files or directories and binary files automatically. ripgrep has first class support on Windows, macOS and Linux, with binary downloads available for every release. ripgrep is similar to other popular search tools like The Silver Searcher, ack and grep. ripgrep supports arbitrary input preprocessing filters which...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 14
    MinerU

    MinerU

    A high-quality tool for convert PDF to Markdown and JSON

    MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    PDFMathTranslate

    PDFMathTranslate

    PDF scientific paper translation with preserved formats

    PDFMathTranslate is a Python-based tool that uses AI translation to convert academic PDFs into bilingual (e.g. Chinese-English) documents while preserving formatting, including math notation. It supports OCR-enhanced content and offers CLI, GUI, Docker, and Zotero integration under AGPL v3.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 16
    ContextGem

    ContextGem

    ContextGem: Effortless LLM extraction from documents

    ContextGem is an open-source framework designed to simplify the extraction of structured data and insights from documents using large language models (LLMs). It provides a flexible, intuitive API that minimizes boilerplate code, enabling developers to build complex extraction workflows efficiently. ContextGem supports various document formats and integrates with multiple LLM providers, making it a versatile tool for tasks like contract analysis, anomaly detection, and information retrieval.​
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Freyr.js

    Freyr.js

    A tool for downloading songs from music streaming services

    A Node.js tool for searching and downloading music from multiple online sources, including streaming platforms.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 18
    Docspell

    Docspell

    Assist in organizing your piles of documents

    Docspell is a personal document organizer. Or sometimes called a "Document Management System" (DMS). You'll need a scanner to convert your papers into files. Docspell can then assist in organizing the resulting mess. It can unify your files from scanners, emails, and other sources. It is targeted for home use, i.e. families, households, and also for smaller groups/companies. You can associate tags, set correspondent,s and lots of other predefined and custom metadata. If your documents...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 19
    Datacap

    Datacap

    DataCap is integrated software for data transformation

    Datacap is an open-source data catalog and governance tool that helps organizations manage and document their data assets. It provides metadata management, lineage tracking, and collaboration features to ensure data transparency and quality. Datacap is designed for teams that need a lightweight, self-hosted solution to organize and govern their data ecosystems.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 20
    S3cmd

    S3cmd

    Command line tool for managing Amazon S3 and CloudFront services

    S3cmd (s3cmd) is a free command line tool and client for uploading, retrieving and managing data in Amazon S3 and other cloud storage service providers that use the S3 protocol, such as Google Cloud Storage or DreamHost DreamObjects. It is best suited for power users who are familiar with command-line programs. It is also ideal for batch scripts and automated backup to S3, triggered from cron, etc. S3cmd is written in Python. It's an open-source project available under GNU Public License v2...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 21
    MyDumper

    MyDumper

    MyDumper project

    MyDumper is a MySQL Logical Backup Tool. It has 2 tools. mydumper which is responsible to export a consistent backup of MySQL databases. myloader reads the backup from mydumper, connects the to destination database and imports the backup. Both tools use multithreading capabilities. MyDumper is Open Source and maintained by the community, it is not a Percona, MariaDB or MySQL product. Parallelism (hence, speed) and performance (avoids expensive character set conversion routines, efficient code...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 22
    Metarank

    Metarank

    A low code Machine Learning service that personalizes articles

    Metarank is a service that can personalize any type of content: product listings, articles, recommendations and search results in 3 easy steps with a few lines of code. It’s often considered "too risky" to spend 6+ months on an in-house moonshot project to reinvent the wheel without an experienced team and no existing open-source tools. Metarank makes it easy not only for Amazon to do personalization but for everyone else. Ingest historical item listings, clicks and item metadata so Metarank...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 23
    SchemaCrawler

    SchemaCrawler

    Free database schema discovery and comprehension tool

    SchemaCrawler is a free database schema discovery and comprehension tool. SchemaCrawler has a good mix of useful features for data governance. You can search for database schema objects using regular expressions, and output the schema and data in a readable text format. The output serves for database documentation, and is designed to be diff-ed against other database schemas. SchemaCrawler also generates schema diagrams. You can execute scripts in any standard scripting language against your...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 24
    EasyTier

    EasyTier

    A simple, decentralized mesh VPN with WireGuard support

    EasyTier is a user-friendly file management tool for creating and managing tiered storage solutions, allowing users to offload rarely used files to alternative storage while keeping the system clean and efficient. Built for Windows, it helps users analyze disk usage, identify large or unused files, and move them to other volumes or cloud drives with minimal effort. Its intuitive interface and automation capabilities make it suitable for both personal and small business use, particularly when...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 25
    SVGO

    SVGO

    Node.js tool for optimizing SVG files

    SVG Optimizer is a Node.js-based tool for optimizing SVG vector graphics files. SVG files, in particular those exported from multiple editors, normally contain tons of redundant and useless information. This can include editor metadata, comments, hidden elements, default or non-optimal values and other stuff that can be safely removed or converted without affecting the SVG rendering result. Some options can be configured with CLI though it may be easier to have the configuration in a separate...
    Downloads: 7 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.