Tarsier

At Reworkd, we iterated on all these problems across tens of thousands of real web tasks to build a powerful perception system for web agents... Tarsier! In the video below, we use Tarsier to provide webpage perception for a minimalistic GPT-4 LangChain web agent. Tarsier visually tags interactable elements on a page via brackets + an ID e.g. [23]. In doing this, we provide a mapping between elements and IDs for an LLM to take actions upon (e.g. CLICK [23]). We define interactable elements as buttons, links, or input fields that are visible on the page; Tarsier can also tag all textual elements if you pass tag_text_elements=True. Furthermore, we've developed an OCR algorithm to convert a page screenshot into a whitespace-structured string (almost like ASCII art) that an LLM even without vision can understand. Since current vision-language models still lack fine-grained representations needed for web interaction tasks, this is critical.

Features

Vision utilities for web interaction agents
Google Vision and Microsoft Azure
Documentation available
Effortlessly extract web data at scale
Reworkd automates your entire web data pipeline, end-to-end
It scans websites, generates code, runs extractors, validates results, and outputs data

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow Tarsier

Tarsier Web Site

Other Useful Business Software

MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free

Rate This Project

User Reviews

Be the first to post a review of Tarsier!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

Python

Related Categories

Python Web Services Software

Registered

2024-09-20

Report inappropriate content

Tarsier

Vision utilities for web interaction agents

Get an email when there's a new version of Tarsier

Features

Project Samples

Project Activity

Categories

License

Follow Tarsier

User Reviews

Additional Project Details

Operating Systems

Programming Language

Related Categories

Registered