DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • How to Restore a Transaction Log Backup in SQL Server
  • How to Attach SQL Database Without a Transaction Log File
  • A Deep Dive into Apache Doris Indexes
  • Spring Boot Sample Application Part 1: Introduction and Configuration

Trending

  • How to Practice TDD With Kotlin
  • Why High-Performance AI/ML Is Essential in Modern Cybersecurity
  • How to Configure and Customize the Go SDK for Azure Cosmos DB
  • Transforming AI-Driven Data Analytics with DeepSeek: A New Era of Intelligent Insights
  1. DZone
  2. Data Engineering
  3. Databases
  4. Using SingleStore DB for Full-Text Index and Search

Using SingleStore DB for Full-Text Index and Search

This post continues our exploration of the multi-model capabilities of SingleStore DB. Here, discuss SingleStore DB's support for Full-Text Index and Search.

By 
Akmal Chaudhri user avatar
Akmal Chaudhri
DZone Core CORE ·
Updated Mar. 21, 22 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
5.9K Views

Join the DZone community and get the full member experience.

Join For Free

Abstract

Continuing our exploration of the multi-model capabilities of SingleStore DB, we'll discuss SingleStore DB's support for Full-Text Index and Search in this article.

Using the example of medical journal articles from the SingleStore self-paced training course on Full-Text Index and Search, we'll store the text from journal articles and then perform a variety of queries using the full-text capabilities of SingleStore DB.

The SQL scripts used in this article are available on GitHub.

Introduction

There is a wide range of use cases where we may want to perform keyword searches on text. Examples include newspaper articles, journal articles, restaurant reviews, lodging reviews, etc. The requirements for these use cases would consist of the ability to:

  • Store and search a large body of text
  • Return query results based upon relevancy. Relevancy could be determined by frequency, for example

SingleStore DB can support these requirements by:

  • CHAR, VARCHAR, TEXT, or LONGTEXT data types.
  • Matched documents return a relevancy score between 0 and 1. Results can be ordered by relevancy.

To begin with, we need to create a free Cloud account on the SingleStore website. At the time of writing, the Cloud account from SingleStore comes with $500 of Credits. This is more than adequate for the case study described in this article.

Create Database Table

In our SingleStore Cloud account, let's use the SQL Editor to create a new database. Call this fulltext_db, as follows:

SQL
 
CREATE DATABASE IF NOT EXISTS fulltext_db;


We'll also create the journals table, as follows:

SQL
 
USE fulltext_db;

CREATE TABLE journals (
     volume VARCHAR(1000),
     name VARCHAR(1000),
     journal VARCHAR(1000),
     body LONGTEXT,
     KEY(volume),
     FULLTEXT(body)
);


Each row consists of 4 columns. The journal article contents are stored in the body column using LONGTEXT. We also create an inverted index on the body column using FULLTEXT. Stopwords are ignored as they occur very frequently. SingleStore DB's default list of stopwords is as follows:

Plain Text
 
a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with


Populate Database Table

We'll create a Pipeline to load journal data into our SingleStore DB table. In a previous article, we've used a Pipeline with Kafka to load data into SingleStore DB, and now we'll use this same technique with Amazon S3.

SQL
 
CREATE PIPELINE IF NOT EXISTS journal_pipeline AS
LOAD DATA S3 'zhou-fts/*json'
CONFIG '{
   "region" : "us-west-1"}'
INTO TABLE journals
FORMAT JSON
( volume <- volume,
  name <- name,
  journal <- journal,
  body <- body
);


We'll now start the pipeline, as follows:

SQL
 
START PIPELINE journal_pipeline;


After a few minutes, we can check the table using the following query:

SQL
 
SELECT COUNT(*) FROM journals;


There should be 31000 rows, as shown in Figure 1.

COUNT

Figure 1: COUNT

Example Queries

Now that we have built our system, we can run some queries. SingleStore DB supports two main functions for use with full-text:

  1. MATCH: The result of using this function is a relevancy score between 0 and 1. A score closer to 0 indicates a lower-quality match, and a score closer to 1 indicates a higher-quality match. The documentation contains additional details and examples.
  2. HIGHLIGHT: This is similar to MATCH but returns a JSON document. The JSON document contains an offset, a unique term count, and a small quantity of text. The documentation contains additional details and examples.

Let's see some examples of these functions.

First, let's find all journal articles that contain the word optometry.

SQL
 
SELECT *
FROM journals
WHERE MATCH(body) AGAINST ('optometry');


This query should return 40 rows. The partial results are shown in Figure 2.

Partial query results - Optometry

Figure 2: Optometry

Next, let's find articles that contain the words dentistry and cavities anywhere in the text, as follows:

SQL
 
SELECT *
FROM journals
WHERE MATCH(body) AGAINST ('dentistry AND cavities');


Instead of using AND, we can also use &&. An alternative way to express this query is as follows:

SQL
 
-- Alternative query
SELECT *
FROM journals
WHERE MATCH(body) AGAINST ('+dentistry +cavities');


By using the + operator, we specify that both words must be present anywhere in the text.

This query should return three rows, as shown in Figure 3.

Query results - Dentistry and Cavities

Figure 3: Dentistry and Cavities

In the following query, we'll look for any articles that have the word optometry but not the word contacts, as follows:

SQL
 
SELECT *
FROM journals
WHERE MATCH(body) AGAINST ('optometry -contacts');


This query should return 28 rows, partially shown in Figure 4.

Query Results - Optometry: Contacts

Figure 4: Optometry - Contacts

The ~ character can also be used for fuzzy searches. The documentation contains a complete list of operators.

Wildcard support is also available for single and multiple characters using ? and *, respectively. In the following example, we'll use the * character to match dentists, dentistry, etc.

SQL
 
SELECT *
FROM journals
WHERE MATCH(body) AGAINST ('dentist*');


This query should return 91 rows, partially shown in Figure 5.

Query Results: Dentist

Figure 5: Dentist*

Let's write a slight variation of an earlier query. Let's search for the word pediatrician, as shown below:

SQL
 
SELECT *
FROM journals
WHERE MATCH(body) AGAINST ('pediatrician');


This query should return six rows, as shown in Figure 6.

Query Result: Pediatrician

Figure 6: Pediatrician

If we use the main query as part of the SELECT, as follows:

SQL
 
SELECT MATCH(body) AGAINST ('pediatrician') AS score, *
FROM journals
WHERE MATCH(body) AGAINST ('pediatrician')
ORDER BY score DESC;


We can see the relevancy score, as shown in Figure 7.

Relevancy Score

Figure 7: Score

Finally, we can also use HIGHLIGHT to return the offset, a unique term count, and a small quantity of text, as follows:

SQL
 
SELECT HIGHLIGHT(body) AGAINST ('pediatrician')
FROM journals
WHERE MATCH(body) AGAINST ('pediatrician');


We can see the output in Figure 8 and, if we examine the result closely, we can see the <B>pediatrician</B> pattern in the fragments of text.

Output HIGHLIGHT

Figure 8: HIGHLIGHT


Summary

In this article, we have seen how we can use the built-in full-text index and search capabilities of SingleStore DB. SingleStore DB supports two main functions, MATCH and HIGHLIGHT, a series of operators, and both single and multiple character wildcard searches.

Database sql

Published at DZone with permission of Akmal Chaudhri. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • How to Restore a Transaction Log Backup in SQL Server
  • How to Attach SQL Database Without a Transaction Log File
  • A Deep Dive into Apache Doris Indexes
  • Spring Boot Sample Application Part 1: Introduction and Configuration

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: