Skip to the content

Database Tagging

Perform Free-Text Search on multiple columns and tables

Requires: SQL Server

A Document Management System (DMS) developed for a recent client included a requirement to enable searching for records across multiple fields in different tables stored within the database. As far as end users were concerned they should be able to use a single search field to search across many fields and for many objects simultaneously. Given that the database could potentially contain thousands of documents, and other objects, a complex query with lots of joins and where clauses on each searchable column would have been extremely slow, inefficient and possibly over time unusable. In this post I'm going to run through and demonstrate a basic version of the database tagging schema I implemented and how I ended up getting there. 

Database Setup

Firstly I will create a Document table to store document records. In a real world application there will probably be some more information stored against the document and versions would be stored in a separate table to allow for multiple versions per documents, however for simplicity I'm going to put everything in one table. 

Create Table Documents (
DOC_ID int identity(1,1)
-------------------------------------------
, Name varchar(50) not null
, Number varchar(50) not null
, Version varchar(50) not null
, Author varchar(50) not null
-------------------------------------------
, constraint PK_DOC_ID primary key (DOC_ID)
)

Add some documents to the table:

Insert Into Documents (Name, Number, Version, Author) Values ('Document 1', '001', '1', 'Paul Williams')
Insert Into Documents (Name, Number, Version, Author) Values ('Document 2', '002', '1', 'Paul Williams')

Now I'll create a table to store the tags. I've added type and description to store some further information on the tag, this may come in useful later when I may need to determine what sort of tag I'm working with. 

Create Table Tags (
TAG_ID int identity(1,1)
-------------------------------------------
, Name varchar(50) not null
, Type varchar(50) not null
, Description varchar(100) not null
-------------------------------------------
, constraint PK_TAG_ID primary key (TAG_ID)
)

Add some tag data to the tags table:

Insert Into Tags (Name, Type, Description) Values ('Document 1', 'Document', 'Document Name') 
Insert Into Tags (Name, Type, Description) Values ('001', 'Document', 'Document Number')
Insert Into Tags (Name, Type, Description) Values ('1', 'Document', 'Document Version')
Insert Into Tags (Name, Type, Description) Values ('Paul Williams', 'Document', 'Document Author')
Insert Into Tags (Name, Type, Description) Values ('Document 2', 'Document', 'Document Name')
Insert Into Tags (Name, Type, Description) Values ('002', 'Document', 'Document Number')
Insert Into Tags (Name, Type, Description) Values ('1', 'Document', 'Document Version')

Note that the author is the same for both documents so only one tag is needed.

Now to map the tags to an object I created a tag mapping table. Essentially you can map an unlimited number of objects but for purpose of this blog I'm just going to be using a document.

Create Table TagMapper (
TGM_ID int identity(1,1)
--------------------------------------------
, TAG_ID int not null
, DOC_ID int not null
--------------------------------------------
, constraint PK_TGM_ID primary key (TGM_ID)
, constraint FK_TGM_TAG_ID foreign key (TAG_ID) references Tags (TAG_ID)
, constraint FK_TGM_DOC_ID foreign key (DOC_ID) references Documents (DOC_ID)
)

Lets add some data to the mapping table:

Insert Into TagMapper (TAG_ID, DOC_ID) Values (1, 1) 
Insert Into TagMapper (TAG_ID, DOC_ID) Values (2, 1)
Insert Into TagMapper (TAG_ID, DOC_ID) Values (3, 1)
Insert Into TagMapper (TAG_ID, DOC_ID) Values (4, 1)
Insert Into TagMapper (TAG_ID, DOC_ID) Values (5, 2)
Insert Into TagMapper (TAG_ID, DOC_ID) Values (6, 2)
Insert Into TagMapper (TAG_ID, DOC_ID) Values (7, 2)
Insert Into TagMapper (TAG_ID, DOC_ID) Values (4, 2)

Database Queries

Now that I have some data I can write a query to combine tags, tag mapper and documents.  

Select doc.DOC_ID
, doc.Name
, doc.Number
, doc.Version
, doc.Author
From Tags tag
Inner Join TagMapper tgm
On tgm.TAG_ID = tag.TAG_ID
Inner Join Documents doc
On tgm.DOC_ID = doc.DOC_ID
Group By doc.DOC_ID, doc.Name, doc.Number, doc.Version, doc.Author

Group By is needed because we are joining all tags to documents and we have up to 4 tags per document. The reason for this approach is because we want to search all the tag names and then find documents with those tags, and some words or character strings may appear in multiple tags.

This is great but I need a way to search through documents based on the tag name. The first idea I had was to use a LIKE operator in the WHERE clause to search for a specific pattern within the tag name. This works well as I can just use the tag name and not query each individual column in the document table. However using the LIKE operator on a large number of records could have performance issues. 

declare @SearchQuery varchar(100) = ''
Select doc.DOC_ID
, doc.Name
, doc.Number
, doc.Version
, doc.Author
From Tags tag
Inner Join TagMapper tgm
On tgm.TAG_ID = tag.TAG_ID
Inner Join Documents doc
On tgm.DOC_ID = doc.DOC_ID
Where tag.Name LIKE '%' + @SearchQuery + '%'
Group By doc.DOC_ID, doc.Name, doc.Number, doc.Version, doc.Author

After some research a better option would be to use SQL Server Full-Text Search on the tag name. I won't go through on how to setup Full-Text Search in this article but here is a tutorial. A Full-Text Search will perform much better than the LIKE operator against a large number of records because the tag name column is indexed, and because of our design we are only searching a single column, the 'tag name'. 

declare @SearchQuery varchar(100) = ''
Select doc.DOC_ID
, doc.Name
, doc.Number
, doc.Version
, doc.Author
From Tags tag
Inner Join TagMapper tgm
On tgm.TAG_ID = tag.TAG_ID
Inner Join Documents doc
On tgm.DOC_ID = doc.DOC_ID
Where FREETEXT(tag.Name, @SearchQuery)
Group By doc.DOC_ID, doc.Name, doc.Number, doc.Version, doc.Author

Conclusion 

Hopefully I've demonstrated how implementing a database tagging schema can improve usability, performance and extensibility when searching database records. Full-Text Search is an extremely powerful feature of SQL Server and has many more options than I have shown in the article.

About the author

Paul Williams

Paul Williams

Senior Developer at Peninsula Data Solutions.

Peninsula Data Solutions

S10
Plymouth Science Park
Plymouth
PL6 8BX

info@peninsuladatasolutions.co.uk