Introduction
For
decades, databases have excelled at storing and retrieving structured data, but
they struggled when it came to understanding similarity, context, or meaning.
Traditional SQL queries rely on exact matches and predefined relationships,
which makes searching unstructured or semantically rich data difficult. Oracle
23ai changes this model by introducing native AI Vector Search directly inside
the database. With vector embeddings stored and indexed alongside traditional
data, Oracle enables similarity-based searches that go beyond keywords and
exact values. This allows applications to find results based on meaning,
relevance, and proximity in vector space, making it possible to search text,
documents, images, and other complex data types in a far more intuitive and
powerful way, all using SQL.
AI Vector Search
Instead of searching for exact words, Vector Search allows you to
search for concepts. By using the new VECTOR data type, you store
"embeddings"—mathematical representations of data. This allows you to
perform a "similarity search." For example, if you search for
"staffing issues," the database can find documents about "low
headcount" or "recruitment delays" because it understands they
are conceptually related.
Example:
Assume you have a table called docs that stores documents along
with their vector embeddings.
To see the power of Vector Search, I need a larger dataset. With only one row, the "closest" result is always the same one! Here is a script to quickly populate my docs table with 100 rows of synthetic data. I’ve used a CONNECT BY loop and DBMS_RANDOM to generate unique titles and random vector coordinates so that the similarity search actually has something to compare.
INSERT INTO docs (id, title, embedding)
SELECT
level + 1,
'Tech Manual Part ' || (level + 1),
-- Generating a random 3-dimensional vector string like '[0.12, -0.45, 0.88]'
'[' ||
ROUND(DBMS_RANDOM.VALUE(-1, 1), 2) || ',' ||
ROUND(DBMS_RANDOM.VALUE(-1, 1), 2) || ',' ||
ROUND(DBMS_RANDOM.VALUE(-1, 1), 2) || ']'
FROM dual
CONNECT BY level <= 100;
COMMIT;
Let’s take a look at the table structure with this query:
set line 200;
col ID for 9999;
col TITLE for a20;
col EMBEDDING for a50;
set pagesize 5000;
select * from docs;
Instead of writing a complex text search, you can ask the database
to find documents that are conceptually similar to a given idea.
What is the following query doing
- The vector [0.1, 0.5, -0.2] represents the
concept you are searching for (for example, something related to staffing
or workforce challenges).
- VECTOR_DISTANCE calculates how close each
document’s embedding is to that concept using cosine similarity.
- Documents with the smallest distance are
the most closely related in meaning.
- The query returns the top 3 documents that best match the idea, not based on keywords, but on similarity.
-- Searching for documents similar to a specific concept
SELECT title
FROM docs
ORDER BY VECTOR_DISTANCE(embedding, '[0.1, 0.5, -0.2]', COSINE)
FETCH FIRST 3 ROWS ONLY;
In practice, this means a search for “staffing issues” could return
documents talking about low headcount, hiring delays, or resource shortages,
even if the exact phrase never appears. The database understands the
relationship between these ideas.
Conclusion
AI
Vector Search in Oracle 23ai represents a major shift in how databases handle
modern data workloads. By embedding vector storage, indexing, and similarity
search directly into the database engine, Oracle eliminates the need for
external vector stores or separate AI infrastructure. This keeps data secure,
reduces architectural complexity, and improves performance by allowing vector
queries to run where the data already lives. As a result, Oracle 23ai enables
developers and data teams to build smarter, AI-driven applications using
familiar SQL tools while unlocking semantic search capabilities that were
previously difficult or costly to implement. Vector search is no longer an
add-on; it is now a core database capability.
No comments:
Post a Comment