20 October, 2025

The "Retrieval" for RAG using Semantic Search

 Reusing the sample code from my previous demo, I build a table with 130 sentences describing AI Tools and then use SQL to run the sort of queries that you'd expect RAG to use against an "internal knowledge source".

Here's a preview of the sentences loaded :


SQL> select count(*) from my_data_source;

  COUNT(*)
----------
       130

SQL> select my_sentence from my_data_source fetch first 5 rows only;

MY_SENTENCE
----------------------------------------------------------------------------------------------------------------------------------------------------
        Adobe Firefly: A family of generative AI models integrated into Adobe's Creative Cloud suite. It is used for professional image editing and graphic
 design

        AIVA (Artificial Intelligence Virtual Artist): An AI composer that generates music in various styles for content creators and brands
        Amazon CodeWhisperer: An AI coding assistant from Amazon Web Services that provides contextual code recommendations
        Canva Magic Media: An AI image generator integrated into the user friendly design platform. It is used for creating visuals for social media and pr
esentations

        ChatGPT: An AI chatbot from OpenAI. It can engage in human like conversations, write code, summarize text, and create content such as stories and e
ssays


SQL> select my_sentence from my_data_source where id > 126 ;

MY_SENTENCE
----------------------------------------------------------------------------------------------------------------------------------------------------
        Zubtitle.com   captions for video
        Cohere.ai   large language models
        Grok.com   personal AI assistant
        Claude.ai   advanced AI chatbot

SQL>


The embeddings are then computed and loaded into the target table using the same query (the HuggingFace model ALL_MINILM_L12_V2_AUGMENTED has already been loaded  with the ONNX RunTime engine with the same code as in the previous blog post) :


insert into my_data_vectors
select ds.id as id, my_sentence, to_vector(et.embed_vector) sentence_vector
from
    my_data_source ds,
    dbms_vector_chain.utl_to_embeddings(
       dbms_vector_chain.utl_to_chunks(dbms_vector_chain.utl_to_text(ds.my_sentence), json('{"normalize":"all"}')),
       json('{"provider":"database", "model":"ALL_MINILM_L12_V2_AUGMENTED"}')) t,
    JSON_TABLE(t.column_value, '$[*]' COLUMNS (embed_id NUMBER PATH '$.embed_id', embed_data VARCHAR2(4000) PATH '$.embed_data', embed_vector CLOB PATH '$.embed_vector')) et
/


I now test a few "Retrieval" queries thus :  (Note how the queries return different results with smaller VECTORE_DISTANCE as I refine them) 

SQL> ACCEPT text_input CHAR PROMPT 'Enter your query : '
Enter your query : Video
SQL> VARIABLE text_variable VARCHAR2(1000)
SQL> VARIABLE query_vector VECTOR
SQL> BEGIN
  2    :text_variable := '&text_input';
  3    SELECT vector_embedding(ALL_MINILM_L12_V2_AUGMENTED USING  :text_variable as data) into :query_vector;
  4  END;
  5  /
old   2:   :text_variable := '&text_input';
new   2:   :text_variable := 'Video';

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> SELECT my_sentence, vector_distance(sentence_vector , :query_vector, COSINE) as Calc_Vector_Distance
  2  FROM my_data_vectors
  3  ORDER BY 2
  4  FETCH FIRST 3 ROWS ONLY;

MY_SENTENCE
----------------------------------------------------------------------------------------------------------------------------------------------------
CALC_VECTOR_DISTANCE
--------------------
        Clipchamp.com   quick video creation
          4.635E-001

        Lumen5.com   AI powered social video
          5.277E-001

        Synths.video   convert blogs into videos
           5.39E-001


SQL>
----------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------



SQL> ACCEPT text_input CHAR PROMPT 'Enter your query : '
Enter your query : What AI Tool can I use to edit Videos ?
SQL> VARIABLE text_variable VARCHAR2(1000)
SQL> VARIABLE query_vector VECTOR
SQL> BEGIN
  2    :text_variable := '&text_input';
  3    SELECT vector_embedding(ALL_MINILM_L12_V2_AUGMENTED USING  :text_variable as data) into :query_vector;
  4  END;
  5  /
old   2:   :text_variable := '&text_input';
new   2:   :text_variable := 'What AI Tool can I use to edit Videos ?';

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> SELECT my_sentence, vector_distance(sentence_vector , :query_vector, COSINE) as Calc_Vector_Distance
  2  FROM my_data_vectors
  3  ORDER BY 2
  4  FETCH FIRST 3 ROWS ONLY;

MY_SENTENCE
----------------------------------------------------------------------------------------------------------------------------------------------------
CALC_VECTOR_DISTANCE
--------------------
        Wisecut.video   auto edit with AI
           2.62E-001

        Runwayml.com   AI video editing
          2.777E-001

        Gling.ai   YouTube video editor
          3.413E-001


SQL>
SQL>
----------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------



SQL> ACCEPT text_input CHAR PROMPT 'Enter your query : '
Enter your query : image
SQL> VARIABLE text_variable VARCHAR2(1000)
SQL> VARIABLE query_vector VECTOR
SQL> BEGIN
  2    :text_variable := '&text_input';
  3    SELECT vector_embedding(ALL_MINILM_L12_V2_AUGMENTED USING  :text_variable as data) into :query_vector;
  4  END;
  5  /
old   2:   :text_variable := '&text_input';
new   2:   :text_variable := 'image';

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> SELECT my_sentence, vector_distance(sentence_vector , :query_vector, COSINE) as Calc_Vector_Distance
  2  FROM my_data_vectors
  3  ORDER BY 2
  4  FETCH FIRST 3 ROWS ONLY;

MY_SENTENCE
----------------------------------------------------------------------------------------------------------------------------------------------------
CALC_VECTOR_DISTANCE
--------------------
        VanceAI.com   image enhancement
          5.268E-001

        Canva Magic Media: An AI image generator integrated into the user friendly design platform. It is used for creating visuals for social media and pr
esentations
          5.459E-001

        PicWish.com   photo editing
          5.696E-001


SQL>
SQL>
----------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------



SQL> ACCEPT text_input CHAR PROMPT 'Enter your query : '
Enter your query : How can I use AI to edit Images ?
SQL> VARIABLE text_variable VARCHAR2(1000)
SQL> VARIABLE query_vector VECTOR
SQL> BEGIN
  2    :text_variable := '&text_input';
  3    SELECT vector_embedding(ALL_MINILM_L12_V2_AUGMENTED USING  :text_variable as data) into :query_vector;
  4  END;
  5  /
old   2:   :text_variable := '&text_input';
new   2:   :text_variable := 'How can I use AI to edit Images ?';

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> SELECT my_sentence, vector_distance(sentence_vector , :query_vector, COSINE) as Calc_Vector_Distance
  2  FROM my_data_vectors
  3  ORDER BY 2
  4  FETCH FIRST 3 ROWS ONLY;

MY_SENTENCE
----------------------------------------------------------------------------------------------------------------------------------------------------
CALC_VECTOR_DISTANCE
--------------------
        Hotpot.ai   AI image editing
          2.805E-001

        Runwayml.com   AI video editing
          3.711E-001

        Wisecut.video   auto edit with AI
          3.921E-001


SQL>
SQL>
----------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------



SQL> ACCEPT text_input CHAR PROMPT 'Enter your query : '
Enter your query : Can I chat with an AI ?
SQL> VARIABLE text_variable VARCHAR2(1000)
SQL> VARIABLE query_vector VECTOR
SQL> BEGIN
  2    :text_variable := '&text_input';
  3    SELECT vector_embedding(ALL_MINILM_L12_V2_AUGMENTED USING  :text_variable as data) into :query_vector;
  4  END;
  5  /
old   2:   :text_variable := '&text_input';
new   2:   :text_variable := 'Can I chat with an AI ?';

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> SELECT my_sentence, vector_distance(sentence_vector , :query_vector, COSINE) as Calc_Vector_Distance
  2  FROM my_data_vectors
  3  ORDER BY 2
  4  FETCH FIRST 3 ROWS ONLY;

MY_SENTENCE
----------------------------------------------------------------------------------------------------------------------------------------------------
CALC_VECTOR_DISTANCE
--------------------
        Claude.ai   advanced AI chatbot
          2.989E-001

        Claude.ai   conversational assistant
          3.828E-001

        Grok.com   personal AI assistant
          4.025E-001


SQL>
SQL>
----------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------



SQL> ACCEPT text_input CHAR PROMPT 'Enter your query : '
Enter your query : coding
SQL> VARIABLE text_variable VARCHAR2(1000)
SQL> VARIABLE query_vector VECTOR
SQL> BEGIN
  2    :text_variable := '&text_input';
  3    SELECT vector_embedding(ALL_MINILM_L12_V2_AUGMENTED USING  :text_variable as data) into :query_vector;
  4  END;
  5  /
old   2:   :text_variable := '&text_input';
new   2:   :text_variable := 'coding';

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> SELECT my_sentence, vector_distance(sentence_vector , :query_vector, COSINE) as Calc_Vector_Distance
  2  FROM my_data_vectors
  3  ORDER BY 2
  4  FETCH FIRST 3 ROWS ONLY;

MY_SENTENCE
----------------------------------------------------------------------------------------------------------------------------------------------------
CALC_VECTOR_DISTANCE
--------------------
        Amazon CodeWhisperer: An AI coding assistant from Amazon Web Services that provides contextual code recommendations
          5.543E-001

        Replit.com   write and run code
          5.548E-001

        GitHub Copilot: An AI coding assistant that provides code suggestions and autocompletion within an IDE
          5.677E-001


SQL>
SQL>
----------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------


SQL> ACCEPT text_input CHAR PROMPT 'Enter your query : '
Enter your query : Tell me of a Coding Assistant
SQL> VARIABLE text_variable VARCHAR2(1000)
SQL> VARIABLE query_vector VECTOR
SQL> BEGIN
  2    :text_variable := '&text_input';
  3    SELECT vector_embedding(ALL_MINILM_L12_V2_AUGMENTED USING  :text_variable as data) into :query_vector;
  4  END;
  5  /
old   2:   :text_variable := '&text_input';
new   2:   :text_variable := 'Tell me of a Coding Assistant';

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> SELECT my_sentence, vector_distance(sentence_vector , :query_vector, COSINE) as Calc_Vector_Distance
  2  FROM my_data_vectors
  3  ORDER BY 2
  4  FETCH FIRST 3 ROWS ONLY;

MY_SENTENCE
----------------------------------------------------------------------------------------------------------------------------------------------------
CALC_VECTOR_DISTANCE
--------------------
        GitHub Copilot: An AI coding assistant that provides code suggestions and autocompletion within an IDE
          4.306E-001

        Amazon CodeWhisperer: An AI coding assistant from Amazon Web Services that provides contextual code recommendations
          4.702E-001

        Replit: An online IDE with an integrated AI assistant that helps with code generation, debugging, and project creation
          4.734E-001


SQL>
SQL>

Thus, in the first pair of examples, if I refine (aka "prompt engineering") my query from "Video" to "What AI Tool can I use to edit Videos ?", I get a better set of responses with  a smaller VECTOR_DISTANCE from my query.

In the second pair, I change my query from "image" to "How can I use AI to edit Images ?"  I get an improved set of responses (again with a smaller VECTOR_DISTANCE).

In the last pair, when I change  my query from "coding" to "Tell me of a Coding Assistant", I get responses with a better match to my query.

As noted in my previous blog post, the "LLM" is actually a "MinLM" called "all-MiniLM-L12-v2" from HuggingFace.

21 September, 2025

AI / LLM : Semantic Similarity Search using Hugging Face with Vectors in an Oracle 23ai Database

 As I have recently gone through a course on LLMs for Chatbots and RAG on OCI (Oracle Cloud Infrastructure), I thought about putting together a quick demo that uses 

A The Hugging Face all-MiniLM-L12-v2 model (with some augmentations done by Oracle for download)

B  The ONNX (Open Neural Network Exchange) Runtime Engine that runs in Oracle 23ai (and, yes, in the 23ai Free downloadable edition, not requiring the Oracle Cloud)

C  Vectors stored in an Oracle Database using the Vector Datatype

D  Use the Hugging Face model to generate Vectors for Similarity / Semantic Searches



but DOES NOT USE Oracle Cloud Infrastructure.  (To use OCI for Embedding -- generating "Vectors" from "Tokens", I have to book a  Dedicated AI Cluster to "Fine Tune" an existing model, so I am currently staying away from this expensive method)

[Similarly, I cannot currently build a Generative AI as I have to setup a model and server resources]


The sequence is :

1. Setup an Oracle 23ai database (either in the Oracle Cloud OR on an Exadata machine  OR the Free edition for On-Premises testing)

2. Create a User to hold the source data and vectors

3. Import the augmented Hugging Face model

4. Load the Source Data as words or sentences or chunks -- here I use a few sentences

5. Compute Vectors for the target column

6.  Optional but recommended : Build Vector Index -- for this demo with very few rows I haven't built a Vector Index but if and/or when I do build a demo with a few hundred or thousand sentences or chunks of data, I will create an Index on the Vectors

7. Use Oracle's VECTOR_DISTANCE  Function for a Semantic Similarity Search


So here goes :


User and Grants :
Using the 23ai FREEPDB  downloadable VM, I created the user with these grants :

grant connect to vector_demo identified by password ;

alter user vector_demo default tablespace users ;

alter user vector_demo quota unlimited on users ;

grant DB_DEVELOPER_ROLE to vector_demo ;

grant CREATE MINING MODEL to vector_demo;

CREATE OR REPLACE DIRECTORY DM_DUMP as '/tmp/Load_Model';  -- on the Linux VM

GRANT READ ON DIRECTORY dm_dump TO vector_demo;

GRANT WRITE ON DIRECTORY dm_dump TO vector_demo;

Download and Import the Model :

Download the ZIP file from https://adwc4pm.objectstorage.us-ashburn-1.oci.customer-oci.com/p/VBRD9P8ZFWkKvnfhrWxkpPe8K03-JIoM5h_8EJyJcpE80c108fuUjg7R5L5O7mMZ/n/adwc4pm/b/OML-Resources/o/all_MiniLM_L12_v2_augmented.zip It also contains a README Place the extracted ONNX file on the Linux Server at /tmp/Load_Model SQL> !ls -l /tmp/Load_Model total 249932 -rw-r--r--. 1 oracle oinstall 122597346 Jul 15 2024 all_MiniLM_L12_v2_augmented.zip -rw-r--r--. 1 oracle oinstall 133322334 Jul 15 2024 all_MiniLM_L12_v2.onnx -rw-r--r--. 1 oracle oinstall 4232 Jul 15 2024 README-ALL_MINILM_L12_V2-augmented.txt SQL> If necessary, delete the previously imported model from the database (logged in to the SQL command line as vector_demo) : exec DBMS_VECTOR.DROP_ONNX_MODEL(model_name => 'ALL_MINILM_L12_V2_AUGMENTED', force => true); Import the model file : SQL> EXECUTE DBMS_VECTOR.LOAD_ONNX_MODEL(- directory=>'DM_DUMP',- file_name=>'all_MiniLM_L12_v2.onnx',- model_name=>'ALL_MINILM_L12_V2_AUGMENTED'- metadata => JSON('{"function" : "embedding", "embeddingOutput" : "embedding", "input": {"input": ["DATA"]}}')- ); The "model_name" is the User Defined Name (i.e. defined by the DBA). In many online demos and the online demo, I see the model name specified as "doc_model". I preferred to use the name "ALL_MINILM_L12_V2_AUGMENTED". You could use any meaningful name Verify the imported model with these queries : SQL> l 1 SELECT model_name, attribute_name, attribute_type, data_type, vector_info 2 FROM user_mining_model_attributes 3 WHERE model_name = 'ALL_MINILM_L12_V2_AUGMENTED' 4* ORDER BY ATTRIBUTE_NAME SQL> / MODEL_NAME ATTRIBUTE_NAME ATTRIBUTE_TY DATA_TYPE -------------------------------- ---------------- ------------ ---------------- VECTOR_INFO -------------------------------------------------------- ALL_MINILM_L12_V2_AUGMENTED DATA TEXT VARCHAR2 ALL_MINILM_L12_V2_AUGMENTED ORA$ONNXTARGET VECTOR VECTOR VECTOR(384,FLOAT32) SQL> SQL> l 1 SELECT MODEL_NAME, MINING_FUNCTION, ALGORITHM, 2 ALGORITHM_TYPE, MODEL_SIZE 3 FROM user_mining_models 4 WHERE model_name = 'ALL_MINILM_L12_V2_AUGMENTED' 5* ORDER BY MODEL_NAME SQL> / MODEL_NAME MINING_FUNCTION ALGORITHM ALGORITHM_ MODEL_SIZE -------------------------------- ------------------------------ ------------------------------ ---------- ---------- ALL_MINILM_L12_V2_AUGMENTED EMBEDDING ONNX NATIVE 133322334 SQL> SQL>SELECT * FROM DM$VMALL_MINILM_L12_V2_AUGMENTED ORDER BY NAME ; NAME VALUE ------------------ ------------------------------------------------ Graph Description Graph combining tokenizer and main_graph tokenizer main_graph Graph Name tokenizer_main_graph Input[0] input:string[?] Output[0] embedding:float32[?,384] Producer Name onnx.compose.merge_models Version 1 6 rows selected. SQL> SQL> SELECT * FROM DM$VPALL_MINILM_L12_V2_AUGMENTED ORDER BY NAME; NAME VALUE ------------------ ------------------------------------------------ embeddingOutput embedding function embedding SQL> SQL> SELECT * FROM DM$VJALL_MINILM_L12_V2_AUGMENTED; METADATA -------------------------------------------------------------------------------- {"function":"embedding","embeddingOutput":"embedding","input":{"input":["DATA"]} SQL>

Test generation of Embeddings with the Model :


SQL> SELECT VECTOR_EMBEDDING(ALL_MINILM_L12_V2_AUGMENTED USING 'The quick brown fox jumped' as DATA) AS embedding;

EMBEDDING
------------------------------------------------------------------------------------------------------------------------------------
[1.65517051E-002,3.19098569E-002,-1.96293015E-002,-3.56926955E-002,

SQL>
SQL> SELECT VECTOR_DIMS(VECTOR_EMBEDDING(ALL_MINILM_L12_V2_AUGMENTED USING 'The quick brown fox jumped' as DATA)) as Dimensions_Count;

DIMENSIONS_COUNT
----------------
             384

SQL>

note : The output is actually a list of 384 dimensions, I copied the whole output from SQL Developer put it into a Python List and 
then did a LEN(list) to count the number of elements

Create the Table for the source data (sentences of text in my demo)  and load some data (sentences)


SQL> drop table my_data_source purge;

Table dropped.

SQL> create table my_data_source(id number primary key, my_sentence varchar2(4000));

Table created.

SQL> insert into my_data_source values (1,'The quick brown fox jumped over the lazy dog');

1 row created.

SQL> insert into my_data_source values (2,'she sells sea-shells at the seashore');

1 row created.

SQL> insert into my_data_source values (3,'the fox and dog are brown friends');

1 row created.

SQL> insert into my_data_source values (4,'the elephant knows the fox and dog');

1 row created.

SQL> insert into my_data_source values (5,'the fox, dog and elephant live together');

1 row created.

SQL> insert into my_data_source values (6,'aeroplanes fly in the sky');

1 row created.

SQL> commit;

Commit complete.

SQL>
SQL> select * from my_data_source;

        ID
----------
MY_SENTENCE
------------------------------------------------------------------------------------------------------------------------------------
         1
The quick brown fox jumped over the lazy dog

         2
she sells sea-shells at the seashore

         3
the fox and dog are brown friends

         4
the elephant knows the fox and dog

         5
the fox, dog and elephant live together

         6
aeroplanes fly in the sky


6 rows selected.

SQL>



Create the Table to hold the computed Vectors, using the model that has been loaded


SQL> drop table my_data_vectors;

Table dropped.

SQL> create table my_data_vectors (id number primary key, my_sentence varchar2(4000), sentence_vector vector);

Table created.

SQL> 
SQL> insert into my_data_vectors
select ds.id as id, my_sentence, to_vector(et.embed_vector) sentence_vector
from
    my_data_source ds,
    dbms_vector_chain.utl_to_embeddings(
       dbms_vector_chain.utl_to_chunks(dbms_vector_chain.utl_to_text(ds.my_sentence), json('{"normalize":"all"}')),
       json('{"provider":"database", "model":"ALL_MINILM_L12_V2_AUGMENTED"}')) t,
    JSON_TABLE(t.column_value, '$[*]' COLUMNS (embed_id NUMBER PATH '$.embed_id', embed_data VARCHAR2(4000) PATH '$.embed_data', embed_vector CLOB PATH '$.embed_vector')) et;
  2    3    4    5    6    7    8
6 rows created.

SQL>
SQL> commit
  2  /

Commit complete.

SQL> select * from my_data_vectors;

        ID
----------
MY_SENTENCE
------------------------------------------------------------------------------------------------------------------------------------
SENTENCE_VECTOR
------------------------------------------------------------------------------------------------------------------------------------
         1
The quick brown fox jumped over the lazy dog
[1.70537992E-003,5.95331714E-002,-2.32173726E-002,3.02353837E-002,

         2
she sells sea-shells at the seashore
[1.2790652E-002,5.97022101E-002,8.24511051E-002,3.93822305E-002,

         3
the fox and dog are brown friends
[7.67209902E-002,-7.82399923E-002,-2.40087509E-002,-2.37390138E-002,

         4
the elephant knows the fox and dog
[4.04452085E-002,-4.15055361E-003,-1.71641614E-002,2.63017584E-002,

         5
the fox, dog and elephant live together
[9.44276601E-002,-4.74944711E-002,2.31287945E-002,2.9055763E-002,

         6
aeroplanes fly in the sky
[1.1147093E-001,2.18752325E-002,-4.58196662E-002,-2.64751501E-002,


6 rows selected.

SQL>


Do a Semantic Similarity Search for certain phrases :

SQL> l
  1  SELECT VECTOR_DISTANCE(
  2  VECTOR_EMBEDDING(ALL_MINILM_L12_V2_AUGMENTED USING 'The quick brown fox jumped' as DATA),
  3  VECTOR_EMBEDDING(ALL_MINILM_L12_V2_AUGMENTED USING 'The quicker yellow fox jumped' as DATA),
  4* COSINE) as Vector_Distance
SQL> /

VECTOR_DISTANCE
---------------
   1.76127E-001

SQL>
SQL> l
  1  SELECT VECTOR_DISTANCE(
  2  VECTOR_EMBEDDING(ALL_MINILM_L12_V2_AUGMENTED USING 'The quick brown fox jumped' as DATA),
  3  VECTOR_EMBEDDING(ALL_MINILM_L12_V2_AUGMENTED USING 'seashells' as DATA),
  4* COSINE) as Vector_Distance
SQL> /

VECTOR_DISTANCE
---------------
   9.81778E-001

SQL>
SQL> l
  1  SELECT VECTOR_DISTANCE(
  2  VECTOR_EMBEDDING(ALL_MINILM_L12_V2_AUGMENTED USING 'The quick brown fox jumped' as DATA),
  3  VECTOR_EMBEDDING(ALL_MINILM_L12_V2_AUGMENTED USING 'moon elephants' as DATA),
  4* COSINE)as Vector_Distance
SQL> /

VECTOR_DISTANCE
---------------
   7.51297E-001

SQL>

--- "the quicker yellow fox jumped" is not very far away from "the quick brown fox jumped"
--- "moon elephants" is far away from the "the quick brown fox jumped".
--- "seashells" is much further away.
--- likely because the prebuilt model recognises tnat "fox" and "elephant" have some similarity being animals

SQL> l
  1  SELECT VECTOR_DISTANCE(
  2  VECTOR_EMBEDDING(ALL_MINILM_L12_V2_AUGMENTED USING 'The quick brown fox jumped' as DATA),
  3  VECTOR_EMBEDDING(ALL_MINILM_L12_V2_AUGMENTED USING 'houshold electricals' as DATA),
  4* COSINE)as Vector_Distance
SQL> /

VECTOR_DISTANCE
---------------
   9.97313E-001

SQL>
-- "household" electricals is further away


SQL> l
  1  SELECT id , my_sentence, vector_distance(sentence_vector , VECTOR_EMBEDDING(ALL_MINILM_L12_V2_AUGMENTED USING 'sea-shells' as DATA), COSINE) as Vector_Distance
  2  FROM my_data_vectors
  3  ORDER BY vector_distance(sentence_vector , VECTOR_EMBEDDING(ALL_MINILM_L12_V2_AUGMENTED USING 'sea-shells' as DATA), COSINE)
  4* FETCH FIRST 2 ROWS ONLY
SQL> /

          ID MY_SENTENCE                                      VECTOR_DISTANCE
------------ ------------------------------------------------ ---------------
           2 she sells sea-shells at the seashore                3.65937E-001
           6 aeroplanes fly in the sky                           8.43676E-001

SQL>
SQL> l
  1  SELECT id , my_sentence, vector_distance(sentence_vector , VECTOR_EMBEDDING(ALL_MINILM_L12_V2_AUGMENTED USING 'she sells seashells' as DATA), COSINE) as Vector_Distance
  2  FROM my_data_vectors
  3  ORDER BY vector_distance(sentence_vector , VECTOR_EMBEDDING(ALL_MINILM_L12_V2_AUGMENTED USING 'she sells seashells' as DATA), COSINE)
  4* FETCH FIRST 2 ROWS ONLY
SQL> /

          ID MY_SENTENCE                                      VECTOR_DISTANCE
------------ ------------------------------------------------ ---------------
           2 she sells sea-shells at the seashore                1.46406E-001
           6 aeroplanes fly in the sky                           9.26212E-001

SQL>
--- If I refine my query from "sea-shells" to "she sells seashells", I can get a better match, with the Vector Distance dropping from 3.6*10^-1  to 1.4*10^-1

SQL> l
  1  SELECT id , my_sentence, vector_distance(sentence_vector , VECTOR_EMBEDDING(ALL_MINILM_L12_V2_AUGMENTED USING 'quick fox' as DATA), COSINE) as Vector_Distance
  2  FROM my_data_vectors
  3  ORDER BY vector_distance(sentence_vector , VECTOR_EMBEDDING(ALL_MINILM_L12_V2_AUGMENTED USING 'quick fox' as DATA), COSINE)
  4* FETCH FIRST 2 ROWS ONLY
SQL> /

          ID MY_SENTENCE                                      VECTOR_DISTANCE
------------ ------------------------------------------------ ---------------
           1 The quick brown fox jumped over the lazy dog        3.22912E-001
           4 the elephant knows the fox and dog                  5.21092E-001

SQL>
--- "quick fox" has a better similarity with sentence 1 than sentence 4


What LLMs do when "generating" the next word (known as "token") is to find the "token" that has a higher probability -- a lower COSINE Distance

COSINE Distance measures the Angle between the Vectors
There are other Methods such as DOT PRODUCT, EUCLIDEAN (L1/L2) etc


Some Useful References :

Oracle Docs and Examples :




Oracle Blog :



External Blog :



Hugging Face :



ONNX :


Oracle SQL Reference :


21 August, 2025

Cross-Instance Deadlock in RAC

 I've created a video demonstration of a Cross-Instance Deadlock in a 2-node RAC environment and how to read the trace file.

Note : Only the SQL that has been returned the "deadlock" error is failed, the session and transaction are still valid.