A CONTEXT Index is one of a class of Domain Indexes. It is used to build text-retrieval application.
Instead of a regular B-Tree index that captures the entire text of a VARCHAR2 column as a key, you can build an index that consists of "Tokens", words extracted from a long-ish text in the database column. Searching the index is based on the CONTAINS operator.
Here is a quick demo in 11.2.0.4 :
First I setup a new user for this demo. I could have used an existing user but my "HEMANT" user has DBA privileges and I want to demonstrate CONTEXT without such privileges.
Here the key grant is "CTXAPP" that is granted to the user.
Next, this user creates the demonstration table and index :
Note the call to CTX_DDL.SYNC_INDEX. Unlike a regular B-Tree index, the CONTEXT (Domain) Index is, by default, *not* updated in real-time when DML occurs against the base table. (Hopefully, in a future post, I may demonstrate real-time updates to a CTXCAT index, different from a CONTEXT Index).
Only remember that this CONTEXT index is not automatically updated. If new rows are added to the table, they are not visible through the index until a SYNC_INDEX operation is performed. The SYNC_INDEX can be scheduled as a job with another call to DBMS_JOB or DBMS_SCHEDULER or itself as part of the CREATE INDEX statement.
Now, let me demonstrate usage of the Index with the CONTAINS operator.
Yes, the CONTAINS operator is a bit awkward. It will be some time before you (or your developers) get used to the syntax !
Besides CTX_DDL, there are a number of other packages in the prebuilt CTXSYS schema that are available :
CTX_CLS
CTX_DDL
CTX_DOC
CTX_OUTPUT
CTX_QUERY
CTX_REPORT
CTX_THES
CTX_ULEXER
Since I have created a separate database user, I can also demonstrate the additional objects that are created when the INDEXTYPE IS CTXSYS.CONTEXT.
Yes, that is a large number of database objects besides MY_TEXT_TABLE and MY_TEXT_INDEX. SYS_C0017472 is, of course, the Primary Key Index on MY_TEXT_TABLE (on the ID_COLUMN column). The others are interesting.
The "Tokens" I mentioned in the first paragraph are, for the purpose of this table MY_TEXT_TABLE, in the DR$MY_TEXT_INDEX$I.
Update : Here is a demonstration of a STOPLIST for a CONTEXT Index.
I have been busy in the last few months and have not published much this quarter. Hopefully, I will get more time in the coming weeks to explore CONTEXT, CTXCAT and Domain Indexes.
.
.
.
Instead of a regular B-Tree index that captures the entire text of a VARCHAR2 column as a key, you can build an index that consists of "Tokens", words extracted from a long-ish text in the database column. Searching the index is based on the CONTAINS operator.
Here is a quick demo in 11.2.0.4 :
First I setup a new user for this demo. I could have used an existing user but my "HEMANT" user has DBA privileges and I want to demonstrate CONTEXT without such privileges.
Here the key grant is "CTXAPP" that is granted to the user.
SQL> create user ctxuser identified by ctxuser default tablespace users; User created. SQL> grant create session, create table to ctxuser; Grant succeeded. SQL> grant ctxapp to ctxuser; Grant succeeded. SQL> alter user ctxuser quota unlimited on users; User altered. SQL>
Next, this user creates the demonstration table and index :
SQL> connect ctxuser/ctxuser Connected. SQL> create table my_text_table 2 (id_column number primary key, 3 my_text varchar2(2000)); Table created. SQL> create index my_text_index 2 on my_text_table(my_text) 3 indextype is ctxsys.context; Index created. SQL> SQL> insert into my_text_table 2 values (1,'This is a long piece of text written by Hemant'); 1 row created. SQL> insert into my_text_table 2 values (2,'Another long text to be captured by the index'); 1 row created. SQL> commit; Commit complete. SQL> SQL> -- this update to the index would be a regular background job SQL> exec ctx_ddl.sync_index('MY_TEXT_INDEX'); PL/SQL procedure successfully completed. SQL>
Note the call to CTX_DDL.SYNC_INDEX. Unlike a regular B-Tree index, the CONTEXT (Domain) Index is, by default, *not* updated in real-time when DML occurs against the base table. (Hopefully, in a future post, I may demonstrate real-time updates to a CTXCAT index, different from a CONTEXT Index).
Only remember that this CONTEXT index is not automatically updated. If new rows are added to the table, they are not visible through the index until a SYNC_INDEX operation is performed. The SYNC_INDEX can be scheduled as a job with another call to DBMS_JOB or DBMS_SCHEDULER or itself as part of the CREATE INDEX statement.
Now, let me demonstrate usage of the Index with the CONTAINS operator.
SQL> select id_column as id, 2 my_text 3 from my_text_table 4 where contains (my_text, 'written by Hemant') > 0 5 / ID ---------- MY_TEXT -------------------------------------------------------------------------------- 1 This is a long piece of text written by Hemant SQL> select my_text 2 from my_text_table 3 where contains (my_text, 'Another long') > 0 4 / MY_TEXT -------------------------------------------------------------------------------- Another long text to be captured by the index SQL>
Yes, the CONTAINS operator is a bit awkward. It will be some time before you (or your developers) get used to the syntax !
Besides CTX_DDL, there are a number of other packages in the prebuilt CTXSYS schema that are available :
CTX_CLS
CTX_DDL
CTX_DOC
CTX_OUTPUT
CTX_QUERY
CTX_REPORT
CTX_THES
CTX_ULEXER
Since I have created a separate database user, I can also demonstrate the additional objects that are created when the INDEXTYPE IS CTXSYS.CONTEXT.
SQL> select object_type, object_name, to_char(created,'DD-MON-RR HH24:MI') Crtd 2 from user_objects 3 order by object_type, object_name 4 / OBJECT_TYPE OBJECT_NAME CRTD ------------------- ------------------------------ ------------------------ INDEX DR$MY_TEXT_INDEX$X 10-DEC-17 16:48 INDEX DRC$MY_TEXT_INDEX$R 10-DEC-17 16:48 INDEX MY_TEXT_INDEX 10-DEC-17 16:48 INDEX SYS_C0017472 10-DEC-17 16:48 INDEX SYS_IL0000045133C00006$$ 10-DEC-17 16:48 INDEX SYS_IL0000045138C00002$$ 10-DEC-17 16:48 INDEX SYS_IOT_TOP_45136 10-DEC-17 16:48 INDEX SYS_IOT_TOP_45142 10-DEC-17 16:48 LOB SYS_LOB0000045133C00006$$ 10-DEC-17 16:48 LOB SYS_LOB0000045138C00002$$ 10-DEC-17 16:48 TABLE DR$MY_TEXT_INDEX$I 10-DEC-17 16:48 TABLE DR$MY_TEXT_INDEX$K 10-DEC-17 16:48 TABLE DR$MY_TEXT_INDEX$N 10-DEC-17 16:48 TABLE DR$MY_TEXT_INDEX$R 10-DEC-17 16:48 TABLE MY_TEXT_TABLE 10-DEC-17 16:48 15 rows selected. SQL> SQL> select table_name, constraint_name, index_name 2 from user_constraints 3 where constraint_type = 'P' 4 order by table_name, constraint_name 5 / TABLE_NAME CONSTRAINT_NAME ------------------------------ ------------------------------ INDEX_NAME ------------------------------ DR$MY_TEXT_INDEX$K SYS_IOT_TOP_45136 SYS_IOT_TOP_45136 DR$MY_TEXT_INDEX$N SYS_IOT_TOP_45142 SYS_IOT_TOP_45142 DR$MY_TEXT_INDEX$R DRC$MY_TEXT_INDEX$R DRC$MY_TEXT_INDEX$R MY_TEXT_TABLE SYS_C0017472 SYS_C0017472 SQL>
Yes, that is a large number of database objects besides MY_TEXT_TABLE and MY_TEXT_INDEX. SYS_C0017472 is, of course, the Primary Key Index on MY_TEXT_TABLE (on the ID_COLUMN column). The others are interesting.
The "Tokens" I mentioned in the first paragraph are, for the purpose of this table MY_TEXT_TABLE, in the DR$MY_TEXT_INDEX$I.
SQL> desc DR$MY_TEXT_INDEX$I Name Null? Type ----------------------------------------- -------- ---------------------------- TOKEN_TEXT NOT NULL VARCHAR2(64) TOKEN_TYPE NOT NULL NUMBER(10) TOKEN_FIRST NOT NULL NUMBER(10) TOKEN_LAST NOT NULL NUMBER(10) TOKEN_COUNT NOT NULL NUMBER(10) TOKEN_INFO BLOB SQL> select token_text, token_count 2 from dr$my_text_index$i 3 / TOKEN_TEXT TOKEN_COUNT ---------------------------------------------------------------- ----------- ANOTHER 1 CAPTURED 1 HEMANT 1 INDEX 1 LONG 2 PIECE 1 TEXT 2 WRITTEN 1 8 rows selected. SQL>
Update : Here is a demonstration of a STOPLIST for a CONTEXT Index.
I have been busy in the last few months and have not published much this quarter. Hopefully, I will get more time in the coming weeks to explore CONTEXT, CTXCAT and Domain Indexes.
.
.
.
No comments:
Post a Comment