Search Indexes

<< ---------------------------------------------------------------- >>

Search Indexes(text searching)

We can index the DB fields but the search term might not be the starting substring of the indexed field so not that valuable.

We use search indexes: it takes a document and tokenizes it → creates an inverted index → basically makes a k,v pairings of word tokens and the IDs of documents they can be found in. Prefix searching: The inverted index is sorted so you can binary search it for specific indexes.

Suffix Searching: you can keep a second copy of the inverted index for searching words by their suffix(reverse the key of the indexes) apple:10 → elppa:10 Still ordered by suffix. → you can reverse the search string and then binary search the suffix search index.

Apache Lucene

Pretty popular open source search index Many types of indexes supported for complicated variants of search! texts, numbers, coordinates, etc…

Uses an LSM tree variant to support fast document ingestion … writes first go to memory

it can also store the entire document for faster access and fewer network trips instead of just the document ID.

ElasticSearch

Convenience wrapper around Apache Lucene to allow for fast searching in a distributed system.

Supports REST API Its own query language managed replication and partitioning visualization

It maintains a local index(for a given key on a partition we have all the possible values for it) per node.

If were doing local indexing then we have to query multiple nodes and then aggregate the results in an aggregator node which is more latency.

so Try to keep all searches limited to one partition.

Caching

Normally: cache piece of index or full query result

ElasticSearch: cache part of a query

for example cache all products that are on sale since sale is going to be a popular query.

Nikan's Notebook

Explorer

09-Search Indexes

Search Indexes

Search Indexes(text searching)

Apache Lucene

ElasticSearch

Caching

Graph View

Table of Contents

Backlinks

Explorer