Cosine Similarity

intermediate
Core ConceptsLast updated: 2025-01-15

What is Cosine Similarity?


Cosine similarity is a metric that measures the similarity between two vectors by calculating the cosine of the angle between them in vector space. It ranges from -1 to 1, where 1 indicates identical direction, 0 indicates orthogonality (no similarity), and -1 indicates opposite directions. In the context of embeddings, cosine similarity is widely used to quantify semantic similarity between pieces of text, with higher values indicating more similar meanings.


The metric is particularly well-suited for comparing embeddings because it is independent of vector magnitude, focusing purely on the direction or orientation in vector space. This means that even if two embeddings have different magnitudes (lengths), if they point in the same direction, they are considered similar. This property is valuable when working with normalized embeddings, where the relative orientation captures semantic relationships better than absolute distances.


Cosine similarity has become the de facto standard for semantic search and retrieval in AI applications. When a user query is embedded and compared against a database of document embeddings, cosine similarity scores determine which documents are most semantically relevant. Most vector databases support cosine similarity as a primary distance metric alongside alternatives like Euclidean distance and dot product, with the choice of metric affecting both retrieval quality and computational efficiency.


Related Terms