OpenAI's text and code embeddings

DECEMBER 16, 2022

According to OpenAI, "embeddings are numerical representations of concepts converted to number sequences, which make it easy for computers to understand the relationships between those concepts."

They introduced a new text and code embeddings API endpoint in January 25, 20221 capable of measuring the relatedness of text strings.

Here's a list of common uses of text embeddings, as listed in OpenAI's documentation.

  • Search (where results are ranked by relevance to a query string)
  • Clustering (where text strings are grouped by similarity)
  • Recommendations (where items with related text strings are recommended)
  • Anomaly detection (where outliers with little relatedness are identified)
  • Diversity measurement (where similarity distributions are analyzed)
  • Classification (where text strings are classified by their most similar label)

I look forward to testing this API on my writing to see how well it recommends, classifies, and clusters my mini-essays.