Embedding

General Note

Rely on external embeddings and vector stores. OpenAI APIs do not scale for retrieval.

Generate embeddings using the create but store them externally with vector indices.

Embeddings API

Create Embedding

import openai client = OpenAI() file = client.files.create() # using file res = client.embeddings.create( model="text-embedding-ada-002", file_id="file-abc123" encoding_format="float" ) # using input res = client.embeddings.create( model="text-embedding-ada-002", input="my name is Sam", encoding_format="float" )

Response from calling the embedding create:

res = { "object": "list", "data": [ { "object": "embedding", "embedding": [ 0.0023064255, -0.009327292, .... (1536 floats total for ada-002) -0.0028842222, ], "index": 0 } ], "model": "text-embedding-ada-002", "usage": { "prompt_tokens": 8, "total_tokens": 8 } } data_vector = res['data'][0]['embedding']

Available Models (2025)

text-embedding-3-small text-embedding-3-large text-embedding-ada-002 # deprecated

Vector Stores

OpenAI also has its own vector stores.

  • No need to use this.
  • Mostly used for Assistants API.
# vector create vector_store = client.vector_stores.create( name="Support FAQ", ) client.vector_stores.files.upload_and_poll( vector_store_id=vector_store.id, file=open("customer_policies.txt", "rb") )