HomeAbout

Eval

Creating Evals

First, upload a file to be used for evaluation.

from openai import OpenAI client = OpenAI() const file = client.files.create( file=open("tickets.jsonl", "rb"), purpose="evals" ) print(file)

Response after file upload:

{ "object": "file", "id": "file-CwHg45Fo7YXwkWRPUkLNHW", "purpose": "evals", "filename": "tickets.jsonl", "bytes": 208, "created_at": 1742834798, "expires_at": null, "status": "processed", "status_details": null }

id would be referenced later to run an eval:

curl https://api.openai.com/v1/evals/YOUR_EVAL_ID/runs \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Categorization text run", "data_source": { "type": "completions", "model": "gpt-4.1", "input": [ { "role": "developer", "content": "You are an expert in categorizing IT support tickets. Given the support ticket below, categorize the request into one of \"Hardware\", \"Software\", or \"Other\". Respond with only one of those words." }, { "role": "user", "content": "\{\{ item.ticket_text \}\}" } ], "source": { "type": "file_id", "id": "YOUR_FILE_ID" # file id goes here } } }'

Eval Run Response

{ "object": "eval.run", "id": "evalrun_67e44c73eb6481909f79a457749222c7", "eval_id": "eval_67e44c5becec81909704be0318146157", "report_url": "https://platform.openai.com/evaluations/abc123", "status": "queued", "model": "gpt-4.1", "name": "Categorization text run", "created_at": 1743015028, "result_counts": { ... }, "per_model_usage": null, "per_testing_criteria_results": null, "data_source": { "type": "completions", "source": { "type": "file_id", "id": "file-J7MoX9ToHXp2TutMEeYnwj" }, "input_messages": { "type": "template", "template": [ { "type": "message", "role": "developer", "content": { "type": "input_text", "text": "You are an expert in...." } }, { "type": "message", "role": "user", "content": { "type": "input_text", "text": "{{item.ticket_text}}" } } ] }, "model": "gpt-4.1", "sampling_params": null }, "error": null, "metadata": {} }

Some eval may take a long time to run. Check the progress using this endpoint:

curl https://api.openai.com/v1/evals/eval_abc123/runs/evalrun_abc123 \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -H "Content-Type: application/json"
AboutContact