Soft-Evals

ModelDevTest
TBD
TBDTBD

About CORGI SQL

Welcome to the CORGI SQL benchmark hosted by Cornell University and Gena. The CORGI benchmark was made to push the boundaries of txt2sql in the generative AI era. There are a few noticeable differences between CORGI and previous txt2sql benchmarks.

Soft Evaluation System

A significant portion of the questions are recommendation or prediction based natural language queries. These queries are "soft evaluated" with human input.

Business Domain Focus

CORGIv1.0 has business domain databases and queries, designed to test domain-specific lingo.

Complex Schema Design

There are many more tables and relations per database in CORGI than previous benchmarks. Many schemas are based on real industry schema designs.

Flexible Evaluation

There is no test split. Groups are free to experiment with zero-shot/template methods or generate train data themselves.