vn.train

vn.train 是一个包装函数，允许您训练系统（即位于 LLM 之上的检索增强层）。您可以通过以下方式调用它

DDL 语句

这些语句使系统了解有哪些表、列和数据类型可用。

vn.train(ddl="CREATE TABLE my_table (id INT, name TEXT)")

文档字符串

这些可以是关于您的数据库、业务或行业的任何任意文档，这些文档对于 LLM 理解用户问题的上下文可能很有必要。

vn.train(documentation="Our business defines XYZ as ABC")

SQL 语句

对于系统来说，最有帮助的一点是理解您组织中常用的 SQL 查询。这将有助于系统理解正在提出的问题的上下文。

vn.train(sql="SELECT col1, col2, col3 FROM my_table")

问答-SQL 对

您还可以使用问答-SQL 对来训练系统。这是训练系统最直接的方式，对于系统理解正在提出的问题的上下文也最有帮助。

vn.train(
    question="What is the average age of our customers?", 
    sql="SELECT AVG(age) FROM customers"
)

问答-SQL 对包含丰富的嵌入信息，系统可以使用这些信息来理解问题的上下文。当您的用户倾向于提出具有很多歧义的问题时，这一点尤其重要。

训练计划

# The information schema query may need some tweaking depending on your database. This is a good starting point.
df_information_schema = vn.run_sql("SELECT * FROM INFORMATION_SCHEMA.COLUMNS")

# This will break up the information schema into bite-sized chunks that can be referenced by the LLM
plan = vn.get_training_plan_generic(df_information_schema)
plan

# If you like the plan, then uncomment this and run it to train
vn.train(plan=plan)

训练计划基本上就是将您的数据库信息模式分解成 LLM 可以引用的易于理解的小块。这是快速用大量数据训练系统的好方法。