集成opanai、huggingface,直接使用pandas dataframe进行问答,无需额外的数据处理。

liuyuqi-dellpc 9012224e0d 源码分析 1 year ago
README.md 9012224e0d 源码分析 1 year ago

README.md

pandas-ai

集成opanai、huggingface的pandasai,可以直接使用pandas dataframe进行问答,无需额外的数据处理。

Usage

pip install pandasai


import pandas as pd
from pandasai import PandasAI

# Sample DataFrame
df = pd.DataFrame({
    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
    "gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
    "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]
})

# Instantiate a LLM
from pandasai.llm.openai import OpenAI
llm = OpenAI()
# OpenAI
#llm = OpenAI(api_token="YOUR_OPENAI_API_KEY")
#llm = Starcoder(api_token="YOUR_HF_API_KEY")

pandas_ai = PandasAI(llm)
pandas_ai.run(df, prompt='Which are the 5 happiest countries?')

源码分析

pandas-ai 可用 openai 和huggingface 的模型,这里使用的是openai的模型分析,需要设置openai的api_token。

1、生成提取词:

There is a dataframe in pandas (python).
The name of the dataframe is `df`.
This is the result of `print(df.head({rows_to_display}))`:
{df_head}.

Return the python code (do not import anything) and make sure to prefix the python code with {START_CODE_TAG} exactly and suffix the code with {END_CODE_TAG} exactly 
to get the answer to the following question :

上面加上用户的问题。

2、调用openai接口,上面提取词的任务是生成python代码,包含前后缀。

3、执行run_code()方法,执行 exec(code_to_run),返回结果。

注意:

为了可视化结果,需要在 notebook 中运行。

依赖:pandas,openai,requests,dotenv。其他包含 notebook 基础作图包。