# pandas-ai 集成opanai、huggingface的pandasai,可以直接使用pandas dataframe进行问答,无需额外的数据处理。 ## Usage ``` pip install pandasai import pandas as pd from pandasai import PandasAI # Sample DataFrame df = pd.DataFrame({ "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"], "gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064], "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12] }) # Instantiate a LLM from pandasai.llm.openai import OpenAI llm = OpenAI() # OpenAI #llm = OpenAI(api_token="YOUR_OPENAI_API_KEY") #llm = Starcoder(api_token="YOUR_HF_API_KEY") pandas_ai = PandasAI(llm) pandas_ai.run(df, prompt='Which are the 5 happiest countries?') ``` ## 源码分析 pandas-ai 可用 openai 和huggingface 的模型,这里使用的是openai的模型分析,需要设置openai的api_token。 1、生成提取词: ``` There is a dataframe in pandas (python). The name of the dataframe is `df`. This is the result of `print(df.head({rows_to_display}))`: {df_head}. Return the python code (do not import anything) and make sure to prefix the python code with {START_CODE_TAG} exactly and suffix the code with {END_CODE_TAG} exactly to get the answer to the following question : ``` 上面加上用户的问题。 2、调用openai接口,上面提取词的任务是生成python代码,包含前后缀。 3、执行run_code()方法,执行 exec(code_to_run),返回结果。 **注意:** 为了可视化结果,需要在 notebook 中运行。 依赖:pandas,openai,requests,dotenv。其他包含 notebook 基础作图包。