|
@@ -1,2 +1,240 @@
|
|
|
-# thinkgpt
|
|
|
+# ThinkGPT 🧠🤖
|
|
|
+ThinkGPT is a Python library aimed at implementing Chain of Thoughts for Large Language Models (LLMs), prompting the model to think, reason, and to create generative agents.
|
|
|
+The library aims to help with the following:
|
|
|
+* solve limited context with long memory and compressed knowledge
|
|
|
+* enhance LLMs' one-shot reasoning with higher order reasoning primitives
|
|
|
+* add intelligent decisions to your code base
|
|
|
+
|
|
|
+
|
|
|
+## Key Features ✨
|
|
|
+* Thinking building blocks 🧱:
|
|
|
+ * Memory 🧠: GPTs that can remember experience
|
|
|
+ * Self-refinement 🔧: Improve model-generated content by addressing critics
|
|
|
+ * Compress knowledge 🌐: Compress knowledge and fit it in LLM's context either by anstracring rules out of observations or summarize large content
|
|
|
+ * Inference 💡️: Make educated guesses based on available information
|
|
|
+ * Natural Language Conditions 📝: Easily express choices and conditions in natural language
|
|
|
+* Efficient and Measurable GPT context length 📐
|
|
|
+* Extremely easy setup and pythonic API 🎯 thanks to [DocArray](https://github.com/docarray/docarray)
|
|
|
+
|
|
|
+## Installation 💻
|
|
|
+You can install ThinkGPT using pip:
|
|
|
+
|
|
|
+```shell
|
|
|
+pip install git+https://github.com/alaeddine-13/thinkgpt.git
|
|
|
+```
|
|
|
+
|
|
|
+## API Documentation 📚
|
|
|
+### Basic usage:
|
|
|
+```python
|
|
|
+from thinkgpt.llm import ThinkGPT
|
|
|
+llm = ThinkGPT(model_name="gpt-3.5-turbo")
|
|
|
+# Make the llm object learn new concepts
|
|
|
+llm.memorize(['DocArray is a library for representing, sending and storing multi-modal data.'])
|
|
|
+llm.predict('what is DocArray ?', remember=llm.remember('DocArray definition'))
|
|
|
+```
|
|
|
+
|
|
|
+### Memorizing and Remembering information
|
|
|
+```python
|
|
|
+llm.memorize([
|
|
|
+ 'DocArray allows you to send your data, in an ML-native way.',
|
|
|
+ 'This means there is native support for Protobuf and gRPC, on top of HTTP and serialization to JSON, JSONSchema, Base64, and Bytes.',
|
|
|
+])
|
|
|
+
|
|
|
+print(llm.remember('Sending data with DocArray', limit=1))
|
|
|
+```
|
|
|
+```text
|
|
|
+['DocArray allows you to send your data, in an ML-native way.']
|
|
|
+```
|
|
|
+
|
|
|
+Use the `limit` parameter to specify the maximum number of documents to retrieve.
|
|
|
+In case you want to fit documents into a certain context size, you can also use the `max_tokens` parameter to specify the maximum number of tokens to retrieve.
|
|
|
+For instance:
|
|
|
+```python
|
|
|
+from examples.knowledge_base import knowledge
|
|
|
+from thinkgpt.helper import get_n_tokens
|
|
|
+
|
|
|
+llm.memorize(knowledge)
|
|
|
+results = llm.remember('hello', max_tokens=1000, limit=1000)
|
|
|
+print(get_n_tokens(''.join(results)))
|
|
|
+```
|
|
|
+```text
|
|
|
+1000
|
|
|
+```
|
|
|
+However, keep in mind that concatenating documents with a separator will add more tokens to the final result.
|
|
|
+The `remember` method does not account for those tokens.
|
|
|
+
|
|
|
+### Predicting with context from long memory
|
|
|
+```python
|
|
|
+from examples.knowledge_base import knowledge
|
|
|
+llm.memorize(knowledge)
|
|
|
+llm.predict('Implement a DocArray schema with 2 fields: image and TorchTensor', remember=llm.remember('DocArray schemas and types'))
|
|
|
+```
|
|
|
+
|
|
|
+### Self-refinement
|
|
|
+
|
|
|
+```python
|
|
|
+print(llm.refine(
|
|
|
+ content="""
|
|
|
+import re
|
|
|
+ print('hello world')
|
|
|
+ """,
|
|
|
+ critics=[
|
|
|
+ 'File "/Users/user/PyCharm2022.3/scratches/scratch_166.py", line 2',
|
|
|
+ " print('hello world')",
|
|
|
+ 'IndentationError: unexpected indent'
|
|
|
+ ],
|
|
|
+ instruction_hint="Fix the code snippet based on the error provided. Only provide the fixed code snippet between `` and nothing else."))
|
|
|
+
|
|
|
+```
|
|
|
+
|
|
|
+```text
|
|
|
+import re
|
|
|
+print('hello world')
|
|
|
+```
|
|
|
+
|
|
|
+One of the applications is self-healing code generation implemented by projects like [gptdeploy](https://github.com/jina-ai/gptdeploy) and [wolverine](https://github.com/biobootloader/wolverine)
|
|
|
+
|
|
|
+### Compressing knowledge
|
|
|
+In case you want your knowledge to fit into the LLM's context, you can use the following techniques to compress it:
|
|
|
+#### Summarize content
|
|
|
+Summarize content using the LLM itself.
|
|
|
+We offer 2 methods
|
|
|
+1. one-shot summarization using the LLM
|
|
|
+```python
|
|
|
+llm.summarize(
|
|
|
+ large_content,
|
|
|
+ max_tokens= 1000,
|
|
|
+ instruction_hint= 'Pay attention to code snippets, links and scientific terms.'
|
|
|
+)
|
|
|
+```
|
|
|
+Since this technique relies on summarizing using a single LLM call, you can only pass content that does not exceed the LLM's context length.
|
|
|
+
|
|
|
+2. Chunked summarization
|
|
|
+```python
|
|
|
+llm.chunked_summarize(
|
|
|
+ very_large_content,
|
|
|
+ max_tokens= 4096,
|
|
|
+ instruction_hint= 'Pay attention to code snippets, links and scientific terms.'
|
|
|
+)
|
|
|
+```
|
|
|
+This technique relies on splitting the content into different chunks, summarizing each of those chunks and then combining them all together using an LLM.
|
|
|
+
|
|
|
+#### Induce rules from observations
|
|
|
+Amount to higher level and more general observations from current observations:
|
|
|
+```python
|
|
|
+llm.abstract(observations=[
|
|
|
+ "in tunisian, I did not eat is \"ma khditech\"",
|
|
|
+ "I did not work is \"ma khdemtech\"",
|
|
|
+ "I did not go is \"ma mchitech\"",
|
|
|
+])
|
|
|
+```
|
|
|
+
|
|
|
+```text
|
|
|
+['Negation in Tunisian Arabic uses "ma" + verb + "tech" where "ma" means "not" and "tech" at the end indicates the negation in the past tense.']
|
|
|
+```
|
|
|
+
|
|
|
+This can help you end up with compressed knowledge that fits better the limited context length of LLMs.
|
|
|
+For instance, instead of trying to fit code examples in the LLM's context, use this to prompt it to understand high level rules and fit them in the context.
|
|
|
+
|
|
|
+### Natural language condition
|
|
|
+Introduce intelligent conditions to your code and let the LLM make decisions
|
|
|
+```python
|
|
|
+llm.condition(f'Does this represent an error message ? "IndentationError: unexpected indent"')
|
|
|
+```
|
|
|
+```text
|
|
|
+True
|
|
|
+```
|
|
|
+### Natural language select
|
|
|
+Alternatively, let the LLM choose among a list of options:
|
|
|
+```python
|
|
|
+llm.select(
|
|
|
+ question="Which animal is the king of the jungle?",
|
|
|
+ options=["Lion", "Elephant", "Tiger", "Giraffe"]
|
|
|
+)
|
|
|
+```
|
|
|
+```text
|
|
|
+['Lion']
|
|
|
+```
|
|
|
+
|
|
|
+You can also prompt the LLM to choose an exact number of answers using `num_choices`. By default, it's set to `None` which means the LLM will select any number he thinks it's correct.
|
|
|
+## Use Cases 🚀
|
|
|
+Find out below example demos you can do with `thinkgpt`
|
|
|
+### Teaching ThinkGPT a new language
|
|
|
+```python
|
|
|
+from thinkgpt.llm import ThinkGPT
|
|
|
+
|
|
|
+llm = ThinkGPT(model_name="gpt-3.5-turbo")
|
|
|
+
|
|
|
+rules = llm.abstract(observations=[
|
|
|
+ "in tunisian, I did not eat is \"ma khditech\"",
|
|
|
+ "I did not work is \"ma khdemtech\"",
|
|
|
+ "I did not go is \"ma mchitech\"",
|
|
|
+], instruction_hint="output the rule in french")
|
|
|
+llm.memorize(rules)
|
|
|
+
|
|
|
+llm.memorize("in tunisian, I studied is \"9rit\"")
|
|
|
+
|
|
|
+task = "translate to Tunisian: I didn't study"
|
|
|
+llm.predict(task, remember=llm.remember(task))
|
|
|
+```
|
|
|
+```text
|
|
|
+The translation of "I didn't study" to Tunisian language would be "ma 9ritech".
|
|
|
+```
|
|
|
+
|
|
|
+### Teaching ThinkGPT how to code with `thinkgpt` library
|
|
|
+```python
|
|
|
+from thinkgpt.llm import ThinkGPT
|
|
|
+from examples.knowledge_base import knowledge
|
|
|
+
|
|
|
+llm = ThinkGPT(model_name="gpt-3.5-turbo")
|
|
|
+
|
|
|
+llm.memorize(knowledge)
|
|
|
+
|
|
|
+task = 'Implement python code that uses thinkgpt to learn about docarray v2 code and then predict with remembered information about docarray v2. Only give the code between `` and nothing else'
|
|
|
+print(llm.predict(task, remember=llm.remember(task, limit=10, sort_by_order=True)))
|
|
|
+```
|
|
|
+
|
|
|
+Code generated by the LLM:
|
|
|
+```text
|
|
|
+from thinkgpt.llm import ThinkGPT
|
|
|
+from docarray import BaseDoc
|
|
|
+from docarray.typing import TorchTensor, ImageUrl
|
|
|
+
|
|
|
+llm = ThinkGPT(model_name="gpt-3.5-turbo")
|
|
|
+
|
|
|
+# Memorize information
|
|
|
+llm.memorize('DocArray V2 allows you to represent your data, in an ML-native way')
|
|
|
+
|
|
|
+
|
|
|
+# Predict with the memory
|
|
|
+memory = llm.remember('DocArray V2')
|
|
|
+llm.predict('write python code about DocArray v2', remember=memory)
|
|
|
+```
|
|
|
+### Replay Agent memory and infer new observations
|
|
|
+Refer to the following script for an example of an Agent that replays its memory and induces new observations.
|
|
|
+This concept was introduced in [the Generative Agents: Interactive Simulacra of Human Behavior paper](https://arxiv.org/abs/2304.03442).
|
|
|
+
|
|
|
+```shell
|
|
|
+python -m examples.replay_expand_memory
|
|
|
+```
|
|
|
+```text
|
|
|
+new thoughts:
|
|
|
+Klaus Mueller is interested in multiple topics
|
|
|
+Klaus Mueller may have a diverse range of interests and hobbies
|
|
|
+```
|
|
|
+
|
|
|
+### Replay Agent memory, criticize and refine the knowledge in memory
|
|
|
+Refer to the following script for an example of an Agent that replays its memory, performs self-criticism and adjusts its memory knowledge based on the criticism.
|
|
|
+```shell
|
|
|
+python -m examples.replay_criticize_refine
|
|
|
+```
|
|
|
+```text
|
|
|
+refined "the second number in Fibonacci sequence is 2" into "Observation: The second number in the Fibonacci sequence is actually 1, not 2, and the sequence starts with 0, 1."
|
|
|
+...
|
|
|
+```
|
|
|
+This technique was mainly implemented in the [the Self-Refine: Iterative Refinement with Self-Feedback paper](https://arxiv.org/abs/2303.17651)
|
|
|
+
|
|
|
+
|
|
|
+For more detailed usage and code examples check `./examples`.
|
|
|
+
|
|
|
|