Browse Source

Update 'README.md'

天问 2 years ago
parent
commit
7f0d22c239
1 changed files with 239 additions and 1 deletions
  1. 239 1
      README.md

+ 239 - 1
README.md

@@ -1,2 +1,240 @@
-# thinkgpt
+# ThinkGPT 🧠🤖
+ThinkGPT is a Python library aimed at implementing Chain of Thoughts for Large Language Models (LLMs), prompting the model to think, reason, and to create generative agents. 
+The library aims to help with the following:
+* solve limited context with long memory and compressed knowledge
+* enhance LLMs' one-shot reasoning with higher order reasoning primitives
+* add intelligent decisions to your code base
+
+
+## Key Features ✨
+* Thinking building blocks 🧱:
+    * Memory 🧠: GPTs that can remember experience
+    * Self-refinement 🔧: Improve model-generated content by addressing critics
+    * Compress knowledge 🌐: Compress knowledge and fit it in LLM's context either by anstracring rules out of observations or summarize large content
+    * Inference 💡️: Make educated guesses based on available information
+    * Natural Language Conditions 📝: Easily express choices and conditions in natural language
+* Efficient and Measurable GPT context length 📐
+* Extremely easy setup and pythonic API 🎯 thanks to [DocArray](https://github.com/docarray/docarray)
+
+## Installation 💻
+You can install ThinkGPT using pip:
+
+```shell
+pip install git+https://github.com/alaeddine-13/thinkgpt.git
+```
+
+## API Documentation 📚
+### Basic usage:
+```python
+from thinkgpt.llm import ThinkGPT
+llm = ThinkGPT(model_name="gpt-3.5-turbo")
+# Make the llm object learn new concepts
+llm.memorize(['DocArray is a library for representing, sending and storing multi-modal data.'])
+llm.predict('what is DocArray ?', remember=llm.remember('DocArray definition'))
+```
+
+### Memorizing and Remembering information
+```python
+llm.memorize([
+    'DocArray allows you to send your data, in an ML-native way.',
+    'This means there is native support for Protobuf and gRPC, on top of HTTP and serialization to JSON, JSONSchema, Base64, and Bytes.',
+])
+
+print(llm.remember('Sending data with DocArray', limit=1))
+```
+```text
+['DocArray allows you to send your data, in an ML-native way.']
+```
+
+Use the `limit` parameter to specify the maximum number of documents to retrieve.
+In case you want to fit documents into a certain context size, you can also use the `max_tokens` parameter to specify the maximum number of tokens to retrieve.
+For instance:
+```python
+from examples.knowledge_base import knowledge
+from thinkgpt.helper import get_n_tokens
+
+llm.memorize(knowledge)
+results = llm.remember('hello', max_tokens=1000, limit=1000)
+print(get_n_tokens(''.join(results)))
+```
+```text
+1000
+```
+However, keep in mind that concatenating documents with a separator will add more tokens to the final result.
+The `remember` method does not account for those tokens.
+
+### Predicting with context from long memory
+```python
+from examples.knowledge_base import knowledge
+llm.memorize(knowledge)
+llm.predict('Implement a DocArray schema with 2 fields: image and TorchTensor', remember=llm.remember('DocArray schemas and types'))
+```
+
+### Self-refinement
+
+```python
+print(llm.refine(
+    content="""
+import re
+    print('hello world')
+        """,
+    critics=[
+        'File "/Users/user/PyCharm2022.3/scratches/scratch_166.py", line 2',
+        "  print('hello world')",
+        'IndentationError: unexpected indent'
+    ],
+    instruction_hint="Fix the code snippet based on the error provided. Only provide the fixed code snippet between `` and nothing else."))
+
+```
+
+```text
+import re
+print('hello world')
+```
+
+One of the applications is self-healing code generation implemented by projects like [gptdeploy](https://github.com/jina-ai/gptdeploy) and [wolverine](https://github.com/biobootloader/wolverine)
+
+### Compressing knowledge
+In case you want your knowledge to fit into the LLM's context, you can use the following techniques to compress it:
+#### Summarize content
+Summarize content using the LLM itself.
+We offer 2 methods
+1. one-shot summarization using the LLM
+```python
+llm.summarize(
+  large_content,
+  max_tokens= 1000,
+  instruction_hint= 'Pay attention to code snippets, links and scientific terms.'
+)
+```
+Since this technique relies on summarizing using a single LLM call, you can only pass content that does not exceed the LLM's context length.
+
+2. Chunked summarization
+```python
+llm.chunked_summarize(
+  very_large_content,
+  max_tokens= 4096,
+  instruction_hint= 'Pay attention to code snippets, links and scientific terms.'
+)
+```
+This technique relies on splitting the content into different chunks, summarizing each of those chunks and then combining them all together using an LLM.
+
+#### Induce rules from observations
+Amount to higher level and more general observations from current observations:
+```python
+llm.abstract(observations=[
+    "in tunisian, I did not eat is \"ma khditech\"",
+    "I did not work is \"ma khdemtech\"",
+    "I did not go is \"ma mchitech\"",
+])
+```
+
+```text
+['Negation in Tunisian Arabic uses "ma" + verb + "tech" where "ma" means "not" and "tech" at the end indicates the negation in the past tense.']
+```
+
+This can help you end up with compressed knowledge that fits better the limited context length of LLMs.
+For instance, instead of trying to fit code examples in the LLM's context, use this to prompt it to understand high level rules and fit them in the context.
+
+### Natural language condition
+Introduce intelligent conditions to your code and let the LLM make decisions
+```python
+llm.condition(f'Does this represent an error message ? "IndentationError: unexpected indent"')
+```
+```text
+True
+```
+### Natural language select
+Alternatively, let the LLM choose among a list of options:
+```python
+llm.select(
+    question="Which animal is the king of the jungle?",
+    options=["Lion", "Elephant", "Tiger", "Giraffe"]
+)
+```
+```text
+['Lion']
+```
+
+You can also prompt the LLM to choose an exact number of answers using `num_choices`. By default, it's set to `None` which means the LLM will select any number he thinks it's correct.
+## Use Cases 🚀
+Find out below example demos you can do with `thinkgpt`
+### Teaching ThinkGPT a new language
+```python
+from thinkgpt.llm import ThinkGPT
+
+llm = ThinkGPT(model_name="gpt-3.5-turbo")
+
+rules = llm.abstract(observations=[
+    "in tunisian, I did not eat is \"ma khditech\"",
+    "I did not work is \"ma khdemtech\"",
+    "I did not go is \"ma mchitech\"",
+], instruction_hint="output the rule in french")
+llm.memorize(rules)
+
+llm.memorize("in tunisian, I studied is \"9rit\"")
+
+task = "translate to Tunisian: I didn't study"
+llm.predict(task, remember=llm.remember(task))
+```
+```text
+The translation of "I didn't study" to Tunisian language would be "ma 9ritech".
+```
+
+### Teaching ThinkGPT how to code with `thinkgpt` library
+```python
+from thinkgpt.llm import ThinkGPT
+from examples.knowledge_base import knowledge
+
+llm = ThinkGPT(model_name="gpt-3.5-turbo")
+
+llm.memorize(knowledge)
+
+task = 'Implement python code that uses thinkgpt to learn about docarray v2 code and then predict with remembered information about docarray v2. Only give the code between `` and nothing else'
+print(llm.predict(task, remember=llm.remember(task, limit=10, sort_by_order=True)))
+```
+
+Code generated by the LLM:
+```text
+from thinkgpt.llm import ThinkGPT
+from docarray import BaseDoc
+from docarray.typing import TorchTensor, ImageUrl
+
+llm = ThinkGPT(model_name="gpt-3.5-turbo")
+
+# Memorize information
+llm.memorize('DocArray V2 allows you to represent your data, in an ML-native way')
+
+
+# Predict with the memory
+memory = llm.remember('DocArray V2')
+llm.predict('write python code about DocArray v2', remember=memory)
+```
+### Replay Agent memory and infer new observations
+Refer to the following script for an example of an Agent that replays its memory and induces new observations.
+This concept was introduced in [the Generative Agents: Interactive Simulacra of Human Behavior paper](https://arxiv.org/abs/2304.03442).
+
+```shell
+python -m examples.replay_expand_memory
+```
+```text
+new thoughts:
+Klaus Mueller is interested in multiple topics
+Klaus Mueller may have a diverse range of interests and hobbies
+```
+
+### Replay Agent memory, criticize and refine the knowledge in memory
+Refer to the following script for an example of an Agent that replays its memory, performs self-criticism and adjusts its memory knowledge based on the criticism.
+```shell
+python -m examples.replay_criticize_refine
+```
+```text
+refined "the second number in Fibonacci sequence is 2" into "Observation: The second number in the Fibonacci sequence is actually 1, not 2, and the sequence starts with 0, 1."
+...
+```
+This technique was mainly implemented in the [the Self-Refine: Iterative Refinement with Self-Feedback paper](https://arxiv.org/abs/2303.17651)
+
+
+For more detailed usage and code examples check `./examples`.
+