GPT API + Vector Database + LangChain Basics

GPT API

(* based on openai==0.27)

Basics

Payment: done based on # of input, output tokens, rates vary by which model used
Temperature: 0~inf, 0 is mostly deterministic, if inf it becomes creative, normally between 0.5 and 1.0

Chain of Thoughts

Chain of Thoughts의 개념을 잘 이해하고 활용해야 GPT의 성능을 maximize할 수 있을 것으로 보여짐.

이와 관한 Emperical foundation은 이 google blog post에서 확인 가능. LLM을 활용하는 과정에서 CoT를 적용한 prompting을 해야 복잡한 reasoning의 성능이 올라감을 확인하였음.

결국 주어진 큰 문제를 어떻게 효율적으로 쪼갤 수 있을 것이냐가 GPT 사용의 core인 것이라고 보임. 근본적으로 자연어 모델에 넣을 수 있는 input / output의 size가 한정되어 있기 때문에, 이를 engineering적으로 타파할 여러 방법이 제시되고 있고 후술할 langchain의 chain design에서도 이를 아주 흥미롭게 해결하였음. 분명히 LLM을 더 연구하다 보면 theortical한 method도 나올 것이라고 보임.

Models

GPT4: multimodal, optimized for chat (gpt-4, gpt-4-32k )
GPT3.5: (gpt-3.5-turbo, code-davinci-002 for code completion)
GPT3: legacy, currently (jul 23) only fine-tunable (text-X-001, where X in ada, babbage, curie and davinci)
DALLE: CV
Whisper: Speech

Chat Completions API: A Core API

openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ]
)

For full API documentation, https://platform.openai.com/docs/api-reference/chat

List of Core Parameters:

model
messages
- role: One of system, user, assistant, or function
- content: The contents of the message. content is required for all messages, and may be null for assistant messages with function calls.
- name: optional, but required for function
functions (for function call)
temperature: (default 1)
max_tokens
n : # completion choices

GPT 개인화-personalization을 시키기 위해 크게는 아래와 같은 방법이 가능할 듯

prompt-response pair를 추가로 주입하고 supervised manner로 fine-tuning 진행
실제 데이터들을 embedding으로 변환해서 vector database를 구축 후 prompt 시 같이 input으로 활용; pipeline 구축해서 langchain 필성성 (few-shot prompting으로 example들 제공)

Fine Tuning

https://platform.openai.com/docs/guides/fine-tuning (fine tuning on GPT3 is soon to be deprecated)

Fine-tuning is currently only available for the following base models: davinci, curie, babbage, and ada
Traninig data: (prompt, completion) pair를 담은 json파일이 요구됨
cases
- Case study: Write an engaging ad based on a Wikipedia article - generative case이므로 completion에 다양한 실 광고를 넣어주면 됨
- Case study: Entity extraction
- Case study: Product description based on a technical list of properties
바로 classification과의 연동 가능; https://platform.openai.com/docs/guides/fine-tuning/advanced-usage

Embeddings

https://platform.openai.com/docs/guides/embeddings/what-are-embeddings

text를 주면 그에 해당하는 embedding을 반환, 이후 vector DB에 넣어놓고 활용 가능

Tactics for better output

Write clear instructions

Include details in your query to get more relevant answers
Ask the model to adopt a persona; You are a …. => system의 역할을 잘 작성할 것
Use delimiters to clearly indicate distinct parts of the input
Specify the steps required to complete a task
Provide examples
Specify the desired length of the output

Provide reference text

Instruct the model to answer using a reference text
Instruct the model to answer with citations from a reference text
- 정확한 정보 요구하는 법!

System: 
You will be provided with a document delimited by triple quotes and a question. Your task is to answer the question using only the provided document and to cite the passage(s) of the document used to answer the question. If the document does not contain the information needed to answer this question then simply write: "Insufficient information." If an answer to the question is provided, it must be annotated with a citation. Use the following format for to cite relevant passages ({"citation": …}).

Example Prompt Engineering

Korean best practies
- 지식기반 시스템을 이용해 메뉴를 저장하고, 그라디오를 이용해 주어진 예산에 맞는 메뉴를 챗GPT가 추천해주는 서비스 https://aifactory.space/competition/2374/discussion/368
Text2SQL
- https://blog.langchain.dev/llms-and-sql/
To be updated…

Vector Databases

Motivation

Chat-based인 GPT는 최대 20개 정도의 대화만 기억하므로, 과거 chat 데이터에 강한 휘발성이 있음.
User data의 embedding vector를 indexing/search/update를 할 필요가 있음.

Indexing

Vector의 indexing에는 여러 어프로치가 있는데, exact indexing, ANN(Approximate NN), Group-Based indexing 등이 있음.
Goal은 기본적으로 given input vector에 대해 similarity가 높은 vector를 retrieve해 내는 것.
Optimization을 위해 내부적으로 low-dimension projection, MM quantization 등의 기법을 사용함.

Pinecone DB

Use cases

Document retrieval
Real-time AI decisions

LangChain

(*OpenAI API만 이용하는 것으로 한정) (*based on langchain==0.0.237)

GPT API와 각종 tool을 pipeline으로 연결해 처리 가능.

Components

Chains: sequence of calls to components, which can include other chains
Agents: 주어진 prompt 기반으로 LLM이 수행할 수 있는 역할들의 agent (e.g. google search, Python code gen 등)
Model IO
- Prompts: Templated LLM prompts
- LLMs: Connector to the LLM models (langchain.llms)
- Output Parsers
Data Connections
- Retrievers: querying from data: returns documents given an unstructured query link
- Document Loaders: load document given filepath
- Vector Stores: store and search from databases
- Document transformers (e.g. text splitters)
Memories: Making Langchain stateful, by memorizing previous chats; ChatMessageHistory, ConversationBufferMemory
Embeddings:
Toolkits:
Tools
Utilities:

Chains

When dealing with documents, methods for handling large-scale documents as prompt https://python.langchain.com/docs/modules/chains/document/

When cannot scale, use similarity search in VectorDB!

Documents

Document를 input으로 넣어 줄 때 여러 방법이 사용됨. 기본적으로는 chain of thought를 실제로 어떻게 구현할까에 대한 솔루션인 것으로 보임.

stuff: 전체 document를 한 prompt에 몰아 넣는다. 당연히 scalability issue가 있음.
refine: documents를 순회하면서 one-at-a-time으로 provide하면서 intermediate answer를 같이 제공하는 방식. RNN의 작동원리랑 비슷함. 단점으로는 document 간 dependency가 있을 때 capture가 어렵거나(ranking), 아니면 특정 위치에 나오는 document에 답변이 좌우되던지 하는 문제가 있을 듯.
mapreduce: map reduce 방식으로 호출. refine 방식에 비해 order dependency는 덜 할 것으로 보임.
map re-rank: mapreduce방식에 더해 답변의 certainty도 같이 반환. highest score를 가지는 답변을 반환함.

이러한 방식을 취할 수 있는 것에는, LLM이 이제는 지정된 사용자 instruction을 거의 무조건 따라올 수 있기 때문일 것이라고 생각된다. 이것이 불안정하다면 chain을 programming할 수는 없을 것인데, LLM으로부터 바로 자연어 인풋 아웃풋을 생성하더라도 그 output이 사용자 요구에 맞는 구조를 가짐을 이제는 보장할 수 있다는 것을 의미할 것이다. 이로 보아, 앞으로는 기존 structured data 와 unstructured data processing 사이 boundary가 많이 허물어질 것으로 생각된다. Both world를 넘나들며 각 world 강점을 활용한 시스템을 구축하면 분명 엄청난 industrial value를 만들어낼 수 있을 것이라 생각한다.

Example: SQL Generation

https://python.langchain.com/docs/modules/chains/popular/sqlite

Few Shot Prompting

Prompt를 작성할 때 과거 데이터 중 Prompt와 유사한/관련된 데이터를 같이 input에 넣어주는 방식임. 이를 위해서는 Prompt와 관계된 자연어 데이터를 어떠된 형태로든 찾아서 넘겨줘야 는데, 이 과정에서 VectorDB가 적극적으로 쓰일 수 있음.

Example Pipelines

말투 모방
To be updated…

How I Created an AI Clone of Myself With ChatGPT

(summary of the following blog post)

Goal: 질문에 대해 내 어투/문체를 모방하는 bot 만들기
Pipeline
- 1. User Prompt
- 1. Pinecone VectorDB에 저장된 messing logs의 OpenAI Embedding을 이용해 semantic search
  - 전체 text 문서를 RecursiveCharacterTextSplitter 이용해서 split (chunk size 200, overlap 20)
  - pinecone에 indexing함, 이 때 embedding(OpenAIEmbedding) 과 text가 주어져야 함
  - indexing한 DB를 자연어 질의를 통해 마찬가지로 질문 가능 (이 때도 OpenAIEmbedding이 필요함). 질의는 similarity_search() 이용
- 1. LLM + Prompt Template
  - Few shot prompting 이용 (role 주고 examples, history 및 query 제공)
  - Prompt 구조
    - You are going to immerse yourself into the role of Tariq. Tariq is …. Human will give you an input and examples of a conversation… Your answer should be believable, in a casual tone and in Tariq’s style.Answer how Tariq would Answer.
    - Examples: {examples}
    - Examples END
    - {history}
    - Human: {human_input}
    - Tariq:
  - examples는 search_wrapper.similarity_search() 이용, k 값으로 예제 수 조절
  - 과거 대화를 기억하는 conversation이 가능하도록 history 계속 제공: langchain의 기능

Langflow: Web UI for LangChain

https://github.com/logspace-ai/langflow

UI로 flow 설계 후 export한 pipeline file를 python을 통해 불러와 바로 langchin으로 deploy 가능

last modified June 2, 2024

Guideline on what/how to use with GPT(OpenAI) API and LangChain on top of it.

By Jinho Ko