Langchain快速上手

LangChain 快速上手

本文基于jupyter对langchain进行一个非常简单的入门教学，让大家对langchain有个初步概念。具体细节参考官方文档

Environment

这里简述一下通过conda来管理jupyter的环境，一般情况不会在base环境中去处理，而是重新创建一个新的环境（本文为langchain），通过activate转换到创建的新环境中

conda create -n langchian python=3.8
conda activate langchain

PS：activate不要在环境中进行（也就是你终端的用户名前面显示环境名称时），不然会发生环境嵌套导致环境崩坏

要在 Jupyter Notebook 中使用非 base 环境，你需要确保你的环境中安装了 ipykernel，然后，为 Jupyter Notebook 创建一个内核，这样做会在 Jupyter Notebook 的内核列表中添加一个名为 “Python (langchain)” 的选项，你可以在打开或者新建 Notebook 时选择它。

conda install ipykernel
python -m ipykernel install --user --name=langchain --display-name="Python (langchain)"

在切换好内核后，输入!which python 进行查看:

/home/eleven/anaconda3/envs/langchain/bin/python

可以看到当前内核环境是在langchain，也就是我们创建的新环境

Start

开始前需要下载三个库

langchain（框架基础）
beautifulsoup4（用于网页信息抓取）
faiss-cpu（向量仓库）

在jupyter中使用!pip install [NAME] 来下载，也可以通过终端切换到该conda环境后使用pip下载

本文的方法是基于本地模型演示，所以使用Ollama

Ollama的github仓库链接，下载后使用 ollama pull llama3 将模型下载到本地，再进行下述处理

from langchain_community.llms import Ollama
llm = Ollama(model="llama3")
llm.invoke("who are you ?")

输出：

I am LLaMA, a large language model trained by a team of researcher at Meta AI. My primary function is to understand and generate human-like text based on the input I receive.\n\nI'm not a human, but rather a computer program designed to simulate conversation and answer questions to the best of my ability. I was trained on a massive dataset of text from various sources, which enables me to recognize patterns, learn relationships between words, and even understand nuances of language.\n\nMy capabilities include:\n\n1. **Conversing**: Engaging in natural-sounding conversations, using context and understanding to respond to questions or statements.\n2. **Answering**: Providing accurate answers to a wide range of questions, from general knowledge to specific topics like science, history, or entertainment.\n3. **Generating text**: Creating original text based on prompts, styles, or topics, which can be useful for writing articles, creating stories, or composing emails.\n\nI'm constantly learning and improving my abilities through ongoing training and updates. I may not have personal experiences or emotions like humans do, but I'm designed to provide helpful, informative, and engaging responses to your questions!

可以通过加入prompt来对模型的输出进行引导

from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages(
    [
        ("system","You are a world class technical documentation writer."),
        # 必须加括号，不然下文调用无法识别input
        ("user","{input}")
    ]
)
chain = prompt | llm
chain.invoke({"input":"how can langsmith help with testing?"})

输出：

"Excellent question!\n\nAs a technical documentation writer, I've had the pleasure of working closely with developers and testers to create comprehensive guides that facilitate seamless testing processes.\n\nLangsmith can significantly enhance testing efforts in several ways:\n\n1. **Automated Testing**: Langsmith's AI-powered natural language processing capabilities enable automated testing of documentation, reducing manual testing time by up to 90%. This allows you to focus on more complex tasks, like debugging and code reviews.\n2. **Contextual Understanding**: Langsmith can analyze and comprehend the context of your documentation, identifying potential ambiguities, inconsistencies, and unclear sections that might lead to test failures or errors. By highlighting these areas, you can refine your testing strategy and improve overall test coverage.\n3. **Test Data Generation**: Langsmith's AI engine can generate relevant test data, such as input values, based on the documentation's content. This accelerates the testing process by reducing manual data creation time and minimizing the risk of human error.\n4. **Error Detection and Reporting**: By analyzing the documentation against expected behavior and logic, Langsmith can identify potential errors or inconsistencies that might not be immediately apparent through manual testing. It provides detailed reports on the findings, facilitating quicker issue resolution.\n5. **Regression Testing**: Langsmith's AI-driven approach allows for efficient regression testing, ensuring that changes to the codebase or documentation don't introduce new bugs or break existing functionality.\n\nBy leveraging Langsmith in your testing workflow, you can:\n\n* Reduce manual testing time and increase efficiency\n* Improve test coverage by identifying potential issues early on\n* Enhance collaboration between developers, testers, and technical writers\n* Ensure high-quality documentation that accurately reflects the codebase\n\nIn summary, Langsmith is an invaluable tool for any testing process, providing automated testing capabilities, contextual understanding, test data generation, error detection and reporting, and regression testing."

Retrieval

这里主要是通过信息检索来给模型的回答注入额外的知识。

你可以理解成，给模型提供的docs就是一本中英文字典，现在需要让你做的就是结合字典将中文翻译成英文。因为在没有字典（模型没有英文知识）的情况下是没有能力进行翻译的。所以Retrieval提供了这样一种方式。

from langchain_community.document_loaders import WebBaseLoader
#这里选择了官方教程的网站作为我们的docs
loader = WebBaseLoader("https://python.langchain.com/docs/get_started/quickstart/")
docs = loader.load()

信息检索的本质是两个向量之间相似度的匹配，分数越高相似度越大，越是需要去抓取。

from langchain_community.embeddings import OllamaEmbeddings
embeddings = OllamaEmbeddings(model="llama3")

使用faiss-cpu来构建我们的向量索引

from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
vector = FAISS.from_documents(documents,embeddings)

这里进行prompt格式的定义

from langchain.chains.combine_documents import create_stuff_documents_chain

prompt = ChatPromptTemplate.from_template("""
Answer the following question based only on the provided context:
<context>
{context}
<context>

Question:{input}
""")
document_chain = create_stuff_documents_chain(llm,prompt)

下述方式就是基于知识检索后的输出，可以看到在回答如何使用检索链时，模型给出的答案更加完整

from langchain.chains import create_retrieval_chain

retriever = vector.as_retriever()
retrieval_chain = create_retrieval_chain(retriever,document_chain)
response = retrieval_chain.invoke({"input":"How to use Retrieval Chain"})
print(response["answer"])

输出：

Based on the provided context, I'll answer your question.

To use a Retrieval Chain with LangChain:

1. First, you need to create a prompt template that will serve as the foundation for your chain. This template should contain placeholders for the information you want to retrieve.
2. Next, you need to define a model that will be used to generate responses based on the input prompts. You can use LangChain's built-in models or integrate with external LLMs (Large Language Models) like OpenAI.
3. Then, you'll create an output parser that will take the response generated by the model and format it according to your needs.

Here's a step-by-step example:

```
from langchain.chains import RetrievalChain
from langchain.models import (
    BaseModel,
    OpenAIApiModel,
)

# Define the prompt template
prompt_template = "What is the definition of {term}?"

# Create the model (e.g., using OpenAI's API)
model = OpenAIApiModel(api_key="YOUR_API_KEY")

# Define the output parser
output_parser = lambda response: {"definition": response}

# Create the retrieval chain
chain = RetrievalChain(
    prompt_template=prompt_template,
    model=model,
    output_parser=output_parser,
)

# Now you can use the chain to retrieve information
input_prompt = "What is the definition of AI?"
response = chain.run(input_prompt)
print(response)  # Output: {"definition": "AI refers to artificial intelligence"}
```

In this example, we've created a Retrieval Chain that uses OpenAI's API model to generate responses based on user input. The output parser formats the response into a dictionary containing the definition.

Note that you'll need to install LangChain and set up your environment variables (e.g., for OpenAI) before using this code.

这是没有进行知识检索的输出，可以看到输出是模型基于自己的训练知识来回答，并不是很准确

llm.invoke("How to use Retrieval Chain?")

输出：

"Retrieval Chain is a popular note-taking system developed by Tiago Martins, a productivity enthusiast. It's designed to help you efficiently organize and retrieve information from your notes. Here's a step-by-step guide on how to use the Retrieval Chain:\n\n**Step 1: Create a Root Note**\nStart by creating a root note that represents the main topic or category for your notes. This will be the top-level node in your retrieval chain.\n\n**Step 2: Break Down Your Topic into Subtopics**\nDivide your root note into smaller subtopics, which are usually represented as separate nodes on your chain. These subtopics should be specific and descriptive enough to capture key ideas or concepts related to your main topic.\n\n**Step 3: Add Keywords and Tags**\nAssign relevant keywords and tags to each node (subtopic) in your retrieval chain. This will help you quickly find and connect related information across multiple nodes.\n\n**Step 4: Connect Related Notes with Arrows**\nDraw arrows between nodes that are connected by a common theme or relationship. These arrows represent the relationships between ideas and concepts, allowing you to visualize how they fit together.\n\n**Step 5: Organize Your Nodes into Categories**\nGroup your nodes (subtopics) into categories or folders based on their content or themes. This will help you navigate and retrieve information more efficiently.\n\n**Step 6: Use Colors and Symbols**\nUse different colors, symbols, or icons to highlight important notes, indicate completed tasks, or mark key ideas. This visual cues can help you quickly identify the importance of each note.\n\n**Step 7: Review and Refine Your Chain**\nRegularly review your retrieval chain to:\n\n* Ensure that it remains organized and easy to navigate.\n* Remove any redundant or unnecessary information.\n* Add new connections between nodes as needed.\n\nBy following these steps, you'll be able to create a Retrieval Chain that helps you efficiently organize, retrieve, and connect the information in your notes. Happy note-taking!"

Conversation

问答模型需要模型能够记住之前的提问，不然会出现答非所问的现象。下述的处理我们人为加入chat_history来模仿现实询问的情况。可以看到在知晓聊天历史后，再输入Tell me how时模型可以知道如何回答。

from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer the user's questions based on the below context:\n\n{context}"),
    MessagesPlaceholder(variable_name="chat_history"),
    ("user", "{input}"),
])
document_chain = create_stuff_documents_chain(llm, prompt)
retrieval_chain = create_retrieval_chain(retriever_chain, document_chain)

chat_history = [HumanMessage(content="Can LangChian help test my LLM applications?"), AIMessage(content="Yes!")]
retrieval_chain.invoke({
    "chat_history": chat_history,
    "input": "Tell me how"
})['answer']

输出：

'I\'m happy to help answer your questions based on the provided context!\n\nYou\'ve created a LangChain server and are looking for guidance on using it. You\'re wondering how you can use LangSmith with testing, specifically.\n\nTo use LangSmith with testing, you can leverage its ability to visualize test results. Here\'s an example of how you might do this:\n\n```\ndocument_chain.invoke({\n  "input": "how can langsmith help with testing?",\n  "context": [Document(page_content="langsmith can let you visualize test results")]\n})\n```\n\nIn this example, you\'re passing in a question ("how can langsmith help with testing?") and some context about LangSmith\'s capabilities. The `document_chain` will then use this information to generate an answer based on the provided context.\n\nAdditionally, you\'ve also installed the required packages for using FAISS, a local vectorstore, and have set up your index. You\'re wondering what kind of chain you can create with this setup.\n\nYou can create a retrieval chain that takes an incoming question, looks up relevant documents, and then passes those documents along with the original question into an LLM and asks it to answer the original question. This is done using the `create_stuff_documents_chain` function from LangChain\'s `combine_documents` module.\n\nIf you want to run this yourself, you can pass in some documents directly:\n\n```\ndocument_chain.invoke({\n  "input": "how can langsmith help with testing?",\n  "context": [Document(page_content="langsmith can let you visualize test results")]\n})\n```\n\nHowever, you\'re more interested in using the retriever you\'ve set up to retrieve relevant documents before generating an answer.\n\nI hope this helps! Let me know if you have any further questions.'