Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community
Idan Novogroder
Idan Novogroder Author

Idan has an extensive background in software and DevOps engineering....

Last updated on April 26, 2024

LangChain is one of the most useful frameworks for developers looking to create LLM-powered applications. It allows LLM models to create replies based on the most up-to-date data accessible online and simplifies the process of arranging vast volumes of data so that LLMs can quickly access it.

This is how LangChain enables developers to build dynamic, data-responsive applications. The open-source framework has so far enabled developers to create some pretty advanced AI chatbots, generative question-answering (GQA) systems, and language summarization tools (you can find some examples here).

In this article, we dive into LangChain in detail to show you how it works, what developers can build with it, and more.

What is LangChain?

LangChain is an open-source framework that gives developers the tools they need to create applications using large language models (LLMs). In its essence, LangChain is a prompt orchestration tool that makes it easier for teams to connect various prompts interactively.

LangChain began as an open source project, but as the GitHub stars piled up, it was quickly turned into a company led by Harrison Chase.

LLMs (such as GPT3 or GPT4) give a completion for a single prompt, which is more or less like receiving a complete result for a single request. For example, you could tell the LLM to “create a sculpture,” and it would do it. You may also provide more sophisticated requests, such as “create a sculpture of an axolotl at the bottom of a lake.” The LLM will likely return what you wanted. 

But what if you asked this instead:

“Give me the step by step instructions to carving an axolotl sculpture out of wood”? 

To avoid having the user explicitly provide every single step and choose the sequence of execution, you can use LLMs to produce the next step at each point, utilizing the prior step results as its context.

The LangChain framework can do that for you. It arranges a succession of prompts to reach a desired result. It provides a simple interface for developers to interact with LLMs. That way, you could say that LangChain works like a reductionist wrapper for leveraging LLMs.

What is LangChain Expression Language?

LangChain Expression Language (LCEL) is a declarative language that helps engineers connect chains easily. It was built from the start to facilitate placing prototypes in production with no code modifications.

Here are a few benefits of LCEL:

  • When you use LCEL to create your chains, you get the best potential time-to-first-token (the amount of time it takes for the first piece of output to appear). For some chains, this means that we stream tokens directly from an LLM to a streaming output parser, and you get back parsed, incremental chunks of output at the same pace as the LLM provider.
  • Any chain generated with LCEL may be called using both the synchronous API (for example, in a Jupyter notebook when experimenting) and the asynchronous API (like a LangServe server). This allows for the use of the same code for prototypes and production, with excellent speed and the flexibility to handle several concurrent requests on the same server.
  • A data scientist or practitioner can have steps in LCEL chains executed in parallel.
  • LangServe can quickly deploy any chain generated using LCEL.

Why consider using LangChain?

When employed with a single prompt, LLMs are already quite strong. However, what they do is essentially perform completions by predicting the most likely next word. They don’t think and reason like people do before they say something or respond. At least that’s what we like to believe.

Reasoning is the process of using information acquired prior to the communication act in order to reach new conclusions. We don’t consider creating an axolotl sculpture as a single continuous operation but rather as a succession of smaller actions that impact the next steps.

LangChain is a framework that allows developers to create agents capable of reasoning about issues and breaking them down into smaller sub-tasks. By building intermediary stages and chaining complex commands together, you can add context and memory to completions using LangChain.

Here’s an example of LangChain usage with Large Language Models

If you ask an LLM which branches were top performers in your chain of art supply stores, here’s what’s going to happen:

The model will build a logical SQL query to retrieve the results and serve you a bunch of fictitious but entirely plausible column names.

What would that look like if you used LangChain?

In this scenario, you could provide the LLM with a bunch of functions to use and then ask it to create a process for you. Then, after going through that process, you might get a single answer: 

Art supply store #1516 in Dallas is your top-performing store.

Note that some work needs to go into the formulation of the SQL query.

You can start by writing some functions like getTables() or getSchema(table). But how do you get the table schema if you don’t know the table names? Which of the table schemas includes data about sales per store anyway? 

Using LangChain, developers can rely on LLMs to produce each step and ask each of these questions. So, you no longer need to spend time providing input and manually organizing these phases.

Why is LangChain captivating the industry?

LangChain is fascinating because it lets teams augment existing LLMs with memory and context. They can artificially add “reasoning” and complete more complex tasks with greater precision and accuracy.

Developers are excited about LangChain because it offers a new approach to creating user interfaces – where users can just ask for what they want rather than dragging and dropping elements or using code.

Consider a tool we’ve all used at some point: Microsoft PowerPoint. Take a look at the sheer number of buttons, each of which performs a specific job. Nobody would mind using natural language to describe exactly what they need and get a neat presentation style in a matter of seconds.

This explains the massive success of ChatGPT. It’s way more than a basic implementation of GPT. Its output comes from constant learning via a feedback loop. When a coding request is made, ChatGPT formalizes the request, presents two implementations, gives the reasoning for each, and explains the code.

Given that there is no method to describe the code before it was created by the LLM, the LLM completion to explain the code must have come to life after it was formed.

How LangChain works

LangChain was developed in Python and JavaScript, and it supports a wide range of language models such as GPT3, Hugging Face, Jurassic-1 Jumbo, and others.

To start using LangChain, you must first create a language model. This means either taking advantage of a publicly available language model, such as GPT3, or training your own model.

Once completed, you can start developing applications with LangChain. LangChain offers a number of tools and APIs that make it simple to link language models to external data sources, interact with their surroundings, and develop complicated applications.

It creates a workflow by chaining together a sequence of components called links. Each link in the chain does something specific, such as:

  • Formatting of user input
  • Using a data source
  • Referring to a linguistic model
  • Processing the language model’s output

The links in a chain are connected in a sequential manner, with the output of one link serving as the input to the next. By chaining together small operations, the chain is able to do more complicated tasks.

What are the fundamental components of LangChain?

LLMs

Naturally, LangChain calls for LLMs – large language models that are trained on vast text and code datasets. You can use them to generate text, translate languages, and answer queries, among other things.

LangChain calls for LLMs
Source: LangChain documentation

Prompt templates 

Prompt templates are used to format user input so that the language model can understand it. You can use them to provide context for the user’s input or to describe the job that the language model should complete. A prompt template for a chatbot, for example, can include the user’s name and question.

Indexes

Indexes are databases that hold information about the training data for the LLM. This data can comprise the text of the documents, their metadata, and their connections. 

Retrievers

Retrievers are algorithms that look for specific information in an index. You can use them to locate documents that are relevant to a user’s query or documents that are most similar to a particular file. Retrievers are critical for increasing the LLM’s response speed and accuracy.

Output parsers 

LLM output parsers are in charge of formatting the replies they generate. They can adjust the structure of the response, eliminate undesired stuff, or add extra information. Output parsers are key to ensuring that the LLM’s replies are simple to interpret and apply.

Vector store

Vector stores
Source: LangChain documentation

A vector store houses mathematical representations of words and phrases. It comes in handy for tasks like answering questions and summarizing. A vector database, for example, can be used to locate all words that are comparable to the word “cat.”

Agents

Agents are programs that can reason about issues and divide them into smaller subtasks. You can use an agent to direct the flow of a chain and decide which jobs to do – for example, assess whether a language model or a human expert is best suited to answer a user’s inquiry.

8 benefits of using LangChain

  1. Scalability – LangChain may be used to create applications capable of handling massive volumes of data.
  2. Adaptability – The framework’s adaptability allows it to be used to develop a wide range of applications, from chatbots to question-answering systems.
  3. Extensibility – Developers may add their own features and functionality to the framework because it is expandable.
  4. Ease of use – LangChain offers a high-level API for connecting language models to various data sources and building complicated applications.
  5. Open source –  LangChain is an open-source framework that is free to use and modify.
  6. Vibrant community – There is a huge and active community of LangChain users and developers that can assist and support you.
  7. Great documentation – The documentation is thorough and simple to understand.
  8. Integrations – LangChain may be integrated with various frameworks and libraries, such as Flask and TensorFlow.

How to get started with LangChain

LangChain’s source code is accessible on GitHub. You can download and install the source code on your machine. 

LangChain is also available as a Docker image, making it simple to install on cloud platforms.

You can also install it with a simple pip command in Python: langchain install pip

If you want to install all of LangChain’s integration requirements, use the following command: pip install langchain[all]

Now you’re ready to start a new project!

  1. Create a new directory and run the following command: init langchain
  2. Next, you need to import the required modules and make a chain, which is a series of links where each performs a certain function. 
  3. To make a chain, create an instance of the Chain class and then add links to it. Here’s a snippet that generates a chain that calls a language model and receives its response: Chain() returns a chain.add_link(Link(model=”openai”, prompt=”Create an axolotl sculpture”)
  4. To execute a chain, use the run() function on the chain object. 
  5. The output of a chain is the output of the chain’s last link. To get the chain’s output, use the get_output() function on the chain object.
  6. Finally, you can personalize the chain by changing the properties of links or adding/removing them.

What kind of apps can you build with LangChain?

Content generation and summarization

LangChain comes in handy for creating summarizing systems capable of producing summaries of news articles, blog entries, and other sorts of text. Another common use case is content generators that generate writing that is both helpful and interesting.

Chatbots

Naturally, chatbots or any other system that responds to questions is a great use case for LangChain. Such systems will be able to access and process data from a range of sources, such as databases, APIs, and the internet. Chatbots can respond to queries, provide customer support, or even generate unique text formats such as poetry, code, screenplays, musical pieces, email, letters, and so on.

Data analysis software

LangChain can also be used to create data analysis tools that assist users in understanding the links between various data pieces.

Is LangChain open-source?

Yes, LangChain is an open-source project that is entirely free to use. You can get the source code from GitHub and use it to create your own apps. Also, you can use pre-trained models provided by LangChain.

Wrap up: the future of LangChain

The primary use case for LangChain at the moment is chat-based apps on top of LLMs (particularly ChatGPT), also called “chat interfaces.” In a recent interview, the company’s CEO Harrison Chase, said the ideal use case right now is a “chat over your documents.” LangChain also provides additional features to improve the conversation experience for applications, such as streaming, which implies providing the output of the LLM token by token rather than all at once.

He also hinted at the future evolution of such interfaces:

“Long term, there’s probably better UX’s than chat. But I think at the moment that’s the immediate thing that you can stand up super-easily, without a lot of extra work. In six months, do I expect chat to be the best UX? Probably not. But I think right now, what’s the thing that you can build at the moment to deliver value, it’s probably that [i.e. chat].”

In the future, we might see teams developing applications powered by LangChain for other areas. Given the novelty of designing apps with LLMs, frameworks like LangChain are indispensable for providing tools to help address some of the challenges with LLMs in the data science world. Install LangChain and see what it can do for yourself.

Git for Data – lakeFS

  • Get Started
    Get Started
  • Where is data engineering heading in 2024? Find out in this year’s State of Data Engineering Report -

    Read it here
    +