Machine Learning

Why the Fury? Building a new flow engine from scratch

May 2, 2023

5 min read

Introduction to ChainFury

ChainFury started as a weekend hackathon but since then has developed into a much bigger project. The core idea behind it is rapid development (with chains), deployment (with embeddable chatbot UI), and gathering feedback for performance. Initially, it was built with LangFlow as inspiration which was in turn built on top of LangChain.

Chandrani wrote a great starting blog on ChainFury.

Successes and Challenges with LangChain and LangFlow

LangFlow had a brilliant description of what the UI should look like, with a great Template for each node that allowed the creation of forms to populate. This standard allowed us to build the Front-end in a nice spec and focus more on the Back-end during the 48-hour hackathon. We focused solely on the CRUDL of the chat, chatbots, and other resources, industry-standard authentication, and JS-embeddable chatbox.

I want to talk a little bit more about the challenges we faced. First and foremost was that LangChain is very hard to use:

There is no straightforward syntax
Interactivity with chats is not built-in
Support for multiple modalities is hard

These are hard problems, and LangChain does a great job of managing the immense complexity of 100s of different APIs while abstracting them away from the users. There are other concerns in LangFlow as well, ex. the way it determines the steps is by using langflow.utils.payload.get_root_node() function that looks like this:

def get_root_node(graph):
    """
    Returns the root node of the template.
    """
    incoming_edges = {edge.source for edge in graph.edges}
    return next((node for node in graph.nodes if node not in incoming_edges), None)

This might appear to be a working solution, but there is a bug hiding in plain sight. What if the DAG (graph) was initialized incorrectly? It relies on the assumption that whoever created the DAG via the Front-end did the correct job. You cannot guarantee performance in this form; instead, the correct algorithm is the topological sort which guarantees that DAG will be executed in the correct order despite small overhead at runtime.

This is just a small critique, and we stand on the shoulders of giants.

Introducing Fury

We thus decided to rebuild the processing engine from the ground up with abstractions that are pretty future-proof and scalable. A lot of our production code is written in Golang, which has helped our team of self-taught engineers in designing systems with correct responsibilities. Python, despite all its greatness, is a very limited language for building complex applications which require interactions with other systems whose behavior is unpredictable, but more on this later.

Fury, which is available at chainfury.fury keeps Agent (chatbot) as the centerpiece and is inspired by the Von Neumann Architecture, which is the backbone for the entirety of modern computing.

Each Agent will be:

Interactable via chat: This is the new standard interface for the 2020s
Have its own memory: Agent can remember things as it wants and store them in patterns it wants
Multiple source models: Models can provide all kinds of modality as outputs
Chains: Developers can choose to build their own flow and uniqueness

The Design of Fury: Pseudocode and Features

Here is the pseudocode I have in mind for this:

class Model:
  # user can subclass this and override the __call__
  def __call__(self, *args, **kwargs):
    ...
class Memory:
  # user can subclass this and override the following functions
  def get(self, key: str):
    ...
  def put(self, key: str, value: Any):
    ...
class Chain:
  def __init__(self, agent: Agent):
    # so the chain can access all the underlying elements of Agent including:
    # - models
    # - memories
    self.agent = Agent
  # user can subclass this and override the __call__
  def __call__(self):
    ...
# the main class, user can either subclass this or prvide the chain
class Agent:
  def __init__(self, models: List[Model], memories: List[Memory], chain: Chain):
    self.models = model
    self.memories = memories
    self.chain = chain
  def __call__(self, user_input: Any):
    return self.chain(user_input)

Model allows for any kind of model to be put into the picture, whether it is OpenAI GPT, Stable Diffusion, or even connected to a local running endpoint.
Memory makes it such that the users can choose to store things in a DB, file, etc. I am not fully sure what the final APIs will look like, but starting with a key/value store never hurts anyone.
Agent is the simplest; its primary job is as a namespace and a standard interface to call the chain.
Chain makes it so that any kind of flow that the user wants to implement can be handled.

Sharing Responsibilities

It is still not clear how all the different outputs will be standardized, e.g., a stable-diffusion output can be an image while the ChatGPT output can be a text. However, we will provide enough abstractions and guarantees that the flow I/O will be consistent, and the dev can refer to docs/chat to find out more.

Future

I hinted above that ChainFury might be one of the last projects. The reason is simple, if chains are the new form of development and memory can allow it to store abstracted concepts effectively, then all software dev can eventually be abstracted, stored in a DB, and applied as and when needed.

If you have any thoughts on this, you can raise an issue or start a discussion.

Written By

Yash Bonde

Head of Research