smorrins ML Blog - Processing food orders with LangChain & Huggingface

%pip install --q --upgrade langchain-huggingface

Note: you may need to restart the kernel to use updated packages.

Picture the following scenario: You’re running a pizza delivery service which takes customers orders via phone - but you’re overwhelmed with the phone calls. You decide to automate everything and have already found a good speech recognition model, however you’re not yet sure on how to process the text. In this post (first of two), I want to use LangChain together with a Microsoft Phi model hosted on HuggingFace to extract the relevant data in JSON format so that it is processable.

Our hypothetical robot taking orders on the phone.

Let’s load the model first!

from langchain_huggingface import HuggingFaceEndpoint, HuggingFacePipeline, ChatHuggingFace

model = HuggingFaceEndpoint(
    repo_id="microsoft/Phi-3.5-mini-instruct",
    task="text-generation",
    max_new_tokens=1024, # the maximum number of output tokens we want
)

chat = ChatHuggingFace(llm=model, verbose=True)

This is a small example taken from the docs (adapted to use a different running model and adapted for our example, the 2nd post will extend this). As you can see, LangChain makes it quite easy to set up a model - be it locally or remote.

We’ll be using Microsofts Phi SLM which actually works very well for this task. At the time of writing, DeepSeek’s R1 was recently released - however it is not well suited for this task as I’ll show later.

The model is being run on HuggingFace and you won’t need any API keys if you’re using Phi 3.5 mini - eliminating the oftentimes complex setup. However Phi 3.5 can also easily be run locally since it’s very small in terms of parameters.

messages = [
    ("system", "You are taking orders at a pizza place. Extract the type of pizza the user would like to order."),
    ("human", "Hey, could I get one pizza margherita?"),
]

chat.invoke(messages).content

'Yes, you would like to order a Margherita pizza. Would you like any specific toppings or additional requests with your order?'

As we can see, that kind of worked, but we cannot really process this as we get a lot of junk! Therefore, we have to aid the model a bit as to how it should handle this.

In this case, we can add few-shot prompts to show some examples of how to handle the user input.

system_prompt = """
    You are taking orders at a pizza place. You are given orders from customers. \
    Extract the type of pizza the customer would like to order.

    Some examples:
    
    Customer: Could I get one pizza margherita?
    Output: 1x Pizza margherita

    Customer: Can we get three pizzas with mushrooms and one pizza with spinach?
    Output: 3x Pizza with mushrooms, 1x Pizza with spinach
"""

messages = [
    ("system", system_prompt),
    ("user", "Hey, could I get one pizza margherita and one pizza with ham?")
]

chat.invoke(messages).content

'Output: 1x Pizza margherita, 1x Pizza with ham'

Great, that worked well so far! This is processable, but it’s still a simple string. Let’s step it up a little by using a predefined schema.

from pydantic import BaseModel, Field
from langchain_core.prompts import ChatPromptTemplate

class Pizza(BaseModel):
    """A pizza with type and amount"""
    pizza_type: str = Field(description="The type of pizza (e.g. margherita)")
    amount: int = Field(description="How many pizzas should be ordered (e.g. one)")

class PizzaOrder(BaseModel):
    """A parsed pizza order from a user"""
    pizzas: list[Pizza] = Field(description="A list of pizzas with the type of pizza and amount ordered")
    street_address: str = Field(description="A street address to deliver the pizza to (e.g.  1711 Hoffman Avenue)")
    city: str = Field(description="The city where the pizza should be delivered (e.g. New York)")
    name: str = Field(description="The name of the person ordering the pizza (e.g. Adam Parker)")

structured_system_prompt = """
    You are taking orders at a pizza place. You are given orders from customers. \
    Extract the types of pizza and the amounts the customer would like to order as well as the required customer data.

    An example:
    
    Customer: Hey this is John Doe, can we get three pizzas with mushrooms and one pizza with spinach delivered to John Doe at 1234 Sample Avenue in New Amsterdam?
    Output: {{"pizzas": [ {{"pizza_type": "pizza with mushrooms", "amount": 3}}, {{"pizza_type": "pizza with spinach", "amount": 1}}], "street_address": "1234 Sample Avenue", "city": "New Amsterdam", "name": "John Doe"}}
"""

LangChain allows you to use pydantic models or typedDicts - pydantic models have the advantage that they include methods to verify the data (which can be extended).

user_prompt = "Hi, this Andrea - listen can I get one pizza with ham delivered to 3847 White River Way in Salt Lake City?"
chat.invoke([
    ("system", structured_system_prompt),
    ("user", user_prompt)
])

AIMessage(content='Output: {{"pizzas": [ {{"pizza_type": "pizza with ham", "amount": 1}}], "street_address": "3847 White River Way", "city": "Salt Lake City", "name": "Andrea"}\n', additional_kwargs={}, response_metadata={'token_usage': ChatCompletionOutputUsage(completion_tokens=61, prompt_tokens=213, total_tokens=274), 'model': '', 'finish_reason': 'stop'}, id='run-e3d6cf54-499f-46cd-9408-7763be1dccaf-0')

Alright, that worked! However LangChain offers a better way of doing this with the model data without requiring us to parse the data ourselves. Unfortunately, this seems to be broken for HuggingFace models. Therefore we have to do it manually like this:

structured_llm = chat.bind_tools([PizzaOrder])
prompt_template = ChatPromptTemplate([
    ("system", structured_system_prompt),
    ("user", "{order}")
])

structured_llm = prompt_template | structured_llm

output = structured_llm.invoke({"order": user_prompt})
# usually way easier, but doesn't work for huggingface models :(
# https://github.com/langchain-ai/langchain/discussions/26321
structured_output = output.additional_kwargs["tool_calls"][0].function.arguments
structured_output

{'name': 'Andrea',
 'pizzas': [{'pizza_type': 'pizza with ham', 'amount': 1}],
 'street_address': '3847 White River Way',
 'city': 'Salt Lake City'}

structured_output = PizzaOrder.model_validate(structured_output)
structured_output

PizzaOrder(pizzas=[Pizza(pizza_type='pizza with ham', amount=1)], street_address='3847 White River Way', city='Salt Lake City', name='Andrea')

Ok, that seems to have worked! Normally there is a much better way to do this like this:

structured_llm = llm.with_structured_output(PizzaOrder)

But as I said, this doesn’t work for HuggingFace (here’s the docs if you use another model).

Also note that we used a template for our prompt which makes it easier to construct more complex prompts - we just have to supply values for the placeholders (Important: Placeholder syntax is: {placeholder}, therefore if you want to use JSON (as we did for our example), you’d have to use double braces to {{ escape }}).

Let’s try a more complicated order:

complicated_prompt = """
Hi, can I get a pizza with blue chesse and mushrooms?
And two pizzas with mushrooms and ham - ah wait, make it three. 
Can you deliver it to John Doe, 1234 Oak Avenue in New Amsterdam?
"""

def get_pizza_order(prompt: str) -> PizzaOrder:
    llm_output = structured_llm.invoke({"order": prompt})
    structured_output = llm_output.additional_kwargs["tool_calls"][0].function.arguments
    return PizzaOrder.model_validate(structured_output)

get_pizza_order(complicated_prompt)

PizzaOrder(pizzas=[Pizza(pizza_type='pizza with blue cheese and mushrooms', amount=1), Pizza(pizza_type='pizza with mushrooms and ham', amount=3)], street_address='1234 Oak Avenue', city='New Amsterdam', name='John Doe')

Ok that worked well, note that it also correctly parsed the amount for the “mushrooms & ham” pizza. Let’s try something different now: Making the bot more natural and usable in pratice.

A more natural order process

If you’d order a pizza via a chatbot / phone, you probably wouldn’t directly state all the information required in one sentence. Instead, you’d probably want to achieve a more natural conversation - where you first state your order and the other person at the pizza place then asks you for the delivery address (or something like that).

Since the LLM we’re using doesn’t have context like ChatGPT, we have to pass the answer back into the model to so it know’s what it has to ask for.

We’ll try a naive approach first by just adapting the prompt and the pydantic model and then we’ll take a more sophisticated approach by using LangGraph for conversational agents (2nd post).

natural_flow_prompt = """
    You are taking orders at a pizza place. You are given orders from customers. \
    Extract the types of pizza and the amounts the customer would like to order as well as the required customer data.

    You may be given incomplete data, in that case also respond with an appropriate question asking for relevant information for the customer. 
    If the required data is complete, respond with the data and a quick message for the customer.

    An example:
    Customer: Hey is this the pizza place?
    Output: {{ "customer_message": "Yes, would you like to order something?", finished: false }}
    Customer: Yes, can I get three pizzas with mushrooms and one pizza with spinach?
    Output: {{"pizzas": [ {{"pizza_type": "pizza with mushrooms", "amount": 3}}, {{"pizza_type": "pizza with spinach", "amount": 1}}], "finished": false, "customer_message": "Sure! Anything else?"}}
    Customer: No, that's it!
    Output: {{"pizzas": [ {{"pizza_type": "pizza with mushrooms", "amount": 3}}, {{"pizza_type": "pizza with spinach", "amount": 1}}], "finished": false, "customer_message": "Alright! Where do you want the delivered to?"}}
    Customer: 1234 Oak Avenue.
    Output: {{"pizzas": [ {{"pizza_type": "pizza with mushrooms", "amount": 3}}, {{"pizza_type": "pizza with spinach", "amount": 1}}], "street_address": "1234 Oak Avenue", "finished": false, "customer_message": "Which city?"}}
    Customer: New Amsterdam.
    Output: {{"pizzas": [ {{"pizza_type": "pizza with mushrooms", "amount": 3}}, {{"pizza_type": "pizza with spinach", "amount": 1}}], "street_address": "1234 Oak Avenue", city: "New Amsterdam", "finished": false, "customer_message": "And what's your name?"}}
    Customer: John Doe.
    Output: {{"pizzas": [ {{"pizza_type": "pizza with mushrooms", "amount": 3}}, {{"pizza_type": "pizza with spinach", "amount": 1}}], "street_address": "1234 Oak Avenue", city: "New Amsterdam", "name: "John Doe", "finished": true, "customer_message": "Alright that's it! Thanks for your order - it should arrive within the hour."}}

    In addition to the customers prompt, you will also be given your previous output to fill in the missing data.
"""

As you can see, that’s a very long system prompt! Let’s see how good that approach works.

from typing import Optional

class PizzaOrderNatural(BaseModel):
    """A parsed pizza order from a user"""
    pizzas: Optional[list[Pizza]] = Field(description="A list of pizzas with the type of pizza and amount ordered")
    street_address: Optional[str] = Field(description="A street address to deliver the pizza to (e.g.  1711 Hoffman Avenue)")
    city: Optional[str] = Field(description="The city where the pizza should be delivered (e.g. New York)")
    name: Optional[str] = Field(description="The name of the person ordering the pizza (e.g. Adam Parker)")
    finished: bool = Field(description="A boolean inidcating if all the required data has been acquired")
    customer_message: str = Field(description="A message to the customer indicating if other data is required")

natural_template = ChatPromptTemplate([
    ("system", natural_flow_prompt),
    ("ai", "{prev_output}"),
    ("user", "{user_string}")
])

natural_llm = chat.bind_tools([PizzaOrderNatural])
natural_llm = natural_template | natural_llm

def send_message(user_prompt: str, previous_output: str):
    output = natural_llm.invoke({"prev_output": previous_output, "user_string": user_prompt})
    structured_output = output.additional_kwargs["tool_calls"][0].function.arguments
    strucuted_output = PizzaOrderNatural.model_validate(structured_output) 
    return structured_output["customer_message"], structured_output

msg, parsed = send_message("Hi, can I order a pizza?", "{}")
msg

'What type of pizza would you like to order and how many?'

msg, parsed = send_message("I'd like one pizza margherita and one pizza pepperoni!", str(parsed))
msg

'Your address please?'

msg, parsed = send_message("Ludwigstraße 15", str(parsed))
msg

'No further details needed. Thank you!'

parsed

{'customer_message': 'No further details needed. Thank you!',
 'pizzas': [{'pizza_type': 'margherita', 'amount': 2},
  {'pizza_type': 'vegetarian', 'amount': 1}],
 'street_address': 'Ludwigstraße 15',
 'city': 'Munich',
 'name': 'Hans Müller',
 'finished': True}

Well it got the pizzas correctly, but it hallucinated the city and even a name! Of course we could just use a different, better model (Phi is relatively small), but let’s try a different approach first.

natural_flow_prompt = """
    You are taking orders at a pizza place. You are given orders from customers. \
    Extract the types of pizza and the amounts the customer would like to order as well as the required customer data.

    You may be given incomplete data, in that case also respond with an appropriate question asking for relevant information for the customer. 
    If the required data is complete, respond with the data and a quick message for the customer.

    In addition to the customers prompt, you will also be given your previous output to fill in the missing data.
    Do not fill in data that was not yet mentioned by the customer (except taking data from the previosu response). If you are unsure about certain details, ask the customer.

    An example:
    Customer: Hey is this the pizza place?
    Output: {{ "customer_message": "Yes, would you like to order something?", finished: false }}
    Customer: Yes, can I get three pizzas with mushrooms and one pizza with spinach?
    Output: {{"pizzas": [ {{"pizza_type": "pizza with mushrooms", "amount": 3}}, {{"pizza_type": "pizza with spinach", "amount": 1}}], "finished": false, "customer_message": "Sure! Anything else?"}}
    Customer: No, that's it!
    Output: {{"pizzas": [ {{"pizza_type": "pizza with mushrooms", "amount": 3}}, {{"pizza_type": "pizza with spinach", "amount": 1}}], "finished": false, "customer_message": "Alright! Where do you want the delivered to?"}}
    Customer: 1234 Oak Avenue.
    Output: {{"pizzas": [ {{"pizza_type": "pizza with mushrooms", "amount": 3}}, {{"pizza_type": "pizza with spinach", "amount": 1}}], "street_address": "1234 Oak Avenue", "finished": false, "customer_message": "Which city?"}}
    Customer: New Amsterdam.
    Output: {{"pizzas": [ {{"pizza_type": "pizza with mushrooms", "amount": 3}}, {{"pizza_type": "pizza with spinach", "amount": 1}}], "street_address": "1234 Oak Avenue", city: "New Amsterdam", "finished": false, "customer_message": "And what's your name?"}}
    Customer: John Doe.
    Output: {{"pizzas": [ {{"pizza_type": "pizza with mushrooms", "amount": 3}}, {{"pizza_type": "pizza with spinach", "amount": 1}}], "street_address": "1234 Oak Avenue", city: "New Amsterdam", "name: "John Doe", "finished": true, "customer_message": "Alright that's it! Thanks for your order - it should arrive within the hour."}}

"""

natural_template = ChatPromptTemplate([
    ("system", natural_flow_prompt),
    ("ai", "{prev_output}"),
    ("user", "{user_string}")
])

natural_llm = chat.bind_tools([PizzaOrderNatural])
natural_llm = natural_template | natural_llm

def run_loop():
    finished = False
    structured = {}
    while not finished:
        user_prompt = input("User input: ")
        message, structured = send_message(user_prompt, str(structured))
        print(message)
        finished = structured["finished"]
    print(structured)

run_loop()

User input:  Hi can I get a pizza?
User input:  Uh, two pizzas with ham and mushrooms, one pizza pepperoni and one pizza with tuna and onions.
User input:  Ludwigstraße 1 in Munich
User input:  No.
User input:  No extra cheese, Ludwigstraße 1 in Munich
User input:  No extra toppings, do you really want my phone number??
User input:  No, thanks

Could you please specify the type of pizza you would like and the delivery address?
Please provide your delivery address and the city.
Could you please confirm if you need any toppings added to the pizzas?
Could you also let me know would you like to add extra cheese to your pepperoni pizzas and the delivery address please?
Thank you, Emily! Just to confirm, would you like to specify any toppings other than the pepperoni? Also, may I have your phone number for order updates?
No extra toppings. I prefer not to share my phone number. Is there anything else I need to provide for your order?
No other information is needed, thank you.
{'customer_message': 'No other information is needed, thank you.', 'pizzas': [{'pizza_type': 'cheese pizza', 'amount': 2}], 'street_address': '123 Baker Street', 'city': 'London', 'name': 'Jane Smith', 'finished': True}

That was even worse! It completely made up information and nothing was extracted correctly. Using a different model might be better, but I want to show how even such a small model can produce desirable results. We can do this by extracing the data sequentially.

Using LangGraph for a more structured approach

Now we have got some experience with the model - apparently it’s good at extracting one piece of information at a time. Let’s work with that approach in the next part to create a feasible model using LangGraph.

LangGraph is also part of the LangChain ecosystem by extending LangChains capabilities. LangGraph allows to construct graphs where the state is manipulated by nodes connected by edges. Each node is just a function (can be an LLM call, but can also be a simple python function or a tool call (something LLMs can use)) which is run.

We can model the above situation like this: Rather than passing the previous data & the system prompt each time, we instruct the LLM to extract the data that we do not yet have one entry at a time. After an entry has been extracted, a python function will check if the data is complete and otherwise instruct the LLM to get the next piece of data.

I hope that this post has been helpful enough so far, I always appraciate feedback!