You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running SmolAgents CodeAct for tool calling, we often observe that smaller open-source models struggle with complex tool-use tasks — and sometimes even fail at simple ones. While careful prompt engineering can mitigate this problem, it’s not a sustainable solution, especially in dynamic agentic systems where any workflow change can disrupt tool-calling accuracy.
To address this issue at its core, the ideal approach is to train/fine-tune models to use tools effectively. However, this is a non-trivial task that requires setting up complex machine learning pipelines tightly integrated with the agentic system — something that can be challenging for most developers.
To make this process easier, we’ve developed a lightweight open-source library that removes the need to build these pipelines from scratch with MIT license for more information https://github.com/ToolBrain/ToolBrain
✨ Key Features
🤖 Learning algorithms: Supports GRPO, DPO, and supervised learning.
🎯 Flexible rewards: Define your own reward functions or use LLM-as-judge.
🔧 Tool management: Scalable retrieval for managing large tool collections.
📊 Knowledge distillation: Distill large teacher models into smaller student models for efficiency.
🚀 Zero-learn: Automatically generate training tasks.
⚡ Efficient training: Supports FP16 finetuning, LoRA, Unsloth, and BitsAndBytes for resource-efficient training.
🧠 Multiple agent frameworks: Supports SmolAgent and LangChain, with more coming soon.
fromsmolagentsimporttool, TransformersModel, CodeAgentfromtoolbrainimportBrainfromtoolbrain.rewardsimportreward_exact_match# --- 1. Define Tools and Reward Function (User-defined) ---@tooldefadd(a: int, b: int) ->int:
""" Add two integers. Args: a (int): First addend. b (int): Second addend. Returns: int: Sum of a and b. """returna+b# --- 2. Prepare Training Data ---training_dataset= [
{
"query": "Use the add tool to calculate 5 + 7",
"gold_answer": "12"
}
]
# 3. Create agentmodel=TransformersModel(
model_id="Qwen/Qwen2.5-0.5B-Instruct", # use a bigger model for better resultsmax_new_tokens=128
)
agent=CodeAgent(
model=model,
tools=[add],
max_steps=1
)
# 4. Create Brainbrain=Brain(
agent, # Agent instancealgorithm="GRPO", # Algorithm choicereward_func=reward_exact_match# A reward function, you can customise any python function as reward
)
# 5. Train the agent with GRPO stepsbrain.train(training_dataset, num_iterations=10)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
When running SmolAgents CodeAct for tool calling, we often observe that smaller open-source models struggle with complex tool-use tasks — and sometimes even fail at simple ones. While careful prompt engineering can mitigate this problem, it’s not a sustainable solution, especially in dynamic agentic systems where any workflow change can disrupt tool-calling accuracy.
To address this issue at its core, the ideal approach is to train/fine-tune models to use tools effectively. However, this is a non-trivial task that requires setting up complex machine learning pipelines tightly integrated with the agentic system — something that can be challenging for most developers.
To make this process easier, we’ve developed a lightweight open-source library that removes the need to build these pipelines from scratch with MIT license for more information https://github.com/ToolBrain/ToolBrain
✨ Key Features
🤖 Learning algorithms: Supports GRPO, DPO, and supervised learning.
🎯 Flexible rewards: Define your own reward functions or use LLM-as-judge.
🔧 Tool management: Scalable retrieval for managing large tool collections.
📊 Knowledge distillation: Distill large teacher models into smaller student models for efficiency.
🚀 Zero-learn: Automatically generate training tasks.
⚡ Efficient training: Supports FP16 finetuning, LoRA, Unsloth, and BitsAndBytes for resource-efficient training.
🧠 Multiple agent frameworks: Supports SmolAgent and LangChain, with more coming soon.
Beta Was this translation helpful? Give feedback.
All reactions