Home
News
Tech Grid
Data & Analytics
Data Processing Data Management Analytics Data Infrastructure Data Integration & ETL Data Governance & Quality Business Intelligence DataOps Data Lakes & Warehouses Data Quality Data Engineering Big Data
Enterprise Tech
Digital Transformation Enterprise Solutions Collaboration & Communication Low-Code/No-Code Automation IT Compliance & Governance Innovation Enterprise AI Data Management HR
Cybersecurity
Risk & Compliance Data Security Identity & Access Management Application Security Threat Detection & Incident Response Threat Intelligence AI Cloud Security Network Security Endpoint Security Edge AI
AI
Ethical AI Agentic AI Enterprise AI AI Assistants Innovation Generative AI Computer Vision Deep Learning Machine Learning Robotics & Automation LLMs Document Intelligence Business Intelligence Low-Code/No-Code Edge AI Automation NLP AI Cloud
Cloud
Cloud AI Cloud Migration Cloud Security Cloud Native Hybrid & Multicloud Cloud Architecture Edge Computing
IT & Networking
IT Automation Network Monitoring & Management IT Support & Service Management IT Infrastructure & Ops IT Compliance & Governance Hardware & Devices Virtualization End-User Computing Storage & Backup
Human Resource Technology Agentic AI Robotics & Automation Innovation Enterprise AI AI Assistants Enterprise Solutions Generative AI Regulatory & Compliance Network Security Collaboration & Communication Business Intelligence Leadership Artificial Intelligence Cloud
Finance
Insurance Investment Banking Financial Services Security Payments & Wallets Decentralized Finance Blockchain Cryptocurrency
HR
Talent Acquisition Workforce Management AI HCM HR Cloud Learning & Development Payroll & Benefits HR Analytics HR Automation Employee Experience Employee Wellness Remote Work Cybersecurity
Marketing
AI Customer Engagement Advertising Email Marketing CRM Customer Experience Data Management Sales Content Management Marketing Automation Digital Marketing Supply Chain Management Communications Business Intelligence Digital Experience SEO/SEM Digital Transformation Marketing Cloud Content Marketing E-commerce
Consumer Tech
Smart Home Technology Home Appliances Consumer Health AI Mobile
Interviews
Anecdotes
Think Stack
Press Releases
Articles
  • Home
  • /
  • News
  • /
  • AI
  • /
  • Machine Learning
  • /
  • AI Workflows Get New Open Source Tools to Advance Document Intelligence, Data Quality, and Decentralized AI with IBM's Contribution of 3 projects to Linux Foundation AI and Data
  • Machine Learning

AI Workflows Get New Open Source Tools to Advance Document Intelligence, Data Quality, and Decentralized AI with IBM's Contribution of 3 projects to Linux Foundation AI and Data


AI Workflows Get New Open Source Tools to Advance Document Intelligence, Data Quality, and Decentralized AI with IBM's Contribution of 3 projects to Linux Foundation AI and Data
  • by: Source Logo
  • |
  • June 19, 2025

New projects strengthen the open source AI and data ecosystem and expand the Foundation's technical portfolio

LF AI & Data Foundation, an umbrella foundation of the Linux Foundation supporting open source innovation in artificial intelligence and data, today announced the induction of three new open source projects contributed by IBM: Docling, Data Prep Kit, and BeeAI. All three projects have officially been inducted by the LF AI & Data Technical Advisory Committee.

These contributions significantly enhance LF AI & Data's technical landscape in three rapidly growing domains—semantic document understanding, enterprise-grade data preparation, and privacy-preserving federated learning—reinforcing the foundation's mission to build a sustainable and open AI ecosystem.

The New Projects:

  • BeeAI is the first open-source agent-to-agent platform for developers to build, discover, run, and compose agents and create multi-agent workflows. Powered by the open Agent Communication Protocol (ACP), BeeAI makes it easy to discover and connect AI agents from any framework or tech stack.

  • Docling is an open-source, state-of-the-art ecosystem of tools (python packages) to do document conversion, generation and manipulation. It enables users to easily build pipelines to extract structured information from complex documents. With over 27K stars on github, Docling is already well on its way to becoming the de facto standard.

  • Data Prep Kit is a modular suite of tools designed to clean, transform, and trace unstructured data for LLMs with a focus on quality, transparency, and scalability. It supports both batch and streaming data scenarios and integrates easily with modern AI workflows.

"We are excited to welcome Docling, Data Prep Kit, and BeeAI into the LF AI & Data family," said Todd Moore, SVP, Community Operations at the Linux Foundation and interim Executive Director, LF AI & Data. "These contributions from IBM reflect a strong commitment to open collaboration and responsible AI. I love BeeAI's commitment to both Javascript and Python for aggregated learning."

"Docling, Data Prep Kit, and BeeAI were born from a need to fill critical gaps in AI development tooling and accelerate innovation in the Generative AI space. We're proud to see them as a catalyst enabling the broader open-source community to build AI applications and agentic workflows," said Brad Topol, Distinguished Engineer and Director of Open Source IBM. "We're excited to collaborate with the open-source community to evolve these technologies and solve real-world challenges together."

Governance & Community Collaboration

The projects will benefit from the governance, technical support, and ecosystem engagement that LF AI & Data provides to its hosted projects. All three projects have officially been inducted by the LF AI & Data Technical Advisory Committee (TAC) and will establish neutral, community-driven technical steering committees.

The projects are now publicly available for exploration and contribution. Developers, data scientists, and researchers are encouraged to get involved and shape the future of these impactful technologies.

For more information and to get involved, visit: https://lfaidata.foundation

 

About the Linux Foundation

The Linux Foundation is the world's leading home for collaboration on open source software, hardware, standards, and data. Linux Foundation projects are critical to the world's infrastructure, including Linux, Kubernetes, LF Decentralized Trust, Node.js, ONAP, OpenChain, OpenSSF, PyTorch, RISC-V, SPDX, Zephyr, and more. The Linux Foundation focuses on leveraging best practices and addressing the needs of contributors, users, and solution providers to create sustainable models for open collaboration. For more information, please visit us at linuxfoundation.org.

The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see its trademark usage page: www.linuxfoundation.org/trademark-usage. Linux is a registered trademark of Linus Torvalds.

Media Contact
Jill Lovato
The Linux Foundation
jlovato@linuxfoundation.org

News Disclaimer
  • Share