Hancom said on Wednesday it is unveiling OpenDataLoader PDF v2.0, an open-source PDF data extraction tool that it said achieved No. 1 performance in benchmarks in the open-source PDF data extraction segment.
The biggest feature of this version is a hybrid engine that combines an AI method and a direct extraction method. Companies and developers can use PDF data extraction for free in an isolated local environment without concerns about data leaking to external servers.
The OpenDataLoader PDF v2.0 comes with 4 free AI add-ons installed by default to extract complex elements in documents, the company said. Optical character recognition, or OCR, improved text recognition rates for image-based PDFs and scanned documents. Table extraction analyses complex table structures such as merged cells using an ultra-lightweight AI model. Formula extraction recognises formulas in science and mathematics papers in a local environment, while chart analysis explains the context of what a chart means in sentence form.
These add-ons were implemented to be compatible with other open-source AI models such as Docling. The company said it is not in an official partnership or sponsorship relationship with any specific entity, but stressed it has secured objective technical compatibility so users can integrate them into existing technology environments.
Hancom made public all benchmark test data and reproducible detailed code in its official GitHub repository to demonstrate transparency, a core value of open source.
With this release, it changed its open-source licence from MPL 2.0, or Mozilla Public License 2.0, to Apache License 2.0. It said it lowered barriers to entry for external developers and global IT companies by switching to a licence that allows the most freedom for commercial use.
Hancom is also pursuing an expansion of its ecosystem in line with the era of autonomous AI agents. It completed integration with LangChain in 2025 and will expand integration in 2026 with various AI frameworks such as Langflow, LlamaIndex and Gemini-cli. It is also preparing Model Context Protocol, or MCP, functions to support AI agents.
It plans to launch a commercial AI add-on in the second half of 2026 that consolidates its proprietary document AI technologies. It will also include technology in which AI analyses document structure and automatically generates accessibility tags.
Hancom Chief Technology Officer Ji-hwan Jeong (정지환) said OpenDataLoader PDF v2.0 evolved into an open PDF data platform that anyone can freely use and expand through an AI hybrid engine and the switch to the Apache 2.0 licence. He said the company will lead the global ecosystem so that PDF documents worldwide can be used by AI and become open documents for everyone through future commercial AI add-ons and accessibility solutions.