Ontology. [Photo: ChatGPT]

In the AI industry, an elusive term, ontology, is frequently discussed as one technology that could overcome the limitations of generative AI. Interest appears to be rising as Palantir is known to have succeeded with ontology. More South Korean tech companies are also highlighting ontology as a specialty.

Ontology can be summed up as explicitly expressing, in writing, something that machines can understand in the way humans think.

That alone makes it difficult to understand what ontology actually is. This reporter met Kim Hak-rae (김학래), a professor in the Department of Library and Information Science at Chung-Ang University, to ask about the concept of ontology and why it is drawing growing attention.

Kim is one of the few scholars in South Korea who has studied ontology for a long time and in depth. When he was younger, he took part in work defining social media tags and other elements as ontology vocabulary. He studied large-scale knowledge graphs at Ireland’s DERI institute and at Samsung Electronics, and now leads Chung-Ang University’s HIKE lab.

◆ What it means to "eat a pear"

A frequently cited example in discussions of ontology is the phrase, "eat a pear." When people read this sentence, they immediately think of a fruit. They do not think of eating a belly, or eating a boat. That is because the brain, in context, selects a concept that naturally aligns with the predicate "eat."

Computers cannot do that. Someone has to write in advance the relationship that "a pear is a fruit, fruit can be eaten, therefore it aligns with the predicate 'eat'" for a computer to understand it.

That is what ontology does. The key word is "explicit." Like knowing how to swim in your body but finding it hard to put into words, it pulls out tacit knowledge embedded in humans into documents or data so anyone can verify it.

◆ Neither a relational database nor an LLM

It becomes clearer when compared with other data systems.

Relational databases (RDB), used since the 1970s, store facts. These include statements such as "a MacBook costs 2 million won" and "a MacBook comes in silver and space gray." But they cannot capture conceptual meaning or hierarchies such as "a MacBook is a type of laptop."

Vector databases, widely used in recent AI services, convert vast amounts of text into numbers and calculate distances between concepts as scores. If a MacBook and an iPhone are in the same vector space, the logic is that they are close to the concept of "Apple."

Large language models (LLMs) process text based on probability. They do not separately define objects or relationships between objects. They learn vast amounts of text to predict the next word and generate the most plausible answer in context.

Ontology is different. It explicitly structures a knowledge system such as "Apple is a company (concept)," "a MacBook is a type of laptop (relationship)," and "a laptop has a keyboard and a screen (attributes)."

◆ A bread mold and bread

Ontology consists of two layers.

One is the class area, a conceptual mold. For example, if you define the concept of a "student," you set the mold by stating that a student is a person, has a student ID number, has a name, and has gender and a place of origin. If you make this mold universal, it can be applied to students worldwide. If you make a resident registration number a required item, it becomes a mold that fits only South Korean students.

The other is the data that fills the completed mold, called instances. If "DigitalToday reporter" is the mold, then "Son Seul-gi" and "Hwang Chi-gyu" are the contents. "Lee Jae-myung," "Trump" and "IU," which do not fit the mold, cannot go in.

The bread analogy also works for understanding knowledge graphs. Google introduced the concept in 2012 and popularised it. If ontology strictly defines both the mold and the data, a knowledge graph may connect only concepts and relationships between concepts without a mold. Ontology needs both a bread mold and bread, but a knowledge graph can work with bread alone.

◆ How ontology is made

The concept of ontology, which began in philosophy, moved into engineering in the late 1990s when Tim Berners-Lee proposed the semantic web. It was a plan to give meaning to web data so that machines can understand it on their own. Ontology is the methodology for implementing that.

The build sequence works like this. Stakeholders agree on concepts and relationships. They design these into a structural diagram and then express them in ontology-specific languages such as OWL and RDF.

It is already standardised to a considerable degree. Domain-specific vocabulary systems are widely used, such as vCard for expressing business card information and Schema.org for covering web content overall. With about 70 to 80 percent already built, many cases reuse existing vocabulary rather than creating everything from scratch.

◆ It can be an AI guardrail

The biggest reason ontology is drawing attention recently is expectations that it can complement the limits of LLMs.

Because LLMs run on probability, they cannot be fully controlled no matter how strong the instructions are. They also cannot block at the source content that is prohibited by policy, such as pornography and violence. If someone asks a circumvention question, there is a chance it can produce an answer depending on the context.

Ontology, by contrast, changes the structure rather than rules. It cannot be led to an answer that has not been defined.

For example, there was an incident in which Replit’s chief technology officer deleted an entire customer database while working on AI agent tasks. If the system definition had been set in an ontology approach as "the customer database is not deleted regardless of any request," it was an accident that could have been prevented. Ontology can be a guardrail that confines the scope of AI execution.

◆ Without data, ontology is not very useful either

In South Korea, talk of ontology has been heard far more often thanks to Palantir. Palantir is an AI software company known for an ontology-based data integration and decision-making platform. Some argue Palantir’s ontology will become an alternative to LLMs, while others believe ontology will become unnecessary if LLMs advance.

But Palantir is instead one of the companies that uses LLMs best. Its real strength also lies more in its data processing platform than in ontology technology. The platform processes any incoming data into graphs with a single click, creating strong lock-in that makes it hard to leave once adopted. Ontology is only one of the ways data is processed within that platform. That means the very view of it as an ontology-versus-LLM structure is wrong.

The ontology boom has also heated up in South Korea over the past year. Many companies and startups have rushed to declare adoption, but few have actually started. Critics say that once projects begin, most hit a wall at data cleansing and end with the project period over without even touching ontology.

Another chronic issue is uneven investment in building and managing data systems. In typical corporate environments where different standards and rules are mixed across organisations, reaching agreement on business logic is difficult from the start. The sequence is to lay the data foundation first, ahead of ontology technology.

Keyword

#Ontology #Palantir #Knowledge Graph #Replit #Schema.org
Copyright © DigitalToday. All rights reserved. Unauthorized reproduction and redistribution are prohibited.