Research & Development

AI-based natural language processing (NLP)

Among the language models, two basic models are worth highlighting. Both their training methods and their functions can be significant and can be useful for applications in other domains or for specific language processing tasks. Both models are concerned with general and context-dependent language processing, but one of them produces a context-dependent text embedding (BERT), while the other is strong in generative domains (GPT) and, under few-shot teaching, performs well in specific complex tasks such as question-answer problems, summary generation, or text categorization.

NATURAL LANGUAGE PROCESSING (NLP) ALGORITHMS

BERT and NSP algorithms

The BERT model (Bidirectional Encoder Representations from Transformers), as its name implies, uses the encoder unit of Transformers to produce compact, easily reusable representations that are appropriate to the domain. An important cornerstone of the use of the encoder is that it also adapts to the Transformer encoder masking method, whereby information can flow freely between all time instants in the sequence (no causality-constraining attention mask). This method is not only useful in language processing, so it is worth understanding its training methods.

For language models, BERT trains by solving two tasks together. The first is MLM (masked language modeling), which performs a general task of language modeling. A small fraction of the words in any given text (generally a single sentence) are masked out and replaced by a [MASK] token. The task of BERT and the simple hidden classifier layer attached to its output, is to recover the vector of the original word from the encoder representations. This task facilitates the generation of context-dependent word embeddings.

The other task, which requires the generation of sentence-level meaning-based representations, is NSP (next sentence prediction). In this case, an extra token suitable for classifying the sentence (or sentences) is added to the input at the beginning of it, with the value [CLS]. The [CLS] token is followed by the tokens of the first sentence, then a [SEP] separator token is added to the input, and finally, the tokens of a second sentence are added, again ending with a separator.

The essence of the method is that, based on the representation of the [CLS] token encoder, a classifier network, also with a single layer, has to decide whether two sentences within a text are consecutive or not. Thus, information about the meanings of the sentences as a whole is stored in the representation associated with the [CLS] token.

One of the main advantages of BERT is that with transfer-training it can be used as a basis for implementing almost any language processing task that requires context.
Another major advantage is that today there are several language-optimised versions, which can be easily and efficiently extended. Examples include:

(1) huBERT – Hungarian language-optimized algorithm

(2) mBERT – Multilingual BERT – an algorithm for multilingual applications

N

Suitable for language processing in any context

N

Multiple variations for multilingual applications

N

Can be used effectively in the design of AI assistants

NATURAL LANGUAGE PROCESSING (NLP) ALGORITHMS

GPT algorithms

The third iteration of the GPT (Generative Pretrained Transformer) model family is currently on the market and is available online as a tool from OpenAI. The model is primarily optimised for English, but can also work in Hungarian as well as a number of other languages.

The GPT models use the Transformer’s decoder, here there is a causality mask, so that the attention weights for future elements are set to zero (no future-to-past information flow). The model has a simple task, it has to predict the next token for a given input. As the name of the model implies, this is a generative task, and the model can be trained on an unlabeled data set as well, just as BERT.

GPT models can then be fine-tuned for promising performance on various tasks. Such fine-tuning is very energy and data-intensive, but it can be seen that OpenAI’s fine-tuned model called “davinci” (the same OpenAI who have developed the now-famous ChatGPT model) can understand and use abstract concepts even in Hungarian when generating text.

The most typical use cases for GPT are the so-called zero-shot situations, where the network does not receive any examples for its output and thus has to make predictions.

However, one of the main advantages of the GPT-3 algorithm, a variant of GPT, is that it is also suitable for few-shot learning. In the case of few-shot learning, some extra information, some samples, are given in front of the text to be generated, from which the rules of the generation can be revealed to the network. This allows it to solve even more complex problems than simple, zero-shot problems, such as recommending a meal from a predefined menu or calculating an invoice amount.

Training models such as GPT is straightforward in the generative case, but using untuned models is difficult. Untuned models should typically be used for basic tasks only, more complex use requires subject-specific tuning and retraining.

N

Suitability for generative use

N

Can be fine-tuned according to the specific use

Research and Development with Trilobita

Phases of our R&D projects

R&D Project planning

In the R&D project planning phase, we help our clients find the most optimal use of resources. We prepare the financial and technical design of the project and prepare the proposal for the selected funding scheme.

Applied research

In the applied research phase, we prepare the necessary research plans. We carry out and document the series of experiments based on our research methodology.

Evaluation of research results

We evaluate the results of the series of experiments using various data analysis methods and prepare the research summary document.

System Planning

Based on the research results, we design the systems for our customers. We use our own system design methodology and tools for the planning.

Development and testing

Our development methodology combines elements of classic waterfall and agile methodologies, flexibly adapting to the needs of the given client and project. The efficiency of our development and testing work is further enhanced by a number of our already tested, ready-to-use system modules.

Support

After the completion of our R&D projects, we always provide follow-up and support services to our customers for the solutions we have delivered. Our goal is to establish successful, long-term partnership with our clients.

We believe that every hour spent on design pays off many times over in the implementation and roll-out of our systems.  Our ergonomically designed user interfaces provide our customers with a new user experience and ease of use.

Contact

info@trilobita.hu

(+36) 1 220 6458

Nagy Lajos király útja 117.
H-1149 Budapest, Hungary