Transformers

If you interacted with ChatGPT in any way at all, you’ve interacted with AI that uses transformers (the "GPT" in ChatGPT stands for "Generative Pretrained Transformer"). But what is a transformer? Transformers are neural networks that utilize attention mechanisms (inspired by the famous paper published in 2017, "Attention is All You Need") and the encoder decoder framework. They build off the previous modeling leader (LSTM, a type of RNN) and outperform by a wide margin by being combined with transfer learning (the "Pretrained" portion in GPT). To summarize, the novelty of transformers lie in 3 key areas:

  • Attention Mechanisms

  • Transfer Learning

  • The Encoder-Decoder Framework

If you will be working with LLM’s (and by extension, transformers) you should learn about those 3 areas in particular.

Large Language Models (LLM’s)

Large Language Models today are almost all trained with using a transformer architecture, at least to some degree if not entirely. Large Language Models are designed and trained on large amounts of corpora with the intent of creating a general purpose model that can both understand and generate language. They utilize billions of parameters, massive computational resources, but can perform remarkably well.

LLM’s (at least the most successful ones including OpenAI’s ChatGPT, Google’s Bard, Meta’s LLaMa, etc) require so much development and computational resources so as to make their design more or less impossible for a small team to produce (hence why large corporate investment has been the promulgator of these models, and not a few clever data scientists at a startup). That said, if you wish to build transformers/LLM’s, you ought to join one of the organizations already making them (or focus on making small niche models- at that point it might not be a Large Language Model however).

If you seek to incorporate a LLM into your project, start with the API for whichever LLM you think is best. Most of them are both open source and free to use for any purpose (research or commercial in particular). Here are the links to some of the most well known ones:

How Can I Utilize Transformers In My Project?

Transformer architecture has seen use primarily in NLP, but also in computer vision and audio. If you plan to utilize LLM’s in any way, then you are almost certainly going to be interacting with a transformer today, even if indirectly. Since transformers are still relatively new, their application space is somewhat unclear and there is much research being done to understand their strengths, weaknesses, and applications.

Resources

The content here is hand selected by Data Mine staff, and all of it is free for Purdue students (including the book links); most of it should be free for National Data Mine students as well (check your school’s digital library resources for the books).