Protein Sequence Embedding

In the Word Embeddings Starter Guide page you learned how embeddings can be used to capture the relationships between words. But how do we use embeddings to capture the relationship between…​ proteins? One way to think of this is that biology speaks in a language too, such as with genetic changes over long periods of time. Protein sequence embedding is a way to infer the biological properties of protein sequences using machine learning, and at the same time predict potential new protein sequences while optimising for specific biological properties.

Common Applications

It is incredibly useful for healthcare and/or pharmeceutical drug discovery in particular, or any industry that encounters biological data. This is a field that intersects with biology (bioinformatics and/or biochemistry), data science, and healthcare.