You can find the reference code for custom implementation using PyTorch with model training notebook and small project of natural text generation here: https://github.com/agasheaditya/handson-transformers

Machine cannot understand the text directly, so it uses some encoding technique to represent words/sentences into a embedding space. In transformers positional encoders will be used to generate vectors which gives context based on position of word in sentence. After passing a sentence to a encoder we will get word embeddings and passing through position encoding we will have positional encoding for that word i.e. context.

Encoding component is nothing but a stack of encoders where it takes input sequence in parallel. N identical encoder blocks are stacked on each other and those does not share weights in between. All encoder are identical in structure and each one is divided in 2 sub-parts.

The paper stacks 6 of them on top of each other, this value can be changed and experiment can be performed.
Self-attention :- It is also referred as Scaled Dot-Product Attention mechanism which allows the model to compute the significance of different words in a sequence against each other while considering their relationships. The encoders input will be passed to this layer where this layer helps encoder look at other words in the input sentence as it encodes the specific word.