AI (Artificial intelligence) is a subfield of computer science that was created in the 1960s, and it was/is concerned with solving tasks that are easy for humans but hard for computers. In particular, a so-called Strong AI would be a system that can do anything a human can (perhaps without purely physical things). This is fairly generic and includes all kinds of tasks such as planning, moving around in the world, recognising objects and sounds, speaking, translating, performing social or business transactions, creative work (making art or poetry), etc.
NLP (Natural language processing) is simply the part of AI that has to do with language (usually written).
Machine learning is concerned with one aspect of this: given some AI problem that can be described in discrete terms (e.g. out of a particular set of actions, which one is the right one), and given a lot of information about the world, figure out what is the “correct” action, without having the programmer program it in. Typically some outside process is needed to judge whether the action was correct or not. In mathematical terms, it’s a function: you feed in some input, and you want it to produce the right output, so the whole problem is simply to build a model of this mathematical function in some automatic way. To draw a distinction with AI, if I can write a very clever program that has human-like behavior, it can be AI, but unless its parameters are automatically learned from data, it’s not machine learning.
Deep learning is one kind of machine learning that’s very popular now. It involves a kind of mathematical model that can be thought of as a composition of simple blocks (function composition) of a certain type, and where some of these blocks can be adjusted to better predict the outcome.
The word “deep” means that the composition has many of these blocks stacked on top of each other, and the tricky bit is how to adjust the blocks that are far from the output, since a small change there can have very indirect effects on the output. This is done via something called Backpropagation inside of a larger process called Gradient descent which lets you change the parameters in a way that improves your model.