Our world is not merely random but has structure. Every day the sun rises and sets, quadrupeds with their eyes in front of their head want to eat me, fire is hot and will damage my body, and so on. Our, that is all living things, primary task is to discover as much structure as possible so that we can leverage it effectively and manipulate our world. In fact I want to argue that the primary goal of learning is prediction. If I can predict the world I can find food, defend myself, survive and ultimately bear offspring. Thus, "learning = prediction"
But learning is not just remembering what we see. Why? because we will never encounter exactly the same situation again. Instead we need to derive rules that can be applied (generalized) to new situations as well. A nice example is the "eye pointing forward rule". If I noticed it for tigers and lions, I can now apply it to panthers, wolves and bears. So, we need to connect the dots: interpolate between the things we have experienced. Another example: when a child sees her mommie appear and disappear she learns that out of sight does not mean non-existant. S/he can now apply this rule to cars being momentarily occluded which may save her life. Thus, "learning = generalization".
When we have learned to predict something well, we have a feeling that we have "understood" something. Perhaps the underlying causes. For instance, we might learn that all animals with bright colors are poisoness. We have learned a concept or an abstraction that aptly explains a class of events with an elegant rule. Most of modern science explains phenomena through powerful often mathematically driven abstractions. Thus, "learning = abstraction".
When you zip your files on your computer, you compress the number of bytes necessary to describe its contents but not the information (which can easily be reconstructed perfectly.) So the original format must have been redundantly encoded. Indeed, language is redundant in the sense that you can often predict the second half of a word given the first half. Similarly, photo's are highly redundant because if I would pay you 1 dollar every time you would predict the color of a pixel correctly given the pixels surrounding it, you would earn a lot of money. Hence, the better we can predict, the more structure is present, the more we can compress and in fact the more there is for us to learn. Hence, "learning = compression".
Evolution has heavily selected for creatures that predict well. Humans are the pinnacle of this process, the ultimate "prediction machines". We can now predict far beyond what seems evolutionary necessary, such as the mass of the electron, or the speed of light. We have run away with our ability to predict and as a result completely dominate the world (I guess in a certain sense only because one could argue that ants rule the world in a different sense.) But, being able to predict well does seem pays off.