My Blog

The idea of my blog

Posted on March 28, 2023

Just a bit of housekeeping before my brain starts to dribble all over the internet. Firstly, ChatGPT taught me how to make a website so if anything is broken the blame lies squarely with OpenAI and the people whose HTML code I took. The learning process of making a website has been somewhat upside down. I have been classically trained as a theoretical physicist and therefore have a slight obsession with a first principles approach that eventually leads into complexity. In the case of making a website, I know next to nothing and I do not feel that it is worth investing too much time in this venture. That said, I also cannot commit myself to a cooky cutter mass production framework with limited flexibility. So now I start with complexity from my frame of reference and then learn how things are connected by watching what breaks when I change the code. This is effectively the best way to understand black boxes like the brain. Please do not take my blog too seriously, I sometimes like embellishing sentences far too much, I sometimes suffer from untastefully framing myself as the protagonist, and I also sometimes make statements that are too bold for my knowledge of the subject. This is a blog and not a scientific article, and I am perfectly capable of switching between mad ramblings and accurate careful research when the time requires it. You have been warned ;-)

How to predict AI's next step

Posted on March 28, 2023

To be honest, I am not sure, big shock, but I have a suspicion that it is intimately linked and powered by the limits of our own nervous system. Against the fast and ever-changing technological landscape, our human mental hardware appears almost static as it follows the slow timescales dictated to by evolution. This fact implies that any transformative breakthroughs in technology will always be defined by how well they map to our oldest instincts and intuitions. Survival, at least in the distant past, was defined as the art of proper energy management, and there is nothing we love more than relieving our most energy-demanding organ of its tasks. Wrapping technology tightly along the contours of our craniums, bypassing the wasteful higher cognitive portions of the prefrontal cortex, and injecting information directly into our limbic system is the fastest way to be cheaply productive. So my guess is that technology will always be driven in the direction of intuition, and shaped by our oldest faculties. As of 2023, ChatGPT is the fastest-growing product. Why? Because humans have been talking to one another far longer than they have been operating mice, tapping keyboards, and reading off lists from an internet search query. ChatGPT has allowed us to have a personal relationship with no less than the accumulated knowledge of humanity, and at a shockingly cheap energy expense. If you like to hold yourself suspended in the dual superposition of religion and sacrilegious, the internet has become flesh and dwelt among us.

One reason for the unreasonable effectiveness of neural networks

Posted on March 28, 2023

One reason is the composite structure of the fully connected feed-forward neural network. It is composed of a layered set of simple functions $$ \mathcal{F}=f_{\boldsymbol{\theta}^{(L)}}^{(L)} \circ f_{\boldsymbol{\theta}^{(L-1)}}^{(L-1)} \circ \cdots \circ f_{\boldsymbol{\theta}^{(2)}}^{(2)} \circ f_{\boldsymbol{\theta}^{(1)}}^{(1)}. $$ Such a structure lends itself to fast optimization via gradient descent, as the derivative of the loss function with respect to the network's parameters reduces to a series of Jacobian matrix multiplications. This means that in general, neural networks represent a class of non-linear functions that are faster to optimize than the general set of all non-linear functions. This composite of weak classifiers combining to be greater than the sum of their parts is also reminiscent of the Boosted Tree paradigm that has proven to be a competitor of NNs.

Man on the rim and the equivalence of attention and Creativity

Posted on March 30, 2023

The whole of human history has been a continual and relentless assault on our egos. We used to think that we were the center of the universe, and then a few people got burnt at the stake after discovering the heliocentric model. This was our first uncomfortable displacement from the center. So what did we do? We hid and took refuge in our special positions ordained by god until Darwin dismantled our souls and removed the dramatic invisible backdrop of angels and demons. A long-predicted nihilistic Nietzschean crisis that each individual must confront. After Deep Blue beat Garry Kasparov, the mantle of logical supremacy was removed from our monkey-shaped heads. Another uncomfortable and painful renormalization for the human species. Finally, our last refuge of creativity has been obliterated by AI models such as Stable Diffusion...

One could actually argue that the secret sauce of the transformer-based mechanism powering large language models is the stuff of creativity itself. A satisfactory first approximation at a definition of creativity might go something like this: Creativity is the finding of unexpected connections between things that appear on the surface to be distant. Under this definition, the attention mechanism found in transformers might be viewed as hardcoding creativity directly into our models. This self- and cross-attention powered by no more than three linear representations of an embedding and a scaled dot product with a few skip connections might prove to be the flame of creativity. Attention literally treats items that are proximal to one another on equal footing to items that are separated by thousands of characters, thus obliterating the idea of distance and providing fertile ground for creative connections.

The problem of explainability, degenerate minima, and frozen-in effects

Posted on April 07, 2023

Whenever something is mathematically ill-defined, it cannot be subjected to optimization via gradient descent. We find ill defined objectives all the time in machine learning, and indeed such effervescent problems are even the focus of entire branches of research such as explainable AI, which is charged with the task of finding optimal explanations behind model decisions. Whenever this is the case, the standard tools of optimization have to be dropped, and one has to appeal to the law of large numbers as well as a few unjustified claims to make any further progress. For instance, when attempting to find the "best" explanation for a binary-classification task, i.e., highlighting the network's discriminant region, one often encounters multiple explanations that result from the same model with different random initializations, all of which achieve the same performance on the classification task at hand. In other words, the need for explanations appears to break the problem into two optimizations, the first being easily parameterized via a metric derived from a confusion matrix or the BCE-loss, while the second cannot be wrapped into a differentiable function and subjected to gradient analysis. The results seem to indicate the existence of a broad central minima in the loss space. We can approach the minima in the standard way while improving the loss, however, when we hit the minima it turns into ice and all of our tools no longer have traction. The unfounded assumption that we make is that the best solution amongst the myriad of equivalent solutions is the average of the highest scoring set i.e., the solution at the center of gravity. To obtain this solution one can use a Monte Carlo method. Additionally, if the problem is easy to solve, classifiers might need only be aware of a portion of the discriminant region, resulting in a saturation of model performance and a freezing in effect of irrelevant features. Incidentally, this is what I think is also happening with the multi-headed paradigm in transformer models. The multiple heads represent a type of semi-smart Monte Carlo method to gain traction on an otherwise intractable loss surface called "contextual meaning".

Modern gods

Posted on April 16, 2023

I believe that Darwin’s theory of natural selection is at a greater level of sophistication than most theories in physics. This outlandish statement requires a rather substantial defense. The theory of evolution ripped the hand of god out of our creation story and completely abolished all traces of mysticism that shrouded our origins. God was replaced with a set of simple axioms, and our evolution into complexity came from a game of pure statistics and mathematics. In physics, we hide our gods behind the fundamental forces of nature, in fact, a force is a modern Western version of mysticism rapped into a palatable 21st-century sensibility. The Queen of physics and the beacon for all our theories is that of statistical physics. Here, time has been compressed and wrapped up into a formula called entropy $S=k_b\log \Omega$, which is equivalent to the log of the number of microstates multiplied by some constant called the Boltzmann constant. A macrostate is something defined by a tuple, for instance, if we have a gas in a particular macrostate, its physical properties are completely determined by its pressure, temperature, and density, however, one can rearrange the atoms (the microcomponents) in the gas and still achieve the same macrostate. It is then a matter of counting to determine that the most likely state is the state that can be realized in the greatest number of ways, i.e., the macrostate with the greatest number of degenerate microstates. This idea is vacant of any access assumptions and treats all things equally. The second law written upon the stone tablet of statistical physics is that this thing called entropy always increases. This law, which is a matter of counting under equal treatment, winds up the clockwork universe, and unrolls the arm of time, setting everything into motion. To cement the decrease of mystic sterility of this framework, think of four playing cards, each with two sides, face up and face down. If I throw them all into the air what configuration should I expect to find them in when they hit the ground? I should expect two up and two down, since there is only one realization, i.e., one microstate where all cards are face up, but six microstates where two cards are up and two down. Now the cards could all face up by chance, however, if I average the outcomes after many throws it would appear that a force was guiding the average configuration into a two-up two-down outcome. Although we can easily identify this as a “pseudo force”, it becomes more convincing as a fully fledge force when the number of components increases from four to forty trillion and the shuffling rate ticks by at the scale and fine granularity of virtual time. There is no law against cool objects transferring heat to warm objects, or the particles in a room suddenly occupying a single corner. All of these miracles are simply so unlikely that they appear to be written in stone. A theory can be judged and ranked based on how minimalistic its assumptions are, and how few axiomatic pillars support its body. Newton’s theory of gravity was unsatisfactory $F=Gm_1m_2/r^2$, not only because it lacked precision when applied to mercury's orbit, but because it hides a secret pantheon of gods behind its formula. On the other hand, the theory of general relativity proved far more satisfactory not only because it resolved the mismatch between theory and observation, but because it provided an emergent force carried on the back of something far more basic, that the universe will always select the leased energy expenditure, that objects will always travel in straight lines, and that if spacetime is bent into a non-euclidean geometry, this laziness is spelled out along the trajectories of geodesics. A theory is successful if, along with a god, it gives us that gods creation story.

The rush for irrelevance / How to build a stable system / Robin Hood's shadow

Posted on July 01, 2023

Computer programmers are obsessed with automation. They like to take complex tasks embroidered with sophisticated mathematics and deliver the thunderbolt of Zeus into the hands of the Athenians. Their prime directive is, therefore, the dissemination of power from the ivory towers of academia into the simple API of your grandmother's iPhone. Different brands of governance and society are defined by the density of power distribution. If a society is ruled by a single dictator, then the system as a whole can move at the speed of a single individual. However, its fate is governed by the stability of a single human mind. This system parades itself as an inherently volatile and egocentrically driven animal with fast reflexes, coincidentally what every military system requires: the fast mobilization of a hierarchy where successive lower ranks have no mind of their own but are like puppets connected to the information flow from above, filtering efficiently down from the head. On the other side of the spectrum, power can be diffused among all its members equally. This is the programmer's obsession, whether they know it or not. The byproduct of their efforts, technology, seeks to flatten the power distribution. In the best case, laws become sluggish but fair and stable. Unfortunately, technology also comes with a greater capacity for destruction. And although one would imagine that the mass delocalization of power would be inherently good, we end up in a far worse position than that of the tyrant. The tyrant is there because they demonstrated a particularly high aptitude that distinguished them from the pack. In some sense, they were society's fittest individual (granted the social ladder might promote sharks). On the other hand, when the programmer opens up the power channels, when information and technology flow from institutions into the pockets of men, the safety of society then rests upon its weakest and most fragile of minds. The problem is not that power is delocalized; it is that at the same time, power is massively increased for all. When the internet makes it possible for nonprofessionals to dirty the information highways in fields they are unfamiliar with, when deep fakes make it impossible to see the path of truth, when thermonuclear devices can fit into a pocket, when blueprints of world-ravaging viruses can be forged like cocktails by schoolchildren, then the programmer, the technologist, the scientist, like a greedy algorithm maximizing the monetization of ideas, would have achieved their secret purpose.

Solar satellite images / The great blender

Posted on July 02, 2023

The light that we receive from the universe is due in large part to electrons jumping between atomic energy levels. One of the rules is that only the electrons in the outer energy shells can participate in this energetic dance (mostly). The emissions wavelength/energy is related to the gap between the two levels jumped. The furious temperature structure of the solar atmosphere with height peels off the outer layers of specific ions revealing different possible sets of jumps and therefore wavelengths. If we restrict the light that we receive with some filter, then effectively we will access different temperatures, and consequently access different heights. This provides the hope that we could study solar phenomena at different heights in the atmosphere. One would naively guess that as we peel off electrons on the outer shell, the magnitude of the difference between the energy jumps behave linearly so that height becomes a tightly bound function of temperature, but this is entirely not true and is dictated by the particular atomic rules governing the particular collection of ions. The situation becomes further complicated since we normally have filters that allow a small band of wavelengths through, meaning that hot emission can simultaneously be captured with cool emission. The only refuge is solar abundance, i.e., that at a particular temperature, some emission from a particular process and ion dominates the scene. And what if a massive energy dump occurs in the form of a Solar flare? Then all the height structures captured by your instrument amalgamate together in a hot, poorly height-coupled mess, good luck interpreting the photons.

Protecting language's perimeter

Posted on Feb 27, 2024

Society charges certain words such that the most intense human emotions can safely be shuttled across minds without dissipation of meaning or intent. It is my position then that social norms around language are meant therefore to enhance communication. Whether this is the intent of this system of norms or not, I argue that its principal effect is in the realm of proper information trade. Telling your child not to ever swear is an act of smoothing the child’s edges such that they become harmless components of a society that can exert inordinate pressure on them, pressure that is received without bark or a bite but only with a flurry of apologetic whispers and an exposed underbelly. On the other hand, folding such potent words into the common lexicon via the kneading fingers of frequency sets a twilight on the kingdom of communication and with it the expanse and ability of language to properly express and cover one's entire inner emotional landscape. Under this twilight our most important internal reflections diminish into the encroaching shadow and dark. Any robbery of communication in this way is bound to erode the efficacy of society and clog up its processes. Telling your child not to swear carelessly but at the correct moments is, on the other hand, an act of preserving the territory and proper permitter of language. When one only uses these words sparingly their meaning is solidified and their desired effect crystallized. Therefore, I might agree that children not swear, but not for the pathetic reason that it is not nice, or the slightly more advanced reason that it is because they do not yet deserve to wheeled such effective weapons. No, my reasons could not stand more juxtaposed to the common position, I want them not to swear so that when they do society will know they mean fucking business!

On Tolkien's Silmarillion

Posted on March 22, 2024

Tolkien’s prehistory of Middle-earth laid down in the Silmarillion faces a serious narrative problem that arises from the eternal nature of many of its inhabitants: the Elves, the Maiar, and the Valar. In light of their everlasting existence, all events lose their importance. If the lamps of Middle-earth and the trees of Valinor are destroyed, or the Silmarils stolen, or the white ships of the Teleri burned by Fëanor’s flame, indeed even if the entirety of Arda comes to ruin, in the fullness of time all can be remade anew. To avoid meaning from gushing out of the narrative, Tolkien creates a mechanism to sew importance and weight back into these events by stating that some great works can only be performed once; in this way, there is some resource and energy of sorts that is finite and from whose substance and finitude the cycles of time are broken, allowing one to place importance on the unfolding events in his mythology once again. Here, the well of meaning clearly needs to come from the dissipation of a resource; at the heart of meaning is an end, without an end, rocks transform into feathers and oceans into ponds. The clearest watershed between the meaningless undying and the meaningful death lies between the First Age and the coming of Men. This age is symbolized by the greater and lesser lights (the Sun and the Moon) coming from the last leaf and fruit of the great dead trees of Valinor. Cut off from their eternal source of nourishment they therefore epitomise and come to symbolise the nature of the temporary. Whether this is meant purposefully by Tolkien or not I do not know, but the fact that the dying children, the second and lesser children of Ilúvatar, are called the followers of the Sun, and that the Sun indeed has no roots but is a withering fruit, seems to exonerate this point. More evidence that this is Tolkien thesis can be found in the following: “From this time forth were reckoned the Years of the Sun. Swifter and briefer are they than the long Years of the Trees Telperion and Laurelin of Valinor. In that time, the air of Middle-earth became heavy with the breath of growth and mortality, and the changing and ageing of all things was hastened exceedingly; life teemed upon the soil and in the waters in the Second Spring of Arda, and the Eldar increased, and beneath the new sun, Beleriand grew green and fair.” If anything, this paragraph speaks of the joined fate of death and beauty, what Carl Jung would call life’s contrasexual elements and nature. Nietzsche's happiest and deepest thought, his rescuing thought from the pit of nihilism, empties life of its meaning, and lo, we have stared too long and become the abyss, for the shape of the vessel that contains meaning is only made from ends.

Aphorisms

Posted when they come

Things written clearly in bright daylight seldom find rarefied air. Great works of literature are unsolvable, purposefully ambiguous such that they reflect one’s personal nature, in this sense they have a psychological action. True greatness is always assisted by obscurity and ascends on a bed of mist.