Causal Entropic Forces II: A Piecemeal Definition

What if there were an algorithm which procedurally generated responses to any set of circumstances, one so robust it could be applied to any system? Enter causal entropic forces.

But first, what does “causal entropic forces” mean? Let’s take the words in reverse order.

In physics, we talk about force as mass times acceleration. Force, in our sense, is abstracted as we aren’t talking about objects through space, but systems through states. The “space” is the set of all system states, and the “mass” is the resistance to change in movement through those states. Imagine a business which orders 100 widgets every month. The set of states of this system is the number of widgets the business buys, and the resistance arises from several factors including contracts and scalability. If the business is locked into a contract, it may take some time before it can lower orders to 90 or 80 per month. If the business is processing the widgets before selling them, there is some speed at which it can increase its production capacity – it doesn’t make sense to order more widgets than may be processed. In general, the business will not go from ordering 100 widgets to 50 or 200 overnight. We term this resistance to change inertia.

Entropy is oft called the disorder of a system, but the more mathematical definition is the set of accessible states. Think of the entropy of a system as the amount of information required to represent the system. Read this post for a primer.

An entropic force is a force which increases entropy. Entropic forces don’t exist in our universe – they are an approximation of a statistical phenomenon. An entropic force would model how food dye spreads through still water. At the microscopic level, the dye particles move randomly and ignore the movement of the other dye particles. Modeling random movements of billions of billions of dye particles is infeasible (and unnecessary for us to understand the system), so we instead represent the system as a concentration of particles diffusing deterministically with an entropic force.

Causal refers to the fact that these are not entropic forces, but the causal generalization of them. Causal generalizations are statements such as “public education decreases poverty” or “radiation causes cancer.” They aren’t true in every instance (you don’t get cancer every time a photon interacts with a cell in your body), but they are causally linked (every photon of high enough energy which interacts with a cell in your body has a chance of causing the cell to become cancerous). So a causal entropic force is a force which increases path entropy, rather than just entropy.

Recall that entropy is the number of states in which a system may be. Similarly, path entropy is the number of future paths the system may take. So while a fair coin may have one bit of entropy, if we are going to flip the coin ten times the path entropy is ten bits.

Now that I’ve got these terms defined, I can finally get to discussing the paper.


This post series was inspired by the paper Causal Entropic Forces written last year by Dr. Wissner-Gross and Dr. Freer.

Causal Entropic Forces I: Procedural Programming

Brute forcing a system into existence is neither pretty nor efficient. This can be masked, but it requires a massive time commitment. The creators of recent role-playing games devote thousands of hours to creating believable worlds that feel alive. Each character has a set of dialogue choices programmed in, actions to take given situations, and sometimes even a schedule to follow. My go-to example is Morrowind (from The Elder Scrolls series), where most of its over two thousand named characters has a weekly schedule consisting of locations, times, animations and activities. Such feats are impressive works of programming and human created content.

This works well within the intended context of the game. The world seems to breathe upon initial inspection, the main quest line, and the many (many!) side quests. Veer outside what the game makers intended, or keep playing after the quests are complete, and the world feels empty and static. The game world was created to represent a specific place and moment in the series’s timeline; little in how the world works changes with time or the player’s actions.

All programming falls on a spectrum from fully procedural to fully manual. I am simplifying for the sake of example when I say that the characters are manually generated. A human created the names of characters, set routines for them to follow, created and animated models, and wrote dialogue. This falls into the manual programming paradigm – the game’s nearly 1 GB of hard disk space are a testament to the gigantic databases of characters, models, terrains, and building layouts hand-crafted by humans. Procedural programming takes a different approach.

In contrast to manual content, procedural content does not exist until the program generates it. Humans develop algorithms and test to make sure generated content fits within desired parameters. If written robustly, the algorithms are capable of creating much more than any human could conceivably witness. The canonical example is Minecraft. Entire worlds are generated on-the-fly using an algorithm, with continuous playable areas roughly double the surface area of Earth. Using standard settings, there are eighteen billion billion possible worlds. If you could explore six billion worlds every second, it’d take your entire lifespan to see them all. If you considered a sphere with the Sun at its center and just encompassing our nearest stellar neighbor, Proxima Centauri, the surface of that sphere could barely hold Minecraft. This has made Minecraft infinitely replayable – there is effectively limitless content, and there are only a few reasonably accessible limits to gameplay.

The same relationship holds for artificial intelligence. For a manual intelligence, the AI is specifically programmed for a set of reasonable situations. The AIs for turn based strategy, first person shooters, and RPGs look completely different. What if there were an algorithm which procedurally generated responses to any set of circumstances, one so robust it could be applied to any system?

Enter causal entropic forces. Roll credits for Part 1.


This post series was inspired by the paper Causal Entropic Forces written last year by Dr. Wissner-Gross and Dr. Freer.

Measuring Information: A Primer

Information theory is one of my biggest interests. It has shaped a lot of what I do in my spare time, but unfortunately I haven’t had the chance to get any formal education. (I hope to rectify this with graduate school.) Information is an incredibly useful subject, with applications in fields from telecommunications to physics to evolution. I hope I can communicate a small piece of what I have learned from my ever-growing library on the subject.

A computer bit could be considered information in one of its purest forms. A bit is a location which can take the value 0 or 1. It is important to note that the bit is not the value which it takes, the number is a quality of the bit. Bits are easy to manipulate, so we use them for storing data on computers. They are also a standard unit of measuring information capacity.

Information capacity is a curious thing. Consider a sequence of 8 bits: 11011110. Given 8 bits there are 256 (28) different combinations of 8 bits. A single bit, remember, can hold one of two states. Since 256/2=128, one might imagine that 8 bits contains 128 times the information capacity of a single bit! Not so fast. Imagine each bit as the result of a coin flip. For a fair coin, the result of a flip provides one of two outcomes. Let’s say we want to store the results of 8 coin flips in order. We flip a coin 8 times, and get tails, tails, heads, tails, tails, tails, tails, heads. Abbreviated, we can write this as TTHTTTTH. If we represent T and H with 1 and 0 respectively, we obtain a sequence of 8 numbers, 11011110.

It turns out that we cannot compress the information any further. Given 8 bits we can only record the outcome of 8 coin flips, not 128. It is more useful, then, to talk of information capacity as the number of bits it takes to represent the state of a system, not the number of states the system can take. This makes combining the information of systems very easy. If we have a system of 4 coin flips, which has 4 bits of information, and combine it with a system of 8 coin flips, with 8 bits, the new system has an information capacity of 12 bits. We can use this to see the relationship between states and information. In general, a system with x states has log2(x) bits of information.

We can apply this concept to any system which can take a finite number of states, even if it does not have exactly a power of two. The information content of a system in bits is defined as log2 of the number of states. But since many systems do not have exactly a power of two states, we end up with a decimal. Whenever this is the case, we round up to the next whole number. So a system with 2187 states technically only has 11.0947 bits of information, it still requires 12 bits to represent it. If we combine two of these systems, however, since the new system has 22.1894… bits of information, we only need 23 to represent it rather than 24. No crazy multiplication required, just addition. But bits are not the only way to store information.

A trit is a bit like a bit, but it can take three states (0, 1, 2) rather than two. If we take the system of 2187 states, we can represent it with log3(2187)=7 trits. Converting between bits and trits is easy, just like you multiply meters by a constant to get a measurement in feet, you multiply trits by log(3)/log(2) (about 1.585) to get bits. Information can be stored in similar structures which can take any number of states. Some of the more common include decimal notation, the algebraic way of writing numbers like 4096, and written language, like this entire blog post. Decimal numbers are saturated with information since each digit gives the same amount of information. Since there are 10 states for a given digit, each digit in a decimal number communicates log2(10)=3.321… bits. The English language is actually very inefficient at storing and communicating information despite having many more states for each character. That, however, is a subject for a different blog post.

Game Design and Cybernetics

I’ve recently taken to studying game design. Over the weekend I participated in Startup Weekend Austin, a three day workshop where teams come together to work on ideas that have the potential to become businesses. I joined the project that most closely resonated with me: Code Arcade.

The Code Arcade project is the brainchild of Adam Lupu, a learning scientist who has done research on the effectiveness of games as educational tools. The project aims to teach computer programming through the medium of games. I find this a very noble venture – coding has the unique position of teaching logical, organized thinking without the baggage people have about science. Programming does not have much to say on the topic of the origins of the universe or unweave the rainbow. I would argue the previous sentence is false, but what matters is that popular sentiment holds this as true. Resultingly, programming does not meet the same resistance as science education, and can be thought to act as a Trojan Horse for critical thinking.

Back to the story. Over the next couple days, I learned C# used it to help write a game interactive educational program in Unity. I avoid calling it a game because I have too much respect for game design. The team has held together after the workshop, and it appears the project may become an open source project. Nothing is finalized, and we are still hashing out how this will work. I have agreed to contribute some of my spare time to aid in game development and programming. I have a knack for procedural programming, and figure I could be of some use. Unfortunately, I know very little about game design, and need to study it on my own before feeling confident in calling something I create a “game.”

So I did the first thing I do when I don’t know anything about a subject I am interested in: I found a book recommended by the Internet and bought it. I chose Rules of Play. Mainly because of this passage from the first chapter (I like using the “read the first chapter free” feature).

But the book is not just for game designers. In writing Rules of Play, we quickly realized that it has direct application to fields outside game design. The concepts and models, case studies, exercises, and bibliographies can be useful to interactive designers, architects, product designers, and other creators of interactive systems. Similarly, our focus on understanding games in and of themselves can benefit the emerging academic study of games in fields as diverse as sociology, media studies, and cultural policy. Engagement with ideas, like engagement with a game, is all about the play the ideas make possible. Even if you are not a game designer, we think you will find something here that lets you play with your own line of work in a new way.

Salen, Katie; Zimmerman, Eric (2003-09-25). Rules of Play: Game Design Fundamentals (Kindle Locations 231-236). The MIT Press. Kindle Edition.

I am not sure of a way to read the above passage without hearing it scream “Game design is cybernetics!” Game design is not just about creating systems, but also about finding ways to recreate some aspect of the real world that makes people want to interact with the abstraction. I think studying game design will provide a lot of insight into how systems work. Maybe someday I’ll learn enough to start being able to express parts of cybernetics through games. But before that, I need to digest and think about what I have read for a while.

The Catalyst Analogy

I’ve covered a small part of the basics of cybernetics, and figured it was time to show an application of concepts. We can verify a model based on the predictions it makes. A model which makes wrong  predicitons is either wrong or incomplete, and a model which makes no predictions is effectively useless. Models that make good predictions can be used to study the behavior of a system, and even be applied to analagous systems.

The thing is, there are ofttimes multiple models which predict the same behavior and don’t make verifiable predictions which are different from the other models. Currently, M-theory would fall in this category. It unifies the phenomena we observe, but the predictions it makes would require a particle accelerator the size of the Solar System to verify. The reason it is so popular is that it explains all of our observations (unifying gravity and the Standard Model) with as few assumptions as possible. That does not make M-theory correct, just … useful.

Rather than explain the entirety of modern physics, I figured I’d go with a simple everyday example: jaywalking. Specifically, how I’ve seen it happen at the University of Texas at Austin across intersections with very low traffic. Here’s what I have observed:

  • People want to get from one side of the road to the other.
  • People rarely jaywalk alone.
  • The more people currently jaywalking, the more likely a given person is to jaywalk.

This is enough to build a system. I chose to create a simulation in a computer program written in R. It randomly creates people and has them approach a road. When they get to the curb, they stop. Sometimes, they jaywalk. If other people are already jaywalking, a given person standing at the curb is more likely to jaywalk (proportional to the current number of jaywalkers).

A jaywalking simulation.
A jaywalking simulation.

I took the liberty of adding a red/green light that tells the people when they are “supposed” to go. Notice that the addition does not change the behavior we were attempting to observe. As we add more rules, we can create closer approximations to what we observe. For instance, some people are more likely to jaywalk than others.

In this simulation, there are a few behaviors that emerged – I did not have to create rules specifically for them. Once there are many people, while the probability of any one person beginning to jaywalk is small, the probability that at least one person jaywalks is large. But they don’t all go at once. Usually, one starts going alone, then others take a small amount of time to react.

Admittedly I did handpick the example to the right, and there is a behavior that occurs sometimes, but just didn’t happen on this run. If the light turns red while people are walking, people who have just gotten to the curb will often jaywalk.

This is where the catalyst analogy comes in. In chemistry, a catalyst is a substance that speeds up a reaction by lowering something called the activation energy. The catalyst makes it more likely for each particle to undergo the reaction in a given span of time. For our case, jaywalkers make it less likely that people will be deterred by the sign telling them not to cross.

The plot to the right shows what each person “sees.” I modeled walking as people feeling a force towards their destination, causing them to walk across the road. Imagine a ball rolling down a gently sloped hill. The ball accelerates to maximum speed, then holds a velocity until something changes (because of friction). If the hill slopes upwards, it comes to a rest at the level. So the graph on the right should be tilted clockwise slightly, to keep the people rolling toward their destination.

The potentials above show why a non-jaywalker stops but a jaywalker keeps going. A non-jaywalker sees a hill – a deterrent, and a jaywalker sees a downward slope. Both see a downward slope after the left dotted line – once they are on the road, it makes sense to move faster than normal and finish crossing to minimize the likelihood of being in the way. Once back on the other side, both continue normally. Also, as more people jaywalk, the deterrent to start jaywalking decreases. Once several people are jaywalking it does not take long for everyone to cross, and the cycle begins again.

This brings our discussion to a close. We used three rules and built a system that accurately models a real world behavior. The model is not “real” in any sense (people are not balls rolling down slopes), but it does help us understand a phenomenon through abstraction. Now that we understand jaywalking, we can apply this model to other similar structures. Bullying, looting, littering, and conversation topics are all examples of systems that work in analagous ways, and so can be analyzed using models close to the one described above.

I made the simulation using the computer language R. If you want to see how I made this, I’ve posted my code on pastebin. To run it, you’ll have to install R first – it is a scripting language so this isn’t an executable. Note that since it is a simulation with randomness, every run will be different. It is very unlikely you will ever see the exact same thing happen twice. The program is a bit memory heavy, so be prepared for that.

Weakening Belief Foundations

Understanding how people come to believe things can be used to figure out how to change their beliefs. Beliefs in people’s minds are systems, and the same principles which apply to systems in general can be used here. As a notable case, religion-inspired sexism.

The argument that we should believe true things appears self-evident at first glance. If we believe true things then we can take informed actions, which are more likely to produce good results. It is unarguable that believing false things will in general improve the state of humanity. This leads to a question: If we really care so much about believing true things, why are there so many contradicting beliefs? If we did care, one of the first things we would teach schoolchildren is how to determine the truth of a statement. Alas, we do not, and all too often we choose to believe something because it is comforting or desirable. If two people find two opposite viewpoints to be desirable, then at least one of them will be wrong.

Before continuing, there is an important distinction to be made between labels and viewpoints. Many people with widely varying beliefs all use the collective label “Christian,” even though their beliefs are widely disparate, in fact believing only a small number of similar statements. This is even more pronounced in Hinduism, where those we label as “Hindu” usually do not even label themselves as such. In fact, the label “Hinduism” is applied to a wider variety of beliefs than those of every Abrahamic religion. I find it meaningless to speak of “the Hindu,” “the Christian,” “the conservative,” or even “the Trekkie” view. It is much more productive to discuss individual positions and beliefs.

Consider the viewpoint that women are inferior to men. This position is espoused by major texts in Christianity, Islam, and Hinduism.


“I do not permit a woman to teach or to assume authority over a man; she must be quiet.” – 1 Timothy 2:12

“Allah instructs you concerning your children: for the male, what is equal to the share of two females.” – Quran 4:11

“In childhood a female must be subject to her father, in youth to her husband, when her when her lord is dead to her sons; a woman must never be independent.” – Manusmriti 5.148


It is here that another idea on beliefs becomes apparent. Beliefs are not held in a vacuum. Beliefs have consequences (and ofttimes causes). If a Christian, Muslim, or Hindu believes everything in their texts are fully endorsed by God, then they must also hold the belief that women are inferior to men or be inconsistent. Fortunately, the belief of holy textual inerrancy is waning. However, any significant effort to establish gender equality between will encounter resistance from those who believe their texts are literal and infallible.

Sexism is a wrong belief, and confronting a literalist on this point will usually not convince them. This is because sexism is a direct result of their beliefs:

(1) The text is literally true. (call this literalism)

(2) The text says sexism is correct.

Sexism is tricky since it is not self-evidently wrong or inconsistent. Note, however, that there are many other incorrect beliefs which result from literalism. While we cannot rely on people to believe true things, we can rely on cognitive dissonance. Cognitive dissonance – the discomfort caused by being aware of inconsistent beliefs – encourages people to believe things which seem to be consistent. Disregarding believers with remarkable mental flexibility, someone who believes (1) and (2) is sexist.  We can take advantage of this by getting the person to reevaluate the validity of (1).

The religious texts above each include some discussion of cosmology – an explanation of how we came to exist. These descriptions of the universe are verifiably false. This leads me to think that the best way to combat religion-inspired sexism is education. The cosmology presented by the literalist interpretation of religions is laughable, and easily disprovable with a basic education in science and logic. With the overwhelming evidence provided by science, we see an increase in the proportion of believers who see their texts as metaphorical and a product of the time in which they were written. Once a given believer accepts this sexism is no longer a necessary product of their beliefs, and logical and evidence-based arguments will have more of a sway on the topic of sexism.

There is a case where this will not work. Namely, if the person believes (1) axiomatically rather than through a line of reasoning. Given this, there is no discussion to be had. Under this belief system there is no method to evaluate the truth or literalness of the text, so even if the text obviously disagrees with reality the text’s validity will not be questioned. It is a sign that truth is not a priority.

To summarize:

Beliefs can cause and be caused by other beliefs. People are uncomfortable being aware that their beliefs are inconsistent. So if a belief is caused by other beliefs, it is usually more effective to target the cause.

Emergence and Unintended Results

The concept of emergence is central to understanding cybernetics. Complex behaviors can arise from very simple rules. In particular, emergence deals with behaviors not described by the rules of the system. The token example is snowflakes. There is no physical law which says when water freezes around a small speck of dust or pollen at below -35 degrees Celsius, it must form a planar pattern of sixfold symmetry about the center. An interesting – and difficult – topic in emergence is trying to predict what behaviors will arise from adding a rule to a system.

Too often, we only look at an incomplete picture. Not every rule in a system is written down, but we usually limit ourselves to only considering the rules which are. This can be problematic when legislators create rules without looking for the unwritten rules of a system. Consider a policy currently used in the United States to determine whether funding for a project should be decreased.

(1) If a project does not use all of the funds allocated to it in a given year, reduce the funding to the amount used.

This rule seems innocuous enough. It should reduce spending by making sure a program is only allocated the funds which it needs. It doesn’t make sense to allocate more than the minimum necessary amount to a program. The money will be put to use rather than sitting idle. Over time, spending should decrease on items which do not have a need for as much money. Instead, this policy has the unintended side effect of artificially increasing waste because fails to account for the following rule:

(2) In general, programs not require exactly the same amount of funds every year.

Eventually, a given program will probably need less money in the current year than it did in the previous year. Given (1), the program managers have two main paths.

They could only spend the money required to complete their assigned tasks. In this case, the funding for their program decreases, and they have less to work with when they need more money.

They could spend all of the allocated funds. In this case, they will continue to receive the same amount of money and will not encounter problems when costs increase back to their normal levels.

Rather than decreasing spending, this rule has the effect of causing unnecessary waste. Programs which do not spend up to the last dollar every year will have funding reduced, and be forced to work with less over time. Eventually, they will be unable to perform their assigned tasks, and the program will be dissolved. Programs which do spend everything will continue to thrive. As time goes on, thrifty programs will be weeded out and deliberately wasteful ones will remain.

Neither of the two rules specify this behavior, yet the behavior results from the two rules. It emerges. Unfortunately for legislators, the result of changing a system cannot be predicted without first understanding the system. They cannot mandate a complex behavior into existence; they must figure out how to make the behavior emerge.

How We Know Things

A question people too rarely ask themselves is “How do I know things?” The term epistemology refers to this most fundamental of questions. The American school system gives a pitiful introduction to the scientific method, but never explains why our society has come to accept it as the way to learn about the universe.

Answering this question is the first step in developing a strong, consistent worldview.  Most people have not seriously confronted this question, and usually believe something because it feels right or because they wish it to be true. It does not take too much thought to see why these ways of knowing things can lead to errors of judgment and incorrect beliefs about the world.

If we are to have any useful definition of truth or reality, then the followers of two incompatible religions cannot both be correct. If it were enough for someone to merely believe something to be true, we would not live in a consistent universe. Besides, the universe is oft more beautiful than anything our minds create. So how should we answer this question? Incontrovertibly, the best answer ever given to this question is science. Cell phones and laptops and airplanes do not work merely because we wish them to, and the technologies were not developed by people doing what felt right to them. They discovered a good enough approximation to physical laws, and figured out how to exploit them for humanity’s benefit.

Science doesn’t give a perfect answer to every question. All it promises is an ever-improving approximation to the right answer. If the current answer fits all of the available data, then it is good enough until we find something which contradicts our model. Once we have a good model, we test it. If a model predicts a behavior we haven’t seen before, we try to create the situation and see if the model is still correct. If not, we alter our existing views or find one that fits the data. If we observe what we expect, the model lasts one more cycle.

Say we have two competing models. Both fit all of the available data, but contradict each other on phenomena we haven’t observed yet. If this happens, and we are not yet able to test to see which one describes reality better, this is not a problem. Both models are close enough to the truth that if we ask them the same questions, we will arrive at the same result as reality. In the future one may turn out to be a better model, or both may be proven wrong. Until the truth of one over the other becomes verifiable, it doesn’t really matter. We just go with the one that makes finding the right answer easier. Claiming that one is closer to the truth than the other until such a matter becomes provable is pointless.

So, how do we know things? Through reason, evidence, and experimentation. With science, we go with the ideas that work, and discard the ones that don’t. Truth is something that can be reasonably verified. It doesn’t ask you to feel anything, to just believe anything; truth is something that anyone can test on their own and still arrive at the same conclusion.