Building a 2D Physics Engine - Part 1

August 18, 2023 - physics-engine

In this blog series, I'll talk about the implementation of a simple 2D physics engine I made with some theoretical bits here and there.

Prerequisities

high-school level knowledge of vector/matrix operations, dot/cross products, vector magnitude, normals, maybe matrix determinants.
high-school level basic physics knowledge, Newton's laws, friction between rigid bodies.
some object-oriented programming, preferably C++, if you're confident you can try implementing your engine in a different paradigm, I'll stick to OO here.

What we will implement in the end

We will implement a 2D physics engine that is able to simulate convex rigid bodies. This engine will often make visually satisfying but not realistic approximations, appropriate for games.

One of the most critical approximations in our engine is how collisions are handled. In Figure 1, a rigid body is in rest above the ground. The reason it can stay that way and not go through the ground is the reaction force the ground applies on the body. Due to this force, the total force on the body is zeroed out and it stays in rest in accordance with Newton's Laws.

**Figure 1 - Impulse based collision resolution.**

Our engine will not incorporate/calculate the reaction forces however. Instead, it will detect if the body has gone through the ground (it definitely will since the gravity acts on it, giving it downward velocity). If it has, it will first "teleport" the body and the ground away from each other, and then will modify their velocities so that their velocity vectors are not pointing towards each other. This will happen every frame (with ad-hoc modifications, we will reduce the number of some of these microcollisions later for performance). As the collisions are resolved before you see them, you'll see a body resting on the ground in the screen.

**Figure 2 - Sequential collision resolution is not ideal.**

Another thing the engine does not do is handling the microcollisions simultaneously. Especially with non-convex objects, or more than 2 objects touching, resolving a microcollision might aggreviate the other. Unfortunately we will not have a good solution for this sort of problem in our engine, we will simply sequentially resolve each collision, starting from the worst cases. We repeat going through our list of collisions until no more collisions are paradoxically generated from our resolution routine.

Starting with Rudimentaries - Transforms of Entities

The first thing we need to concern ourselves with is how we represent the entities within our simulation. If you think of a particularly non-symmetric shape, you'll notice we'll also have to store their orientation. The combination of position and the orientation of an entity is called transform, or pose. I dislike pose as a name since it is easy to confuse it with position. In the blog series, I will only use the name transform for this reason.

**Figure 3 - Relative transforms of spaceship C.**

Another thing about transforms is that they are generally defined with respect to frames. If you look at Figure 3, you'll see $C$ on two different grids (grids implied from the frames $A$ and $B$) in different orientations with respect to the grids. Another thing you'll notice is the local frame of shapeship $C$ itself.

There are many ways to mathematically represent transforms. I like homogeneous matrices the best, it is also very easy to see some generic properties of transforms through them. A homogeneous matrix representing a transform with position $(x_0,y_0)$ and orientation $\theta$ is of the form: $$ \begin{bmatrix} R(\theta) & p_0 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} \cos(\theta) & -\sin(\theta) & x_0 \\ \sin(\theta) & \cos(\theta) & y_0 \\ 0 & 0 & 1 \end{bmatrix} $$

$R(\theta)$ is called a right handed rotation matrix. It has a cute property of being orthogonal, $R(\theta)^{-1} = R(\theta)^\top$, which makes it easy for us to calculate inverses and derive interesting mathematical properties in matrix multiplication. Other than this, by itself, it rotates a point $(x0,y0)$ around origin counter-clockwise (CCW) when this point is multiplied from the left.

A point with position $(x0, y0)$ is represented as follows in homogeneous coordinates: $$\begin{bmatrix} x_0 \\ y_0 \\ 1 \end{bmatrix}$$

The 1's and 0's in the matrices stay the same for all operations closed on the homogeneous matrices and points (If you use matrix block multiplication rule this is very easy to see). We do our operations by simply multiplying points or matrices from left or right depending on what we want to do.

Examples:

Transform of C with respect to A is $^{A}_{C}T = \begin{bmatrix} \cos(\pi/3) & -\sin(\pi/3) & 5 \\ \sin(\pi/3) & \cos(\pi/3) & 4 \\ 0 & 0 & 1 \end{bmatrix}$. Expectedly, the last column of the matrix is the position of C with respect to A.
Transform of C with respect to B is $^{B}_{C}T = \begin{bmatrix} -1 & 0 & 3 \\ 0 & -1 & 2 \\ 0 & 0 & 1 \end{bmatrix}$. Notice that the first two columns of this matrix are the unit vectors of the local x and y axes of C with respect to B. This also expands to 3D homogeneous matrices, where x,y,z axes of the local frame correspond to first three columns of the matrix, again, with respect to the reference frame.

Notations for Transforms

When transforms are relative to various frames, the notations become very important. There are many notation methods, but I like the one I laid out here, it allows space for other subscripts on the right. When representing a transform or a point, we do the following:

The superscript is the reference frame: Write the identifier of the reference entity on the top left.
The subscript is the identifier: Write the identifier/name of this transform/point on the bottom left.

Find yourself an acronym of some sort to remember this, it will not be good if you confuse the superscript with the subscript, for the application of the following awesome rules:

Awesome rules

$^{A}_{A} T = I$ for all frames.
$^A_C T = ^A_BT ^B_CT$ for all frames $A,\ B,\ C$. This is a very important rule. You see:
- it immediately implies $^A_B T = ^B_AT^{-1}$.
- CS-wise, we can have a graph of transforms defined w.r.t. to each other that we can use to infer other relative transforms. Generally this graph happens to be a tree, and on the top we have the global frame $G$.
- it helps us understand the second important nature of transforms:

Transforms as Transformations

Transforms actually double duty as the following two:

They represent the position and orientation of a frame $F_1$ with respect to another frame $F_2$.
They represent a transformation, a change on the position and orientation of an entity $F_1$. This change might be itself be respect to $F_2$, or more enigmatically, $F_1$.

I generally do not put subscripts or superscripts at all to my transformation notation if I'm not referring to it as a frame (the first case above).

Multiplying from Left

As you might have guessed from the previous examples, A transformation matrix $T = \begin{bmatrix} R(\theta) & p_0 \\ 0 & 1 \end{bmatrix}$ applies these transformations with respect to the reference frame if our original transformation is multiplied from the left: $$ ^{F_2} _ {F_1} T' = T{^{F_2}_{F_1}T} $$ Here the transformed version of frame $F_1$ is found by:

rotating it CCW by $\theta$ with respect to the position (center) of $F_2$, then
translating it along the x and y axes of $F_2$ by $x_0$ and $y_0$ respectively.

This sequence is easy to see if you write the other matrices in a form like $\begin{bmatrix} R(\theta_{F}) & p_f \\ 0 & 1 \end{bmatrix}$ and use the block multiplication rule.

We generally do multiplications from left when we have the global frame as the reference frame. For example, when we want to rotate around a certain point $p$:

We first translate the frame so that p is at the origin: $T =\cdots Trans(-p)$
Rotate around the origin: $T = \cdots Rot(\theta)Trans(-p)$
Translate back to our original position: $T = Trans(p)Rot(\theta)Trans(-p)$.

This sequence is also called a similarity transform since it's in the form $T=(T_{easy})^{-1}UT_{easy}$ and comes up even in cases where $U$ is not a homogeneous matrix. Essentially $T_{easy}$ is a transform that makes our current frame some frame easy to come up with the values within $U$ with respect to the frame we transformed into (this frame is usually the global frame). In the case above, Coming up with the values of the transform for rotating around $p$ is not easy immediately, but rotating around origin is simple to come up with: $\begin{bmatrix} R(\theta) & 0 \\ 0 & 1 \end{bmatrix}$.

Multiplying from Right

If we multiply $T$ from the right, we get the transformation with respect to $F_1$ (transformation w.r.t. local frame). It does the following operations:

rotate the frame $F_1$ CCW by $\theta$ with respect to its own center. Essentially, just rotate the frame.
translate it along the x and y axes of $F_1$ by $x_0$ and $y_0$ respectively.

Multiplying from right is awkward in terms of transformation operation perspective of operations, but makes sense if you regard them as frames themselves. Name the final result of our operation as $^{F_2} _ {F_3}T$, a new frame defined w.r.t $F_2$. It is also apparent that $^{F_1} _ {F_3}T = T$ if we use the left multiplication rule as:

$$^{F_1} _ {F_3}T = T^{F_1}_{F_1}T = TI = T$$

Since right multiplication and left multiplication rule steps 1 and 2 are equivalent if we treat $F_1$ as the reference frame. Then, magically: $$ ^{F_2 } _ {F_1}T' = ^{F_2} _ {F_3}T = ^{F_2} _ {F_1}T^{F_1} _ {F_3}T $$ In practical cases, we would use this rule to find the global transforms of certain points on a rigid body. Let's say we would like to have the topmost point $P$ of a unit circle $C$ that is rotating about in the world. We would have the following: $$ \begin{align*} ^{G} _ {P}T &= {^{G} _ {C}T} {^{C} _ {P}T} \\ ^{G} _ {P}T &= ^{G} _ {C}T \cdot Trans([0,1]) \end{align*} $$

The Code

It is quite long with all of the operation overloads etc. but the main points are as follows:

real is an alias for float, future users might change it to double if they want.
Vec2 simply holds two reals, x and y, and defines dot and cross products, magnitude, normalization, rotation, etc. It is a very simple class, so I did not include it here.
I hold 4 reals as two vectors, translation and the direction for transforms (I obviously do not store 3x3 matrices, we don't need the computer to compute 0 times 0 is 0).
Holding the unit vector pointing towards the local x axis of the frame as direction instead of the angle is a curious decision however. I thought it would be faster to hold the cosine and sine of the angle without calculating at every transformation multiplication, but I'm not sure if it is worth it. It makes it easy form me to extract the edge normals of a polygon though during collision detection, so that is a plus.
Even though we don't calculate sines and cosines at every step, we will need to watch out for numerical drift and call Tform2::normalize() every once in a while.
Everything is inline, since this is the kind of stuff that would be called a lot and is quite short.

class Tform2 {
private:
    Vec2 _direction;
public:
    Vec2 translation;
    Tform2() : translation({}), _direction({1.0f, 0.0f}) {}
    Tform2(const Vec2 &_translation, real orientation) : translation(_translation),
                                                            _direction({std::cos(orientation),
                                                                        std::sin(orientation)}) {}
    Tform2(const Vec2 &_translation, const Vec2 &direction, bool normalize = false) : translation(_translation),
                                                                                        _direction(direction) {
        if(normalize) _direction.normalize();
    }

    real getOrientation() const {
        return _direction.orientation();
    }

    Vec2 getDirection() const {
        return _direction;
    }

    void setOrientation(real orientation) {
        _direction = {std::cos(orientation), std::sin(orientation)};
    }

    void setDirection(const Vec2 &direction, bool normalize = false) {
        _direction = direction;
        if(normalize) _direction.normalize();
    }

    Vec2 getOrthogonalDirection() const {
        return {-_direction.y, _direction.x};
    }

    void setOrthogonalDirection(const Vec2 &direction, bool normalize = false) {
        _direction = {-direction.y, direction.x};
        if(normalize) _direction.normalize();
    }

    Vec2 operator*(const Vec2& point) const {
        return {_direction.x * point.x - _direction.y * point.y + translation.x,
                _direction.y * point.x + _direction.x * point.y + translation.y};
    }

    Tform2 operator*(const Tform2 &other) const {
        return {(*this) * other.translation,
                {_direction.x * other._direction.x -_direction.y * other._direction.y,
                    _direction.y * other._direction.x + _direction.x * other._direction.y}};
    }

    Tform2 inverse() const {
        return {{-_direction.dot(translation), -_direction.cross(translation)},
                {_direction.x, -_direction.y}};
    }

    void invert() {
        translation = {-_direction.dot(translation), -_direction.cross(translation)};
        _direction.y = -_direction.y;
    }

    void rotateLocal(real angle) {
        _direction.rotate(angle);
    }

    void normalize() {
        _direction.normalize();
    }

};

inline std::ostream& operator<<(std::ostream& os, const Tform2& tform) {
    os << "Tform2{{" << tform.translation.x << ", " << tform.translation.y << "}, {"
        << tform.getDirection().x << ", " << tform.getDirection().y << ", " << tform.getOrientation() / gust_PI * 180.0f << " degrees}}";
    return os;
}

The takeaway

Now you know what homogeneous transforms are. Most of the stuff here readily extends to 3D and transform implementations of established libraries (only right-handed/left-handedness might be different). You can very well use ready Vec2 and Tform2 implementations from the libraries but I highly suggest implementing them on your own, it would be a good high school analytical geometry review. Definitely write some unit tests, it will be hard to track bugs if you let the incorrect implementations ferment this deep at your physics engine.