We derive the basic equations of quantum mechanics and field theory using only symmetry and redundancy. We elucidate the concept of gauge symmety.
Back to homepage https://principiaphysicaegeneralis.com/
Introduction
Symmetry is everything in modern physics. From the assumption of symmetry all else follows. In particular, in the realm of elementary particles, one idea that is at the heart of our understanding is that of gauge symmetry. But what is gauge symmetry and why is it so important?
In what follows we will use Lorentz symmetry and gauge symmetry in order to derive the most basic equations of elementary quantum mechanics almost effortlessly. This is based on A. Zee’s book “Quantum Field Theory in a Nutshell”.
Relativity and Lorentz Invariance
Our starting point will be Lorentz symmetry. Everything that happens in our world takes place on a spacetime parameterised by 4 coordinates \((t,x,y,z)\). Our laws of physics should not be bound by the coordinate system we use and so the objects we use should transform from system to system following definite “well-behaving” rules. Restricting ourself to special relativity1 we will only be concerned with inertial coordinate systems (or reference frames). A quick and dirty definition of an inertial frame is that if an object’s velocity is constant in one inertial frame, it should be in all inertial frames. This of course leaves us with the problem of finding the “first” inertial frame but this is besides the point.
If we want to go from one inertial frame to another there are two different types of transformations we can enact. The first is rotations and the second is boosts. A boost is a change in the velocity of the inertial frame by a constant amount in any direction. The complete set of these operations is called the Lorentz group (also denoted by \(SO(3,1)\)). In group theoretical language what we need from a physical object is for it to transform under \(SO(3.1)\) is a specific way i.e. for it to belong to a representation of the group (For more on representations look at the Appendix at the end).
The way we generally describe system transformations in physics is through the group generators. The generators are a set of matrices that describe infinitesimal transformations. Out of those we can build any possible finite transformation of the system if that finite transformation is continuous.
For example, a very small rotation by \(d\phi\) around the z-axis would be equivalent to \(x\rightarrow x-d\phi y,y\rightarrow y+d\phi x\). If we represent the space-time coordinates as a column vector
$$ \begin{pmatrix} t \\ x\\y\\z \end{pmatrix}$$ then the rotation corresponds to acting on this column vector with the matrix
$$ \begin{pmatrix} 1&0&0&0 \\ 0&1&-d\phi&0 \\ 0&d\phi&1&0 \\0&0&0&1 \end{pmatrix} = \begin{pmatrix} 1&0&0&0 \\ 0&1&0&0 \\ 0&0&1&0 \\0&0&0&1 \end{pmatrix} + d\phi \begin{pmatrix} 0&0&0&0 \\ 0&0&-1&0 \\ 0&1&0&0 \\0&0&0&0 \end{pmatrix} = I + d\phi J_z $$
The matrix \(J_z\) is called a generator of rotations because we can generate any finite rotation around the z-axis by acting with it multiple times. To see this assume that we want to rotate around the z-axis by an angle \(\phi\). We can separate \(\phi\) into N small angles \(\phi=N\cdot d\phi\) and then we can just rotate N times by that smaller angle \(d\phi\). Taking the limit \(N\rightarrow \infty\) we’ve already seen what such a rotation looks like and so
$$R_z(\phi)=\lim_{N\rightarrow \infty}(1+d\phi J_z)^N = \lim_{N\rightarrow \infty}\left(1+\frac{\phi}{N} J_z\right)^N = e^{J_z} $$
We can conclude that all of the information we need to study a continuous group (in this case) can be found in its generators2 and so for the rest of this article we will focus on those.
As we would expect, the full set of rotation generators are \(J_x,J_y,J_z\) and they statisfy the commutation relations
$$ [J_i,J_j]= J_i J_j-J_j J_i = \epsilon_{ijk}J_k \quad (1.1) $$ where \(\epsilon\) is the fully anti-symmetric symbol. The commutation relations (1.1) define what is called a Lie algebra, essentially any three objects satisfying those relations act in the exact same way, they have the same eigenvalues, the same eigenvectors and the same representations.
The other part of the Lorentz group is the boosts. In the unified picture of space and time that special relativity operates in, we can think of boosts as “rotations” where the plane of rotation contains the time axis with just a small difference (thanks to the minus sign in the Minkowski metric). The corresponding infinitesimal boost towards the z-axis is \(t\rightarrow t+d\phi z,z\rightarrow z+d\phi t\). Notice that now both transformations have a plus sign. The three boost generators are called \(M_x,M_y,M_z\) and I am sure you can figure out their matrix form. To finish out the commutation relations we have
$$ [M_i,M_j] = -\epsilon_{ijk}J_k,\quad [J_i,M_j] = i\epsilon_{ijk}M_k\quad (1.2) $$
Now that we have the whole set of generators we can make a simple change of variables. We can define six new operators \(J_{\pm i}=\frac{1}{2}(iJ_i\pm K_i)\). These now satisfy the following relations
$$[J_{+i},J_{+j}] = \epsilon_{ijk}J_{+k},\quad[J_{-i},J_{-j}] = \epsilon_{ijk}J_{-k} \quad (1.3)$$ and $$[J_{+i},J_{-j}]=0\quad (1.4)$$ Well hold on, we just got two of the same algebra (1.1) and in fact (1.4) tells that that they are separate from each other.
Representations of the Lorentz Group
If you are familiar with quantum mechanics3 . you will know that the eigenvectors of the angular momentum, or in this case \(J_z\) come in separate groups called representations that are characterised by the eigenvalue of the total angular momentum \(J^2=J^2_x+J^2_y+J^2_z\). If we symbolise that eigenvalue as \(j\) we know that \(j\) is either an integer or a half-integer and that the total amount of states with total momentum \(j\) are \((2j+1)\) in number. Any rotation of the system can only change vectors into other vectors that belong to the same representation, you cannot, by rotation change the value of \(j\). Back to the Lorentz group, as we’ve seen we get two instances of the rotational group and so we can characterise each representation by a set of two numbers \(j_{+}\) and \(j_{-}\). Each representation will now have \((2j_{+}+1)\cdot(2j_{-}+1)\) elements that transform into each other under rotations and boosts. In other words, we can say that that is the amount of degrees of freedom in our representation. The first representations by number of degrees of freedom are \((j_{+},j_{-})=(0,0),(1/2,0),(0,1/2),(1,0),…\)
But why do we care about this? Well, as we’ve said at the start we want all physical objects to “behave well” under Lorentz transformations. In our new language this means exactly that any physical object must belong into a representation of the Lorentz group. Otherwise, one object would turn into a completely different object when we changed our inertial frame. That of course would be contradictory to our physical intuition and observation.
Equations of motion
The scalar
Lets put all of those abstract concepts to work and start with the simplest case. The first representation is \((0,0)\) and it has precisely one degree of freedom. Following the above discussion this means that objects in this representation do not change at all under Lorentz transformations since it has nothing to change into. We call such an object a scalar denoted as \(\phi(x)\).
We now require two properties from a candidate equation of motion for \(\phi(x)\). The first comes to us from Newton himself, an equation of motion should have no more than two time derivatives since we have never observed a system that needs more than the initial speed and initial position to define its time evolution. The second is that is our physical object satisfies the equation, then it should satisfy the same equation in every inertial frame of reference. This is sometimes called Lorentz co-variance since the equation changes to fit the change in the variable. This of course follows from the principle of (special) relativity, the same equation should apply in every inertial frame. In the case of the scalar \(\phi(x)\) which does not change, we would also ask of the equation to stay the same under Lorentz transformation. Following the two rules above we are left with a single choice in equation, namely the Klein-Gordon equation: $$(\partial^2+m^2)\phi=0\quad (1.5)$$
where \(m^2\) is a constant (it obviously doesn’t change under \(SO(3,1)\)) that can be interpreted as the mass of \(\phi\) and \(\partial^2 = \partial^2_t -\partial^2_t-\partial^2_y-\partial^2_z\) which is sometimes called the d’Alembertian, similar to the Laplacian \(\nabla^2\).
That wasn’t too hard. Based solely on the way that the scalar transforms under Lorentz transformations (it doesn’t) we found that there is only one equation of motion that is could possibly satisfy. There are no other Lorentz invariant terms we could add.
The fermion
The next representation on our list is (1/2,0). Following our formulas an object in this representation should have 2 degrees of freedom. Sound simple enough, we denote this object by a two-component column vector
$$ \psi(x)=\begin{pmatrix} \psi_1(x)\\ \psi_2(x) \end{pmatrix} $$
How does this object transform? Well we need a set of three \(2\times 2\) matrices that satisfy the relations (1.3) with the right eigenvalues for \(J^2_{+}\) and \(J^2_{-}\) namely \(\frac{1}{2}\) and \(0\). That’s simple enough, we choose the Pauli matrices
$$\sigma_1 = \begin{pmatrix} 0&1\\1&0 \end{pmatrix},\quad \sigma_2 = \begin{pmatrix} 0&-i\\i&0 \end{pmatrix},\quad \sigma_3 = \begin{pmatrix} 1&0\\0&-1 \end{pmatrix} $$
and so we define
$$J_i = \frac{1}{2}\sigma_i,\quad K_i=\frac{-i}{2}\sigma_i\quad (1.6)$$
In the same way, the (0,1/2) representation acts on objects that look like:
$$\chi(x)=\begin{pmatrix} \chi_1(x)\\ \chi_2(x) \end{pmatrix} $$ with matrices that look like: $$J_i = \frac{1}{2}\sigma_i,\quad K_i=\frac{i}{2}\sigma_i\quad (1.6)$$
Now come the problems. There is one more symmetry that we would like our theory to have. This doesn’t fit into the Lorentz group because it’s discrete. This extra symmetry is parity, (or \(\vec{x}\rightarrow -\vec{x}\)). Again the reason we want this symmetry is because our world seems to obey it4. What does this mean for us? Well under a parity transformation a boost is reversed, this means that \(M_i\rightarrow -M_i\) and so \(J_{+}\rightarrow J_{-}\) and vice versa. But now we got exactly what we were trying to avoid, we have an object jumping representations between frames of reference. To solve this we need to describe the physical object with a combination of the two representations, namely a 4-component column vector
$$\Psi(x) = \begin{pmatrix}\psi(x)\\ \chi(x) \end{pmatrix} \quad (1.7) $$ We call this object a Dirac spinor. Now the top two components transform with (1.6) and the bottom two with (1.7). We say that a Dirac spinor belongs in the sum representation \((1/2,0)\oplus(0,1/2)\) of SO(3,1). The elements of this representation are just a block diagonal matrix with the elements of one representation diagonal to the elements of the other. Schematically $$ (1/2,0)\oplus(0,1/2) = \begin{pmatrix}(1/2,0)&0\\ 0&(0,1/2) \end{pmatrix} \quad (1.8)$$
Well now we have a different problem, we started with an object with 2 degrees of freedom but our Dirac spinor has 4. We have too much freedom! To solve this we need to project \(\Psi\) into a space with 2 degrees of freedom. To do this we use the four-momentum as a variable so \(\Psi(p)\) and we got to the fermion’s rest frame so that \(\vec{p}=0\).
We now define a projection matrix that we denote as \(\mathcal{P}=\frac{1}{2}(1-\gamma^0)\) for future convenience. \(\gamma^0\) is, for now, just some \(4\times 4\) matrix. After we act once with \(\mathcal{P}\) we project \(\Psi\) to a specific basis and so, if we act with \(\mathcal{P}\) again we would expect nothing to happen since we are already in the correct basis. This translates into \(\mathcal{P}^2=\mathcal{P}\) which also means that \((\gamma^0)^2= I\) and so its eigenvalues are \(\pm1\).
Of course this projection cannot just take us into 2d space of \(\psi\) or \(\chi\) since we would run into exactly the problem we would like to avoid, we would again not satisfy parity. We need a projection that treats both representation in the same way and so we chose
$$\gamma^0 = \begin{pmatrix} 0&0&1&0\\0&0&0&1\\1&0&0&0\\0&1&0&0 \end{pmatrix} $$ There are many matrices we could’ve chosen, this is just a matter of convention. If we act with \(\mathcal{P}\) on \(\Psi\) we go into the subspace where \(\psi_i-\chi_i=0\). This projection can be written as
$$(\gamma^0-1)\Psi(p_r)=0\quad (1.9)$$ where \(p_r\) is the four-momentum in the rest frame. To reiterate, what we accomplish by enforcing (1.9) is to discard 2 of the 4 degrees of freedom of \(\Psi\) to get back to the 2 we want.
Equation (1.9) looks a lot like an equation of motion. To see that all we have to do is to boost the system into an arbitrary frame with an arbitrary momentum. We have the generators so we know that a finite boost is just
$$\Psi(p) = e^{\vec{\phi}\cdot\vec{M}}\Psi(p_r)\quad (1.10)$$ where \(\phi\) is the “angle” of the boost5 and
$$\vec{M}=(M_x,M_y,M_z)$$
From linear algebra we know that to transform a matrix into a different basis we need to multiply it from the left and the right with the transformation and its inverse so $$\gamma^0\rightarrow e^{\vec{\phi}\cdot\vec{M}}\gamma^0 e^{-\vec{\phi}\cdot\vec{M}} $$
Thus, we get Dirac’s equation
$$(e^{\vec{\phi}\cdot\vec{M}}\gamma^0 e^{-\vec{\phi}\cdot\vec{M}}-1)\Psi(p)$$
To bring this into a more physical form, we define
$$\frac{\gamma^\mu p_\mu}{m} \equiv e^{\vec{\phi}\cdot\vec{M}}\gamma^0 e^{-\vec{\phi}\cdot\vec{M}} $$ where we sum over repeated indices (Einstein’s summing convention) and so we finally get
$$(\gamma^\mu p_\mu - m)\Psi(p)=0$$
If we wanted to to go back to position space instead of momentum space all we would have to do with Fourier transform this expression to get
$$(i\gamma^\mu\partial_\mu-m)\Psi(x)=0\quad (1.11)$$
And there we have it, the equation of motion for an object that belongs in the (1/2,0) representation of the Lorentz group. By the way, the non-zero eigenvalue of the representation (in this case 1/2) corresponds to the spin of the object. That is, spin is an inherent property of an object that corresponds to how it transforms under Lorentz transformations.
What did we learn from this? Well we learned what Gauge symmetry is! In order to fit all of the symmetry we need into our physical objects we had to add more degrees of freedom than the object actually had. These are sometimes called unphysical degrees of freedom since they should not affect any property of the system. Then, to get back to the physical degrees of freedom we put a constraint -here (1.11)- on our higher-dimensional object -here \(\Psi\)- which ends up being its equation of motion. This concept of unphysical degrees of freedom is called gauge symmetry since we can change them without changing the physical system6.
One last thing to note: it turns out that the matrices \(\gamma^\mu\) satisfy the anticommutation relations
$$ {\gamma^\mu,\gamma^\nu } = \gamma^\nu\gamma^\mu+\gamma^\mu\gamma^\mu = 2g^{\mu\nu} $$
where \(g=diag(1,-1,-1,-1)\) is the Minkowski metric. Therefore, if we multiply (1.11) with \((-i\gamma^\nu\partial_\nu - m)\) we get
$$(\partial^2 + m^2)\Psi(x)\quad (1.12)$$
Well well, if it isn’t the Klein-Gordon equation (1.5) for the scalar object. What does this mean? Well, one way to look at this is to say that everything in Relativity, regardless of representation, satisfies the K-G equation and we need to supplement it with certain constraints so that the number of degrees of freedom match.
The spin-1 field
The last object we will look at is the vector spin-1 field, also known as the object that transforms in the \((1,0)\) representation. Our vector has \((2\cdot 1+1)=3\) degrees of freedom. We already know of an object that contains a Lorentz 3-vector, namely a 4-vector \(A_\mu = (A_o,\vec{A})\). As before, we expect this extended vector to satisfy the Klein-Gordon equation
$$(\partial^2+m^2)A_\mu=0\quad (1.13)$$
We now need to cut one degree of freedom. Turns out that, with our limited tools, the only constraint we can impose that is both Lorentz covariant and projects out precisely one degree of freedom is
$$\partial^\mu A_\mu=0\quad (1.14)$$
We can in fact combine both conditions (1.13) and (1.14) into one equation, namely $$(g^{\mu\nu}\partial^2-\partial^\mu\partial^\nu)A_\nu+m^2A^\mu=0\quad (1.15)$$
Notice that if we act with \(\partial_\mu\) on (1.15) the first two terms cancel each other out and we are left with $$m^2\partial_\mu A^\mu=0$$
which is just (1.14). We can now use (1.14) on (1.15) to get rid of the second term and get (1.13).
With just a little more formalism we can get to an even cooler result. Remembering our classical mechanics we can easily see that equation (1.15) can be derived from the following action density/Lagrangian
$$S = \int d^4x \mathcal{L} = \int d^4x\left[ \frac{1}{2}A_\mu [(\partial^2+m^2)g^{\mu\nu}-\partial^\mu\partial^\nu]A_\nu \right] \quad (1.16)$$ where the integral runs over all of spacetime.
It is well known that Electromagnetism has a gauge symmetry all of its own: we can change \(A_\mu\) by the derivative of a function without affecting the physics (i.e. the electric and magnetic fields). This acts in exactly the same way as the gauge symmetry in the fermion case, it tells us that \(A_\mu\) does not have 3 degrees of freedom after all, it only has 2. The transformation is $$A_\mu(x)\rightarrow A_\mu(x)+\partial_\mu a(x)$$ and we would like for our action (1.16) to stay the same under such a transformation. If we go through the calculations and discard total derivative terms such as \(\int d^4x \partial^\mu(A_\mu \partial_\mu a(x))\) because they vanish at infinity, we can see that the only problematic term is the mass which gives terms of the form $$m^2\int d^4x A_\mu \partial^\mu a(x) \neq 0 $$
Fitting a total of 3 conditions in one single equation we are left with the action
$$S_{em}= \int d^4x \mathcal{L} = \int d^4x\left[ \frac{1}{2}A_\mu [(\partial^2g^{\mu\nu}-\partial^\mu\partial^\nu]A_\nu \right] \quad (1.17)$$ or, if we integrate by parts and again assume that \(A_\mu\) vanishes at infinity we get our familiar expression
$$S_{em} = \frac{-1}{4}\int d^4x (\partial^\mu A^\nu - \partial^\nu A^\mu) (\partial_\mu A_\nu - \partial_\nu A_\mu)\equiv \frac{-1}{4}\int d^4x F^{\mu\nu}F_{\mu\nu} $$ where \(F^{\mu\nu}\) is the electromagnetic tensor, sometimes called Faraday’s tensor.
So there it is, after applying all of the symmetries we would like our theory to have, we have reached the only possible equations they could satisfy. To do that we started again with a “free-er” object and then constrained it down to the actual degrees of freedom we want.
Conclusion
Well there it is, the foundation of modern physics: symmetry and redundancy. Following the same methodology as above we can construct the equations of motion for any object in any representation of the Lorentz group and, if Relativity is true, that should include about everything in the universe. Maybe in the future someone will find a way to describe physics without the need unphysical degrees of freedom but maybe that’s just the way things are, and in my oponion they’re not all that bad.
Sources
[1] A vast amount of this article is taken or inspired from A. Zee’s fantastic book “Quantum Field Theory in a Nutshell”. Any potential reader should know that this is not meant to be an introduction to QFT but rather a supplement to a more rigorous textbook.
[2] Any quantum mechanics textbook should contain a section on the representations of SU(2) usually going under the name “angular momentum algebra” or “spin representations”.
[3] Likewise, information about Lie algebras, generators and all of that can be found in books about group theory and continuous group. A. Zee also has a group theory book with the innovative name “Group Theory in a Nutshell”.
Back to homepage https://principiaphysicaegeneralis.com/
Appendix: Representations of groups
A group is defined as an abstract set of objects \(G={e,g_1,…,g_n } \) together with a form of multiplication such that
i)There exist an identity element \(e\): \(e\cdot g_i = g_i\cdot e=g_i,\quad \forall g_i \in G\)
ii) There exists an inverse element \(g_i^{-1}:g_i^{-1}\cdot g_i = g_i\cdot g_i^{-1} = e, \quad \forall g_i\in G \)
iii) The group is closed under multiplication: \(g_i\cdot g_j\in G,\quad \forall g_i,g_j\in G\)
An example of a group is the set of all rotations in two dimensions, also called \(SO(2)\) or the special orthogonal group. To understand that name we need to talk about representations.
A representation is a set of matrices \(D(g)\) that satisfy the group multiplication. What that means is that
$$D(g_i)\times D(g_j) = D(g_i\cdot g_j) $$ where \(\times\) denotes the usual matrix multiplication. Following our example, a possible representation of \(SO(2)\) is the set of matrices $$D(\theta) = \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix}$$
This is non other than the rotation matrix in two dimensions. If we rotate our system by an angle \(\theta\) then our new coordinates will be
$$\begin{pmatrix} x’ \\ y’ \end{pmatrix} = D(\theta)\begin{pmatrix} x \\ y \end{pmatrix} $$ We call this representation the fundamental or defining representation of the group. Now we can translate the name of the group. Special means that the fundamental representation the group has a determinant of one and orthogonal means that \(D^T(\theta)=D(\theta)^{-1}\) where T denotes the transpose.
The next step is to see how rotations affect functions of \((x,y)\). We notice that rotating a function by an angle, let’s say \(\theta\), has the exact same effect as rotating the coordinate system by the opposite angle \(-\theta\). To see this think of a clock as the coordinate system and it’s hand as the function. If we rotate the hand from 1 to 2 we get the exact same result as if we had rotated the numbers underneath the hand from 1 to 12, the hand ends up pointing at 2 either way. Therefore we define a rotation \(R_\theta\) on a function \(f(x,y)\) as
$$R_\theta f(\vec{x}) \equiv f(D^{-1}(\theta) \vec{x})$$
There is no general rule for \(R_\theta\) but there are specific functions that “act nicely” under rotation. We define the basis functions \({\phi_n}\) of the group \(SO(2)\) (or any group for that matter) as the functions that obey the following transformation rule
$$R_\theta\phi_n = \sum^d_m R_{mn}(\theta) \phi_m$$ where \(R(\theta)\) are \(d\times d\) matrices that form a representation of the group. Every representation has a set of \(d\) basis functions where \(d\) is the dimension of the representation.
Each representation comes with its own generators of the Lie algebra of the group. All you have to do is expand \(R_(\theta)\) around \(\theta=0\) and keep only the terms of order \(\theta\) to get \(d\)-dimensional matrices satisfying the commutation relations of the group. We saw an example of these in (1.1).
One way to characterise each representation is by finding a matrix, called the Casimir-matrix \(\mathcal{C}\), that commutes with all generators of the group. This means that if \({T_i}\) the generators, we have $$ [T_i,\mathcal{C}]=0,\quad \forall i$$
This has an interesting consequence, \(\mathcal{C}\) ends up having a single eigenvalue since, assuming that \(\vec{v}_\lambda\) is one of \(\mathcal{C}\)’s eigenvectors with eigenvalue \(\lambda\), we have
$$ \mathcal{C}(T_i \vec{v}_\lambda) = T_i(\mathcal{C}\vec{v}_\lambda) = \lambda (T_i \vec{v}_\lambda)$$
which means that \(T_i \vec{v}_\lambda\) is also an eigenvector with the exact same eigenvalue. But we can get any vector we want by rotation, and so \(\mathcal{C}\) has a single eigenvalue. This is also known as Schur’s first lemma. Therefore, we can characterise a representation of the group by the eigenvalue of the Casimir matrix. In the main text we used the fact that the matrix \(J^2\) is the Casimir matrix of \(SO(3)\) or the group of rotations in 3d space, and so we characterised the representations by it’s eigenvalue \(j\). More precisely the eigenvalue of \(J^2\) is \(j(j+1)\) but that’s besides the point.
-
General Relativity is not all that different but the formalism would be too dense for this article. ↩︎
-
As a matter of convention, relating to quantum mechanics, physicists usually define the generators as Hermitian matrices so in this case \(J_{z,physicist}=iJ_{z,here}\) ↩︎
-
if not, look up ladder operators. ↩︎
-
There are exceptions to this but this is beyond the scope of this article ↩︎
-
{In special relativity we define this angle as \(\tan\phi_i=v_i/c\) where \(\vec{v}\) is our desired velocity and c the speed of light. ↩︎
-
This explains the symmetry part of the name. The gauge part is a little bit harder since it comes from an 80-year old theory of Weyl’s and has stuck around for no particular reason. ↩︎