Opinion: Rationalizing Latency Competition in High-Frequency Trading

Introduction

There is a common misunderstanding, even among practitioners, that low-latency trading is a waste of human talent and resources that could instead go to advancing physics or curing cancer. It’s been attacked by books like Flash Boys, governments trying to pass transaction taxes, and exchanges bending to pressure by implementing speed bumps or periodic batch auctions. This essay argues the positive case for HFT and latency competition based on four main reasons: (1) Low latency trading lowers spreads, (2) Economically significant things do happen on sub-millisecond time scales, (3) HFT is the optimization layer for capitalism, and (4) Markets are not a zero-sum game.

Latency plays a significant role in how humans interact with the world. When we perceive and respond to stimuli, there is an inherent delay between the occurrence of an event and our reaction to it. This delay, or latency, is typically around 200 milliseconds, which is the time it takes for light to enter the eye, be converted into an electrical signal, traverse the brain’s neurons to decide on a response, and finally travel to the muscles to trigger an action.

The implications of this latency extend beyond simple reactions. When we make decisions based on information from various sources, the age of that information can significantly impact the accuracy and effectiveness of our choices. For instance, reading news from a day-old newspaper means basing decisions on information that is 24 hours old. In the past, communication latency was even more pronounced. In 1776, Americans made decisions based on information about Europeans that was around 3 weeks old, due to the time it took for ships to cross the Atlantic.

Latency also plays a role in more mundane interactions, such as negotiating the price of a used car. Both the buyer and seller enter the negotiation with pre-existing knowledge and expectations, but the actual negotiation process involves a rapid exchange of information through verbal and non-verbal cues. This high-frequency interaction helps both parties uncover additional information about the other’s willingness to pay or sell, ultimately leading to a mutually acceptable price.

In essence, latency is present in all human interactions, from simple reactions to complex decision-making processes. Understanding the impact of latency on these interactions is crucial for developing effective strategies to manage and mitigate its effects, both in personal and professional contexts. The field of high-frequency trading has emerged as a means to address latency in financial markets, aiming to facilitate more efficient price discovery and reduce the potential for market distortions.

Latency Increases Spreads

One of the most important ways that HFT benefits markets is by reducing bid-ask spreads. The bid-ask spread is the difference between the highest price a buyer is willing to pay for an asset (the bid) and the lowest price a seller is willing to accept (the ask). Spreads represent the cost of transacting in a market – the wider the spread, the more expensive it is to trade. As a reminder, spreads matter so much because of the famous Coase Theorem from economics. Under the Coase Theorem, in a world with zero transaction costs the allocation of resources will be efficient regardless of the initial distribution of property rights, government tariffs, or organization of firms.

But in the real world, transaction costs are never zero. And the biggest source of transaction costs in electronic markets is the bid-ask spread. The wider the spread, the more friction there is in the market, and the harder it is for prices to efficiently reflect all available information.

Let’s see why spreads get tighter when latency is decreased. Market makers set the spread by posting bids and offers. Let’s take a bid for example. If the market maker is filled on the bid, they will want to hedge their risk on a similar security. Imagine you’re making markets on the S&P 500 futures in Chicago. If your bid is filled you are long S&P500. You need to hedge by selling short S&P 500 stocks in New York. If the latency to New York is high, then you may not be able to fill your hedge orders at a good price. In fact when market makers quote on S&P 500 futures they will use pricing logic to estimate their potential hedge prices, and only quote if there are reliable liquid hedges available in another market. High latency reduces the reliability of hedges.

Let’s plug in some numbers. Say there’s a bid for the ETF SPY for $100 on Nasdaq in New York. Then market makers can bid CME ES futures for $99.99 in Chicago, since if they’re filled, they can hedge by selling to lock in a $0.01 profit in New York. However if latency is high, then by the time their sell order goes over the network from Chicago to NY, the Nasdaq bid might have dropped below $100, so the market maker either makes no profit or has a loss. Prices are volatile so the longer the order is in flight on the network, the more risk there is the price could drop. If latency is very long, market makers might only bid for ES for 99.98. Similarly the offer side will be wider so spreads will be wide.

We can quantify this effect using some concepts from options pricing theory. A market maker’s quote is equivalent to writing an option that the market can execute against. The width of their quotes corresponds to the premium they charge for this option. And one of the key drivers of options premiums is time to expiry – the longer the option lasts, the more risk the option writer takes on, and thus the more expensive the option. In this context, latency is equivalent to time to expiry for a market maker. The longer it takes them to hedge an executed quote, the more risk they are exposed to, and the wider the spreads they will need to charge. Cutting latency is like reducing the lifespan of the options that market makers are implicitly short – it makes their job less risky and enables them to charge lower premiums, i.e., quote tighter spreads. If time to expiry is cut in half, then by Black-Scholes the option price goes down by 30% (sqrt(1) to sqrt(0.5)). Note there are other frictions that contribute to wider spreads, like monopolist exchanges charging high fees, so the 30% reduction would only apply to the liquidity-option-selling aspect of market making.

Even bringing latency down from 100 microseconds to 10 microseconds significantly shrinks the option-selling cost of market making since volatility is spiky in short time intervals. In short time intervals rather than measuring time in wall clock time, volatility is proportional to market data packet time. A lot of market updates can happen in a few 10s of microseconds because market activity is bursty and discontinuous.

Studies from around the world empirically confirm that HFT decreases spreads. The following summarize the conclusions of the SEC, a regulator from ESMA, and Bank of Canada-
SEC: “Algorithmic trading in general, and HFT specifically, increases the accuracy of prices and lowers transaction costs”
European Securities and Markets Authority regulator: “investments in high-frequency trading technology provide positive economic spillovers to the overall market since they reduce transaction costs not only for those who invest in this technology but for all market participants by enhancing the quality of securities markets.”
Bank of Canada: “Passive HFT entry leads to a tightening of the best incumbent bid-ask spread. … incumbents tighten spreads by approximately 0.8 basis points on average.”
Sources:
Gerig, Austin. “High-Frequency Trading Synchonizes Prices in Financial Markets” (2012) https://www.sec.gov/files/dera-wp-hft-synchronizes.pdf
In Clapham, B., Haferkorn, M. & Zimmermann, K. The Impact of High-Frequency Trading on Modern Securities Markets (9/2022), https://link.springer.com/article/10.1007/s12599-022-00768-6.
Jonathan Brogaard, Corey Garriott, Anna Pomeranets, “In High-Frequency Trading Competition”, Working Paper 2014-19, https://www.bankofcanada.ca/wp-content/uploads/2014/05/wp2014-19.pdf

Feedback Loops in Global Supply and Demand Negotiation

HFT plays a crucial role in handling supply and demand negotiations to uncover true prices. The faster these negotiations are resolved, the fewer opportunities there are for feedback loops to form. Feedback loops can result in incorrect prices and even bubbles, but HFT helps prevent these issues by anticipating and rapidly propagating price updates.

Consider a prototypical example of a feedback loop:
1) The price of gold increases in London.
2) 100ms later, people in America see the price increase and raise their prices.
3) 100ms after that, people in London see the price increase in America and consider raising their prices.
4) The feedback loop continues.
HFT breaks these feedback loops by propagating information more quickly and using short term predictive signals to untangle causality. While the example above may seem trivially solvable, reality is exponentially more complex. There are thousands of interacting assets, news being released in different locations, and exogenous forces of supply and demand from anonymous participants buying and selling.

Bad, undampened positive feedback loops manifest as bank runs, flash crashes, and bubbles. When these persist they cause inefficient resource allocations for longer periods of time. Most assets’ intrinsic values don’t impose tight price constraints. For example gold may be partly upper bounded by the cost of mining deeper deposits or lower bounded by its value in industrial applications. Prices can remain incorrect for a long time if they’re allowed to get out of whack initially, especially once people lose track of how the prices got there.

With that background, let’s see why speed and latency matter so much. Suppose that each person has about one financial need per hour. This could be buying a meal, leaving a tip, turning down the AC to save money, flipping a light switch, paying a recurring subscriptions, taxes, etc. With 6 billion people, one transaction per hour (3600 seconds) is about one every millionth of a second, aka one microsecond. HFT systems can react in about 1 microsecond. In reality most transactions an individual makes don’t have enough supply and demand information to move the price.

But we’ve undercounted the number of transactions by a few orders of magnitude. For one, there are many other entities producing financial transactions- bonds, contracts, businesses, etc. Second, a large move in one asset implies price changes in other things. Imagine that a major company announces positive earnings, causing a hedge fund to buy its stock so the price jumps. This price change contains information that is relevant to many other assets – the company’s suppliers, its competitors, maybe even broad market indices – triggering a cascade of trades. HFTs need to be fast enough to handle every single transaction. Since every transaction echoes information back to its source, ideally the entire network-wide update can resolve before any new exogenous information arrives, to avoid confusion.

To emphasize the complexity of identifying the trigger of a trade it helps to know a little bit about the market. Most exchanges are electronic after Covid shuttered the remaining pits. With humans out of the loop, the biggest source of latency is from information traveling long distances. Economies scattered around the planet are interlinked. It takes information tens of milliseconds to travel from NY to London on Hibernia’s fiber optic cable. So when gold prices are moving it can be hard to identify the root cause, because echoes are bouncing back and forth between continents creating interfering waves and sometimes resonating into feedback loops. With Raft shortwave this is a few milliseconds shorter. Luckily when the sun shines on one continent, producing energy and activity from people being awake, most price formation happens there. So when it’s the European trading session, Asia is relaxing after work and America is still sleeping. Feedback loops are always present, but some HFT firms reduce their computation by simply assuming the exchange where it’s daytime has the correct price and the only relevant supply and demand info – but this doesn’t always work since many exchanges are open 23 hours.

Alpha signals can help unravel the causality behind sequences of events, to avoid exacerbating feedback loops. Alphas can also simultaneously produce accurate multi symbol first-order price updates. Since this article is about latency and alphas have unique proprietary variations, we will not elaborate more on alphas.

With each transaction causing ripple effects that need to be processed within microseconds to avoid destabilizing feedback loops, even nanosecond-level improvements benefits market efficiency. By continually mediating supply and demand at high speeds, HFT serves as a stabilizing force in the market. In an increasingly fast-paced and interconnected global economy, this is a necessary service.

Low Level Optimization Layer of Capitalism

Electronic markets are information processing systems with a layered architecture, similar to other systems like the internet and CPUs. Each layer relies on the layers below to abstract away higher frequency tasks. HFT is the second-lowest layer of the market information processing system, comparable to the branch predictor in a CPU.

Branch prediction is not essential for running software, but it makes everything more efficient. Similarly, HFT is not the most important component of the financial system, but it’s worth having a few specialists focusing on it to make everyone else more efficient. This optimization role would likely exist even in alternative economic systems, such as a planned cybernetic socialist economy.

Let’s see the parallels between the layers around CPUs and electronic markets:

CPU:
Application: Internet, email, gaming (milliseconds)
C++: Types, higher-order functions, data structures (microseconds)
Assembly: Simple operations (nanoseconds)
Branch prediction (nanoseconds)
Transistors: Silicon (picoseconds)

Electronic markets:
Investors: fundamental valuation (years)
Hedge funds: earnings prediction (1 quarter)
Stat arb: first-order news and relative value (hours – days)
HFT: supply and demand (microseconds – minutes)
Exchange: transaction database (microseconds)

In the context of capitalism, HFT serves as a low-level optimization layer in the global price discovery process. Capitalism can be viewed as a distributed and decentralized system for resource allocation, in contrast to centrally planned economies. Solving the Nash Equilibrium of an economy is NP-complete (Christos Papadimitriou, 2008). Central planning often leads to accumulated pricing errors and societal collapse, as seen in the Soviet Union and Chile. Committees of experts struggle to determine the costs and supply chains for even basic goods (Leonard Read, 1958). Price discovery is a non-trivial problem.

The HFT industry is relatively small, with total revenue less than $30 billion in most years, even lower costs, and headcount in the single-digit thousands across all firms. Similarly to how there aren’t many Intel CPU engineers or GCC compiler devs compared to the total number of web developers, the HFT industry supports a much larger ecosystem. McDonald’s, Exxon, JP Morgan, etc each have workforces of around 100k, individually dwarfing the entire HFT industry, while benefiting from more accurate agriculture, oil, bond, and FX pricing from the thin HFT layer below.

Some people might say that flip phones were powerful enough, or that calculators used to be fast enough, and yet with each new speedup we enable new capabilities, such as smartphones, video streaming, and AI. In 1943, Thomas Watson, president of IBM, supposedly remarked, “I think there is a world market for maybe five computers”. As HFT improves efficiency, new applications also become feasible, like personal retirement portfolios and placing trades throughout the day without needing to worry about careful timing. We will speculate on more possibilities in the conclusion.

Criticism of HFT as a resource sink or brain drain from other fields may be misplaced, as it plays a vital role in the operating system of capitalism. It functions as the optimization layer, facilitating the efficient operation of this complex, decentralized economic computer.

Not “Winner-takes-all”

Even some people who work in the industry are disillusioned by latency competition. They are generally in one of two groups: people working at smaller, less-profitable firms that worry their lack of traction is due to latency being an “all or nothing game”; and second, people at big firms that feel their micro-optimization work maintains a moat for another year or two but will be replaced shortly and not have any lasting value to humanity. They both think, isn’t it wasteful to invest so much in ASICs in a race to the bottom to shave off 5 nanoseconds to capture a winner-takes-all profit in a zero-sum game? Isn’t it a misallocation of resources for so many companies to build something, for only one to win, and the rest to have squandered resources and time? Even if I’m getting paid well, should I work on something more lasting instead? In this section I’ll explain the nuances of the industry to show how we are in fact all cumulatively building up technology stacks that collectively improve the market.

In reality, companies are mostly rational. Most decided that ASICs aren’t a good investment. For the few that did, one may focus on futures while the other focuses on options, so they could both use them effectively in their own markets. As you should be able to see now, the market has numerous layers and niches, room for many trading bots with different specialties.

There was a time around 2007 when multiple firms tried to build direct fiber optic lines from NY to Chicago to compete on SPY vs ES. This was parodied in the Hollywood blockbuster flop The Hummingbird Project and mischaracterized in Michael Lewis’s Flash Boys. Investments in expensive telecommunications lines like Spread networks resulted in cost overruns for companies including Getco and Chopper. However, those business decisions could be excused because there were numerous complicating strategic factors. They may not have known how many others were racing with them; They couldn’t imagine the complexity of licensing a straight path from the FCC; And few knew breakthroughs in microwave would make the payback period only a few years rather than 20 years.

On the other hand, competition squeezes people to continue improving even when they aren’t yet the bottleneck. Who asked to store 10000 photos on their phone instead of 9000, or for a 1 cent cheaper microwave? No one. Particularly since exchanges are the bottleneck now, taking 10s of microseconds, we may struggle to see how shaving off nanoseconds helps. However, over time, surplus technological capabilities have a way of paving the way for new applications. Exchanges could adopt the technological advances pioneered by HFT firms. With more efficient systems, exchanges could develop new features like implied calculations or new order types. Lower latency means higher throughput, so exchanges could list more products with less hardware. Trading firms have excess capacity so they could handle more products already. Exchanges could pass along cost savings from more efficient technical operations by lowering fees, which would lower spreads further. Unfortunately, exchanges need pressure from regulators to adopt modern systems to lower latency and fees since they are natural monopolies.

Finally, almost all competition looks like it’s winner takes all when you define the space of competition in a narrow enough region, but if you look big picture it’s never that simple. HFTs are diverse. A one nanosecond speedup in one component doesn’t allow anyone to take the whole market. Trading strategies’ reaction latencies vary by trigger type- some may be highly tuned, others may not. Strategies have different signals and risk appetites throughout the day. Exchanges want to commoditize their complements by ensuring multiple traders can make markets profitably, so they add artificial latency variance to order entry network connections. So there are very few situations where markets are truly winner-takes-all.

Conclusion

In this essay, we examined sub-millisecond latency optimization in modern electronic trading, with a focus on three key benefits: reducing spreads, disseminating fair prices, and mediating supply and demand negotiation. We saw how latency reductions allow market makers to manage risk more effectively, leading to tighter spreads that benefit all market participants. We explored HFT’s role in rapidly propagating information across related assets and preventing harmful feedback loops, thereby ensuring that prices remain an accurate reflection of supply and demand. Third, we drew analogies to low level computer optimizations that ultimately enabled new software applications as they chased Moore’s Law. Finally, we challenged the simplistic narrative that HFT is winner-takes-all with most effort going to waste. Latency improvements expand what is possible for markets, enabling innovations like smaller tick sizes and more product listings. Since the industry is profitable, even after accounting for small, wasted investments and opportunity costs, and has technological side benefits, effort on HFT is going to a good purpose. It has no negative externalities like pollution or employee injury, so it doesn’t need taxation or banning.

So what comes next? Will simulation-based models of earth’s 8b human agents, each represented by an AI, be feasible with a few orders of magnitude more compute? Will microeconomics be solved by agent-based simulation rather than finding the intersection of supply and demand curves? Could prediction markets add a layer below the news, Twitter comment wars, and podcast appearances as a more efficient clearinghouse for uncertainties about the present or future? Could debates be settled in bets not words? Will AI agent fiduciaries that know and communicate our entire personal vector of wants, needs, goods, and services instantly cross with others’ for a perfect barter system, without the lossy projection down to a single number, the price? Will electronic trading concepts become so well known that we can expand realtime central limit order books to more markets, like some tech companies have moved toward, such as Amazon for retail, Google for ads, and Uber for rides? We don’t know what improvements technology may enable but hopefully this essay has at least helped explain what currently exists. And why even nanosecond improvements are in fact valuable to society and should be cheered and encouraged rather than criticized.