OpenAI and Microsoft are apparently planning to build a $100 billion data center codenamed Stargate. We discuss how this compares to existing data centers and other planned investments into AI centric infrastructure. They don’t seem to have enough time to design special purpose networking and other hardware, but it is a massive investment compared to other plans.
We discuss the design problems you have to solve when creating a data center. You need to power it, you need to make sure it stays cool, provide networking, and provide resilience and redundancy for everything. We also discuss how Google data centers are a little different from the norm.
Finally, we discuss the AI chips that could be present in a data center. Primarily, this means Nvidia gpus or Google TPUs. There are implications for the software stack and ultimate usability of the system.