I review the physical processes involved in massive star and star cluster formation. I describe how these are combined in a theoretical model - Core Accretion - for massive star formation, which assumes this process is a scaled-up version of low-mass star formation. The assumption of initially massive starless cores that are near virial equilibrium can be tested by studies of Infrared Dark Clouds. I show some of our latest observations of these clouds, including results from ALMA. At later stages when the protostar is forming and becoming infrared bright, the morphology is determined by bipolar outflow cavities. Their appearance from ~10 to 40 microns tests the properties of the core immediately surrounding the massive protostar. The predictions of the model appear to be validated in at least several nearby examples. The case of the massive protostar in Orion KL is more complex, but I discuss how it also can be understood in the context of Core Accretion theory. Larger samples of massive stars and star clusters are needed for more stringent tests of theoretical models, and I describe first results from the Galactic Census of High and Medium-mass Protostars (CHaMP), which is based on a complete mapping of dense molecular gas in a region of the southern Milky Way. Finally, I discuss application of massive star formation theory to the early universe: how massive were the first stars and could they have been the progenitors of supermassive black holes?