A significant number of drugs fail during the clinical testing stage. To understand the attrition of drugs through the regulatory process, here we review and advance machine-learning (ML) and natural language-processing algorithms to investigate the importance of factors in clinical trials that are linked with failure in Phases II and III. We find that clinical trial phase transitions can be predicted with an average accuracy of 80%. Identifying these trials provides information to sponsors facing difficult decisions about whether these higher risk trials should be modified or halted. We also find common protocol characteristics across therapeutic areas that are linked to phase success, including the number of endpoints and the complexity of the eligibility criteria.