A new study from the University of Wisconsin investigates how small transformers trained by random initialization can efficiently learn arithmetic operations using the next token prediction goal

https://arxiv.org/abs/2307.03381 For various downstream tasks, including language and code translation, compositional thinking, and fundamental arithmetic operations, large language models such as GPT-3/4, PaLM, and LaMDA exhibited general characteristics, sometimes emergent abilities. Perhaps surprisingly, the model’s training goal, which is often an autoregressive loss based on predicting the next token, doesn’t directly encode these goals. These … Read more

AMD Announces $135 Million Investment Plan to Expand Adaptive Computing Research, Development and Engineering Operations in Ireland

The investment aims to add up to 290 new jobs and fund research and development projects for next-generation AI, data centers, networks and 6G communication infrastructure DUBLIN, Ireland, June 21, 2023 (GLOBE NEWSWIRE) — AMD (NASDAQ: AMD) today announced plans for continued growth in Ireland through an investment … Read more