Zusammenfassungen
Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.
Von Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Kewal Dhariwal, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei im Buch Language Models are Few-Shot Learners (2020) Dieses Buch erwähnt ...
Personen KB IB clear | Aidan N. Gomez , Geoffrey Hinton , Llion Jones , Lukasz Kaiser , Niki Parmar , Illia Polosukhin , Noam Shazeer , Jakob Uszkoreit , Ashish Vaswani | ||||||||||||||||||
Begriffe KB IB clear | Generative Pretrained Transformer 3 (GPT-3) , Retrieval Augmented Generation (RAG) | ||||||||||||||||||
Texte |
|
Zitationsgraph
Zitationsgraph (Beta-Test mit vis.js)
Zeitleiste
40 Erwähnungen
- Original oder Plagiat? - Der schnelle Weg zur wissenschaftlichen Arbeit im Zeitalter künstlicher Intelligenz (Doris Weßels, Eike Meyer) (2021)
- On the Dangers of Stochastic Parrots - Can Language Models Be Too Big? (Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret Shmitchell) (2021)
- Should you believe Wikipedia? - Online Communities and the Construction of Knowledge (Amy Bruckman) (2022)
- The Robots Are Coming - Exploring the Implications of OpenAI Codex on Introductory Programming (James Finnie-Ansley, Paul Denny, Brett A. Becker, Andrew Luxton-Reilly, James Prather) (2022)
- Large Language Models are Zero-Shot Reasoners (Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa) (2022)
- Story Machines - How Computers Have Become Creative Writers (Mike Sharples, Rafael Pérez y Pérez) (2022)
- What do NLP researchers believe? (Julian Michael, Ari Holtzman, Alicia Parrish, Aaron Mueller, Alex Wang, Angelica Chen, Divyam Madaan, Nikita Nangia, Richard Yuanzhe Pang, Jason Phang, Samuel R. Bowman) (2022)
- Competition-level code generation with AlphaCode (Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, Thomas Hubert, Peter Choy, Cyprien de Masson d’Autume, Igor Babuschkin, Xinyun Chen, Po-Sen Huang, Johannes Welbl, Sven Gowal, Alexey Cherepanov, James Molloy, Daniel J. Mankowitz, Esme Sutherland Robson, Pushmeet Kohli, Nando de Freitas, Koray Kavukcuoglu, Oriol Vinyals) (2022)
- The End of Programming (Matt Welsh) (2023)
- ChatGPT for Good? - On Opportunities and Challenges of Large Language Models for Education (Enkelejda Kasneci, Kathrin Sessler, Stefan Küchemann, Maria Bannert, Daryna Dementieva, Frank Fischer, Urs Gasser, Georg Groh, Stephan Günnemann, Eyke Hüllermeier, Stephan Krusche, Gitta Kutyniok, Tilman Michaeli, Claudia Nerdel, Jürgen Pfeffer, Oleksandra Poquet, Michael Sailer, Albrecht Schmidt, Tina Seidel, Matthias Stadler, Jochen Weller, Jochen Kuhn, Gjergji Kasneci) (2023)
- Theory of Mind May Have Spontaneously Emerged in Large Language Models (Michal Kosinski) (2023)
- Unlocking the Power of Generative AI Models and Systems such as GPT-4 and ChatGPT for Higher Education - A Guide for Students and Lecturers (Henner Gimpel, Kristina Hall, Stefan Decker, Torsten Eymann, Luis Lämmermann, Alexander Mädche, Maximilian Röglinger, Caroline Ruiner, Manfred Schoch, Mareike Schoop, Nils Urbach, Steffen Vandirk) (2023)
- Is Education Losing the Race with Technology? - AI's Progress in Maths and Reading (OECD Organisation for Economic Co-operation and Development) (2023)
- Modern language models refute Chomsky’s approach to language (Steven T. Piantadosi) (2023)
- Generative AI at Work (Erik Brynjolfsson, Danielle Li, Lindsey R. Raymond) (2023)
- Generative Agents: Interactive Simulacra of Human Behavior (Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein) (2023)
- Sparks of Artificial General Intelligence - Early experiments with GPT-4 (Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang) (2023)
- ChatGPT und andere Computermodelle zur Sprachverarbeitung - Grundlagen, Anwendungspotenziale und mögliche Auswirkungen (Steffen Albrecht) (2023)
- Tree of Thoughts - Deliberate Problem Solving with Large Language Models (Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Ca, Karthik Narasimhan) (2023)
- The Curse of Recursion - Training on Generated Data Makes Models Forget (Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Yarin Gal, Nicolas Papernot, Ross Anderson) (2023)
- Testing of Detection Tools for AI-Generated Text (Debora Weber-Wulff, Alla Anohina-Naumeca, Sonja Bjelobaba, Tomáš Foltýnek, Jean Guerrero-Dib, Olumide Popoola, Petr Šigut, Lorna Waddington) (2023)
- AI model GPT-3 (dis)informs us better than humans (Giovanni Spitale, Nikola Biller-Andorno, Federico Germani) (2023)
- The Future of AI in Education - 13 Things We Can Do to Minimize the Damage (Arran Hamilton, Dylan Wiliam, John Hattie) (2023)
- Künstliche Intelligenz - Dem Menschen überlegen - wie KI uns rettet und bedroht (Manfred Spitzer) (2023)
- Does GPT-4 pass the Turing test? (Cameron R. Jones, Benjamin K. Bergen) (2023)
- Large Language Models und ihre Potenziale im Bildungssystem - Impulspapier der Ständigen Wissenschaftlichen Kommission der Kultusministerkonferenz (SWK Ständige Wissenschaftliche Kommission der KMK) (2024)
- Talking about Large Language Models (Murray Shanahan) (2024)
- Dialect prejudice predicts AI decisions about people's character, employability, and criminality (Valentin Hofmann, Pratyusha Ria Kalluri, Dan Jurafsky, Sharese King) (2024)
- Pädagogik 3/2024 - KI in der Schule (2024)
- Alles überall auf einmal - Wie Künstliche Intelligenz unsere Welt verändert und was wir dabei gewinnen können (Miriam Meckel, Léa Steinacker) (2024)
- schule verantworten 1/2024 - Künstliche Intelligenz (2024)
- Resisting Dehumanization in the Age of «AI» (Emily M. Bender) (2024)
- Writing at a Distance - Notes on Authorship and Artificial Intelligence (Hannes Bajohr) (2024)
- Kompetenzen kommunikativen Handelns im Kontext mediatisierter Digitalität (Ann-Kathrin Watolla) (2024)
- The Singularity is nearer (Ray Kurzweil) (2024)
- 2. Reinventing Intelligence
- Deepfakes und manipulierte Realitäten - Technologiefolgenabschätzung und Handlungsempfehlungen für die Schweiz (Murat Karaboga, Nula Frei, Manuel Puppis, Daniel Vogler, Patric Raemy, Frank Ebbers, Greta Runge, Adrian Rauchfleisch, Gabriele de Seta, Gwendolyn Gurr, Michael Friedewald, Sophia Rovelli) (2024)
- Generative KI und betriebliche Bildung/Personalentwicklung - Orientierung – Befähigung – Weiterentwicklung (Christoph Meier) (2024)
- Generative KI-Systeme in der Lehre systematisch anleiten (Timon Rimensberger) (2024)
- DELFI 2024 (Sandra Schulz, Natalie Kiesler) (2024)
- ChatGPT erzähl mir eine Geschichte - Die Verwandlung von Lernwelten durch KI-gestützte Erzählungen (Rebecca Finster, Linda Grogorick, Susanne Robra-Bissantz) (2024)
- KI-basierte Unterstützung zum Abbau sprachlicher Barrieren für Kinder mit nichtdeutscher Herkunftssprache (Kensuke Akao) (2024)
Co-zitierte Bücher
A Guide for Students and Lecturers
(Henner Gimpel, Kristina Hall, Stefan Decker, Torsten Eymann, Luis Lämmermann, Alexander Mädche, Maximilian Röglinger, Caroline Ruiner, Manfred Schoch, Mareike Schoop, Nils Urbach, Steffen Vandirk) (2023)Sparks of Artificial General Intelligence
Early experiments with GPT-4
(Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang) (2023)Volltext dieses Dokuments
Language Models are Few-Shot Learners: Gesamtes Buch als Volltext (: , 6609 kByte; : ) |
Anderswo suchen
Beat und dieses Buch
Beat hat dieses Buch während seiner Zeit am Institut für Medien und Schule (IMS) ins Biblionetz aufgenommen. Beat besitzt kein physisches, aber ein digitales Exemplar. Eine digitale Version ist auf dem Internet verfügbar (s.o.). Aufgrund der wenigen Einträge im Biblionetz scheint er es nicht wirklich gelesen zu haben.