Ensuring generalization of LLMs in response to distribution shifts is especially important for medical and health-related models. Here we describe AfriMed-QA, an open-source benchmark question–answer dataset sourced from countries across Africa. More at https://goo.gle/4mEeYZT
About us
From conducting fundamental research to influencing product development, our research teams have the opportunity to impact technology used by billions of people every day. We aspire to make discoveries that impact everyone, and sharing our research and tools to fuel progress in the field is fundamental to our approach.
- Website
-
https://research.google/
External link for Google Research
- Industry
- Technology, Information and Internet
- Company size
- 1,001-5,000 employees
Updates
-
We present a new approach to time-series forecasting that uses continued pre-training to teach a model to adapt to in-context examples at inference time, matching the performance of supervised fine-tuning without additional complex training. Learn more at https://goo.gle/42HGU7Z
-
TTD-DR is a Deep Researcher agent that models research report writing as a diffusion process, where a first draft is gradually polished into a high-quality final version. Read about its impressive results in long-form report writing and complex reasoning tasks: goo.gle/41XNhnp
-
-
Sensible Agent is a research prototype for proactive, unobtrusive AR agents that use real-time context (gaze, noise, hand availability) to offer seamless, socially-aware assistance, to minimize disruption. Learn more and check out the paper: goo.gle/4nfL3IA
-
Controlling a quantum system comes down to how well you can shape and tune waveforms. Using microwave and DC signals, we run calibration sequences that prepare the processor to carry out targeted, repeatable quantum operations. Learn more → https://goo.gle/3KdexrQ
-
Google Research reposted this
An advanced version of our Gemini Deep Think model achieved a 🏅 gold-medal level performance at the International Collegiate Programming Contest (ICPC) World Finals 2025. This milestone in competitive programming demonstrates progress in several areas of AI: 🏅 Advanced Reasoning The model's success is based on breakthroughs in multi-step reasoning, as well as parallel and iterative thinking. It was trained using novel reinforcement learning techniques to explore multiple approaches to a problem and learn from feedback. 🏅 Generalizing Across Domains This achievement, following a gold medal-level performance at the International Math Olympiad (IMO), shows Gemini's ability to apply complex reasoning to new challenges, from mathematics to coding. 🏅 Solving Complex Problems The model solved 10 out of 12 problems, including one that no university team could solve during the competition. This highlights the value of competitive programming as a rigorous, objective benchmark for evaluating a model's problem-solving and code-generation abilities. Congratulations to all involved in this exciting milestone! Read more about how Gemini performed at the ICPC World Finals: https://lnkd.in/dGz6i2AK
-
-
SLED is a decoding strategy that uses all of an LLM’s layers, instead of just the last one, to better align the output with the model’s intrinsic knowledge, enhancing model accuracy without the need of external data or additional fine-tuning. Learn more at https://goo.gle/4pounAh
-
-
Today we're excited to introduce Learn Your Way, now on Google Labs. Learn Your Way is a research experiment that explores how GenAI can transform educational materials to create a more effective, engaging, learner-driven experience for every student. The static textbook becomes an interactive artifact with immersive text, quizzes, narration and more. It adapts to the learner and gives students greater agency over how they learn. The results? In our efficacy study, students scored 9% to 11% higher on assessments. Check out our blog post and accompanying tech report → http://goo.gle/3KqM8i0 Experience Learn Your Way for yourself on Google Labs → https://lnkd.in/gE6eciwT
-
We just released the weights of TimesFM-2.5 on Hugging Face (this upgrade will soon be available in GCP BigQuery and Model Garden). This checkpoint is better than TimesFM 2.0 by up to 25% on leading benchmarks, while having half the number of parameters (200M). It also has a longer 16K maximum context length. TimesFM-2.5 takes the top spot on the GIFT-Eval (https://goo.gle/4aeiA0f) leaderboard in terms of point forecasting accuracy measured by MASE as well as probabilistic forecasting accuracy measured by CRPS, in zero-shot mode (i.e., without seeing any train splits of the GIFT-Eval dataset ). Instructions for using this model are in our repository. We thank all the customers of TimesFM who have provided feedback and deployed the model in production. We would love to hear from you about how you are using TimesFM in production. Check it out: GiFT-Eval →https://goo.gle/4aeiA0f GitHub →https://goo.gle/3K4EjPe Hugging Face →https://goo.gle/4gmT5N3
-
-
Introducing VaultGemma, the largest open model trained from scratch with differential privacy. Read about our new research on scaling laws for differentially private language models, download the weights, & check out the technical report on the blog →https://goo.gle/46fUSiq
-