You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+18-2Lines changed: 18 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -17,8 +17,8 @@ performances of GPT-4 and GPT-3.5 can vary substantially over time, and for the
17
17
</p>
18
18
19
19
20
-
What are the main findings? In a nutshell, there are many interesting performance shifts over time. For example, GPT-4 (March 2023) was very good at identifying prime numbers (accuracy 97.6%)
21
-
but GPT-4 (June 2023) was very poor on these same questions (accuracy 2.4%). Interestingly
20
+
What are the main findings? In a nutshell, there are many interesting performance shifts over time. For example, GPT-4 (March 2023) was very good at identifying prime numbers (accuracy 84.0%)
21
+
but GPT-4 (June 2023) was very poor on these same questions (accuracy 51.1%). Interestingly
22
22
GPT-3.5 (June 2023) was much better than GPT-3.5 (March 2023) in this task. We hope releasing the datasets and generations can help the community to understand how LLM services drift better. The above figure gives a quantatitive summary.
23
23
24
24
## 🚀 Reproducing the Results
@@ -41,10 +41,26 @@ The above figure shows the first few rows in the ```generation/PRIME_EVAL.csv```
0 commit comments