It seems like you don’t understand the rules of the competition. Entries don’t have access to the internet. The OP acknowledges in their post that this is not eligible for the prize. The HN comment from the prize co-founder specifically says the OP’s claims haven’t been scrutinized. (implicit: they won’t be for the prize set unless the OP submits with an open LLM implementation)
There is a plan for a “public” leaderboard, but it currently has no entries, so we don’t actually know what the SOTA for the unrestrained version is. [1]
The general idea - test time augmentation - is what the current private set SOTA uses. [2] Generating more examples via transforming the samples is not a new idea.
Really, it seems like all the publicity has just gotten a bunch of armchair software architects coming up with 1-4 year-old ideas thinking they are geniuses.
> It seems like you don’t understand the rules of the competition.
I don't think you "don't understand" anything :) I'd ask you, politely, to consider that when you're replying to other people in the future.
Better to bring to interactions the prior that your interlocutor is a presumably intelligent individual who can have a different interpretation of the same facts, than decide they just don't get it. The second is a quite lonely path.
> Entries don’t have access to the internet.
Correct. Per TFA, cofounder, Chollet, then me: this is an offline solution: the solution is the Python program found by an LLM.
> The HN comment from the prize co-founder specifically says the OP’s claims haven’t been scrutinized.
Objection: relevancy? Is your claim here that it might be false so we shouldn't be discussing it at all?
> (implicit: they won’t be for the prize set unless the OP submits with an open LLM implementation)
I don't know what this means, "open LLM implementation" is either a term of art I don't recognize, or a misunderstanding of the situation.
I do assume you read the article, so I'm not trying to talk down to you, but to clarify:
The solution is the Python program, not the LLM prompts that iterated on a Python program. A common thread that would describe the confusing experience of reading your comment phrased aggressively and disputing everything up until you agree with me: your observations assume I assume the solution requires a cloud-based LLM to run. As noted above, it doesn't, which is also the thrust of my comment: they found a way to skirt what I thought the rules are, and the co-founder and Chollett have embraced it, publicly.
> There is a plan for a “public” leaderboard, but it currently has no entries, so we don’t actually know what the SOTA for the unrestrained version is. [1]
This was false before you posted, when I checked this morning, and it was false as early as 4 days ago, June 14th, we can confirm via archive.is. (prefix the URL you provided with archive.is/ to check for yourself)
> The general idea - test time augmentation - is what the current private set SOTA uses. [2] Generating more examples via transforming the samples is not a new idea.
Did anyone claim it was?
> Really, it seems like all the publicity has just gotten a bunch of armchair software architects coming up with 1-4 year-old ideas thinking they are geniuses.
I don't know what this means other than you're upset, but yes, sounds like both you and I agree that having an LLM generate Python programs isn't quite what we'd thought would be an AGI solution in the eyes of Chollet.
>> (implicit: they won’t be for the prize set unless the OP submits with an open LLM implementation)
> The solution is the Python program, not the LLM prompts that iterated on a Python program. A common thread that would describe the confusing experience of reading your comment phrased aggressively and disputing everything up until you agree with me: your observations assume I assume the solution requires a cloud-based LLM to run. As noted above, it doesn't, which is also the thrust of my comment: they found a way to skirt what I thought the rules are, and the co-founder and Chollett have embraced it, publicly.
I think the implication is that solutions that use an LLM via an API won't be eligible (the "no internet" rule).
This seems obvious to solve: can use GPT4 to generate catalogs in advance and a lesser, local LLM with good code abilities to select them.
I don't see why this skirts any rules you think were implied and I'm puzzled why you think it does.
> sounds like both you and I agree that having an LLM generate Python programs isn't quite what we'd thought would be an AGI solution in the eyes of Chollet.
> Alas, here we are.
Chollet noted that program synthesis was a promising approach, so it's not surprising to me that a program synthesis approach that also uses an LLM is effective.
From the leaderboard link (and on the archive version):
>ARC-AGI-Pub is a secondary leaderboard (in beta) measuring the public evaluation set. … The public evaluation set imposes no limitations on internet access or compute. At this time, ARG-AGI-Pub is not part of ARC Prize 2024 (eg. no prizes are associated with this leaderboard).
And, all the entries at time of writing and in the archive link say “You?…”. “ARC-AGI 2024 HIGH SCORES” which does have entries is on the private test set.
>I don't think you "don't understand" anything :)
I genuinely don’t understand if we are viewing the same websites.
> I genuinely don’t understand if we are viewing the same websites.
We are! I missed the nuance on you're looking for a public leaderboard on the private test set. I do see it now, but I'm still confused as to how that's relevant here.
There is a plan for a “public” leaderboard, but it currently has no entries, so we don’t actually know what the SOTA for the unrestrained version is. [1]
The general idea - test time augmentation - is what the current private set SOTA uses. [2] Generating more examples via transforming the samples is not a new idea.
Really, it seems like all the publicity has just gotten a bunch of armchair software architects coming up with 1-4 year-old ideas thinking they are geniuses.
[1] https://arcprize.org/leaderboard
[2] https://lab42.global/community-interview-jack-cole/