the protagonist is interviewed as a one-man "focus group" in lieu of a national election and one of the questions he is asked is "What do you think about the price of eggs?" and he said roughly "I have no idea, my wife does the shopping."
cheschire 3 hours ago [-]
Absolutely loved the article, the process, and the results. Hated the price.
You could pay a human to read receipts, 1 every 30 seconds (that’s slow!), $15/hr (twice the US federal minimum wage!), plus tax and overhead ($15x1.35) comes out to $20.25/hr over 5 hours. $101 all in.
Sure, sure, a human solution doesn’t scale. But this sort of project makes me feel like we haven’t hit the industrialization moment that i thought we had quite yet.
knollimar 36 minutes ago [-]
There's no way there wasnt a more efficient way of doing this. Way too many tokens per receipt.
I'd wager gemini flash could get decent results. Id be willing to try on 100 receipts and report cost
stavros 2 hours ago [-]
One issue is that the human was less accurate than the LLM. The other is that the author probably didn't pay $1,500 for this, they probably paid $20 on a subscription.
moron4hire 52 minutes ago [-]
You're counting just the egg-having receipts, but there were over 11 thousand receipts they had to go through to get to that 500-ish subset. I'm assuming OP wanted to process all of the receipts and then selected just eggs for a simple analytics job. With your rates, the human would cost almost $2000.
N_Lens 1 hours ago [-]
AI has some weird unexpected uses that haven’t been fully uncovered yet, while it fails to scale or match the needed accuracy on expected usecases.
qoez 55 minutes ago [-]
Imagine how many 2001 era eggs he could have bought with that $101
wolfram74 3 hours ago [-]
I mean, at over 1000% the cost, the machine solution doesn't scale either?
ProllyInfamous 53 minutes ago [-]
Not yet.
>>So I told Codex “we have unlimited tokens, let’s use them all,” and we pivoted to sending every receipt through Codex for structured extraction. From that one sentence, Codex came back with a parallel worker architecture - sharding, health management, checkpointing, retry logic. The whole thing. When I ran out of tokens on Codex mid-run, it auto-switched to Claude and kept going. I didn’t ask it to do that. I didn’t know it had happened until I read the logs.
----
For anybody still thinking my goodness, how wasteful is this SINGLE EXAMPLE: remember that all of the receipts from the article have helped better-train whichever GPT is deciphering all this thermalprinting.
For a small business owner (like my former self), paying $1500 to have an AI decipher all my receipts is still a heck of a lot cheaper than my accountant's rate. It would also motivate me to actually keep receipts (instead of throw-away/guessing), simply to undaunt the monumental task of recordskeeping.
----
>>But the runs kept crashing. Long CLI jobs died when sessions timed out. The script committed results at end-of-run, so early deaths lost everything. I watched it happen three times. On the fourth attempt I said “I would have expected we start a new process per batch.” That was the fix ... Codex patched it, launched it in a tmux session, and the ETA dropped from 12 hours to 3. Not a hard fix. Just the kind of thing you know after you’ve watched enough overnight jobs die at 3 AM.
>>11,345 receipts processed. The thing that was supposed to take all night finished before I went to bed.
cheschire 2 hours ago [-]
I think at a certain scale we're talking about switching to local trained models which don't have the same operating costs as running a frontier model for OCR. That would reduce the ongoing costs significantly. Might take longer than 30 seconds to read each receipt if you run multiple passes to ensure accuracy, but could run 24/7/365 without the same tax and administration overhead of humans.
Spherical cows aside though, I do agree with you that I should not consider scalability as a given.
wolfram74 2 hours ago [-]
I suppose if we had access to a public data set like this receipt bank, programmers could time themselves setting up a solution with off the shelf OCR algos. If they could clock in at under 10 hours they could advertise themselves as being "just as good as an LLM, but significantly cheaper." Downside for the managerial class that wants generative algos for the complete lack of legal protections.
MarceliusK 35 minutes ago [-]
[dead]
ProllyInfamous 2 hours ago [-]
>Everyone needs a rewarding hobby. I’ve been scanning all of my receipts since 2001. I never typed in a single price - just kept the images. I figured someday the technology to read them would catch up, and the data would be interesting.
This is perhaps among the best openers I've ever read.
[spoiler: the tech caught up, the data is interesting]
I read a lot. This article, entirely.
MarceliusK 31 minutes ago [-]
Technically interesting and genuinely well-written end to end
EdNutting 50 minutes ago [-]
The AI writing of the article made me give up halfway through. It’s a neat idea but the writing style of these AI models is brain-grating, especially when it’s the wrong style choice for this kind of technical report.
ismailmaj 1 hours ago [-]
I don't know why people mess with tesseract in 2026, attention-based OCRs (and more recently VLMs) outperformed any LSTM-based approach since at least 2020.
My guess is that it's the entry-point to OCR and the internet is flooded by that, just like pandas for data processing.
mettamage 12 minutes ago [-]
Painful comparison haha
Leaving a comment so I can more easily find this
And for the people wondering about Pandas, use Polars instead
egeozcan 4 hours ago [-]
I usually avoid shallow comments but I feel like this time it has to be said as a conversation starter: That's a lot of eggs!
Also ignoring the benefits of subscriptions, an estimate in the magnitude of thousands of dollars for extracting egg prices still makes me feel like we aren't "there" yet. This should have been a problem with a much more efficient solution given the advancements in the AI, data analysis and OCR space. I am sort of disillusioned.
sgbeal 3 hours ago [-]
> This should have been a problem with a much more efficient solution given the advancements in the AI, data analysis and OCR space.
There's got to be a "it's a chicken/egg problem" joke in there somewhere, but i'm not seeing it.
egeozcan 2 hours ago [-]
I actually was going to go for the "why did the chicken not cross the road?". Then I wanted to say "because it was in a price negotiation with the author to sell its eggs", but it was too wordy. Then I thought, "because the author had it as an egg before it could hatch", but it was too dark... Then I gave up.
Well, I guess you cannot make a chicken joke without breaking some eggs (I'll stop now. I'm really sorry, but come on, it's Sunday).
bombcar 1 hours ago [-]
You’ve got two weeks to work on this before Eggster.
sgbeal 2 hours ago [-]
> (I'll stop now. I'm really sorry, but come on, it's Sunday).
FWIW, you made an eggceptional attempt :).
wiether 2 hours ago [-]
> That's a lot of eggs!
Less than one per day, assuming they're doing groceries only for themselves
MarceliusK 29 minutes ago [-]
I wouldn't read this as "AI can't do this efficiently yet" but more like "we're still figuring out the playbook"
f0cus10 3 hours ago [-]
[dead]
MarceliusK 37 minutes ago [-]
Overall this feels less like a quirky egg project and more like a blueprint for how messy real-world data pipelines are going to look going forward
PowerElectronix 3 hours ago [-]
Inflation adjusted dsta just comes to tell us that either eggs have been outdoing the CPI for 25 years or that actual CPI is way higher than what the BLS calculates.
MarceliusK 27 minutes ago [-]
Or a third option: eggs are just a terrible proxy for CPI
vitus 2 hours ago [-]
It depends what dates you're looking at, but energy (gas prices and more) and food (including eggs) are generally recognized as way more volatile than the rest of the CPI.
Eggs were actually quite stable for the 20 years prior to 2001, so maybe don't put your life savings into egg futures...
That is very curious, yes. Eggs seem to just start to increase dramatically after 2000 and indeed outdo the CPI, disregarding the peaks and valleys of the different shocks to egg production like covid and the avian flu.
I read that the price includes free range, eco, etc varieties which are more expensive and in more demand nowadays, probably just that explains a good chunk of the price increase.
bix6 34 minutes ago [-]
This is a good read if you haven’t seen it. Spoiler alert it’s private equity. Shocker I know.
CPI tracks a weighted average of a large basket of different goods, of which eggs are only a small part. It would be extremely surprising if the change in egg prices over time closely matched CPI.
eeixlk 2 hours ago [-]
Apart from the comical cost of extracting this data from paper receipts, is it more likely that stores will publish their product costs over time so trends can be observed or be more like gas stations where no prices are listed. I have no idea why a box of Cheerios costs $7 for processed oats but i see millions of reasons to obscure that data.
flurb 2 hours ago [-]
Great article through and through. The total number of places you've bought eggs at made me feel a tad depressed though: 4 places where you lived at or spent a longer time, 5 you traveled to *.
I tend to grow bored of a location after a year or two, though I'm certainly in the minority.
* Of course you didn't buy eggs every time you traveled somewhere, so probably not the entire truth.
2 hours ago [-]
tkgally 3 hours ago [-]
I haven't tried it with receipts, but I've gotten excellent OCR results with Gemini 3.0 and now 3.1 on some challenging texts: handwritten letters I couldn't fully decipher myself, vertically printed Japanese texts with tiny furigana readings next to the kanji, a 19th century book in English with extensive use of italics and small caps. Gemini is good at extracting text and formatting from complex layouts, and it might work with egg receipts, too.
gib444 3 hours ago [-]
> Estimated token cost $1,591
I can assume this person does in fact NOT need to worry about the price of eggs ?
OJFord 3 hours ago [-]
I think they worked that back from tokens used, hence the estimation, but their actual billing was Claude Code & Codex subscriptions. (Which probably was also the main contributor to it taking 14 days.)
sgbeal 3 hours ago [-]
> Estimated token cost $1,591
> Confirmed egg receipts 589
> Total egg spend captured $1,972
> Total eggs 8,604
...
> I can’t wait to see what 30 years of eggs looks like.
At $2.70 per receipt, i'd be in no hurry to find out!
BoredPositron 2 hours ago [-]
There is a reason why reciept transcription is still the task with the highest demand on mechanical turk.
DeathArrow 2 hours ago [-]
Without 25 years of photographing receipts, weeks of agents coding and billions of token spent, I can predict that egg prices increased, and the graph of my egg consumption over time is concave, part because my income has risen, part because while all prices get inflated, eggs are still cheaper than other sources of protein, and I did in less than 1 microsecond.
I will use them tokens to be able to afford more eggs.
Rendered at 14:06:31 GMT+0000 (Coordinated Universal Time) with Vercel.
https://en.wikipedia.org/wiki/Franchise_(short_story)
the protagonist is interviewed as a one-man "focus group" in lieu of a national election and one of the questions he is asked is "What do you think about the price of eggs?" and he said roughly "I have no idea, my wife does the shopping."
You could pay a human to read receipts, 1 every 30 seconds (that’s slow!), $15/hr (twice the US federal minimum wage!), plus tax and overhead ($15x1.35) comes out to $20.25/hr over 5 hours. $101 all in.
Sure, sure, a human solution doesn’t scale. But this sort of project makes me feel like we haven’t hit the industrialization moment that i thought we had quite yet.
I'd wager gemini flash could get decent results. Id be willing to try on 100 receipts and report cost
>>So I told Codex “we have unlimited tokens, let’s use them all,” and we pivoted to sending every receipt through Codex for structured extraction. From that one sentence, Codex came back with a parallel worker architecture - sharding, health management, checkpointing, retry logic. The whole thing. When I ran out of tokens on Codex mid-run, it auto-switched to Claude and kept going. I didn’t ask it to do that. I didn’t know it had happened until I read the logs.
----
For anybody still thinking my goodness, how wasteful is this SINGLE EXAMPLE: remember that all of the receipts from the article have helped better-train whichever GPT is deciphering all this thermalprinting.
For a small business owner (like my former self), paying $1500 to have an AI decipher all my receipts is still a heck of a lot cheaper than my accountant's rate. It would also motivate me to actually keep receipts (instead of throw-away/guessing), simply to undaunt the monumental task of recordskeeping.
----
>>But the runs kept crashing. Long CLI jobs died when sessions timed out. The script committed results at end-of-run, so early deaths lost everything. I watched it happen three times. On the fourth attempt I said “I would have expected we start a new process per batch.” That was the fix ... Codex patched it, launched it in a tmux session, and the ETA dropped from 12 hours to 3. Not a hard fix. Just the kind of thing you know after you’ve watched enough overnight jobs die at 3 AM.
>>11,345 receipts processed. The thing that was supposed to take all night finished before I went to bed.
Spherical cows aside though, I do agree with you that I should not consider scalability as a given.
This is perhaps among the best openers I've ever read.
[spoiler: the tech caught up, the data is interesting]
I read a lot. This article, entirely.
My guess is that it's the entry-point to OCR and the internet is flooded by that, just like pandas for data processing.
Leaving a comment so I can more easily find this
And for the people wondering about Pandas, use Polars instead
Also ignoring the benefits of subscriptions, an estimate in the magnitude of thousands of dollars for extracting egg prices still makes me feel like we aren't "there" yet. This should have been a problem with a much more efficient solution given the advancements in the AI, data analysis and OCR space. I am sort of disillusioned.
There's got to be a "it's a chicken/egg problem" joke in there somewhere, but i'm not seeing it.
Well, I guess you cannot make a chicken joke without breaking some eggs (I'll stop now. I'm really sorry, but come on, it's Sunday).
FWIW, you made an eggceptional attempt :).
Less than one per day, assuming they're doing groceries only for themselves
Eggs were actually quite stable for the 20 years prior to 2001, so maybe don't put your life savings into egg futures...
Egg prices: https://fred.stlouisfed.org/series/APU0000708111
CPI: https://fred.stlouisfed.org/series/CPIAUCSL
Core CPI (without food + energy prices): https://fred.stlouisfed.org/series/CPILFESL
I read that the price includes free range, eco, etc varieties which are more expensive and in more demand nowadays, probably just that explains a good chunk of the price increase.
https://www.thebignewsletter.com/p/hatching-a-conspiracy-a-b...
I tend to grow bored of a location after a year or two, though I'm certainly in the minority.
* Of course you didn't buy eggs every time you traveled somewhere, so probably not the entire truth.
I can assume this person does in fact NOT need to worry about the price of eggs ?
...
> I can’t wait to see what 30 years of eggs looks like.
At $2.70 per receipt, i'd be in no hurry to find out!
I will use them tokens to be able to afford more eggs.