AI Applications
GPT vs. Handwriting Recognition: The New Champion Emerges
Mrigesh
Feb 5, 2024
GPT-4V's Advanced Language Understanding Transforms the Way we do handwritten recognition. Its ability to tolerate errors and inconsistencies also enhances its effectiveness, enabling accurate interpretation even of poorly written text.
Introduction
Tiago Forte recently posted a very interesting thread highlighting GPT-4 Vision's superior ability in interpreting handwritten texts. This piqued my interest, as the underlying principle seemed logical: Large Language Models (LLMs) excel in language comprehension, including contextual understanding. Therefore, even with poorly written text, they can predict and interpret more accurately. Intrigued by this concept, I decided to put it to the test. and in the process created Digitize Handwritten Notes Assistant which easily transcribe your handwritten notes.
Challenges in handwriting recognition:
Handwriting recognition, is the process of identifying and deciphering handwritten content, transforming it into machine-readable output. Handwritten text demand a higher level of sophistication compared to standard Optical Character Recognition because of several challenges
Individual Variability: There's significant stroke variability in handwriting from one person to another.
Cursive Handwriting Complexity: Cursive writing complicates character separation and recognition.
Inconsistency in Style: Even an individual's handwriting style can fluctuate over time, lacking consistency.
Non-linear Text Layout: Unlike printed text, which aligns straight, handwritten text may not follow a straight line.
Degraded Source Quality: The quality of handwritten documents or images often deteriorates with age or could be of poor quality
My journal entry sample
Initially, I began my experiment with my own journal entries. My handwriting is neither exceptionally clear nor particularly poor. This balance between clarity and ambiguity seemed like an ideal candidate to represent the average person's script.
It got everything right whereas Iphone had challenges.
Beatles “Yesterday” original manuscript
To escalate the challenge progressively, I moved on to a sample with distinct handwriting from mine, yet still easily legible. I chose a Beatles original manuscript of the famous song “Yesterday” excerpt for this test.
Impressively, the transcription was nearly flawless, missing only the word 'middle.' In contrast, the iPhone's recognition failed to detect even all the text, rendering its transcription completely unusable.
Interestingly, occasionally the assistant flagged a copyright violation error and stopped transcribing. To circumvent this, I clarified that the text was my own handwritten version of the lyrics, and then it gave me the transcribed text. But this trick does not work all the time.
Surely You’re Joking Mr. Feynman!
Following that, I aimed to evaluate the complexity of cursive handwriting, so I analyzed a sample of Dr. Richard Feynman's handwriting. This example was an inscription in an early printed edition of his renowned book 'Surely You're Joking, Mr. Feynman!', addressed to his cousin. In the sample, I found Dr. Feynman's handwriting to be fairly complex and not very clear, at least from my perspective
But Once again, the transcription was impeccable. It altered 'Franky' to 'Frances', a detail I'm uncertain any human could have accurately discerned. In contrast, the transcription produced by the iPhone was notably inferior and comically inaccurate.
Ramanujan second letter to Prof. Hardy
Next, I decided to assess the volume of text, so I chose an examples from renowned letters written by Ramanujan to Professor Hardy. This letter highlights Ramanujan's achievements in the theory of divergent series, notably the counterintuitive conclusion that the sum of all positive integers (1+2+3+4+...) equals -1/12.
Considering the extensive text and the quality of the photograph, I anticipated that the assistant might produce errors, but, to my surprise, the results were exceptionally good. On the other hand, the performance of the Apple iPhone was again quite poor.
Nikola tesla’s letter on humanity greatest possible achievement over next century
In my next test, I aimed to evaluate the handling of degraded source quality, so I used an image of Nikola Tesla's letter to the American Society, where he was asked to predict humanity's most significant achievement in the next century. To provide some context for this letter:
Back in 1899, while working solo in his Colorado Springs lab, Tesla made an extraordinary discovery using his magnifying transmitter. He picked up signals he termed “counting codes,” which he believed were cosmic radio signals and potentially the first communication from intelligent beings on Venus or Mars. Tesla was deeply moved by the thought that he might have been the first person to experience a greeting between planets.
So when Tesla was asked by the American Red Cross to predict man’s greatest possible achievement over the next century. He wrote this letter.
The letter image quality is really bad, i was hardly able to read anything and i was pleasantly surprised how it was able to predict the content so accurately
It hallucinated little bit and corrected some part on it’s own for example
Instead of “one ...two ...three” it predicted “one ..one..three”
“they have given” was replaced by “they may have been”.
Apart from these two instances rest of text were predicted correctly. Conversely, I had no expectations of receiving any output from the iPhone, and it performed as anticipated. In fact, it did not produce any textual output whatsoever.
The notorious Doctor’s Prescription
Feeling a bit daring, I decided to tackle the ultimate test of handwriting deciphering - the notorious and often undecipherable script known as "doctor's handwriting." This kind of writing is famous for being really hard to read, often looking more like a bunch of squiggles than actual words. So, with a lot of curiosity and a bit of excitement, I got ready to see if I could make sense of these tricky scribbles.
When I conducted a test on one of the prescriptions, I found that a few medicine names were correct, but there were occasional errors. This underscores the importance of having cautionary instructions when handling prescriptions.
Observations & scope for improvement:
Not able to detect any other language script .. I tried few hindi (devnagri script) letters and the result was all gibberish and completely wrong.
Occasionally, it may generate hallucinations or insert/remove a word to render a sentence grammatically accurate, which might not be desirable. I used custom instructions to reduce such cases but still it is not full proof
Although it accurately recognized the mathematical equations, but it might produce errors at a time. This cautious approach is due to the fact that large language models (LLMs) generally have a restricted understanding of mathematical expressions, and thus, it would be prudent to double-check the results for accuracy.
The .HEIC image format is incompatible with web usage, but formats like .png, .jpg, and .webp are supported. There are various free online resources available to convert .HEIC images into these compatible formats
Summary:
GPT-4V possesses an impressive ability to decipher modified, unclear, cursive, or highly stylized English text, achieving near-perfect accuracy. This proficiency is remarkable, and I believe GPT-4V performs at least 10 times better than the current system on our phones. This is due to its superior understanding of language and the context of nearby words, allowing it to decipher unclear or ambiguous handwriting. Its ability to tolerate errors and inconsistencies also enhances its effectiveness, enabling accurate interpretation even of poorly written text.
Special thanks to Tiago Forte for his insightful thread, which sparked my curiosity and led me to conduct my own verification and creation of this assistant.
You can try out the assistant here : https://chat.openai.com/g/g-qzObvp5l5-digitize-handwritten-notes
Do let me know if you have any feedback or if you want me to add some custom actions in the workflow.