Saturday, February 28, 2026

Using AI to Clean Up OCR Output

OCR tools such as ABBYY FineReader PDFAdobe Acrobat (Scan & OCR)Readiris 17, and OmniPage are widely used to convert scanned pages into editable text. They handle the heavy lifting of recognition, but their output almost always contains familiar OCR artefacts: split words at line breaks, strange spacing, and character substitutions like “l” for “1” or “rn” for “m.”


AI tools, like Perplexity ProChatGPT and Claude, can step in as a targeted post‑processing layer. Instead of the time-consuming process of checking each suspicious character in your OCR tool or word processor, you let the AI work at sentence and paragraph level, using context to decide what the text was meant to say while keeping the content intact.

Suggested workflow

  1. Run OCR and export text
    Use your preferred OCR tool to recognize the document and export the result as plain text, Markdown, or Word. Avoid exporting to formats that hide extra formatting (e.g. heavily styled PDFs), because you want clean, editable text for the AI.

  2. Send chunks to the AI with a focused prompt
    Work in sections (for example, 5–10 pages at a time). Prompt example:
    “This text comes from OCR and contains typical OCR artefacts. Correct broken words, spacing, punctuation, and capitalization, but preserve all content, line breaks, and headings as much as possible. Do not summarize or omit anything.”

  3. Review against the scan in a side‑by‑side view
    Open the scanned PDF/image on one side of your screen and the AI‑corrected text on the other. Check critical areas: headings, numbers, tables, and domain‑specific terms (e.g. drug names, legal references, codes). Correct any terminology the AI “normalized” incorrectly.

  4. Use a diff tool between raw OCR and AI output
    Run a text diff between the raw OCR file and the AI‑corrected version. This helps you see exactly what changed, confirm that nothing was dropped, and quickly scan for any over‑confident “corrections” you don’t want.

  5. Automate for larger volumes
    For recurring projects, you can script the process: batch‑export from your OCR tool, segment the text, send each chunk to the AI via API, then reassemble the cleaned text. Human effort can then be reserved for spot‑checking and final QA instead of first‑pass cleanup.

By combining a robust OCR engine with an AI cleanup step, you move from “barely readable extraction” to text that is reliable enough for translation, further editing, or long‑term archiving, without spending hours fixing the same types of errors by hand.

Tuesday, January 27, 2026

Translation Notes: “Director”

Translating “Director” into Italian in legal, business, and financial contexts is less straightforward than it looks.

Meanings of “Director”

In English corporate practice, a director is usually a member of the board, i.e., part of the company’s governing organ. In job titles like “Finance Director” or “Marketing Director”, however, “director” often labels a senior manager heading a function, not necessarily a board member.

Italian translations

For Italian joint stock companies (“Società per Azioni”, or S.p.A.) and limited liability companies (“Società a responsabilità limitata”, or S.r.l.), the functional equivalent of a board “director” is an amministratore (or membro del consiglio di amministrazione). In this sense, director and amministratore both indicate a person who takes part in management and decision‑making at organ level.

Within the board, titles such as “managing director” or “executive director” are commonly rendered as amministratore delegato (or “AD”), who is both an amministratore (board member) and the top executive charged with day‑to‑day management. In current Italian practice, the amministratore delegato is often referred to by the English acronym “CEO”.

By contrast, direttore in today’s corporate Italian typically indicates a high‑ranking employee: direttore generale, direttore finanziario, direttore commerciale, etc. Modern legal drafting tends to reserve amministratore for members of the board and direttore for management roles within the organizational chart.

False friends and traps

The main trap is translating every “director” as direttore. When “director” refers to a board member, the correct Italian is amministratore or membro del consiglio di amministrazione, not direttore. A direttore generale is usually a top manager reporting to the board, not one of its members.

Rule of thumb

If the person sits on the board → amministratore / membro del consiglio di amministrazione.
If it’s a functional job title → direttore (+ area: finanziario, marketing, ecc.).

Saturday, March 29, 2025

Italian Citizenship Law Update: Stricter Rules for Descendants Abroad

The new rules will impact individuals applying or planning to apply for Italian citizenship through their Italian ancestry (known as "ius sanguinis"), as well as professionals (like translators) who assist with these applications.
The decree-law approved today stipulates that Italian descendants born abroad will automatically be citizens for only two generations: those with at least one parent or grandparent born in Italy will be citizens from birth. In the second phase, a bill also approved today introduces further and more substantial changes to the citizenship law. Notably, citizens born and residing abroad must maintain real ties with our country over time, exercising their citizenship rights and duties at least once every twenty-five years.

The reform is completed by a second bill that revises the procedures for recognizing citizenship. Going forward, residents abroad will no longer apply through consulates but will instead use a special centralized office at the Ministry of Foreign Affairs. A transition period of about one year is planned for organizing this office. The goal is to streamline procedures, achieving clear economies of scale. Consulates will focus on serving existing citizens rather than processing new citizenship applications. Additionally, the provision includes measures to enhance and modernize service delivery: legalizations, civil registry services, passports, and travel identity cards. Organizational measures are also planned to ensure the Ministry of Foreign Affairs increasingly serves citizens and businesses.

From a press release published on March 28, 2025 by the Italian Ministry of Foreign Affairs.

Wednesday, November 27, 2024

Try Perplexity Pro free for one month

I have a couple of discount codes to try Perplexity AI free for one month. I’ll give them to the first two persons who’ll ask for them by sending me an email from this blog (see on the right) or in Linkedin.

Saturday, November 23, 2024

Perplexity AI, Your Translation Research, Terminology and Review Assistant

I’ve just added to the “Other Presentations” page of this blog the presentation I recently gave on Perplexity AI at the AI in Translation Summit

What is Perplexity AI, and how can it help us?

A Perplexity query

Perplexity AI is an innovative search tool combining web searching with language models for concise, contextual answers.

Unlike traditional search engines, it gives conversational answers with citations, and, unlike AI tools like ChatGPT, it offers real-time web searching, with several advantages for translators, such as access to current information on specialized topics, helping us understand the context of our projects. It also helps in terminology research for domain-specific terms.

Perplexity can verify short translated segments by cross-referencing our translations against its search results, to identify potential errors, and the paid version has enhanced privacy features, allowing secure upload of confidential documents.

Perplexity helps gather contextual information and find references for our translations.
 
It provides us with detailed overviews of complex topics.
 
For example, if we are translating a legal document about international child custody laws, we can ask Perplexity for a summary with a query like "Summarize the differences between child custody laws in Italy and the US", and the system compiles a concise summary from various sources. 

Perplexity incorporates contextual information in its responses; this means that we can ask follow-up questions to dig deeper without repeating the background. For example, we might inquire what weight is given to children's wishes in custody decisions.
 
Perplexity provides citations, which allow us to verify the sources it finds for us; but remember that we should always cross-check crucial information. The best use for this system is as a starting point to guide our research, not as the sole source of information.
 
By leveraging Perplexity's real-time search and summarization functions, we can find better information on complex subjects, speeding up our background research. 

While Perplexity is a powerful tool, it's important to remember that, like all AI models, it may occasionally produce errors and hallucinations: it’s a helpful assistant, not a replacement for our expertise.