Processing and Translation

The genesis of this site was to take the English-dominant research literature on PDA and make it accessible in other languages (first in French, as this is where I live. That means I need to do two broad things. The first is to take the initial document and ensure it is machine readable and understandable (more on this in a minute). The second is to faithfully translate that file into the target language. The following steps explain how this happens.

Step 1: Extracting the text

Most research papers exist as PDFs — however, the quality and usefulness of this digital format varies quite a bit. I have PDF files that are nicely-formatted, semantically-clear, text-native documents that are quite easy to work with. I also have PDFs that are nothing more than digitized scans of 40-year-old typewritten documents. And then there are cases where the article is text-native, but comes from a printed journal that has bits of publishing cruft like page numbers and contact information and the like.

The point is that text extraction needs to be unbelievably robust if it is to recognize titles, paragraphs, headings, references; to separate substantively-relevant text from decorative material; to process figures and tables; to join paragraphs that break over multiple pages (and sometimes across tables) without requiring me to become a full-time editor. I use Marker as my primary text processing tool because it is great at this.

Step 2: Cleaning up

Once the text is extracted, I use Claude Opus 4.6 I review the result against the original PDF. This is where I fix the mistakes the extraction tool made: rejoining paragraphs that were wrongly split, removing page headers and footers that ended up in the body, identifying the abstract when it wasn’t labelled, and making sure the references were captured properly.

The goal is a clean, faithful representation of the original article’s structure.

Step 3: Human review

Before any translation begins, the cleaned article goes into a review queue. I compare it side-by-side with the original PDF to make sure nothing was lost, garbled, or misattributed. If something isn’t right, the article goes back for rework. Only approved articles move forward.

Step 4: Translation

You are reading translations produced by a human-supervised AI system. Translation is done by AI (Claude, made by Anthropic), but not in the way you might expect. I do not paste an article into a chatbot and accept what comes out.

The system feeds the article to the translator in chunks — a few paragraphs at a time. For each chunk, it also provides a glossary of technical terms and their established equivalents in the target language. The translator cannot see the rest of the article while working on a chunk. This prevents the most common AI failure mode: skimming the whole document and producing a summary that sounds right but drops nuance.

The translator is instructed to preserve the style of the original author. A clinical case study should read like a clinical case study. A parent’s essay should read like a parent’s essay. The register is in the source — the translator’s job is to match it, not to impose one.

The glossary itself is built from authoritative sources in each target language. For French, that means the existing peer-reviewed literature and official clinical guidelines — not guesswork, not machine-generated equivalents.

After each chunk is translated, the system runs automatic checks:

Does the translation have roughly the same number of sentences as the original? If the translator condensed three sentences into one, something was lost.
Is the word count in a plausible range? Most languages produce somewhat longer text than English. If the translation is significantly shorter, content was probably dropped.
Were the glossary terms used? Technical vocabulary must be consistent across every article on this site. If the glossary says demande and the translator wrote exigence, the system flags it.

These checks do not guarantee a perfect translation. They catch the most common and most damaging errors: omission, condensation, and terminology drift.

Step 5: Human review (again)

After translation, I review the result. I read the translation, I compare it to the original, and I check that the meaning survived.

I have a doctorate in the social sciences. I know what academic literature should sound like, how it is structured, and what level of precision it demands. The AI writes fluently in the target language; I bring the judgment about whether what it wrote is faithful and whether the terminology is correct. For terminology, I prioritise the highest-authority sources available: government health agencies, medical faculties, and peer-reviewed clinical literature in the target language.

Articles that pass review are published. Articles with problems go back for correction.

What this means for you

Every article on this site has been:

Extracted from its original format
Structurally cleaned and verified against the source
Reviewed by a human before translation
Translated in controlled pieces with terminology enforcement
Checked automatically for omissions and inconsistencies
Reviewed by a human after translation

This is not a perfect process. No translation process is. But it is systematic, transparent, and designed to catch the errors that matter most: missing content, inconsistent terminology, and meaning that shifted in transit.

If you find an error, please contact me. I will fix it.