Cognitive inference as the main predictor of AI reliability in automated behavioral coding of parent–child interactions

Amorocho, José; Balsa, Ana; Giraldo-Huertas, Juan José; Bloomfield, Juanita; Patrone, Paula; Cid, Alejandro

dc.rights.license	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.contributor.author	Amorocho, José
dc.contributor.author	Balsa, Ana
dc.contributor.author	Giraldo-Huertas, Juan José
dc.contributor.author	Bloomfield, Juanita
dc.contributor.author	Patrone, Paula
dc.contributor.author	Cid, Alejandro
dc.date.accessioned	2026-02-27T14:00:24Z
dc.date.available	2026-02-27T14:00:24Z
dc.date.issued	2026	es
dc.identifier.uri	https://hdl.handle.net/20.500.12806/2795
dc.format.extent	31 p.	es
dc.format.mimetype	text/plain	es
dc.language	eng	es
dc.rights	Abierto	es
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.title	Cognitive inference as the main predictor of AI reliability in automated behavioral coding of parent–child interactions	es
dc.type	Artículo	es
dc.type.version	Aceptada	es
dc.description.abstractenglish	Observational coding of parent–child interactions is a gold standard in developmental science but remains unscalable. Multimodal generative AI could help, yet its reliability and failure modes are not well characterized. We benchmarked a multi-agent pipeline (GABRIEL) against a multi-rater expert consensus when scoring 22 PICCOLO items on 156 ten-minute free-play interactions from Uruguay. Agreement was summarized with Percent Agreement (PA) and Cohen’s κ, and disagreement with a unit-free normalized mean squared error (nMSE = MSE/Var(Yitem)). A priori item classes indexed cognitive inference (Low/Medium/High). Final calibration yielded modest agreement (PA = 50.7%, κ = .216). Disagreement was chiefly structured by inference (Kruskal–Wallis H = 308.70, p<.001), a pattern that persisted in late iterations. The model also overused the middle category (1) and underused “2.” No systematic differences in nMSE emerged by sex, age quartile, or maternal education. We conclude that generative AI is promising for scalable detection of concrete, low-inference behaviors, whereas high-inference judgments still require expert adjudication. A human-in-the-loop, co-intelligence workflow aligns current strengths with ethical oversight and supports equitable deployment at scale.	es
dc.subject.keyword	Generative AI	es
dc.subject.keyword	Parent-Child Interaction	es
dc.subject.keyword	Observational Coding	es
dc.subject.keyword	Inter-Rater Reliability	es
dc.subject.keyword	PICCOLO	es
dc.subject.keyword	Multimodal AI	es

Files in this item

Amorocho, Balsa, Giraldo-Huertas, ... (1.217Mb )

license_rdf (805bytes )

This item appears in the following Collection(s)

Artículos [22]

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 Internacional