A group for any and all who are interested in using online crowdsourcing for research, or researching the practice of crowdsourcing for research in the humanities. Practitioners, participants, enthusiasts and skeptics welcome. This is a group for information, discussion, and sharing resources (projects, toolkits, analysis methods, publications, etc.).

OCR-based tasks on Zooniverse?

1 voice, 0 replies
Viewing 0 reply threads
  • Author
    • #50043

      Mia Ridge

      Hey all,

      The data scientists I’m working with are thinking about how Zooniverse tasks might fit into their NLP (natural language processing) research.

      Until now, we’ve been working with images from digitised newspapers for our crowdsourcing tasks with this corpus / research area.

      e.g. we run queries on metadata/OCR to find articles that might be relevant, process and upload the images to Zooniverse for classification.

      However, then then have to work back from the processed images (which might be cropped and resized from the original) to the underlying text to work with the annotations. This isn’t computationally impossible (e.g. the response to https://www.zooniverse.org/talk/1322/1421619?comment=2427365) , but it hasn’t been done so far as it kinda falls between work packages.

      There’s also a slight mis-match between the units of analysis, as they usually work at the sentence or paragraph level, and the images are based on columns of text.

      So – they’d definitely prefer to have Zooites annotate the OCR rather than the image. (Setting aside the impact of poor OCR on user experience, for a moment).

      I know there’s been some work on OCR in Zooniverse, and I’d love to hear from anyone who’s used it or found another way to design tasks for NLP work. What are our options?

Viewing 0 reply threads
  • Only members can participate in this group's discussions.