A group for any and all who are interested in using online crowdsourcing for research, or researching the practice of crowdsourcing for research in the humanities. Practitioners, participants, enthusiasts and skeptics welcome. This is a group for information, discussion, and sharing resources (projects, toolkits, analysis methods, publications, etc.).

Getting started with crowdsourcing in GLAMS and academia: your questions sought

8 replies, 3 voices Last updated by  Ben Brumfield 2 months ago
  • Author
    Posts
  • #30868

    Mia Ridge
    Participant
    @mia

    With much of the world being asked to stay in and prevent the spread of the coronavirus and many people unable to do their usual activities, there’s a surge of interest in crowdsourcing from folk in museums, libraries, archives, galleries, academics and others.

    This is the moment many of us already working in crowdsourcing have dreamed of, but it comes with challenges. We want to help projects avoid common errors, anticipate and manage issues they might face, and ease fears about what can go wrong.

    If you’re thinking about setting up a crowdsourcing project, how can we help? What questions, hopes, fears do you have?

  • #30869

    Mia Ridge
    Participant
    @mia

    Thinking back over previous conversations and unpicking some of the assumptions people bring to them, I’ve kick-started with some questions I’ve heard a few times. I’d love to know which ones resonate, and more importantly, what questions you’d add:

    • How do I manage data quality?
    • Is the overhead of picking and figuring out a platform worth it? Can I just use manual methods like email or comments instead?
    • Which platform do I choose?
    • Is it better to put everything into one task or have a few different tasks for different outputs?
    • What about vandalism or bad data?
    • How do I find people who’ll want to take part?
    • How do I manage if people in the organisation get nervous about it?
    • How much time will I need to get a project going? What steps are involved?
    • How much time will I need while a project is going? What tasks are involved?
    • How much time will I need to wrap up a project? What steps are involved?
    • How do we direct people to work that needs doing?
    • What about audio files? Video?
  • #30870

    Mia Ridge
    Participant
    @mia

    The following is a bit of a brain dump of things I tend to say in conversations about crowdsourcing projects, based on my academic research and practical experience. I should really just dig out my teaching slides as they’re designed to anticipate common questions, but in the spirit of ‘the perfect being the enemy of the good’ I’m going to start here:

    • Think of crowdsourcing as a form of very structured volunteering that takes place online.
    • Crowdsourcing relies on technology but it’s actually about people. You’re entering into a relationship with people who are giving you their time and attention – please honour that.
    • Some volunteers might want a space to chat with others, others might only want to chat to you or to no-one at all.
    • Platforms often have a form of data validation built-in, but that they come with assumptions about what ‘quality’ means in your context. For example, any transcription might be better than a perfect transcription, or you might want a few people to submit exactly matching transcriptions of snippets of text. You might need keywords tags to come from a controlled vocabulary, or to be added by more than one tagger. Or those things might not really matter to you.
    • Lots of factors come into platform choices: e.g. do you have any technical support? what kinds of source material do you have? what kind of data do you want out of it? what kind of experience do you want for your volunteers? is random access to items ok or should people choose items to work on?
    • Platforms make assumptions about the world. Those assumptions might include: it’s more valid if you show people random items from a queue; items only have one part or image; items are or aren’t part of a larger narrative; transcriptions are better when someone else can help review it or chip in.
    • You can use manual methods (e.g. emailing things around) but you might be creating a rod for your own back if you later want to merge different transcriptions to create one good copy.
    • Platforms don’t have to be high-end: maybe an editable doc will do for simple transcriptions.
    • It’s important to have quite detailed conversations internally about where the data created will go. Will it be backed up and accessible across the organisation? If it’s going into a collections management system, which fields will it go into? How will the data be labelled?
    • Designing a task is a balancing act between the results you need and what people are willing to do. The more invested people are in your task, the more complicated the request you can make.
    • Different volunteers will have different preferences. The more specialist your task, the more work you’ll need to put into finding, recruiting and retaining them.
    • Think about copyright now, both for your source materials and for the data that volunteers create.
    • Some systems are all crowdsourcing, all the time, so it’s relatively easy for volunteers to find items to work on. Others are more ‘you can contribute if you can find items to work on’.
    • Volunteers appreciate upfront information about how their contributions will be checked for errors. They especially like knowing how they can fix it if they make a mistake.
    • Volunteers often make a mistake or two in their first tasks. We’re all human. Anticipate and address that fact, or just live with it.
    • Writing good tutorials, introductions and help pages takes time, and (IMHO) is best done with enough time to get some distance from it, double check it and get feedback from others.
  • #30871

    Mia Ridge
    Participant
    @mia

    Common platforms include:

    • The Zooniverse Project Builder
    • FromThePage
    • Scripto + Omeka
    • Pybossa

    This is only a starting point and doesn’t begin to address the strengths and affordances of each platform, or consider the other systems you’ll need around the platform to manage data going in and out.

  • #31328

    Mia Ridge
    Participant
    @mia

    Thinking about it, one of the challenges for people thinking about crowdsourcing ideas for the first time is understanding whether their idea is similar to established patterns, or if it’s novel.

    Platforms tend to cater to projects that match common patterns of tasks, though each has variations in how they approach it. Entirely new or novel tasks can be harder to get off the ground, as they might need bespoke development work or to work against the grain of available platforms.

    But how do you know whether your idea is novel or similar to existing tasks? Is working out how to describe it one of the challenges for people just starting out? Would better explanations of the various common, well-supported tasks help?

    I’m curious to know what you think!

  • #31370

    Samantha Blickhan
    Participant
    @snblickhan

    This question of commonly-seen tasks is one of the hardest ones for me, as a practitioner, to wrap my head around in terms of how to convey information in a useful way. For example, it’s *so* helpful to be able to point to a project using a similar type of data, with a similar goal, and say, “Here’s how this team did it, here’s what their output looked like, here’s how they processed their results,” so that teams can quickly wrap their head around the amount of resources they might need to run a project (resources here including staff time, data analysis skills, long term data management plan, etc.).

    On the other hand, I don’t want to be overly prescriptive, in that by giving specific examples we run the risk of insinuating that this is how a project should be set up — we want there to be space for creativity, too. Ideally (in my head, anyway) these real-world examples would function as ‘templates’ to give people a starting point for further exploration and iteration. Perhaps best to put the information out there and let research teams decide whether that type of resource is useful.

    Mia makes a good point here, too, about novel approaches which may require development — I wonder whether it’s clear from the perspective of project builders what the stakes are for novel vs out of the box options, and how it affects project outcome. For me, it’s really important to weigh the necessity of novel tech against 1) the cost; 2) the timeline; and 3) the potential for re-use (number 3 here I think being the most important).

    Anyway, I would love to hear more thoughts on this!

  • #40964

    Mia Ridge
    Participant
    @mia

    I noticed this question from Nina Janz some time ago, and I’ve (finally) shared it as I think it’s reasonably common in some fields:

    ‘I am looking for any standardisations or guidelines for transcriptions (online) in e.g. #crowdsourcing projects – I would use ISAD(G) – but it includes more titles, other than full-text transcripts’

    My initial response was: ‘It depends what you want to do with the transcriptions. If you have a catalogue that you’ll ingest to in mind, you might want to work out the absolutely compulsory fields and any ‘nice to have’ fields and explain how the data will be used. There’s a balance between interesting[,] enjoyable tasks and the extra miles required for cataloguing. If you need cataloguing and that makes the work less enjoyable, the lines between volunteering and asking for professional work for free become blurred. Retired and furloughed staff also complicate the picture for now.’

    And Sam said, ‘Just to second what Mia & Ben have said, it’s really dependent on the project & type of data being transcribed and the use case for the results. E.g. letters and tabular records would need different guidelines; common abbreviations are often specific to the content, etc. It’s a bit of a cycle, as Mia notes — restrictive/rigorous standards might make for a less enjoyable experience, and not all volunteers will read lengthy instructions, but too few guidelines and you’ll wind up with results that aren’t always useful.’

    Additional replies to the original tweet also contain links to sample transcription guidelines and approaches.

  • #41019

    Mia Ridge
    Participant
    @mia

    A question that was close to my heart this month – what advice would you give to someone in the lead up to launching an online project? What might I have forgotten to do or set up?

    And what’s different when you’re launching a new phase of a project versus launching an entirely new project?

    • #41020

      Ben Brumfield
      Participant
      @benwbrum

      We try to address the first question in our monthly webinars, talking about selecting materials, finding volunteers, creating task instructions, and keeping people motivated.

      A rough recording (the plumber interrupts partway through) of our December webinar is recorded at https://youtu.be/xdy64yZbPHs?t=469 and the first 22 minutes from the timestamp attempt to be platform-agnostic.

      Any suggestions would be welcome.

Only members can participate in this group's discussions.