27: Artificial Intelligence-assisted peer review

This is adapted from our “recent” paper in F1000 Research, entitled “A multi-disciplinary perspective on emergent and future innovations in peer review.” Due to its rather monstrous length, I’ll be posting chunks of the text here in sequence over the next few weeks/months/years now apparently, to help disseminate it in more easily digestible bites. Enjoy!

This section outlines what would a model of AI-based peer review system of annotation could look like. It’s probably a little out-dated now, but still hopefully useful.

Previous parts in this series:

  1. An Introduction.
  2. An Early History
  3. The Modern Revolution
  4. Recent Studies
  5. Modern Role and Purpose
  6. Criticisms of the Conventional System
  7. Modern Trends and Traits
  8. Development of Open Peer Review
  9. Giving Credit to Referees
  10. Publishing Review Reports
  11. Anonymity Versus Identification
  12. Anonymity Versus Identification (II)
  13. Anonymity Versus Identification (III)
  14. Decoupling Peer Review from Publishing
  15. Preprints and Overlay Journals
  16. Two-stage peer review and Registered Reports
  17. Peer review by endorsement
  18. Limitations of decoupled Peer Review
  19. Potential future models of Peer Review
  20. A Reddit-based model
  21. An Amazon-based model
  22. A Stack Exchange/Overflow-style model
  23. A GitHub-style model
  24. A Wikipedia-style model
  25. A Hypothesis-style model
  26. A blockchain-based model


Another frontier is the advent and growth of natural language processing, machine learning (ML), and neural network tools that may potentially assist with the peer review process. ML, as a technique, is rapidly becoming a service that can be utilized at a low cost by an increasing number of individuals. For example, Amazon now provides ML as a service through their Amazon Web Services platform (aws.amazon.com/amazon-ai/), Google released their open source ML framework, TensorFlow (tensorflow.org/), and Facebook have similarly contributed code of their Torch scientific learning framework (torch.ch/). ML has been very widely adopted in tackling various challenges, including image recognition, content recommendation, fraud detection, and energy optimization. In higher education, adoption has been limited to automated evaluation of teaching and assessment, and in particular for plagiarism detection. The primary benefits of Web-based peer assessment are limiting peer pressure, reducing management workload, increasing student collaboration and engagement, and improving the understanding of peers as to what critical assessment procedures involve (Li et al., 2009).

The same is approximately true for using computer-based automation for peer review, for which there are three main practical applications. The first is determining whether a piece of work under consideration meets the minimal requirements of the process to which it has been submitted (i.e., for recommendation). For example, does a clinical trial contain the appropriate registration information, are the appropriate consent statements in place, have new taxonomic names been registered, and does the research fit in with the existing body of published literature (Sobkowicz, 2008). The computer might also look at consistency through the paper; for example searching for statistical error or method description incompleteness: if there is a multiple group comparison, whether the p-value correction algorithm is indicated. This might be performed using a simpler text mining approach, as is performed by statcheck (Singh Chawla, 2016). Under normal technical review these criteria need to be (or should be) checked manually either at the editorial submission stage or at the review stage. ML techniques can automatically scan documents to determine if the required elements are in place, and can generate an automated report to assist review and editorial panels, facilitating the work of the human reviewers. Moreover, any relevant papers can be automatically added to the editorial request to review, enabling referees to automatically have a greater awareness of the wider context of the research. This could also aid in preprint publication before manual peer review occurs.

The second approach is to automatically determine the most appropriate reviewers for a submitted manuscript, by using a co-authorship network data structure (Rodriguez & Bollen, 2008). The advantage of this is that it opens up the potential pool of referees beyond who is simply known by an editor or editorial board, or recommended by authors. Removing human-intervention from this part of the process reduces potential biases (e.g., author recommended exclusion or preference) and can automatically identify potential conflicts of interest (Khan, 2012). Dall’Aglio (2006) suggested ways this algorithm could be improved, for example through cognitive filtering to automatically analyze text and compare that to editor profiles as the basis for assignment. This could be built upon for referee selection by using an algorithm based on social networks, which can also be weighted according to the influence and quality of participant evaluations (Rodriguez et al., 2006), and referees can be further weighted based on their previous experience and contributions to peer review and their relevant expertise, thereby providing a way to train and develop the identification algorithm.

Thirdly, given that machine-driven research has been used to generate substantial and significant novel results based on ML and neural networks, we should not be surprised if, in the future, they can have some form of predictive utility in the identification of novel results during peer review. In such a case, machine learning would be used to predict the future impact of a given work (e.g., future citation counts), and in effect to do the job of impact analysis and decision making instead of or alongside a human reviewer. We have to keep a close watch on this potential shift in practice as it comes with obvious potential pitfalls by encouraging even more editorial selectivity, especially when network analysis is involved. For example, research in which a low citation future is predicted would be more susceptible to rejection, irrespective of the inherent value of that research. Conversely, submissions with a high predicted citation impact would be given preferential treatment by editors and reviewers. Caution in any pre-publication judgements of research should therefore always be adopted, and not be used as a surrogate for assessing the real world impact of research through time. Machine learning is not about providing a total replacement for human input to peer review, but more how different tasks could be delegated or refined through automation.

Some platforms already incorporate such AI-assisted methods for a variety of purposes. Scholastica (scholasticahq.com) includes real-time journal performance analytics that can be used to assess and improve the peer review process. Elsevier uses a system called Evise (elsevier.com/editors/evise) to check for plagiarism, recommend reviewers, and verify author profile information by linking to Scopus. The Journal of High Energy Physics uses automatic assignment to editors based on a keyword-driven algorithm (Dall’Aglio, 2006). This process has the potential to be entirely independent from journals and can be easily implemented as an overlay function for repositories, including preprint servers. As such, it can be leveraged for a decoupled peer review process by combining certification with distribution and communication. It is entirely feasible for this to be implemented on a system-wide scale, with researcher databases such as ORCID becoming increasingly widely adopted. However, as the scale of such an initiative increases, the risk of over-fitting also increases due to the inherent complexity in modelling the diversity of research communities, although there are established techniques to avoid this. Questions have been raised about the impact of such systems on the practice of scholarly writing, such as how authors may change their approach when they know their manuscript is being evaluated by a machine (Hukkinen, 2017), or how machine assessment could discover unfounded authority in statements by authors through analysis of citation networks (Greenberg, 2009). One additional potential drawback of automation of this sort is the possibility for detection of false positives that might discourage authors from submitting.

Finally, it is important to note that ML and neural networks are largely considered to be conformist, so they have to be used with care (Szegedy et al., 2014), and perhaps only for recommendations rather than decision making. The question is not about whether automation produces error, but whether it produces less error than a system solely governed by human interaction. And if it does, how does this factor in relation to the benefits of efficiency and potential overhead cost reduction? Nevertheless, automation can potentially resolve many of the technical issues associated with peer review and there is great scope for increasing the breadth of automation in the future. Initiatives such as Meta, an AI tool that searches scientific papers to predict the trajectory of research (meta.com), highlight the great promise of artificial intelligence in research and for application to peer review.


Tennant JP, Dugan JM, Graziotin D et al. A multi-disciplinary perspective on emergent and future innovations in peer review [version 3; peer review: 2 approved]. F1000Research 2017, 6:1151 (https://doi.org/10.12688/f1000research.12037.3)

Leave a Reply