Illegal file hosting site, ResearchGate, acquires massive financial investment

ResearchGate is a platform where its users, primarily researchers, routinely engage in massive-scale copyright infringement of published works.

It was announced this week and covered in a series of high profile venues, including the New York Times, Business Insider, TechCrunch, and Research Information, that the platform had acquired $52.6 million in funding from a range of sources including the Wellcome Trust, Bill Gates, and for some bizarre reason, Ashton Kutcher.

Financing massive-scale copyright infringement 

The platform boasts that 2.5 million published outputs are uploaded by its users every month, equivalent to around the total number of published scholarly research articles each year. The site claims to have around 100 million published articles, which is very impressive seeing as only around 20-25 million have ever been published Open Access.

Now on the face of it, this might seem awesome, as it is vastly increasing access to published research. But it is actually hugely problematic, as recent research reveals. Based on a random sample of English language articles drawn from ResearchGate, the study showed that:

The key finding was that 201 (51.3%) out of 392 non-OA articles infringed the copyright and were non-compliant with publishers’ policy.

While this sample size was small, there is no reason to think that the same cannot be said if we scale up to consider the entire corpus of articles shared on RG. This means that around half, or approximately 50 million, research papers on RG are most likely illegally hosted.

Every researcher is is aware of RG spam, often with emails requesting directly that authors upload a version of one of their published papers. So while RG is itself not committing the copyright infringement, it is certainly enabling and often encouraging it, therefore being directly complicit in this on an enormous scale.

The problem here is that the vast majority of published research papers cannot be uploaded online legally, even by the original authors. This is because researchers are often forced, albeit often without their knowledge or awareness or understanding of the implications, to transfer full copyright over to publishers in exchange for having their research article published.

The ethics or questionable practice of this aside, this means that for around 75% of all research articles, the published versions cannot be legally posted online. The manuscript versions prior to peer review (preprints) can often freely and without restriction, and the often unformatted but peer reviewed versions (postprints) can, often with embargo restrictions.

With RG though, it is often not these versions which are shared, as the above research demonstrates: “..the majority of non-compliant cases (97.5%) occurred when authors self-archived publishers’ PDF files (final published version).”

This is all of course not complicated or an issue with Open Access publishing in which free, unlimited and unrestricted sharing is completely acceptable and encouraged.

Why is ResearchGate so popular?

Posting to RG is no more difficult than freely posting to an institutional repository, yet with allegedly more than 12 million members on the platform, publishing there is clearly appealing. Academics precariously use it as a professional advertising tool, and in an academic environment where egotism and self-marketing is rewarded more than sharing, it is easy to perhaps see why using RG is more popular than doing things the legal, and often bureaucratic and expensive way.

When questioned about this massive scale illegal file hosting, RG can simply wave their hands and say it has nothing to do with them; instead, fault lies with the choices of their members.

Search engine of choice, Google Scholar, also harvests content from RG. This means that researchers’ most popular way of research discovery is also enabling massive scale illegal access to research. I doubt GS would be half as popular if it didn’t consistently facilitate access to illegal content, as it’s a pretty rubbish search and discovery platform on the face of it.

In the meantime, RG can keep using this illegal content to enhance their data analytics, which is perhaps more of an issue than what they then choose to do with such data as a for-profit company. They can, and do, provide a simple statement upon uploading articles, to make sure that they are the legal versions. But this is not monitored or enforced in any way, and if it was in any way effective as a preventative measure then we would not see such massive scale illegality on the platform. This issue is so much greater than the question of whether or not a for-profit entity should be engaging with academia.

RG boasts that its user base comprises more than half of the global research community: “According to Mr. Madisch, the social network has signed up 12 million scientists, or roughly 60 percent of all such potential users worldwide.”

A system-wide access/copyright/education disaster

This means the problem comprises three major parts:

  1. The vast majority of researchers engage with a platform in which about half of the core article database is illegal.
  2. Those same researchers don’t seem to care or be aware of this.
  3. ResearchGate also does not seem to care or be able to be held accountable for this, as the infringements are from their users, rather than the company itself.

Which means that active engement in illegal sharing of scholarly works is now so commonplace that even companies that facilitate this can gather millions in venture capital, with no one even raising this issue. It also demonstrates that the vast majority of researchers either do not care or are not aware of the issues of copyright and scholarly publishing, as Richard Poynder pointed out recently.

It looks like the future of ResearchGate will be in using this vast and illegal corpus of data to focus on advertising to its user base, according to comments from its CEO.

The nature of these investments is also presently unclear. Does the Wellcome Trust now own a stake in ResearchGate, and how does this align with their charitable aims and other investments, such as Wellcome Open Research, a fully legal initiative?

Why has legal action not been taken? 

Seeing as publishing companies like Elsevier are actively pursuing other illegal article sharing platforms like SciHub, and often with an enormous media campaign following each step, it seems quite surprising that they have not taken additional stronger action against ResearchGate, other than the few thousand takedown notices they issued several years ago.

SciHub claims that it is “the first pirate website in the world to provide mass and public access to tens of millions of research papers”, providing access to 58 million articles at the present, all provided by a global user base. Well, this isn’t really true, is it. ResearchGate has been around longer (2008), and is in effect a pirate website of almost twice the size, now with the backing of major investors.

The difference is that SciHub is run by one frustrated student, whereas RG emphasises the massive-scale problem with access to knowledge through 12 million researchers as members. SciHub is the same as RG though in that it is the vast number of users who commit the copyright infringement, with both platforms simply acting as the hosts for this activity.

Furthermore, this also means that this is 12 million researches who probably think they are ‘doing Open Access’ by sharing their work on RG, whereas actually the opposite is true. By doing this in an illegal manner, they fail to commit to OA in a legitimate, integrated, or sustainable way.

What would happen if ResearchGate shuts down tomorrow?

Edit: While it does explicitly state on my site that content on here is personal and not those of my employers, I should declare that I work for ScienceOpen, a competitor of ResearchGate. Not that it makes any difference at all to the content here, but just for the sake of transparency. 

30 thoughts on “Illegal file hosting site, ResearchGate, acquires massive financial investment

    1. I can tell you easily because this is what I was told 5 years ago:
      1. target the market to become the no. 1 (almost done because they reach out for 12m researchers out of 20m active researchers worldwide);
      2. sell advertisements (just established: you will find ads everywhere next to the PDFs on the RG site, apparently no matter if is NC content or toll-access/copyrighted stuff, as far as I saw);
      3. sell job offers,
      and (this are my personal assumptions only)
      4. further investment, boosting the site and further target the market (as any 3rd party-owned New Economy venture)
      and finally
      5. possibly replace traditional scholarly publishing industry to capture a 10 billion USD market – an easy step from now (2017), but apparently publishers seem to be not aware of this threat or they believe what they claimed: we are not interested in publishing or collaborating with publishers… 😉

      1. I think point 5 answers Graham’s question.

        What would be worse — publically traded companies like RELX, JW.A, JW.B, TRI — owning the bulk share of the market or, a single, privately held company?

    1. Why is it difficult to redirect users to the current version? Is it a technical problem? Cannot see your point here, sorry.

  1. Not sure the comparison between RG and Scihub is quite accurate. Scihub are not just hosting the content they are actively acquiring it. Recent reports suggest Scihub is utilising unethical and maybe even illegal methods to acquire the content.

    1. Hey Andy,

      Of course they’re not identical. Both are hosts of illegal files on an enormous scale. In terms of acquisition, RG does request authors upload their manuscripts, performing only the most cursory of checks as to their legality, and doing nothing to stop the VOR being uploaded. So they do actively acquire illegal content, just they can blame the researchers for not using the ‘proper’ versions. And SciHub is fairly similar, in that content is added to the database based on requests using SciHub. As I understand it, if a request is made and an article exists in the database already (via LibGen), it’ll direct you to that. If not, it’ll cycle through institutional logins until it can acquire a version, and archive that too. I don’t think it has been verified yet how those institutional account details have been acquired yet, has it? I know there has been a lot of dramatic speculation, but has it been confirmed yet?


      1. SciHub actively “hacks” weak passwords of students and faculty at hundreds of universities in order to acquire access to their licensed subscription content and download it to their own servers in massive enough volume that they often trigger “blocks” where the publisher actually cuts the hacked institution’s access to the content until the hacked account’s password is changed. They also, on the fly, used hacked accounts to download individual articles the same way as SciHub users request them. That is a very far cry from RG actively soliciting users who have legitimate access to violate their own institution’s license agreements one article at a time. There is a huge legal difference here. What SciHub does is blatantly and directly illegal and causes loss of access to legitimate users. What RG does is to trick well-meaning individuals to make illegal copies for them on a very small scale each, and does not disrupt the legitimate users in the process. RG is much more like YouTube from the perspective of the RIAA (music industry), where it is the individual uploaders who infringe copyrights one work at a time. I am a librarian at a small public university who has seen directly our paid access get blocked by SciHub hacking abuse of our subscriptions, and also seen in our usage reports fairly strong indirect evidence of large scale attempts continuing on a monthly basis to break into our accounts. RG is not anywhere in that ballpark.

        1. Hi Melissa,

          Thanks for your comments. Has it been proved yet that this is how SciHub acquires passwords and its content? I was unaware that this has been confirmed. In terms of this triggering publisher blocks, I was unaware of this, but note that publishers like Wiley and Elsevier also do this in response to access through perfectly legitimate means (eg via fake URLs, or blocking TDM). Do you know how much has purportedly been downloaded directly in these massive volumes (something I was unaware of too) compared to user uploads (the way I thought it obtained the vast majority of its content).

          And I think you’re assessment of RG is also incorrect: “..far cry from RG actively soliciting users who have legitimate access to violate their own institution’s license agreements one article at a time”. Simply because a user might have legitimate access through their institution, this does not permit them to share those articles, individually or otherwise, in a forum such as ResearchGate. Even for their own works, in which they will often have traded their copyright in exchange for publication. It’s interesting to hear about your personal experiences with this though, are they written up in more detail anywhere?

          In terms of scale, I think you’re mistaken here too. In the article, I show that RG currently hosts around 50 million illegal articles based on best estimates, and SciHub claims to have around 60 million. So they’re both pretty similar, except with RG more users perhaps are complicit, as are the company itself for directly facilitating and encouraging this.

          Note that I don’t support either platform in any way for their activities, I just think it strange that much legal action is taken against one while the other is acquiring millions in funding despite being fairly similar.

          1. No, of course it hasn’t been proven how SciHub works. The only way to do that is if they published the information.

            At some point they were actually asking people on the sly to provide their accounts. How would we detect the difference between 10 users who provided their credentials willingly but who would *never* admit to it and 10 users who were hacked? (We can’t. Until SciHub tell us they were hacking or they were acquiring credentials voluntarily, no dice.)

            It’s also in SciHub’s best interests not to trigger such blocks as described by Melissa so that they can keep siphoning content longer-term. The picture painted is unsustainable and IMO does not hold up to scrutiny. The one possible reason is that they don’t have the resources to manage the whole illegal downloading process well enough, but I would absolutely assume that they are trying to improve it and remain undetected.

  2. Nice piece Jon. I was also a bit surprised to see the large investment from philanthropically-minded entities such as the Wellcome Trust and Bill Gates. My reservations stem from the many aspects of ResearchGate that are at odds with a healthy scientific environment and an open future. Three issues that trouble me are:

    1. ResearchGate’s email practices . As I’ve documented, it’s impossible to unsubscribe from ResearchGate’s emails. They have an incredible amount of subscription categories. And whenever new categories are added, users are auto-subscribed. In addition, it’s no secret that most ResearchGate emails border on spam and deceit, with emails often impersonating users.

    2. The Terms & Conditions state “The software running the Service, the site design, the logos and other graphics, articles and other texts as well as the database are protected by copyright and property of the Provider.” The Provider is defined as “ResearchGate GmbH”. So this seems to imply that ResearchGate claims ownership of user generated content. Whether this is just a poorly worded Terms & Conditions, I’m not sure.

    3. Unlike StackOverflow where content is CC BY, ResearchGate Q&A content is all rights reserved, which places the longterm preservation of this content in peril and limits its contribution to science.

    Fortuitously, I was at a Gates Foundation Open Access event at the AAAS annual meeting. The panalists at the event were George Church of Harvard, Josh Freeman of AAAS (which produces Science magazine), Ijad Madisch of ResearchGate, Leigh Morgan of Gates, and Trevor Mundel of Gates.

    The reception gave me the opportunity to convey my concerns to ResearchGate co-founder Ijad Madisch, who said he will “let people work on it.” So hopefully these issues are resolved soon.

    As a closing note, these investments in ResearchGate reflect the involvement major players in academic communication. As open scientists, it’s important for us to remain vigilant as these rising alliances introduce new conflicts of interest into the organizations responsible for scientific infrastructure and funding.

    1. Funny, though active on the platform I rarely receive emails from RG and as far as I know you can disable all or most of them if you wish. Can you share on examples of those impersonating messages, because that should trigger a real outcry.

  3. What about Google Scholar is rubbish? It is insanely useful for its seamless integration with my library’s remote publication access tool. Fast and gets right to the content. I also really like their metrics, author pages and the way they build networks of citations. I don’t click RG links from it, mostly because I don’t like RG’s cover pages on PDFs.

  4. Such a nice piece, Jon, it made me think a lot! Let me share some of my comments.
    1. It would be great to learn more on the motivation of the organizations who supported RG financially.
    2. Second, I think, conventionality might be one of the key issues behing the scueess of RG. The fact that they are literally a common ground for sharing stuff (illegally), i.e. researches can be (and are) pretty sure about that their colleagues, research mates, rivals and grad students will be available on RG, so their whole audience is guaranteed to be present there (in contrast to smaller academic networks like Academia.edu). I’m quite sure about this is an even more powerful aspect in RG than functioning as a paradise of dark sharing. What’s more, this is exactly what makes RG the FB of academia. Conventionality and this guaranteed presence of the target audience what follows from conventionality is very dangerous because this might function as pressure on researchers for not to delete their accounts, even though they are aware of the situation and in principle disagree with their data policy.
    3. On the other hand, there is also a a “don’t hate the player, hate the game” aspect here. In the current state of academic publishing, a need for such a sharing-networking platform arose and they just simply covered this market niche and gave people what they wanted. This of course doesn’t mean at all that we should get over what RG does. What I’m saying instead is that maybe it’s worth to shift the focus of attention from to the ones who make RG profitable and ask (ok, very genral or maybe even rhetoric but crucial) questions like why this is the case, where such a massive ignorance, “unaware-ing”, doesn’t caring attitudes are coming from and in which ways should an academic society raise awareness.

  5. I see no problem The problem is with the journals who own the copyright. They have all but destroyed libraries and they are SCAM ARTISTS. The only ethical thing to do is to encourage the distribution of papers as broadly as possible until until every one of these publishers are put out of business.

  6. Hi Jon,

    I like the piece. But just to get this clear, what would be your problem with ResearchGate in a 100% OA (CC-BY) world? Because I do like the mentality that researchers share their own work (and in principle that of others as well) as much as they want, without thinking twice. Researchers should be allowed to do that, it’s only that they are not yet, lawfully. But are they morally?

    1. Hi Jeroen,

      Thanks for this interesting comment. I certainly need to think about this more from different perspectives, but here’s a simple answer for now.

      I would have no problem with RG in a 100% Open world, as we probably wouldn’t even need it then anymore. RG’s popularity stems from it’s huge corpus of freely, but often not legally, available articles (and the fact that Google Scholar links to them). My problem is that because it is simple, but illegal, it detracts from researchers ‘doing OA right’, and being lawful in the process. I can’t support that sort of behaviour, no matter what I think of some scholarly publishers. I do however think there is a mismatch between the morality and legality of openly sharing research articles, but don’t have a viable solution to that yet. I don’t think breaking the law is it, though.

      Another thing is that this shows the scale of the issue of copyright. If researchers sharing their own articles is illegal, then what the hell is copyright protecting? The profits of publishers? Because that is not what it was designed for. And it highlights the lack of understanding of the contractual obligations researchers have when signing away their rights to publishers through CTAs. Of course, in a 100% OA world where copyright is retained by authors, which I imagine is what we all want, none of this would be an issue.



  7. “This means that around half, or approximately 50 million, research papers on RG are most likely illegally hosted.”

    Shouldn’t it be around half of the non-OA fulltext available? The study says that only 19M of the 80M (circa 2015) were full-text. Assuming 20% of the 19M are gold-OA leaves about 15M non-OA fulltext, so about 7.75M illegally hosted. Presumably this has increased since 2015 if the latest figure for overall publications has increased to 100M, but I assume the % of fulltext is roughly the same. Any recent figure from RG on the number of full-text articles? Cheers, Eoghan

    1. Hey Eoghan,

      Nice detective work. As far as I can see, according to RG they still claim to have 100 million publications (fact sheet here: https://www.researchgate.net/press). So I would guess ResearchGate are (deliberately?) conflating article record with “publication”, as typically researchers regard publication as the full text record, and not just article metadata. You’re probably right with your assessment though, and if indeed there are only ~20 million full text publications, not 100 million as I misconstrued (not deliberately), then only between 7-10% of article records would be illegally hosted, but still around half the full text corpus. ResearchGate have been quite quiet about this whole thing though.



      1. The lack of transparency on all things RG is its biggest downside! Still a lot of full-text in one place though, regardless of legality. Also interesting that the study found some 14% of full-text was externally hosted, with arxiv & nih the largest sources. Cheers

