{"id":141958,"date":"2013-03-15T18:29:25","date_gmt":"2013-03-15T18:29:25","guid":{"rendered":"https:\/\/staging2.simonw59.sg-host.com\/why-may-google-textmine-but-scientists-may-not\/"},"modified":"2022-05-26T13:44:12","modified_gmt":"2022-05-26T13:44:12","slug":"why-may-google-textmine-but-scientists-may-not","status":"publish","type":"post","link":"https:\/\/force11.org\/post\/why-may-google-textmine-but-scientists-may-not\/","title":{"rendered":"Why may Google textmine but Scientists may not?"},"content":{"rendered":"<div class=\"ct_body\">\n<p><strong>Author: <\/strong>Piwowar, Heather<\/p>\n<p>&quot;Check out the <a href=\"http:\/\/www.ncbi.nlm.nih.gov\/robots.txt\">robot.txt files for PMC <\/a>&nbsp;for&nbsp;<strong>\/pmc\/articles\/<\/strong> &nbsp;and notice that GoogleBot is allowed, Bing and a few others are allowed, but User-Agent:* (the rest of us) are not. &nbsp;The same is true for <a href=\"http:\/\/www.sciencedirect.com\/robots.txt\">ScienceDirect robots.txt<\/a>: &nbsp;Google may textmine everything, experimenting scientists, nothing. &nbsp;(hat tip to Alf Eaton <a href=\"https:\/\/twitter.com\/invisiblecomma\/status\/305310341074669569\">on twitter<\/a>)<\/p>\n<p>Is this&nbsp;defensible&nbsp;on the grounds that Google knows what it is doing but The Rest Of Us Can Not Be Trusted? &nbsp;I sure hope not. &nbsp;Scientists are routinely trusted with a lot more than writing a script that won&rsquo;t bring down a server. &nbsp;There are other ways to ensure someone won&rsquo;t bring down a server than a global robots.txt ban.&quot;<\/p>\n<p><a href=\"http:\/\/researchremix.wordpress.com\/2013\/03\/13\/why-google\/\">http:\/\/researchremix.wordpress.com\/2013\/03\/13\/why-google\/<\/a><\/p>\n<\/div>\n<p class=\"ct_meta\"><span class=\"ct_label\">Archive:<\/span>&nbsp;<a class=\"ct_archive\" target=\"_blank\" href=\"https:\/\/archive.force11.net\/node\/6435\" rel=\"noopener\">https:\/\/archive.force11.net\/node\/6435<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Piwowar, Heather &quot;Check out the robot.txt files for PMC &nbsp;for&nbsp;\/pmc\/articles\/ &nbsp;and notice that GoogleBot is allowed, Bing and a few others are allowed, but User-Agent:* (the rest of us) are not. &nbsp;The same is true for ScienceDirect robots.txt: &nbsp;Google may textmine everything, experimenting scientists, nothing. &nbsp;(hat tip to Alf Eaton on twitter) Is this&nbsp;defensible&nbsp;on &#8230; <a title=\"Why may Google textmine but Scientists may not?\" class=\"read-more\" href=\"https:\/\/force11.org\/post\/why-may-google-textmine-but-scientists-may-not\/\" aria-label=\"More on Why may Google textmine but Scientists may not?\">Read more<\/a><\/p>\n","protected":false},"author":205957,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"advgb_blocks_editor_width":"","advgb_blocks_columns_visual_guide":"","footnotes":""},"categories":[151],"tags":[],"force11":[],"blog_series":[],"working_group":[],"class_list":["post-141958","post","type-post","status-publish","format-standard","hentry","category-news"],"acf":[],"author_meta":{"display_name":"Maryann Martone","author_link":"\/members\/maryann-martone"},"featured_img":null,"coauthors":[],"tax_additional":{"categories":{"linked":["<a href=\"https:\/\/force11.org\/category\/news\/\" class=\"advgb-post-tax-term\">News<\/a>"],"unlinked":["<span class=\"advgb-post-tax-term\">News<\/span>"]}},"comment_count":"0","relative_dates":{"created":"Posted 13 years ago","modified":"Updated 4 years ago"},"absolute_dates":{"created":"Posted on 15 Mar 2013","modified":"Updated on 26 May 2022"},"absolute_dates_time":{"created":"Posted on 15 Mar 2013 18:29","modified":"Updated on 26 May 2022 13:44"},"featured_img_caption":"","series_order":"","_links":{"self":[{"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/posts\/141958","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/users\/205957"}],"replies":[{"embeddable":true,"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/comments?post=141958"}],"version-history":[{"count":0,"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/posts\/141958\/revisions"}],"wp:attachment":[{"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/media?parent=141958"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/categories?post=141958"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/tags?post=141958"},{"taxonomy":"force11","embeddable":true,"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/force11?post=141958"},{"taxonomy":"blog_series","embeddable":true,"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/blog_series?post=141958"},{"taxonomy":"working_group","embeddable":true,"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/working_group?post=141958"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}