{"id":141581,"date":"2013-04-15T10:43:50","date_gmt":"2013-04-15T10:43:50","guid":{"rendered":"https:\/\/staging2.simonw59.sg-host.com\/hackathon-extracting-meaningful-machine-interpretable-data-from-scholarly-publications\/"},"modified":"2022-05-26T13:41:13","modified_gmt":"2022-05-26T13:41:13","slug":"hackathon-extracting-meaningful-machine-interpretable-data-from-scholarly-publications","status":"publish","type":"post","link":"https:\/\/force11.org\/post\/hackathon-extracting-meaningful-machine-interpretable-data-from-scholarly-publications\/","title":{"rendered":"Hackathon, extracting meaningful, machine-interpretable data from scholarly publications"},"content":{"rendered":"<div class=\"ct_body\">\n<p><span style=\"font-family: arial, sans-serif;font-size: 13px\">Hi all, we would like to organize a Hackathon while at the ESWC.<\/span><br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<span style=\"font-family: arial, sans-serif;font-size: 13px\">We want to do it on Monday, May 27 in Montpillier, France.<\/span><br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<span style=\"font-family: arial, sans-serif;font-size: 13px\">We are probably going to use&nbsp;<\/span><a href=\"https:\/\/www.hackerleague.org\/\" style=\"font-family: arial, sans-serif;font-size: 13px\" target=\"_blank\" rel=\"noopener\">https:\/\/www.hackerleague.org\/<\/a><span style=\"font-family: arial, sans-serif;font-size: 13px\">&nbsp;to manage signups<\/span><br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<span style=\"font-family: arial, sans-serif;font-size: 13px\">Theme:<\/span><br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<span style=\"font-family: arial, sans-serif;font-size: 13px\">The ability to extract meaningful, machine-interpretable data from<\/span><br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<span style=\"font-family: arial, sans-serif;font-size: 13px\">scholarly publications in PDF form is a big challenge. &nbsp;Several open<\/span><br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<span style=\"font-family: arial, sans-serif;font-size: 13px\">source libraries exist that attempt to automate this process, but work<\/span><br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<span style=\"font-family: arial, sans-serif;font-size: 13px\">needs to be done on them to improve accuracy and reliability. &nbsp;Some<\/span><br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<span style=\"font-family: arial, sans-serif;font-size: 13px\">specific and relevant &nbsp;challenges include:<\/span><br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<span style=\"font-family: arial, sans-serif;font-size: 13px\">Ability to automatically identify and tokenize citations from the PDF<\/span><br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<span style=\"font-family: arial, sans-serif;font-size: 13px\">(or more accurately, from a string of text)<\/span><br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<span style=\"font-family: arial, sans-serif;font-size: 13px\">Ability to automatically identify those blocks of text that represent<\/span><br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<span style=\"font-family: arial, sans-serif;font-size: 13px\">the narrative in a PDF.<\/span><br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<span style=\"font-family: arial, sans-serif;font-size: 13px\">Ability to identify references within the narrative, extract their<\/span><br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<span style=\"font-family: arial, sans-serif;font-size: 13px\">scope, and associate them with citation information in the PDF.<\/span><br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<span style=\"font-family: arial, sans-serif;font-size: 13px\">Anybody interested is welcome to join us, we will announce more<\/span><br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<span style=\"font-family: arial, sans-serif;font-size: 13px\">presice details later on this week. Also, <\/span><span style=\"font-family: arial, sans-serif;font-size: 13px\">we are looking for someone who co-organizes the meeting, ideally<\/span><br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<span style=\"font-family: arial, sans-serif;font-size: 13px\">someone who is local to Montpellier or to France.<\/span><br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<br style=\"font-family: arial, sans-serif;font-size: 13px\" \/><br \/>\n\t<span style=\"font-family: arial, sans-serif;font-size: 13px\">Best.<\/span><\/p>\n<\/div>\n<p class=\"ct_meta\"><span class=\"ct_label\">Archive:<\/span>&nbsp;<a class=\"ct_archive\" target=\"_blank\" href=\"https:\/\/archive.force11.net\/node\/4359\" rel=\"noopener\">https:\/\/archive.force11.net\/node\/4359<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hi all, we would like to organize a Hackathon while at the ESWC. We want to do it on Monday, May 27 in Montpillier, France. We are probably going to use&nbsp;https:\/\/www.hackerleague.org\/&nbsp;to manage signups Theme: The ability to extract meaningful, machine-interpretable data from scholarly publications in PDF form is a big challenge. &nbsp;Several open source libraries &#8230; <a title=\"Hackathon, extracting meaningful, machine-interpretable data from scholarly publications\" class=\"read-more\" href=\"https:\/\/force11.org\/post\/hackathon-extracting-meaningful-machine-interpretable-data-from-scholarly-publications\/\" aria-label=\"More on Hackathon, extracting meaningful, machine-interpretable data from scholarly publications\">Read more<\/a><\/p>\n","protected":false},"author":206032,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"advgb_blocks_editor_width":"","advgb_blocks_columns_visual_guide":"","footnotes":""},"categories":[182],"tags":[],"force11":[],"blog_series":[],"working_group":[],"class_list":["post-141581","post","type-post","status-publish","format-standard","hentry","category-blog"],"acf":[],"author_meta":{"display_name":"Alex Garcia-castro","author_link":"\/members\/alex-garcia-castro"},"featured_img":null,"coauthors":[],"tax_additional":{"categories":{"linked":["<a href=\"https:\/\/force11.org\/category\/blog\/\" class=\"advgb-post-tax-term\">Blogs<\/a>"],"unlinked":["<span class=\"advgb-post-tax-term\">Blogs<\/span>"]}},"comment_count":"0","relative_dates":{"created":"Posted 13 years ago","modified":"Updated 4 years ago"},"absolute_dates":{"created":"Posted on 15 Apr 2013","modified":"Updated on 26 May 2022"},"absolute_dates_time":{"created":"Posted on 15 Apr 2013 10:43","modified":"Updated on 26 May 2022 13:41"},"featured_img_caption":"","series_order":"","_links":{"self":[{"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/posts\/141581","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/users\/206032"}],"replies":[{"embeddable":true,"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/comments?post=141581"}],"version-history":[{"count":0,"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/posts\/141581\/revisions"}],"wp:attachment":[{"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/media?parent=141581"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/categories?post=141581"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/tags?post=141581"},{"taxonomy":"force11","embeddable":true,"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/force11?post=141581"},{"taxonomy":"blog_series","embeddable":true,"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/blog_series?post=141581"},{"taxonomy":"working_group","embeddable":true,"href":"https:\/\/force11.org\/wp-json\/wp\/v2\/working_group?post=141581"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}