Loading…
escience2019 has ended
Social Sciences & Humanities Track [clear filter]
Wednesday, September 25
 

10:30am PDT

defoe: A Spark-Based Toolbox for Analysing Digital Historical Textual Data
Rosa Filgueira (University of Edinburgh), Michael Jackson (University of Edinburgh), Anna Roubickova (University of Edinburgh), Amrey Krause (University of Edinburgh), Ruth Ahnert (Queen Mary University of London), Tessa Hauswedell (University College London), Julianne Nyhan (University College London), David Beavan (The Alan Turing Institute), Timothy Hobson (The Alan Turing Institute), Mariona Coll Ardanuy (The Alan Turing Institute), Giovanni Colavizza (The Alan Turing Institute), James Hetherington (The Alan Turing Institute), and Melissa Terras (University of Edinburgh)

This work presents defoe, a new scalable and portable digital eScience toolbox that enables historical research. It allows for running text mining queries across large datasets, such as historical newspapers and books, in parallel via Apache Spark. It handles queries against collections that comprise several XML schemas and physical representations. The proposed tool has been successfully evaluated using five different large-scale historical text datasets and two computing environments, Cray Urika-GX, and Eddie, as well as in desktops. Results shows that defoe allows researchers to query multiple datasets in parallel from a single command-line interface and in a consistent way, without any HPC environment-specific requirement.

Speakers
RF

Rosa Filgueira

University of Edinburgh


Wednesday September 25, 2019 10:30am - 11:00am PDT
Cockatoo Room

11:00am PDT

Understanding a Rapidly Expanding Refugee Camp Using Convolutional Neural Networks and Satellite Imagery
Susanne Benz (UC San Diego), Hogeun Park (UC San Diego), Jiaxin Li (UC San Diego), Daniel Crawl (UC San Diego), Jessica Block (UC San Diego), Mai Nguyen (UC San Diego), and Ilkay Altintas (UC San Diego)

In summer 2017, close to one million Rohingya, an ethnic minority group in Myanmar, have fled to Bangladesh due to the persecution of Muslims. This large influx of refugees has resided around existing refugee camps. Because of this dramatic expansion, the newly established Kutupalong-Balukhali expansion site lacked basic infrastructure and public service. While Non-Governmental Organizations (NGOs) such as Refugee Relief and Repatriation Commissioner (RRCC) conducted a series of counting exercises to understand the demographics of refugees, our understanding of camp formation is still limited. Since the household type survey is time-consuming and does not entail geo-information, we propose to use a combination of high-resolution satellite imagery and machine learning (ML) techniques to assess the spatiotemporal dynamics of the refugee camp. Four Very-High Resolution (VHR) images (i.e., World View-2) are analyze to compare the camp pre- and post-influx. Using deep learning and unsupervised learning, we organized the satellite image tiles of a given region into geographically relevant categories. Specifically, we used a pre-trained convolutional neural network (CNN) to extract features from the image tiles, followed by cluster analysis to segment the extracted features into similar groups. Our results show that the size of the built-up area increased significantly from 0.4 km2 in January 2016 and 1.5 km2 in May 2017 to 8.9 km2 in December 2017 and 9.5 km2 in February 2018. Through the benefits of unsupervised machine learning, we further detected the densification of the refugee camp over time and were able to display its heterogeneous structure. The developed method is scalable and applicable to rapidly expanding settlements across various regions. And thus a useful tool to enhance our understanding of the structure of refugee camps, which enables us to allocate resources for humanitarian needs to the most vulnerable populations.

Speakers
SB

Susanne Benz

UC San Diego



Wednesday September 25, 2019 11:00am - 11:30am PDT
Cockatoo Room

11:30am PDT

Social Media Intelligence and Learning Environment: an Open Source Framework for Social Media Data Collection, Analysis and Curation
Chen Wang (University of Illinois at Urbana-Champaign), Luigi Marini (University of Illinois at Urbana-Champaign), Chieh-Li Chin (University of Illinois at Urbana-Champaign), Nickolas Vance (University of Illinois at Urbana-Champaign), Curtis Donelson (University of Illinois at Urbana-Champaign), Pascal Meunier (Purdue University), and Joseph T. Yun (University of Illinois at Urbana-Champaign)


Social Media Intelligence and Learning Environment (SMILE) is an open source framework bringing cutting-edge computational models on social media data to social science researchers and students with any level of programming and computation expertise. Many existing social media analysis tools require programming knowledge, a fee, or are closed source, making it challenging for social science researchers to apply existing and new methods to social media data. SMILE provides a user-friendly web interface, through which researchers can perform a wide spectrum of research tasks, ranging from social media data collection, natural language processing, text classification, social network analysis, and generating human readable outputs and visualizations. SMILE has adopted several technologies to support its needs. The data service of SMILE leverages the GraphQL language to provide an efficient and succinct API for client to communicate with a heterogeneous collection of social media APIs, including Twitter and Reddit. SMILE implements a microservices design and utilizes Amazon AWS services, such as Lambda and Batch for computation, S3 for data storage, and Elasticsearch for a Twitter streaming database, which makes it more portable, economic, and resilient. Analysis outputs can be shared with the larger community using Clowder, an open source data management system to support data curation of long tail data and metadata. SMILE is one of the main applications deployed as a standalone tool within the Social Media Macroscope (SMM), a science gateway based on the HUBzero platform.

Speakers
CW

Chen Wang

University of Illinois at Urbana-Champaign
NV

Nickolas Vance

University of Illinois at Urbana-Champaign
CD

Curtis Donelson

University of Illinois at Urbana-Champaign


Wednesday September 25, 2019 11:30am - 12:00pm PDT
Cockatoo Room
 
Filter sessions
Apply filters to sessions.