Twitter as a sensor, crowdsourced data collection, NLP, Satellites and more.
While many countries provide census, survey, or other politically sensitive information, a majority either do not collect such information or lock it behind paywalls and license restrictions. This data is critical for a functioning society - including elementary building blocks such as the geographic delineation of boundaries, household income levels in those areas, or healthcare and education accessibility. The techniques, methods and processes we develop enable us to collect critical data in an open source, replicable, and effecient way.
A basic building block of many analyses is an understanding of the geographic regions that policy decisions influence - i.e., U.S. States or County boundaries; or Chinese Provinces. However, this type of data is not made public - or otherwise accessible - by many countries around the world. Produced and maintained by the geoLab since 2017, the geoBoundaries Global Database of Political Administrative Boundaries Database is an online, open license resource of boundaries (i.e., state, county) that overcomes this challenge. We currently track approximately 300,000 boundaries across 199 entities, including all 195 UN member states, Greenland, Taiwan, Niue, and Kosovo. All boundaries are available to view or download in common file formats, including shapefiles; the only requirement for use is acknowledgement.
Twitter and similar sites are well-known for their ability to teach us about social networks - but significantly less research is devoted to how to take advantage of the location information contained in tweets, instagram images, or other types of posts. We explore this in depth - using 'Twitter as a Sensor' for where geographic events (such as floods, earthquakes, disease outbreaks or protests) are occuring, or identifying where an image was taken based on what is in the image. We also work to develop techniques to translate this information into useful formats to improve decisionmakers situational awareness.
Satellite data is a largely untapped source of information about socioeconomic factors. With the increased interest in deep learning has come new capabilities to extract household income data from satellite imagery, the roughness of a road and concomitant travel times, or estimate how well a student in a school will score on a math exam based on pictures of the school taken from a vehicle. These new capabilities exist alongside more traditional sources such as nighttime lights data. We conduct both applied and basic research into the use of satellite information to better understand socioeconomic conditions across the globe.
Local populations are nearly universally the most knowledgeable about their environment - and, can provide insights impossible to extract from satellite imagery. This knowledge can aid in validation, discovery, and even open new avenues of research. However, collecting such data in a sustainable, replicable way is an immense challenge. We explore the challenges of collecting and using volunteered location data.