UrbanNet 2016: Smart Cities, Complexity and Urban Networks  (U2SC) Session 1

Schedule Top Page

Time and Date: 10:00 - 12:30 on 21st Sep 2016

Room: B - Berlage zaal

Chair: Oliva Garcia Cantu / Fabio Lamanna

14000 Challenges for integrating urban transportation networks (invited talk) [abstract]
Abstract: Urban mobility can be modeled as a multilayer network, where each layer represents different modalities such as pedestrians, cyclists, buses, trams, metros, taxis, transportation network companies, private, logistics, freight, and emergency vehicles. The technology required to integrate different transportation modes already exists. However, there are several challenges beyond technology for achieving truly integrated transportation networks. Even within a single transportation mode, different actors may fail to coordinate for different reasons: economic, social, or political. These reasons extrapolate and limit the coordination between layers. After reviewing the limitations, I suggest possible scenarios which could overcome them. As an incentive for doing so, I outline a way of quantifying the benefits on integrated transportation networks.
Carlos Gershenson
14001 Socio-Spatial Complexity and Neighborhood Structure in Cities [abstract]
Abstract: The problem of identifying natural, socio-economically defined neighborhoods arises in applied contexts including Census reporting, measurement of segregation, and dimension reduction in urban computing. This problem is also of interest for urban theory, since the difficulty of identifying neighborhoods may be viewed as a measure of socio-spatial complexity. We develop a rigorous, information-theoretic approach to this topic, using open data on race in American cities as a case study. First, we formulate the mean local information J(X, Y ) as a localization of the mutual information between spatial and socio-economic variables. The measure J(X, Y ) is closely related to the Fisher information of the underlying joint distribution, and is therefore a measure of the intrinsic spatial complexity of an urban phenomenon. Unlike standard global information measures, the mean local information clearly distinguishes between cities like Detroit?which is dominated by a few huge, monoracial superclusters?and cities like Philadelphia?which is an intricate patchwork of small, racially-distinct neighborhoods. Second, we provide a practical algorithm for identifying natural neighborhoods through greedy information maximization, and relate this algorithm?s behavior to the mean local information. Questions raised by this work include the social, economic, and policy determinants of socio-spatial complexity in cities, and the potential use of spatial information measures in quantifying temporal changes in socio-economic structure, on time scales ranging from days to decades.
Philip Chodrow
14002 Spatial uncertainty propagation in ICT data analysis [abstract]
Abstract: Everyday massive amounts of geolocated data are passively generated by individuals devices like smart phones, credit cards, GPSs, RFIDs or remote sensing devices. This deluge of information growing at an astonishing rate represents an unprecedented opportunity for researchers, to solve challenging problems and unveil fundamental insights on our society. Many disciplines are concerned, ranging from mathematics, physics and computer science for the analysis and management of research data, to applications in astronomy, medicine, geography and social science. Although data passively generated by the use of Information and Communication Technologies (ICT) have the advantage of the large size of the samples (millions of observations) with a high spatio-temporal resolution, they raise many new challenging issues, related to their storage, transport, management and processing. In particular, they may suffer from hidden biases, and therefore observing the world through the lens of big datasets can lead to possible distortions which may lead to erroneous conclusions. It is thus crucial to develop statistical tools and methods to assess the uncertainty in ICT data, notably by comparing the results obtained with different data sources. In the following we present two examples of such uncertainty analysis on results obtained with mobile phone data recorded in Senegal in 2013. We concentrate on two information-retrieval tasks: first, we evaluate the uncertainty when inferring land use from the rhythms of human activity, and second, we study the uncertainty when identifying individuals? most frequented locations. We conclude by mentioning possible future steps to clearly assess the relevance of various ICT data sources for studying different phenomena.
Maxime Lenormand
14003 Hierarchies and regions from infrastructure to interactions(invited talk) [abstract]
Abstract: As social beings, we create structures that ensure that interactions between and within communities take place. These structures have been changing over time, but they have left footprints that can be identified as patterns that translate into hierarchies of regions and social divisions that are the outcome of a historical process. In this work we use different clustering methodologies and percolation theory to uncover the different communities that can be identified as the outcome of this process, and as the emergence of new kinds of interactions
Elsa Arcaute
14004 Explaining the variations in urban population using the regional hierarchy [abstract]
Abstract: The distribution of population in urban settlements has been extensively characterised in the literature by using Zipf's law but there exist well known deviations from this power-law distribution in the upper and lower tails of the spectrum. In this work we use the definition of cities proposed in a previous paper using a percolation approach on the road network, and show that the same type of power-law distribution exists as well for the number of intersections of a city and that this distribution is fitted with a greater precision presenting less deviations from the theoretical power-law. We also show that it is possible to derive the population of a settlement from this number of intersections and that the existing variability within this approximation can be partially explained by quantifying the position in the regional hierarchy of each settlement. This result gives us another insight into why some cities are over/under-populated with respect to its expected position in the Zipf's law and at the same time renders possible to extract an approximated population of each settlement having as sole source of data the road network of the system. Furthermore, we show that by combining both distributions we can find a clear cut between large and small settlements that can be used to quantitatively define a threshold between urban/rural settlements.
Carlos Molinero
14005 Reconstructing Activity diaries from mobile phone data: feeding MATSim model [abstract]
Abstract: The integration of Information and communications technology (ICT) data sources to generate activity-travel information opens interesting opportunities for feeding agent-based transportation models, whose practical implementation is often hindered by the lack of sufficient data. In this talk we present a module developed to generate activity-travel diaries needed to feed the MATSim simulation framework. Activity diaries are generated by merging data from mobile phone records and census data. We discuss of the process followed for the creation of synthetic agents, from the extraction of mobility patterns to the expansion of the sample data to the total population: the dataset provided by the mobile network operator provides the age and gender of the users, while residence location and daily trips are obtained by analysing the mobile phone records; the information obtained from mobile phone records is then expanded to cover the whole population by using census data at census tract level. Finally, some lines of future research for the improvement of the current methodology are discussed. The resulting synthetic population is validated with the EMEF survey results for Barcelona, The comparison shows a good fit in number of trips by gender and age and major discrepancies in trip pourpose assignation. The resulting activity diaries are used to Feed and Calibrate the MATSim simulation model.
Oliva G Cantu
14006 When GIS meets LUTI: Enhanced version of the MARS simulation model through local accessibility coefficients [abstract]
Abstract: Residential location choices are influenced by a series of factors whose influence varies across space. We aim to improve one of Land Use and Transport Integrated (LUTI) models MARS model introducing the different impact of those factors on residential choice by computing local coefficients. In particular, this research explores the methodology of integrating the public choice model into the MARS (Metropolitan Activity Relocation Simulator) model using a general accessibility indicator, thus creating a new approach to estimate the coefficients of each public service section with the use of Geographically Weighted Regression (GWR). The MARS model includes a transport model which simulates the travel behaviour of the population related to their housing and workplace location, a housing development model, a household location choice model, a workplace development model, a workplace location choice model. The method to embed the public services location model into MARS is to re-develop the accessibility indicator which is the key connection between the transport submodel and housing, workplace loction submodel, not only considering the capability to reach workplaces but also involving the ability to access certain public services. Accessibility plays a major role to influence where people to live and work (Wang, et al., 2015). It is one of the outputs of the transport sub-model in year n as well as the input to the land use sub-models in the year n+1. The new accessibility indicator is calculated by integrating a series of travel motives, and then weighting each of them using the results from a GWR. In this manner, the accessibility is evaluated as the key location factor to express the level of public services in land use, which in turn attracts travel demand. We applied GWR (Fotheringham et al., 2002) to generate local models in which specific coefficients are computed for each observation (i.e. spatial unit) and for each significant variable. The calculation of these local coefficients is based on the values of the corresponding variable in nearby locations, giving more weight to close locations, thus establishing an inverse distance relationship. An origindestination matrix through the road network was computed with ArcGIS Network Analyst extension. This matrix was used as input in GWmodel R package in order to consider network distance in local correlations and GWR. The model update and extension of MARS are all based on the Region of Madrid, Spain. The external scenario update is based on the zoning of MARS, which aggregates the 199 municipalities of Madrid Region into 90 modelling zones. The zoning was carried out following homogeneity parameters of socioeconomic characteristics and mobility, plus correspondence with transport zones and regional rings. Data collected at different levels were aggregated to the MARS zoning. The GWR was based on the most disaggregated spatial units that were available (i.e. census tracts). Data sources include INE, Nomecalles, DUAE and TomTom. We computed local correlation statistics between population data and the highest correlated variables for each topic. Based on previous work, we applied a 5km bandwidth to incorporate the values of nearby locations using a Gaussian function. We then performed a model selection process based on the spatial relationships of the observations within 5 km, after which we estimated the best bandwidth in terms of distance (fixed) and number of neighbors (adaptive) both for distance along the road network and for private motor vehicle travel time. In all cases, a 5-neighbor adaptive bandwidth provided the best fitted models, with network distance performing better than travel time. The best-fitted model is the one considering the number of workplaces, education centres and retail. Figure 1 shows the standardized local coefficients once aggregated at the MARS zoning. The intensity and sign of the relationship between each factor and population location have a great variety across space, especially in the case of education centres. Population figures predicted with our model are consistent with real figures, with discrepancies below +- 0.5 standard deviations in the most populated areas.
María Henar Salas-Olmedo
14007 The irruption of Airbnb in tourist cities: comparing spatial patterns of hotels and peer-to-peer accommodation [abstract]
Abstract: The last few years have seen the emergence of the so-called sharing economy (also known as collaborative consumption) that has been driven by the development of Internet platforms that facilitate peer-to-peer relations. One of the fields in which collaborative consumption has burst onto the scene with greater intensity is that of tourism, both in the travel sector (car-sharing) and that of accommodation. Airbnb is the most successful P2P platform in the field of accommodation, offering more than 2,000,000 listings in 190 countries. The potential impacts of Airbnb on local economies are complex and difficult to measure. The results of the study by Fang et al. (2006) suggest that the entry of sharing economy benefits the entire tourism industry by generating new job positions as more tourists would come due to the lower accommodation cost. From the perspective of the spatial distribution of the Airbnb impacts within the cities, it has been argued that Airbnb listings are more scattered than hotels, so Airbnb guests may be especially likely to disperse their spending in neighbourhoods that do not typically receive many tourists (see Guttentag, 2014). Nevertheless, this possible dispersion may be compatible with a particular concentration of listings in the central areas of the cities, including areas not covered by hotels. This fact could aggravate the problems of crowding and tourism gentrification that some of these areas have to support in certain heritage cities (Russo, 2002; Neuts and Nijkamp, 2012). This article analyses the spatial patterns of Airbnb in Barcelona and compares them with hotels and sightseeing spots.
Gustavo Romanillos

UrbanNet 2016: Smart Cities, Complexity and Urban Networks  (U2SC) Session 2

Schedule Top Page

Time and Date: 14:15 - 18:00 on 21st Sep 2016

Room: B - Berlage zaal

Chair: Oliva Garcia Cantu / Fabio Lamanna

14008 Electric vehicle charging as complex adaptive system - information geometric approach [abstract]
Abstract: In all major cities in the Netherlands, charging points for electric vehicles seem to spring up like mushrooms. In the city of Amsterdam alone, for example, there were 231 charging points by the end of 2012 in comparison with 1, 185 today, and roughly two new charging stations added every week. Over the same period of time, the average number of charging sessions per week went up from 550 to 8, 000. All charging sessions in the Netherlands are recorded by the service providers and those from Amsterdam, Rotterdam, Utrecht, The Hague and provinces of Northern Holland, Flevoland and Utrecht are made available for research through the respective municipalities to the Urban Technology research program at the University of Applied Sciences Amsterdam1. The dataset of charging sessions, which is the largest of its kind in the world, currently holds more than 3.3 million records, containing information about duration, location and a unique identifier of the users [1]. The tremendous growth in electric vehicle adoption, in combination with the existence of this large and rich dataset, creates a unique opportunity to study many aspects of electric mobility and infrastructure in the context of complex social systems. The question we focus on is the following: if we consider the e-mobility system as complex and adaptive, what is its phase structure? Are there regime changes in the system? And, could we define distinct states of the dynamics of the system at hand? The framework in which we study these questions is that of information geometry [2]. To construct the framework we first define observables of interest from the data. We then estimate the probability distributions of these observables, as a function of time or other parameters of the system. As the system evolves, the shape of the probability distributions might change. We say that a regime shift has occurred when a large and persistent change in the probability distributions has happened. To define a large change in the probability distribution we use Fisher information [3]. Our approach is based on an analogy with the theory of phase transitions in statistical physics, especially second order or ?critical? transitions. In statistical physics one can study the information geometry of the Gibbs distribution and show that at second order phase transitions and on the spinodal curve the curvature of the statistical manifold diverges [4]. Taking it a step further, Prokopenko et al. showed that one can use the Fisher information matrix directly to serve as an order parameter [5]. Following these results, a maximum of the Fisher information matrix is used as a definition of criticality in complex systems, e.g. in [6]. The application of our approach is particularly challenging in the charging infrastructure system since 1) it is an open system (the number of users and charging points changes over time), and, 2) it is an irreversible system (the municipalities gain experience in deploying charging points, the users of the system optimize their usage of the charging point infrastructure, and policies and user support systems change). All this indicates that there is no straightforward notion of phase space for this system, which would allow for a Gibbs-like distribution to be defined. Our previous work, which was applying this framework to a non-linear reaction-diffusion system (the Gray-Scott model), is encouraging since we were also able to detect regime changes based on a macroscopic distribution of observables, independent of the microscopic dynamics of the system [7]. These challenges, however, are typical of complex adaptive social systems and therefore finding a satisfactory solution to them might allow for a generalization of the method to different social systems. In the talk we will present the results of pursuing this line of investigation. We will discuss different observables 1http://www.idolaad.nl 1 we tried and insights we gained into the system from our work. Understanding the phase structure of electric vehicle charging, and hence the dynamics of charging, can have large implications on our understanding of the dynamics of neighborhoods, on planning and policy implementation and on the study of Urban science in general.
Omri Har-Shemesh
14009 Residential Flows and the Stagnation of Social Mobility with the Economic Recession of 2008 [abstract]
Abstract: The movement of people within a city is a driver for the growth, development, and culture of the city. Understanding such movements in more detail is important for a range of diverse issues, including the spread of diseases, city planning, traffic engineering and now-casting economic well-being [1, 3]. Residential environment characteristics have been shown to be strongly associated with changes in individual socioeconomic status, making residential relocation a potential determinant of social mobility [2]. Examining residential mobility flows therefore offers an opportunity to better understand the determinants of social mobility. By using a novel dataset, recording the movement of people within the city of Madrid (Spain) over a time period of 10 years (2004-2014), we studied how residential flows changed during the economic recession of 2008. Here we present preliminary results from these investigations. In particular, we found that the crisis had a profound impact on social change, reducing the social mobility within the city as a whole, thus leading to a ?social stagnation? phenomenon. Methods: We used data from a continuous administrative census of the entire Spanish population (the ?Padron?) that includes universal information on all residential relocations. Using this data, we can assess the mobility within and in and out of the city of Madrid, stratified by age, education and country of origin. For analysis involving property value and unemployment, the granularity of our analysis is on the level of neighborhood ( 20,000 people each, n=128 in Madrid). For all other analysis, our granularity is on the level of census section ( 1,500 people each, n 2400 in Madrid), providing a very fine grained perspective on the residential flows within the city. To examine changes in residential mobility flows, we categorized these into the following: any mobility (any change of residential location), mobility within the city of Madrid, and mobility within the city but to a different area. We further divided these last type of flow into upward (from poorer to richer) or downward (from richer to poorer) mobility. Figure 1 (left) shows an example of the geographical delineations and the associated residential mobility flows. Figure 1: (Left) A data overlay of a section of Madrid. Red outlines correspond to neighborhoods, colored by quintile of property value for 2004 (red areas indicate the highest property value quintile). Black outlines correspond to census sections, and arrows represent residential mobility flows. In particular, white arrows indicate movement to areas of higher property value, black to lower, and blue to areas of equal value. (Right) The total movement (in-flow + out-flow) within each census section for the year 2004. Red areas indicate high residential flows. ?1 0 1 2005 2007 2009 2011 2013 Year Quintile of Destination minus Origin Neighborhood Social and Residential Mobility in Madrid 0.05 0.10 0.15 0.20 2006 2008 2010 2012 2014 Year Unemployment Rate (%) Unemployment Rate per Neighborhood in Madrid Figure 2: (Left) Time series of social mobility (average change in quintile of property value of all movers in the neighborhood; where a positive number represents upwards mobility and 0 represents no social mobility) in the six neighborhoods with the highest change in social mobility from 2005 to 2014. (Right) Unemployment time series in all neighborhoods of Madrid, with thicker lines for the six neighborhoods pictured in the left. Results: We find that residential mobility peaked in 2007-2008, especially due to the contribution of incoming flows to Northern and Southeastern Madrid. A centrality based analysis of the residential mobility network reveals the intensity of change in the downtown area (Centro) of Madrid (Figure 1, Right). We further assessed the effect of the 2008 financial crisis on residential mobility flows, showing that neighborhoods in the lower end of the socioeconomic spectrum and those that had changed the most during the housing boom of the 2000s were the most affected by the recession (Figure 2, Right). In particular, these neighborhoods showed a decrease in social mobility associated with residential relocation, with a decreasing proportion of people in poorer relocating to neighborhoods with a higher property value (Figure 2, Left). Moreover, there was also a decreasing proportion of people in richer areas relocating to neighborhoods with a lower property value. This lack of upward mobility (from poorer areas) and downward mobility (from richer areas) led to an stagnation of residential mobility in the aftermath of the recession. Discussion: A combination of fine-grained relocation, socioeconomic and property value data has allowed us to detect communities with increased mobility flows, as well as areas of relative residential stability or stagnation. It has further allowed us to explore changes with the economic recession. Our finding that social mobility at the neighborhood level has stagnated is consistent with previous findings of increased economic segregation concurrent with the economic recession of 2008[4].
Usama Bilal
14010 title to be confirmed (invited talk) Filippo Simini
14011 Smart Street Sensor [abstract]
Abstract: Urban street structures are a snapshot of human mobility and resources, and are an important medium for facilitating human interaction. Previous studies have analyzed the topology and morphology of street structures in various ways; fractal patterns [1], complex spatial networks [2] and so on. Through a functional aspect, it is important to discuss how street networks are used by people. There are studies analyzing the efficiency [3], accessibility[4] and road usage[5] in the street networks too. In those studies, the researchers investigated either empirical travel routes or theoretical travel routes to understand the functionality of the street network. A travel route is a path within the network selected by people or selected under a given condition. Since the determination of a travel route is directly influenced by travel demand and the spatial pattern of the city, including street network and land-use formation, a selected route is a good way to capture complex interactions among the factors which are often hidden. For instance, fastest routes estimate the possible distribution of traffic as well as the street structure in a city. In this study, we analyze the geometric property of routes to understand the street network considering hierarchical property and traffic condition. Although many studies discuss the efficiency of a route or a street network, few people investigate the geometry of a route [6] or study how individual routes are intrinsic to the city structure. Two cities with similar efficiency can have a different geometry of congestion pattern and traffic pattern [7]. Therefore, understanding the geometric feature of routes can link the the existing knowledge of routes and the structure of urban street network. We especially focus on how much a route is skewed into the city center by measuring a new metric, Inness. The inness I of a route is defined as the difference between inner travel area Pinner and outer travel area Pouter as I = Pinner - Pouter. The areas are defined, after a route is divided into inner part and outer part based on the straight line connecting the origin and destination as described in the Fig.1. We measured the inness of the collected optimal routes within 30km radius from the center for 100 global cities including NYC, London, Delhi and so on. In the cities, we identified two competing forces against each other. Due to the agglomeration of businesses and people, street networks grow denser around the center area to meet the demand, and attract traffic toward the interior of the city. On the other hand, many cities deploy arterial roads located outside of the city to help disperse the congestion at urban core. The arterial roads act as the other force pushing traffic toward the exterior of the city. This tendency is well captured by our suggested metric. We analyze two types of optimal routes by minimizing the travel time and distance. While the shortest routes reveal mere road geometric structure, the fastest routes show the geometry in which the road hierarchy is reflected. We systematically select the origin and the destination having different bearings and different radii from the center. Then, we collect the optimal routes of the O-D pairs via the OpenStreetMap API. Our results consist of two parts. We first compare the general average inness of both the shortest and fastest routes of the 100 global cities in order to point out the their fundamental differences. Later, we analyze the inness patterns of individual cities and discuss street layout and the effects of street hierarchy in each city.
Balamurugan Soundararaj
14012 A Retail Location Choice Model: Measuring the Role of Agglomeration in Retail Activity [abstract]
Abstract: The objective of our work is to build a consumers choice model, where consumers choose their retail destinations only based on a retailers? floorspace and the agglomeration with others. In other words, at a very aggregated level, the goal is to describe a retailers success with a model which only takes into account its position, and its floorspace. We define the attractiveness of a retailer r as Ar = f? r +X r0 f? r0e""drr0 (1) where fr is the retailer?s floorspace, drr0 is the distance between r and some other retail unit r0. Eq.(1) states that the composite perceived utility Ar that a consumer attaches to a particular retailer r is equal to its individual utility, quantified as choice and therefore floorspace f? r , and the utility of the shops in its vicinity. In eq.(1), ? controls the extent of the internal economies and " of the external economies of scale. If ? > 1, the relationship between consumer perceived utility of a shop and its size is super-linear and the economies of scale are positive, meaning that a retailer would benefit from larger floorspace. Similarly, low values of ", which translate into a slow decay, would imply a strong dependency of on vicinity to other attractive neighbours, and viceversa. Exploiting eq.(1) we define the probability of consumer i shopping in r as pi!r = Are"#C(dir,$) P r0 Ar0e"#C(dir0,$) (2) where C(dir, #) is the cost function of travelling from i to r, $ and # are two parameters. Eq.(2) has been formulated using random utility theory and as once can see in the proposed cross-nested logit model in eq.(2) consumers prefer to shop at larger shops (internal economies of scale) and at locations with higher concentration of retail activity (external economies of scale). In this work we have considered two types of trips, namely work to retail and home to retail. The model is therefore defined by 6 di?erent parameters, two describing the attractiveness of retailers through their internal and external economies (?, "), and two for each kind of trips describing the cost function, ($h, #h), and ($w, #w). Therefore the total modelled turnover will be of the form Yr = Y w r + Y h r = X l ? nw l pw l!r + nh l ph l!r ? (3) the $ and # have been calibrated using the LTDS datasets, as survey that includes 5004 home and retail and 2242 work to retail trips. Having completed the calibration of the distance profiles we can now calculate the modelled turnover estimates for each retailer r for a set of (?, ") parameters, defined in eq.(3) . This will tell us the modelled fraction of population that will end up shopping 1 in each retailer given their attractiveness and distance. Following this, we calculate the correlation level between the modelled turnovers and the observed floorspace rents. For each retailer r, we use the VOA rateable value as an indicator for willingness to pay for floorspace fr. The Rateable Value (a) Correlations (b) Scatter Plot Figure 1: As we can see from this figures the model yields high correlations with the VOA dataset?s rents. In the left panel we show the correlation between the expected turnover Yr(?, ")/fr and the Rateable Value / Size found in the dataset. Cmax ? C(? = 1.3, " = 0.008). These values are in agreement with a superlinear scaling in floorspace and with the observed retail agglomeration. In the right panel we present a scatter plot of the two quantities. is considered a very good indicator of the property value of the respective hereditament. In fig.(1) we compare the results of the models with rent data coming from VSOA. In fig.(1a) we can see how the maximum correlation between the modelled and real rents per squared meters is given by teh set of parameters (?max = 1.3, "max = 0.008). The ? value is in line with super-linear scaling of floorspace and expected earnings, and seems incredibly realistic, while the " values indicates a benefit in agglomeration of retail activities (the sign is positive), and indicates that the vicinity of a retail activity does have a non negligible role in defining an attractiveness.
Duccio Piovani
14013 Revealing patterns in human spending behavior [abstract]
Abstract:  In the last decade big data originating from human activities has given us the opportunity to analyze individual and collective behavior with unprecedented detail. These approaches are radically changing the way in which we can conceive social studies via complex systems methods. Large data, passively collected from mobile phones or social media, have informed us about social interactions in space and time [1], helping us to to understand the laws that govern human mobility [2?4] or to predict wealth in geographic areas [5]. More recently, data from Credit Card Shopping Records (CCSR) has also been explored providing new insights on human economic activities. Ref. [6] has shown that a fingerprint exists in the sequence of individual payment activities which permits the users to be identifiable with only few of their records. The shoppers spending behaviors and visitation patterns are very much related to urban mobility [7]. Both mobility decisions and expenditure behavior are subject to urban and geographical constraints [8] and to economic and demographic conditions [9, 10]. Further understanding consumer behavior is valuable to model the market dynamics, and to depict the differences between income groups [11]. In particular CCSRs have the potential to transform how we conceive the study of social inequality and human behavior within the geographic and socio-economic constraints of cities. Here we present a novel method to exploit CCSRs to provide new insights in the characterization of human spending patterns and how these are related to sociodemographic attributes. We analyze CCSRs of approx. 150, 000 users over a period of 10 weeks. The dataset is anonymized, and for each user the following demographic information is provided: age, gender, zipcode. For all users we have the chronological sequence of their transaction history with the associated shop typology according to the Merchant Category Codes (MCC) [12]. Our analysis of the aggregated CCSR data reveals that the majority of shoppers adopt the credit card payment for twelve types of transactions among the hundreds of possible MCCs. These are: grocery stores, eating places, toll roads, information services, food stores, gas stations, department stores, telecommunication services, ATM use, taxis, fast food restaurants, and computer software stores. These transaction activities are depicted as icons in Fig. 1. Interestingly, the temporal sequence of how these transactions occur are different among individuals. First, we identify the dominant sequences of transactions for each user using the SEQUITUR algorithm [13]. Then we evaluate the significance level of each sequence calculating the z-score with respect to the sequences computed from 100 randomized sequences whilst preserving the number of transactions per type. Each sequence of transactions defines a path in the space of the transaction codes.We define the User Transaction Network (UTN) connecting the codes of most statistical significant sequence (with z-score> 2), preserving the order.We compute the matrix of user similarity (Fig.1 lower left) calculating the Jaccard index between all the users with at least 3 link in their UTN. Applying the Louvain Method [14] for community detection we are able to group users according to their the most significant sequence of payments. Fig. 1 shows our results for the six different behavioral groups detected, with each cluster ordered in appearance from 1 to 6 in the matrix of users similarity. The upper part of the figure describes the most common sequences of transactions for each group, the link value with the error represents the probability for a user of the group to follow that particular transaction order, and the value in parenthesis defines the fraction of users in the group that perform that transaction sequence. The bottom part shows the demographic attributes of each group with respect to the average population in red. In summary, we have uncovered lifestyles groups in the transaction history of the CCSR data that relates to non-trivial demographic groups. We will discuss future applications of these clusters of life styles in the context of adoption of innovations in the city.
Riccardo Di Clemente
14014 The universal dynamics of urbanization (invited talk) Marc Barthelemy
14015 Identifying and tackling Water Leaks in Mexico through Twitter [abstract]
Abstract: As cities became smarter, the amount of daily data generated has become increasingly granular. Sensors, cameras, crowdsourcing, social media sharing, etc., can monitor different aspects in our cities, such as commuter flows, air quality over different time periods or public transport performance. The rise of the ?smart city? has then the potential of through some light into many fundamental urban problems, and pave the way to make cities a more livable and efficient places. Particularly, Twitter has attracted a lot of attention in recent years (Ausserhofer & Maireder, 2013) for its richness in content. People is not only sharing personal information through its closest contacts, but is using Twitter as a social and political platform to inform and disseminate all sort of statements or ideas (Weng & Menczer, 2015; Lu & Brelsford, 2014; Pi?a-Garc?a, Gershenson, & Siqueiros-Garc?a, 2016). Exploring this type of data has is gradually getting more and more important in terms of data collection. In addition, mining urban social signals can provide quick knowledge of a real-world situation (Roy & Zeng, 2014). It should be noted that the enormous volume of Twitter data has given rise to major computational challenges that sometimes result in the loss of useful information embedded in tweets. Apparently, more and more people are relying on Twitter for information. Twitter has been tagged a strong medium for opinion expression and information dissemination on diverse issues (Adedoyin-Olowe, Gaber, Stahl, & Gomes, 2015). Leveraging large-scale public data from Twitter, we are able to analyze and map the spread of information related to water leaks in the street, under the pavement and roads in Mexico (see Fig. 1). We gathered an initial sample of 2000 geolocated tweets posted by 1599 users tweets that contains the Spanish keywords: "fuga de agua" (water leaks).
Carlos Adolfo Piña García
14016 Estimating nonlinearity in cities' scaling laws [abstract]
Abstract: The study of statistical and dynamical properties of cities from a complex-systems perspective is increasingly popular [1]. A celebrated result is the scaling between a city specific observation y (e.g., the number of patents filed in the city) and the population x of the city as [2] y = ?x? , (1) with a non-trivial (? 6= 1) exponent. Super-linear scaling (? > 1) was observed when y quantifies creative or economical outputs and indicates that the concentration of people in large cities leads to an increase in the percapita production (y/x). Sub-linear scaling (? < 1) was observed when y quantifies resource use and suggests that large cities are more efficient in the per-capita (y/x) consumption. Since its proposal, non-linear scaling has been reported in an impressive variety of different aspects of cities. It has also inspired the proposal of different generative processes to explain its ubiquitous occurrence. Scalings similar to the one in Eq. (1) appear in physical (e.g., phase transitions) and biological (e.g., allometric scaling) systems suggesting that cities share similarities with these and other complex systems (e.g., fractals). More recent results cast doubts on the significance of the ? 6= 1 observations [3, 4, 5]. These results ask for a more careful statistical analysis that rigorously quantifies the evidence for ? 6= 1 in different datasets. We propose a statistical framework based on a probabilistic formulation of the scaling law (1) that allows us to perform hypothesis testing and model comparison. In particular, we quantify the evidence in favor of ? 6= 1 comparing (through the Bayesian Information Criterion, BIC) models with ? 6= 1 to models with ? = 1. The scaling relation in Eq. (1) describes a relation between two quantities y and x. However, the empirical data indicates that this relation can only be fulfilled on average. The statistical analysis we propose is based on the likelihood L of the data being generated by different models. Following Ref. [6], we assume that the index y (e.g. number of patents) of a city of size x is a random variable with probability density P(y | x). We interpret Eq. (1) as the scaling of the expectation of y with x E(y|x) = ?x? . (2) This relation does not specify the shape of P(y | x) , e.g., it does not specify how the fluctuations V(y|x) ? E(y 2 |x) ? E(y|x) 2 of y around E(y|x) scale with x. Here we are interested in models P(y | x) satisfying V(y|x) = ?E(y|x) ? . (3) This choice corresponds to Taylor?s law. It is motivated by its ubiquitous appearance in complex systems, where typically ? ? [1, 2], and by previous analysis of city data which reported non-trivial fluctuations. The fluctuations in our models aim to effectively describe the combination of different effects, such as the variability in human activity and imprecisions on data gathering. In principle, these effects can be explicitly included in our framework by considering distinct models for each of them. We specify different models P(y | x) compatible with Eqs. (2,3): City models are the ones where we assume that each data point yi is an independent realization from the conditional distribution P(y|xi), effectively to each city the same weight when computing the BIC of the model. For this model, we considered two different types of fluctuations, one Gaussian and the other Lognormally distributed, thus choosing a priori a parametric form for P(y | x). Person models are based in the natural interpretation of Eq. (1) that people?s efficiency (or consumption) scale with the size of the city they are living in. This motivates us to consider a generative process in which tokens (e.g. a patent,a dollar of GDP, a mile of road) are produced or consumed by (assigned to) individual persons, which leads to a P(y | x) that effectively weights the observations in of people. 1 100 101 102 103 104 y, Brazil-Aids City Model Person Model Running mean 103 104 105 106 107 x, Population 0.0 0.2 0.4 0.6 0.8 1.0 fraction < x 80% of the cities 75% of the population (A) (B) Figure 1: Comparison of the model of Cities and Persons. (A) Reported deaths by AIDS with respect to cities? population (dots). The lines represent the estimated scaling law giving the same weight to each city (city model, ? = 0.61) and giving the same weight to each person (person model). (B) Cumulative distribution of heavy-tailed distribution of city-sizes in terms of cities and persons, i.e. the fraction of i) cities of size ? x (City Model); and ii) the population in cities of size ? x. We apply this approach to 15 datasets of cities from 5 regions and find that the conclusions regarding ? vary dramatically not only depending on the datasets but also on assumptions of the models that go beyond (1). We argue that the estimation of ? is challenging and depends sensitively on the model because of the following two statistical properties of cities: i The distribution of city-population has heavy tails (Zipf?s law). ii There are large and heterogeneous fluctuations of y as a function of x (Heteroscedasticity). We found that in most cases models are rejected by the data and therefore conclusions can only be based on the comparison between the descriptive power of the different models considered here. Moreover, we found that models which differ only in their assumptions on the fluctuations can lead to different estimations of the scaling exponent ?. In extreme cases, even the conclusion on whether a city index scales linearly ? = 1 or non-linearly ? 6= 1 with city population depends on assumptions on the fluctuations. A further factor contributing to the large variability of ? is the broad city-size distribution which makes models to be dominated either by small or by large cities. In particular, these results show that the usual approach based on least-square fitting is not sufficient to conclude on the existence of non-linear scaling. Recent works focused on developing generative models of urban formation that explain non-linear scalings. Our finding that most models are rejected by the data confirms the need for such improved models. The significance of our results on models with different fluctuations is that they show that the estimation of ? and the development of generative models cannot be done as separate steps. Instead, it is essential to consider the predicted fluctuations not only in the validation of the model but also in the estimation of ?.
José M. Miotto
14017 Estimating Railway Travel Demand Through Social Media Geo-localised Data [abstract]
Abstract: The fundamental four-stage modelling framework on railway planning is highly focused both on modal choice models and on the assignment of passengers' flows over networks. These last steps pursue the achievement of the maximum potential of new policies of transportation modes, constantly running towards more efficient and ecological modes. In Europe we assist at the emergence of several projects that aim to interconnect urban areas within and among countries, both with new or better-performing links and through the developing of rolling stock able to interoperate among national networks characterized by different power-supply infrastructures and signalling/security systems and protocols. Linking demand and supply is therefore a challenge to project, provide and validate better international services that are both reliable and of high quality. Here we develop a new framework able to estimate railway traffic demand through the detection of a set of geo-localised tweets, posted in the last three years, overlapping railway lines in Europe. We scale the data of the potential passengers over a line through the so-called Òpenetration rateÓ, able to get an estimation of the sample we got over the total tweeting population. We compare our data per line with the frequency of the services on several railway branches in order to calibrate our estimations on flows. Our findings provide information about passengers' flows through regions, running over current methodologies that generally constrained data within single countries or administrations. Therefore the potential of the methodology goes towards the interoperability of data through countries, helping planners not only in getting a new source of cross-country demand estimation, but moreover to get a new tool and set of data for the calibration and validation of transportation demand models.
Fabio Lamanna