Crime in Chicago

Crimes in the communities of Chicago

Link: map of the neighborhoods of chicago and the geography of the crimes in the city


CTA Bus transportation

The Bus Network:

The bus tracker API

The Bus Tracker API allows developers to request and retrieve data directly from BusTime (the system which produces estimated arrival times and which provides location and route information in real-time).

What data is available through the API?

Data available through the API includes:

  •  Vehicle locations
  •  Route data (route lists, stop lists geo-positional route definitions, etc.)
  •  Prediction Data
  •  Service Bulletins

How does the Developer API work?

  • The developer API uses the same data from the BusTime system, which powers CTA Bus Tracker. Information about the location, direction and status of CTA buses is fed from each bus and delivered to the BusTime system, which then can show where buses are or estimate arrival times to stops ahead of a bus.
    Data is updated about once per minute, and arrival estimations are based on how long it normally takes for a bus to get from one place to the next. Because traffic conditions and other unexpected delays occur, we can‖t predict precisely when a bus will arrive—only estimate based on normal travel times during the time of day where an estimate is occurring.
    In order to use the API, the user must sign in to their CTA Bus Tracker account and then request an API key. Only one key will be available per account. Once the request has been approved, the user will be sent an e-mail will be sent to the user containing the API key.

The request that are possible to do with the API technology

  • Delayed Vehicle – The state entered by a vehicle when it has been determined to be stationary for more than a pre-defined time period.
  • Direction – Common direction of travel of a route.
  • Off-route Vehicle – State entered by a transit vehicle when it has strayed from its scheduled
  • Pattern – A unique sequence of geo-positional points (waypoints and stops) that combine to form the path that a transit vehicle will repetitively travel. A route often has more than one possible pattern.
  • Route – One or more set of patterns that together form a single service.
  • Service Bulletin – Text-based announcements affecting a set of one or more services (route,
    stops, etc.).
  • Stop – Location where a transit vehicle can pick-up or drop-off passengers. Predictions are only generated at stops.
  • Waypoint – A geo-positional point in a pattern used to define the travel path of a transit vehicle.

The CTA state:

The Chicago Transit Authority (CTA) operates the nation’s second largest public transportation system–a regional transit system that serves the City of Chicago and 40 neighboring communities. CTA provides 1.64 million rides on an average weekday, accounting for over 80% of all transit trips taken in the six-county Chicago metropolitan region.Presently, CTA service is provided by two modes: bus and rail.Most rides on CTA are taken by bus. Our bus system consists of 140 routes. Buses make over 25,000 trips daily, and serve nearly 12,000 bus stops throughout the region.

Bus routes in Chicago:

Collaborative filtering

collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, A is more likely to have B’s opinion on a different issue x than to have the opinion on x of a person chosen randomly. For example, a collaborative filtering recommendation system for television tastes could make predictions about which television show a user should like given a partial list of that user’s tastes (likes or dislikes).[2] Note that these predictions are specific to the user, but use information gleaned from many users. This differs from the simpler approach of giving an average (non-specific) score for each item of interest, for example based on its number of votes.


The growth of the Internet has made it much more difficult to effectively extract useful information from all the available online information. The overwhelming amount of data necessitates mechanisms for efficient information filtering. One of the techniques used for dealing with this problem is called collaborative filtering.

The motivation for collaborative filtering comes from the idea that people often get the best recommendations from someone with similar tastes to themselves. Collaborative filtering explores techniques for matching people with similar interests and making recommendations on this basis.

Collaborative filtering algorithms often require (1) users’ active participation, (2) an easy way to represent users’ interests to the system, and (3) algorithms that are able to match people with similar interests.

Typically, the workflow of a collaborative filtering system is:

  1. A user expresses his or her preferences by rating items (e.g. books, movies or CDs) of the system. These ratings can be viewed as an approximate representation of the user’s interest in the corresponding domain.
  2. The system matches this user’s ratings against other users’ and finds the people with most “similar” tastes.
  3. With similar users, the system recommends items that the similar users have rated highly but not yet being rated by this user (presumably the absence of rating is often considered as the unfamiliarity of an item)

A key problem of collaborative filtering is how to combine and weight the preferences of user neighbors. Sometimes, users can immediately rate the recommended items. As a result, the system gains an increasingly accurate representation of user preferences over time.


Collaborative filtering systems have many forms, but many common systems can be reduced to two steps:

  1. Look for users who share the same rating patterns with the active user (the user whom the prediction is for).
  2. Use the ratings from those like-minded users found in step 1 to calculate a prediction for the active user

This falls under the category of user-based collaborative filtering. A specific application of this is the user-based Nearest Neighbor algorithm.


Alternatively, item-based collaborative filtering invented by (users who bought x also bought y), proceeds in an item-centric manner:

  1. Build an item-item matrix determining relationships between pairs of items
  2. Infer the tastes of the current user by examining the matrix and matching that user’s data

See, for example, the Slope One item-based collaborative filtering family.

Another form of collaborative filtering can be based on implicit observations of normal user behavior (as opposed to the artificial behavior imposed by a rating task). These systems observe what a user has done together with what all users have done (what music they have listened to, what items they have bought) and use that data to predict the user’s behavior in the future, or to predict how a user might like to behave given the chance. These predictions then have to be filtered through business logic to determine how they might affect the actions of a business system. For example, it is not useful to offer to sell somebody a particular album of music if they already have demonstrated that they own that music. Considering another example, it is not necessarily useful to suggest travel guides for Paris to someone who already bought a travel guide for this city.

Relying on a scoring or rating system which is averaged across all users ignores specific demands of a user, and is particularly poor in tasks where there is large variation in interest (as in the recommendation of music). However, there are other methods to combat information explosion, such as web search and data clustering.

Neighborhood boundaries based on social media activity

Researchers at the School of Computer Science at Carnegie Mellon University investigate the structure of cities in Livehoods, using foursquare check-ins.

The hypothesis underlying our work is that the character of an urban area is defined not just by the the types of places found there, but also by the people who make the area part of their daily routine. To explore this hypothesis, given data from over 18 million foursquare check-ins, we introduce a model that groups nearby venues into areas based on patterns in the set of people who check-in to them. By examining patterns in these check-ins, we can learn about the different areas that comprise the city, allowing us to study the social dynamics, structure, and character of cities on a large scale.

It’s most interesting when you click on location dots. A Livehood is highlighted and a panel on the top right tells you what the neighborhood is like, related neighborhoods, and provides stats like hourly and daily pulse and a breakdown location categories (for example, food and nightlife). Does foursquare have anything like this tied into their system? They should if they don’t.

There’s only maps for San Francisco, New York City, and Pittsburgh right now, but I’m sure there are more to come.

Want more on the clustering behind the maps? Here’s the paper [pdf].livehoods_icwsm12livehoods