= MacPorts Statistics - Google Summer of Code 2011 = === Derek Ingrouville - derek ''AT'' macports.org === This page describes the implementation of the MacPorts Statistics project as part of Google Summer of Code 2011. It is based on the documentation in the svn repository available at [source:branches/gsoc11-statistics/docs/implementation/impl.tex branches/gsoc11-statistics/docs/implementation/impl.tex] The code shown here was written during GSoC 2011 and may no longer be current. The latest version of the code is available at [source:branches/gsoc11-statistics branches/gsoc11-statistics] = Client Side - MacPorts Base = == Install == In order to automatically submit data at regular intervals some small changes had to be made to the installation process. These changes include installing a script which handles data submissions `submitstats.sh`, configuring `launchd` to regularly run `submitstats.sh`, and generating a unique identifier for the user submitting data. === Makefile.in === * Install `submitstats.sh` to `$(DESTDIR)${datadir}/macports/` * Run `setupstats.sh` === configure.ac === Generate a universally unique identifier to identify this MacPorts installation. The UUID is generated by `uuidgen` and stored in the variable `STATS_UUID` === Scripts === === `setupstats.sh` === This script is responsible for generating and installing the file `org.stats.macports.plist`. This plist is used by `launchd` to regularly run `submitstats.sh`. The script takes two arguments 1. The path to the script that `launchd` should execute 2. The path to the MacPorts configuration file `macports.conf` It will execute the script once a week. The day of the week, hour and minute are determined as follows: '''Weekday''' The day of the week is determined by the machine's hardware UUID modulo 7. This is to help ensure that submissions are roughly evenly distributed throughout the week. '''Hour: ''' The hour that `submitstats.sh` was executed. '''Minute:''' The minute that `submitstats.sh` was executed. The plist is installed to `/Library/LaunchAgents/org.macports.stats.plist` and then loaded by `launchctl` === `submitstats.sh` === This script has two responsibilities 1. Check if a user is participating. 2. Submit data only if the user if participating It takes one parameter, the path to `macports.conf`. To determine if a user is participating it checks if the variable `stats_participate` is set to `yes`. If it is, then `port stats submit` is executed. If the user is not participating then the script exits. The reason this script exists is to have a lightweight tool to check if a user is participating before running `port`. This script will be executed once a week for every user, regardless of whether or not they are participating. == Configuration == Added several variables to `macports.conf.in` and appropriate descriptions to `macports.conf.5`. === `stats_participate` === This indicates whether or not a user has chosen to opt-in and share their data. Its value is either `yes` or `no` === `stats_url` === This is the url where data should be submitted. === `stats_id` === This is the UUID used for submissions. It is initially set to value of the autoconf variable `@STATS_UUID@` == Changes to macports1.0/macports.tcl == === New Globals === Added globals `stats_participate`, `stats_url`, `stats_id` that correspond to configuration options. Added deferred global `gccversion` === `gcc` version check === Added proc setgccinfo that is called the first time `gccversion` is read. == Changes to pextlib1.0/curl.c - CurlPostCmd() == Added CurlPostCmd function. This takes two Tcl parameters, the post data and the url. Example usage is `curl post "project=macports" $url` == The `port stats` action == `port stats` gathers lists of all active and inactive ports as well as relevant system information. It no subaction is given `port stats` prints the system information to `stdout`. If the `submit` subaction is given then it will encode all the collected data as a `JSON` object. It then submits this via HTTP POST to a server specified in `macports.conf`. `JSON` encoding is done though sub-procedures contained inside the procedure for the `port stats` action. == Changes to port/port-help.tcl == Added help entry for the `port stats` action describing proper usage. = Data Format = Transmitted data is encoded as a JSON object with four fields. {{{ { "id": "...", "os": { ... }, "active_ports": [ {...}, ... {...} ], "inactive_ports": [ {...}, ... {...} ] } }}} 1. id This is a string containing the user’s UUID. 2. os This is a JSON object containing information about the user’s system. {{{ "os": { "macports_version": "1.9.99", "osx_version": "10.6", "os_arch": "i386", "os_platform": "darwin", "build_arch": "x86_64", "gcc_version": "4.2.1", "xcode_version": "4.0" } }}} 3. active ports This is an array of json objects. Each object represents a single port. {{{ "active_ports": [ { "name": "aalib", "version": "1.4rc5_4" }, { "variants": "nonls +", "name": "aspell", "version": "0.60.6_4" } ] }}} 4. inactive ports This is the same as active ports except that port objects represent installed inactive ports. = Server Side - Ruby on Rails = == Database Schema == === Categories table === Imported from MPWA {{{ create_table "categories", :force => true do |t| t.string "name" t.datetime "created_at" t.datetime "updated_at" end }}} '''Relationships and Validations''' {{{ has_many :ports validates_presence_of :name }}} '''Changes from MPWA''' * Validate presence of name == Ports table == Imported from MPWA {{{ create_table "ports", :force => true do |t| t.string "name" t.string "path" t.string "version" t.text "description" t.string "licenses" t.integer "category_id" t.text "variants" t.string "maintainers" t.string "platforms" t.string "categories" t.datetime "created_at" t.datetime "updated_at" end add_index "ports", ["name"], :name => "index_ports_on_name" }}} '''Relationships and Validations''' {{{ has_one :category belongs_to :category has_many :installed_ports validates_presence_of :name, :version }}} '''Changes from MPWA''' * Changed variant column to text type from string * Added index on name column * Validate presence of name and version * has_many installed ports === installed_ports table === The installed_ports table holds submitted port installation data. It keeps track of an installed port's version and variants as well as the id of the submitting user. {{{ create_table "installed_ports", :force => true do |t| t.integer "port_id" t.string "version" t.text "variants" t.datetime "created_at" t.datetime "updated_at" t.integer "user_id" end add_index "installed_ports", ["port_id"], :name => "index_installed_ports_on_port_id" add_index "installed_ports", ["user_id"], :name => "index_installed_ports_on_user_id" }}} '''Relationships and Validations''' {{{ belongs_to :port has_one :user validates_presence_of :user_id, :port_id, :version }}} === os_statistics table === The os_statistics table holds information about a user's system. {{{ create_table "os_statistics", :force => true do |t| t.datetime "created_at" t.datetime "updated_at" t.string "macports_version" t.string "osx_version" t.string "os_arch" t.string "os_platform" t.string "build_arch" t.string "xcode_version" t.string "gcc_version" t.integer "user_id" end add_index "os_statistics", ["user_id"], :name => "index_os_statistics_on_user_id" }}} '''Relationships and Validations''' {{{ belongs_to :port has_one :user validates_presence_of :user_id, :port_id, :version }}} === users table === The users table holds UUIDs for each user. {{{ create_table "users", :force => true do |t| t.string "uuid" t.datetime "created_at" t.datetime "updated_at" end }}} '''Relationships and Validations''' {{{ has_one :os_statistic has_many :installed_ports }}} == Submissions == JSON encoded submissions are sent via HTTPS POST to the `/submissions` page. All data is stored in the `data` POST variable. Submissions are stored on a month by month basis. Resubmissions in a given month cause that month's data to be updated. Storing data happens as follows 1. Attempt to find a user with the given UUID in the database. If no user is found then add a new entry 2. Attempt to find an entry in the os_statistics table for this user that was created this month. If no such entry is found then add an try for this month. If an entry is found then update it. 3. For each submitted port verify that it is a valid port by checking to see if it exists in the `ports` table. If it does not exist then skip it. If it does exist then attempt to find an entry for the given user that was created this month. If an entry was found then update it, otherwise create a new entry. == OS Statistics Page == The OS statistics page provides visualizations of the data in the `os_statistics` table. It shows pie charts for each of * MacPorts Version * OSX Versions * OS Arch * OS Platform * Build Arch * gcc Versions * XCode Versions These pie charts show the percentage of the user population running different versions (or arch / platform) in each category. == Port Page == Every port in the MacPorts repository has an associated port page. This page displays basic information about the port such as * Name * Current version * Licenses * Categories * Variants This page also shows visualizations of the data in the `installed_ports` page for this particular port. It has the following * Line chart of installation counts over the past 12 months. * Top versions over the past 12 months. This finds the top 5 most popular versions in use right now and tracks how their popularity has changed over the past 12 months. Popularity is measured by the number of installations of each version per month. * Pie chart of all versions. This shows the distribution of all different versions in use right now. It will show you that `x%` of users of this port are using version `y`. * Similarly to all versions, there is a pie chart of all variants in use. == Installed Ports Page == This page shows summary information for installed ports such as: * The number of participating users * The number of ports in the MacPorts repository * Average number of ports installed port user * Most popular port this month and the number of installs * Most popular port this year and the number of installs * A bar chart of the top 25 most installed ports along with their install counts * A table of the top 25 most installed ports along with their install counts == Home Page == The home has links to all other pages as well as a search area for ports. It also displays a line chart of the number of participating users over the past 12 months.