wiki:MacPortsStatisticsGSoC2011

Version 3 (modified by derek@…, 13 years ago) (diff)

Use trac links for links to repository

MacPorts Statistics - Google Summer of Code 2011

Derek Ingrouville - derek AT macports.org

This page describes the implementation of the MacPorts Statistics project as part of Google Summer of Code 2011. It is based on the documentation in the svn repository available at branches/gsoc11-statistics/docs/implementation/impl.tex

The code shown here was written during GSoC 2011 and may no longer be current. The latest version of the code is available at branches/gsoc11-statistics

Client Side - MacPorts Base

Install

In order to automatically submit data at regular intervals some small changes had to be made to the installation process. These changes include installing a script which handles data submissions submitstats.sh, configuring launchd to regularly run submitstats.sh, and generating a unique identifier for the user submitting data.

Makefile.in

  • Install submitstats.sh to $(DESTDIR)${datadir}/macports/
  • Run setupstats.sh

configure.ac

Generate a universally unique identifier to identify this MacPorts installation. The UUID is generated by uuidgen and stored in the variable STATS_UUID

Scripts

setupstats.sh

This script is responsible for generating and installing the file org.stats.macports.plist. This plist is used by launchd to regularly run submitstats.sh.

The script takes two arguments

  1. The path to the script that launchd should execute
  2. The path to the MacPorts configuration file macports.conf

It will execute the script once a week. The day of the week, hour and minute are determined as follows:

Weekday The day of the week is determined by the machine's hardware UUID modulo 7. This is to help ensure that submissions are roughly evenly distributed throughout the week.

Hour: The hour that submitstats.sh was executed.

Minute: The minute that submitstats.sh was executed.

The plist is installed to /Library/LaunchAgents/org.macports.stats.plist and then loaded by launchctl

submitstats.sh

This script has two responsibilities

  1. Check if a user is participating.
  2. Submit data only if the user if participating

It takes one parameter, the path to macports.conf.

To determine if a user is participating it checks if the variable stats_participate is set to yes. If it is, then port stats submit is executed. If the user is not participating then the script exits.

The reason this script exists is to have a lightweight tool to check if a user is participating before running port. This script will be executed once a week for every user, regardless of whether or not they are participating.

Configuration

Added several variables to macports.conf.in and appropriate descriptions to macports.conf.5.

stats_participate

This indicates whether or not a user has chosen to opt-in and share their data. Its value is either yes or no

stats_url

This is the url where data should be submitted.

stats_id

This is the UUID used for submissions. It is initially set to value of the autoconf variable @STATS_UUID@

Changes to macports1.0/macports.tcl

New Globals

Added globals stats_participate, stats_url, stats_id that correspond to configuration options. Added deferred global gccversion

gcc version check

Added proc setgccinfo that is called the first time gccversion is read.

Changes to pextlib1.0/curl.c - CurlPostCmd()

Added CurlPostCmd function. This takes two Tcl parameters, the post data and the url.

Example usage is

curl post "project=macports" $url

The port stats action

port stats gathers lists of all active and inactive ports as well as relevant system information. It no subaction is given port stats prints the system information to stdout.

If the submit subaction is given then it will encode all the collected data as a JSON object. It then submits this via HTTP POST to a server specified in macports.conf.

JSON encoding is done though sub-procedures contained inside the procedure for the port stats action.

Changes to port/port-help.tcl

Added help entry for the port stats action describing proper usage.

Data Format

Transmitted data is encoded as a JSON object with four fields.

{
      "id": "...",
      "os": { ...
      },
      "active_ports": [
          {...},
          ...
          {...}
      ],
      "inactive_ports": [
          {...},
          ...
          {...}
      ] 
}
  1. id

This is a string containing the user’s UUID.

  1. os

This is a JSON object containing information about the user’s system.

 "os": {
             "macports_version": "1.9.99",
             "osx_version": "10.6",
             "os_arch": "i386",
             "os_platform": "darwin",
             "build_arch": "x86_64",
             "gcc_version": "4.2.1",
             "xcode_version": "4.0"
}
  1. active ports

This is an array of json objects. Each object represents a single port.

"active_ports": [
             {
                 "name": "aalib",
                 "version": "1.4rc5_4"
             },
             {
                 "variants": "nonls +",
                 "name": "aspell",
                 "version": "0.60.6_4"
             } 
]
  1. inactive ports

This is the same as active ports except that port objects represent installed inactive ports.

Server Side - Ruby on Rails

Database Schema

Categories table

Imported from MPWA

create_table "categories", :force => true do |t|
    t.string   "name"
    t.datetime "created_at"
    t.datetime "updated_at"
end

Relationships and Validations

has_many :ports
validates_presence_of :name

Changes from MPWA

  • Validate presence of name

Ports table

Imported from MPWA

create_table "ports", :force => true do |t|
    t.string   "name"
    t.string   "path"
    t.string   "version"
    t.text     "description"
    t.string   "licenses"
    t.integer  "category_id"
    t.text     "variants"
    t.string   "maintainers"
    t.string   "platforms"
    t.string   "categories"
    t.datetime "created_at"
    t.datetime "updated_at"
end

add_index "ports", ["name"], :name => "index_ports_on_name"

Relationships and Validations

  has_one :category
  belongs_to :category
  has_many :installed_ports
  validates_presence_of :name, :version

Changes from MPWA

  • Changed variant column to text type from string
  • Added index on name column
  • Validate presence of name and version
  • has_many installed ports

installed_ports table

The installed_ports table holds submitted port installation data. It keeps track of an installed port's version and variants as well as the id of the submitting user.

create_table "installed_ports", :force => true do |t|
    t.integer  "port_id"
    t.string   "version"
    t.text     "variants"
    t.datetime "created_at"
    t.datetime "updated_at"
    t.integer  "user_id"
end

add_index "installed_ports", ["port_id"], :name => "index_installed_ports_on_port_id"
add_index "installed_ports", ["user_id"], :name => "index_installed_ports_on_user_id"

Relationships and Validations

belongs_to :port
has_one    :user
  
validates_presence_of :user_id, :port_id, :version

os_statistics table

The os_statistics table holds information about a user's system.

create_table "os_statistics", :force => true do |t|
    t.datetime "created_at"
    t.datetime "updated_at"
    t.string   "macports_version"
    t.string   "osx_version"
    t.string   "os_arch"
    t.string   "os_platform"
    t.string   "build_arch"
    t.string   "xcode_version"
    t.string   "gcc_version"
    t.integer  "user_id"
end

add_index "os_statistics", ["user_id"], :name => "index_os_statistics_on_user_id"

Relationships and Validations

belongs_to :port
has_one    :user

validates_presence_of :user_id, :port_id, :version

users table

The users table holds UUIDs for each user.

create_table "users", :force => true do |t|
    t.string   "uuid"
    t.datetime "created_at"
    t.datetime "updated_at"
end

Relationships and Validations

has_one  :os_statistic
has_many :installed_ports

Submissions

JSON encoded submissions are sent via HTTPS POST to the /submissions page. All data is stored in the data POST variable.

Submissions are stored on a month by month basis. Resubmissions in a given month cause that month's data to be updated.

Storing data happens as follows

  1. Attempt to find a user with the given UUID in the database. If no user is found then add a new entry
  2. Attempt to find an entry in the os_statistics table for this user that was created this month. If no such entry is found then add an try for this month. If an entry is found then update it.
  3. For each submitted port verify that it is a valid port by checking to see if it exists in the ports table. If it does not exist then skip it. If it does exist then attempt to find an entry for the given user that was created this month. If an entry was found then update it, otherwise create a new entry.

OS Statistics Page

The OS statistics page provides visualizations of the data in the os_statistics table. It shows pie charts for each of

  • MacPorts Version
  • OSX Versions
  • OS Arch
  • OS Platform
  • Build Arch
  • gcc Versions
  • XCode Versions

These pie charts show the percentage of the user population running different versions (or arch / platform) in each category.

Port Page

Every port in the MacPorts repository has an associated port page. This page displays basic information about the port such as

  • Name
  • Current version
  • Licenses
  • Categories
  • Variants

This page also shows visualizations of the data in the installed_ports page for this particular port.

It has the following

  • Line chart of installation counts over the past 12 months.
  • Top versions over the past 12 months. This finds the top 5 most popular versions in use right now and tracks how their popularity has changed over the past 12 months. Popularity is measured by the number of installations of each version per month.
  • Pie chart of all versions. This shows the distribution of all different versions in use right now. It will show you that x% of users of this port are using version y.
  • Similarly to all versions, there is a pie chart of all variants in use.

Installed Ports Page

This page shows summary information for installed ports such as:

  • The number of participating users
  • The number of ports in the MacPorts repository
  • Average number of ports installed port user
  • Most popular port this month and the number of installs
  • Most popular port this year and the number of installs
  • A bar chart of the top 25 most installed ports along with their install counts
  • A table of the top 25 most installed ports along with their install counts

Home Page

The home has links to all other pages as well as a search area for ports. It also displays a line chart of the number of participating users over the past 12 months.