Changes between Initial Version and Version 1 of MacPortsStatisticsGSoC2011


Ignore:
Timestamp:
Sep 24, 2011, 3:39:51 PM (13 years ago)
Author:
derek@…
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • MacPortsStatisticsGSoC2011

    v1 v1  
     1= MacPorts Statistics - Google Summer of Code 2011 =
     2
     3=== Derek Ingrouville - derek@macports.org ===
     4
     5This page describes the implementation of the MacPorts Statistics project as part of Google Summer of Code 2011. It is based on the documentation in the svn repository available at
     6https://trac.macports.org/browser/branches/gsoc11-statistics/docs/implementation/impl.tex
     7
     8The code shown here was written during GSoC 2011 and may no longer be current.
     9The latest version of the code is available at https://trac.macports.org/browser/branches/gsoc11-statistics/
     10
     11
     12= Client Side - MacPorts Base =
     13
     14== Install ==
     15
     16In order to automatically submit data at regular intervals some small changes had to be made to the installation process. These changes include installing a script which handles data submissions `submitstats.sh`, configuring `launchd` to regularly run `submitstats.sh`, and generating a unique identifier for the user submitting data.
     17
     18=== Makefile.in ===
     19
     20 * Install `submitstats.sh` to `$(DESTDIR)${datadir}/macports/`
     21 * Run `setupstats.sh`
     22
     23=== configure.ac ===
     24
     25Generate a universally unique identifier to identify this MacPorts installation. The UUID is generated by `uuidgen` and stored in the variable `STATS_UUID`
     26
     27=== Scripts ===
     28
     29=== `setupstats.sh` ===
     30
     31This script is responsible for generating and installing the file `org.stats.macports.plist`. This plist is used by `launchd` to regularly run `submitstats.sh`.
     32
     33The script takes two arguments
     34
     35 1. The path to the script that `launchd` should execute
     36 2. The path to the MacPorts configuration file `macports.conf`
     37
     38It will execute the script once a week. The day of the week, hour and minute are determined as follows:
     39
     40'''Weekday''' The day of the week is determined by the machine's hardware UUID modulo 7. This is to help ensure that submissions are roughly evenly distributed throughout the week.
     41
     42'''Hour: ''' The hour that `submitstats.sh` was executed.
     43
     44'''Minute:''' The minute that `submitstats.sh` was executed.
     45
     46The plist is installed to `/Library/LaunchAgents/org.macports.stats.plist` and then loaded by `launchctl`
     47
     48=== `submitstats.sh` ===
     49
     50This script has two responsibilities
     51
     52 1. Check if a user is participating.
     53 2. Submit data only if the user if participating
     54
     55It takes one parameter, the path to `macports.conf`.
     56
     57To determine if a user is participating it checks if the variable `stats_participate` is set to `yes`. If it is, then `port stats submit` is executed. If the user is not participating then the script exits.
     58
     59The reason this script exists is to have a lightweight tool to check if a user is participating before running `port`. This script will be executed once a week for every user, regardless of whether or not they are participating.
     60
     61
     62== Configuration ==
     63
     64Added several variables to `macports.conf.in` and appropriate descriptions to `macports.conf.5`.
     65
     66=== `stats_participate` ===
     67
     68This indicates whether or not a user has chosen to opt-in and share their data. Its value is either `yes` or `no`
     69
     70=== `stats_url` ===
     71
     72This is the url where data should be submitted.
     73
     74=== `stats_id` ===
     75
     76This is the UUID used for submissions. It is initially set to value of the autoconf variable `@STATS_UUID@`
     77
     78== Changes to macports1.0/macports.tcl ==
     79
     80=== New Globals ===
     81Added globals `stats_participate`, `stats_url`, `stats_id` that correspond to configuration options. Added deferred global `gccversion`
     82=== `gcc` version check ===
     83Added proc setgccinfo that is called the first time `gccversion` is read.
     84
     85== Changes to pextlib1.0/curl.c - CurlPostCmd() ==
     86
     87Added CurlPostCmd function. This takes two Tcl parameters, the post data and the url.
     88
     89Example usage is
     90
     91`curl post "project=macports" $url`
     92
     93== The `port stats` action ==
     94
     95`port stats` gathers lists of all active and inactive ports as well as relevant system information. It no subaction is given `port stats` prints the system information to `stdout`.
     96
     97If the `submit` subaction is given then it will encode all the collected data as a `JSON` object. It then submits this via HTTP POST to a server specified in `macports.conf`.
     98
     99`JSON` encoding is done though sub-procedures contained inside the procedure for the `port stats` action.
     100
     101== Changes to port/port-help.tcl ==
     102
     103Added help entry for the `port stats` action describing proper usage.
     104
     105= Data Format =
     106
     107Transmitted data is encoded as a JSON object with four fields.
     108
     109
     110{{{
     111{
     112      "id": "...",
     113      "os": { ...
     114      },
     115      "active_ports": [
     116          {...},
     117          ...
     118          {...}
     119      ],
     120      "inactive_ports": [
     121          {...},
     122          ...
     123          {...}
     124      ]
     125}
     126}}}
     127
     128 1. id
     129This is a string containing the user’s UUID.
     130
     131 2. os
     132This is a JSON object containing information about the user’s system.
     133
     134{{{
     135 "os": {
     136             "macports_version": "1.9.99",
     137             "osx_version": "10.6",
     138             "os_arch": "i386",
     139             "os_platform": "darwin",
     140             "build_arch": "x86_64",
     141             "gcc_version": "4.2.1",
     142             "xcode_version": "4.0"
     143}
     144}}}
     145
     146 3. active ports
     147
     148This is an array of json objects. Each object represents a single port.
     149
     150{{{
     151"active_ports": [
     152             {
     153                 "name": "aalib",
     154                 "version": "1.4rc5_4"
     155             },
     156             {
     157                 "variants": "nonls +",
     158                 "name": "aspell",
     159                 "version": "0.60.6_4"
     160             }
     161]
     162}}}
     163
     164
     165 4. inactive ports
     166This is the same as active ports except that port objects represent installed inactive ports.
     167
     168= Server Side - Ruby on Rails =
     169
     170== Database Schema ==
     171
     172=== Categories table ===
     173
     174Imported from MPWA
     175
     176
     177{{{
     178create_table "categories", :force => true do |t|
     179    t.string   "name"
     180    t.datetime "created_at"
     181    t.datetime "updated_at"
     182end
     183}}}
     184
     185
     186'''Relationships and Validations'''
     187
     188{{{
     189has_many :ports
     190validates_presence_of :name
     191}}}
     192
     193'''Changes from MPWA'''
     194
     195* Validate presence of name
     196
     197== Ports table ==
     198
     199Imported from MPWA
     200
     201{{{
     202create_table "ports", :force => true do |t|
     203    t.string   "name"
     204    t.string   "path"
     205    t.string   "version"
     206    t.text     "description"
     207    t.string   "licenses"
     208    t.integer  "category_id"
     209    t.text     "variants"
     210    t.string   "maintainers"
     211    t.string   "platforms"
     212    t.string   "categories"
     213    t.datetime "created_at"
     214    t.datetime "updated_at"
     215end
     216
     217add_index "ports", ["name"], :name => "index_ports_on_name"
     218}}}
     219
     220'''Relationships and Validations'''
     221
     222{{{
     223  has_one :category
     224  belongs_to :category
     225  has_many :installed_ports
     226  validates_presence_of :name, :version
     227}}}
     228
     229'''Changes from MPWA'''
     230
     231 * Changed variant column to text type from string
     232 * Added index on name column
     233 * Validate presence of name and version
     234 * has_many installed ports
     235
     236=== installed_ports table ===
     237
     238The installed_ports table holds submitted port installation data. It keeps track of an installed port's version and variants as well as the id of the submitting user.
     239
     240{{{
     241create_table "installed_ports", :force => true do |t|
     242    t.integer  "port_id"
     243    t.string   "version"
     244    t.text     "variants"
     245    t.datetime "created_at"
     246    t.datetime "updated_at"
     247    t.integer  "user_id"
     248end
     249
     250add_index "installed_ports", ["port_id"], :name => "index_installed_ports_on_port_id"
     251add_index "installed_ports", ["user_id"], :name => "index_installed_ports_on_user_id"
     252}}}
     253
     254'''Relationships and Validations'''
     255
     256{{{
     257belongs_to :port
     258has_one    :user
     259 
     260validates_presence_of :user_id, :port_id, :version
     261}}}
     262
     263=== os_statistics table ===
     264
     265The os_statistics table holds information about a user's system.
     266
     267{{{
     268create_table "os_statistics", :force => true do |t|
     269    t.datetime "created_at"
     270    t.datetime "updated_at"
     271    t.string   "macports_version"
     272    t.string   "osx_version"
     273    t.string   "os_arch"
     274    t.string   "os_platform"
     275    t.string   "build_arch"
     276    t.string   "xcode_version"
     277    t.string   "gcc_version"
     278    t.integer  "user_id"
     279end
     280
     281add_index "os_statistics", ["user_id"], :name => "index_os_statistics_on_user_id"
     282}}}
     283
     284'''Relationships and Validations'''
     285
     286{{{
     287belongs_to :port
     288has_one    :user
     289
     290validates_presence_of :user_id, :port_id, :version
     291}}}
     292
     293=== users table ===
     294
     295The users table holds UUIDs for each user.
     296
     297{{{
     298create_table "users", :force => true do |t|
     299    t.string   "uuid"
     300    t.datetime "created_at"
     301    t.datetime "updated_at"
     302end
     303}}}
     304
     305'''Relationships and Validations'''
     306
     307{{{
     308has_one  :os_statistic
     309has_many :installed_ports
     310}}}
     311
     312
     313== Submissions ==
     314
     315JSON encoded submissions are sent via HTTPS POST to the `/submissions` page. All data is stored in the `data` POST variable.
     316
     317
     318Submissions are stored on a month by month basis. Resubmissions in a given month cause that month's data to be updated.
     319
     320
     321Storing data happens as follows
     322
     323 1. Attempt to find a user with the given UUID in the database. If no user is found then add a new entry
     324 2. Attempt to find an entry in the os_statistics table for this user that was created this month. If no such entry is found then add an try for this month. If an entry is found then update it.
     325 3. For each submitted port verify that it is a valid port by checking to see if it exists in the `ports` table. If it does not exist then skip it. If it does exist then attempt to find an entry for the given user that was created this month. If an entry was found then update it, otherwise create a new entry.
     326
     327== OS Statistics Page ==
     328
     329The OS statistics page provides visualizations of the data in the {os_statistics} table. It shows pie charts for each of
     330
     331 * MacPorts Version
     332 * OSX Versions
     333 * OS Arch
     334 * OS Platform
     335 * Build Arch
     336 * gcc Versions
     337 * XCode Versions
     338
     339These pie charts show the percentage of the user population running different versions (or arch / platform) in each category.
     340
     341== Port Page ==
     342
     343Every port in the MacPorts repository has an associated port page. This page displays basic information about the port such as
     344
     345 * Name
     346 * Current version
     347 * Licenses
     348 * Categories
     349 * Variants
     350
     351This page also shows visualizations of the data in the {installed_ports} page for this particular port.
     352
     353It has the following
     354
     355 * Line chart of installation counts over the past 12 months.
     356 * Top versions over the past 12 months. This finds the top 5 most popular versions in use right now and tracks how their popularity has changed over the past 12 months. Popularity is measured by the number of installations of each version per month.
     357 * Pie chart of all versions. This shows the distribution of all different versions in use right now. It will show you that `x%` of users of this port are using version `y`.
     358 * Similarly to all versions, there is a pie chart of all variants in use.
     359
     360== Installed Ports Page ==
     361
     362This page shows summary information for installed ports such as:
     363
     364 * The number of participating users
     365 * The number of ports in the MacPorts repository
     366 * Average number of ports installed port user
     367 * Most popular port this month and the number of installs
     368 * Most popular port this year and the number of installs
     369 * A bar chart of the top 25 most installed ports along with their install counts
     370 * A table of the top 25 most installed ports along with their install counts
     371
     372== Home Page ==
     373
     374The home has links to all other pages as well as a search area for ports. It also displays a line chart of the number of participating users over the past 12 months.