This is my approach to get more (and hopefully better) metrics about the Ambassadors Project with my poor python programming skills. During March I added the new ambassadors by hand to the statistics page. That’s not the way we want to go…Francesco Crippa made a python script to count the members of the Fedora Ambassadors Project. His idea was to collect the diffs of the Country List page. Unfortunately this page is updated manually and because of that not very reliable. IMHO the better way is to get the data from the several category pages (for example CategoryAmbassadorsFrance). The personal wiki pages of the ambassadors are often more up-to-date because Thomas Chung add the category after verification.
The script is quite simple and at the moment just a working prototype. There is a list of countries. With this list the script will get all links from the category pages (incl. the wiki names), then do some work with regex and print out a number. This number is the count of all ambassadors in the selected countries.
import sys, urllib, os, re
from BeautifulSoup import BeautifulSoup
This function will get the names from Fedora Project category pages
urlList = 
for country in countries:
URL = urllib.urlopen ('http://fedoraproject.org/wiki/CategoryAmbassadors'+country)
soup = BeautifulSoup(URL)
for tag in soup("a"):
attrs = dict(tag.attrs)
links = str(tag)
result = 
for line in rawHTML:
if "%28CategoryCategory%5Cb%29" in line:
Extracting the names of the ambassadors.
re1='.*?' # Non-greedy match on filler
re2='(?:[a-z][a-z]+)' # Uninteresting
re3='((?:[a-z][a-z]+))' # ((CategoryCategory))
result = 
for line in cleanHTML:
rg = re.compile(re1+re2+re1+re2+re1+re3,re.IGNORECASE|re.DOTALL)
A_name_temp = rg.search(line)
A_name = A_name_temp.group(1)
Remove double entries...Gerold per example ;-)
for i in all_names:
Get the number of ambassadors
length = len(names)
# Countries in EMEA, only a few for testing (56 ambassadors in this 5 countries)
countryHTML = countryList(countries)
cleanHTML = cleanList(countryHTML)
all_names = get_names(cleanHTML)
names = doubleNames(all_names)
# If you want to know the names of the ambassadors in the region you selected
if __name__ == "__main__":
If you are a python pro feel free to tell me what I can do better in this script. Any kind of suggestions are welcome.
Maybe I will expand this script. I would like to have a log function (writing the data to a file with time and data), selection of the area through user input, and some other small thing like error handling and a test if the the script got all data form the Fedora wiki. At the moment with moinmoin it’s a pain and there have to be more than one run to get all the data, 502 Proxy Error…sooner or later everything will be fine with mediawiki.