EuroPython 2012
Guido van Rossum
Write unit tests - write integration tests.
Keep being clever. Don’t think your tools will solve all the problems. If you want to be naive, then delegate software development to someone else, and go and make money or something!
Almost all functionality on YouTube is written in python (apart from the codecs). python clearly is fast enough for real work.
Functional programming - interesting computer science stuff - not useful very often when integrating third party packages in the real world.
Questions and Answers
assert
, why isn’t it used?
It is used by many people. Test your code with
-o
(what is that)? Even if-o
did not exist, then GVM would still say thatassert
is a valid part of the language. Clever because the code following is not evaluated unless the assertion isTrue
.Misguided to use the
assert
statement for unit tests, because-o
would disable all your tests.
Alex Martelli
Permission or Forgiveness
Sometimes the best answer, is neither!
The only way Grace Hopper could innovate so much in a bureaucratic US Navy was by doing it first and asking for forgiveness later. See Why does it tend to work in the slides (funny, but true):
if isinstance(x, numbers.Number):
… is an idiomatic, well-supported way to perform typeclass checking, but don’t forget, not ALL problems are nails.
Even better, when applicable, use defaults:
d.get(key, 'default')
Concurrency - perform desired operations within a transaction. Databases have been doing this for a while.
Favourite example, is source code control. Change local copy, commit, detect conflicts, forces reconciliation.
Also see, Launching a Product, very interesting to compare and think about the different strategies.
Mandate a code review. Pair programming is not a good alternative. People who were not actively involved in writing the code will spot bits of code which are unclear. Unpopular with young hot heads. Privacy and security cannot be an afterthought! Architectural, privacy and security reviews.
The benefits of paranoia: skills that make good programmers aren’t suitable for human interaction.
Febo Cincotti
Let your brain talk to computers
Brain, computer interfaces (BCIs).
PyPy - Current status and GIL-less future
You don’t want to program using threads.
Lightning Talks
Transifex
Translate into many different languages.
How I became a bitterman
Joined Canonical UbuntuOne - Port it to Windows - based on Twisted. Or, why I hate Windows paths (some funny examples).
os.normpath
doesn’t work,os.path.split
doesn’t work,os.listdir
doesn’t work!ShFileOperation
,ShellLink
(from Microsoft) doesn’t work!!
eGenix, PyRun
One file python runtime environment in a single, very tiny file. Install without any side-effects. No installer, just copy the file. Independant of the system python installation, 12MB, etc, etc…
Can be used for commercial projects. linux, Mac (Windows support might follow).
ScraperWiki
A tool for analysing data online and making collaborations.
Tell us how you would do it.
SQLAlchemy and Testing
github.com/riffm/testalchemy
Most reliable way is to recreate the database schema for every test, but this is very slow. Wrapping each test in a transaction and rolling back after every test is fast.
See
Restorable
context manager in the source code.My guess would be that this module is not very useful.
PSF Members Meeting
All invited to the meeting at 6:30pm. You can become a member…
OpenTaste.eu
Even python developers must eat. The site is made from Flask :)
RedEddy
Technical computing in the cloud
Not WIMP, but mobile, tablets, OpenGL, touch, etc, etc.
Cross platform, Windows, Android, Linux, iOS. Applications have been accepted in the Apple App Store. All written using the GPU, OpenGL.
Check out the gallery, search on Google Play for
kivy
.Talk on Friday, to share experiences, OpenGL and python on Android/iOS.
http://brochure.getpython.info/
python software foundation, printed brochure is ready!
Sponsorship options (including education).
Psycopg
Wrapper, old fashioned C extension. PyPy has tried to do something different. Alex Gaynor tried to write a subset of Psycopg. Another person tried to port using
ctypes
. Now written in pure python using actypes
wrapper. People started using it - now we have some problems. Not yet working on python 3. Need somebody to take charge of pypy integration. Could end up dropping the original Psycopg.
http://readthedocs.org/projects/artichoke/
Web micro framework. One file (very clever presentation). Similar to TurboGears.
DocBook to Sphinx
Sphinx - 30% faster by not doing index generation. SOLR/Lucene does it better anyway.
Natural language processing and geocoding
There is not a perfect solution, but that doesn’t matter. The beginning is usually good enough. The first 70-80% is easy and fun. To get started, read the NLTK documentation.
Moin-Moin - Whoosh
Fast, pythonic, pure python search library. It is rather nice :) Can do highlighting and has a built in spell checker. It is a library, not a separate server or process - just import it and use it.
Dynamic fields are a nice feature - you don’t need to know the name of the field - just give a
glob
field name.Works with Google App Engine.
You can also time limit searches.
python Anywhere
PyCon UK
Alex, PyCon UK is not dead. Coventry TechnoCentre, early bird from Friday 6th July, Friday 28th September for the weekend.
Book tickets using EventBrite.
Building JavaScript Widgets
Fitocracy.com
Django application. Django application. We spend too much time sitting… but we do like playing games.
pyRserve
A network bridge from python to the statistics package R
Social Eating Revolution
Gian Luca Ranno, Gnammo
Technology - Django, already have 2360 users and more than 30 events (within one year).
Django (easy to learn for designers), Rabbit MQ, logging, social-auth, Django PayPal, Fabric, Supervisor
Thank you for creating these great plugins.
Uploading to PyPI
Pelle
All of us should upload stuff. It is very easy to do.
@peralmq
Large XML with Unicode and namespaces
Need to stream…
Some people use
codecs.open()
andcodecs.write()
, but this feels like desperation.Wrote our own
loxun
- pure python, only writes XML, streaming, raises an error for some errors. Try it!!
How not to write a micro-framework
Daniel Pope@lordmauve
Qubes - a secure client OS
Following #euro2012
NHS Hack Day 2012
22-23rd September in Liverpool
Django Bitcoin
django-bitcoin
Open source currency. Governments cannot control.
MoinMoin
Rewritten - now version 2 - look at it again!Jinja 2, Flask, https://bitbucket.org/jek/flatland/ for validating formsHTML5Supports RST
batou
Christian Theune
OpenStack and OpenShift
http://www.pixelbeat.org/talks/openstack_python/
Operational details of a large python project.
What is OpenStack?
IaaS (like Amazon AWS)Open Source2 years old.Mainly written in python (300k lines of python)
Public or Private
Sensitive logic or dataAmazon have partnered with Eucalyptus to offer a private cloud (March 2012)OpenStack aims to support public and private.
Who uses OpenStack?
RackspaceHPSupports multi-tenancy
Overview
Nova == EC2 (central service)Swift == S3Glance, VM image, registrationKeystone, identity and authHorizon, Admin UI (Django)Quantum, networkingVolume, EBS
Compute Service
Postgres or MySQLChoice of queue e.g. RabbitMQ
python Technologies
SQLAlchemy, Django, eventlet, paste, PasteDeploy, httplib2, webob, routes, python-cloudfiles, sqlalchemy-migrate etc, etc.
Project Packaging
Consume through distros. Difficult to install yourself!!6 monthly release cycle.
Development
Always on trunkReleases done to stable branchgitGerrit, patch review server (created by Google for Android). Looks nice :)Jenkins (requires alot of ongoing maintenance).
Commit process
git branch; git commit
./run_tests
, unit tests within avirtualenv
, nose used, pep8git-review
python tool, used to submit togerrit
Related python projects
OZ, ISO - image - glance - nova - libvirt - KVMHeat API, AWS cloud formation, provision apps in the cloud.cloud-init, package install etc, https://launchpad.net/cloud-init/
Similar non python projects
oVirt, Java, for private data centresEucalyptus, C, less general, closed editions, EC2 functionalityCloudStack, Java, parts of this are closedOpenNebula, C++ datacentres
Try it
fedoraproject.org/wiki/Getting_started_with_OpenStackYou can run this on a VM!!
OpenShift
PaaS, you code the application, you want to deploy it. You don’t want to care about the deploy stuff.
Is a free PaaS by Red Hat Hosted at openshift.com Based on Amazon EC2
python, java, node.js, php, perl, ruby
MySQL, PostgreSQL, mongoDB
Your app will still use the basic stuff
Open source project, tutorials, live CD, runs in a VM, Apache 2 license
Can be run on OpenStack
Run your own multi-node, multi-tenant PaaS using OpenShift, OpenStack and Fedora on your own hardware.
A cartridge adds resources to your application e.g. PostgreSQL or MongoDB.
Check out the django-example
on github
3 free apps on their hosted version.
juju - Service Orchestration and Deployment
jamespage
on IRC#juju
Written in python and Twisted. Coordinates service install onto Ubuntu servers. Does not replace Puppet or Chef e.g. juju can deploy a database and an application. Can also scale up and down horizontally. Abstracts you from your underlying infrastructure.
The local provider is for developers. Can deploy onto EC2, OpenStack or bare metal servers.
Charms
Can be written in any languageHave a well defined structure.Has configuration options to allow the application to be personalized.Hooks - install, start, relation related (join, change), upgradeStore - several charms built. By default, charms will come from here.
Demo
Install demo Django application - SummitCharm written by Mark Mimms, and uses Michael Nelson’s generic Django charmMajority of configuration is done using puppet.Start (after installing local provider):juju bootstrap
juju deploy postgresql
juju deploy memcached
juju deploy --config europython.yaml local:summit europython-summit
(Summit is not in the Charm store, which explains the previous command)juju add-relation europython-summit postgresql:db
juju add-relation europython-summit memcached
juju expose europython-summit
(service is made public when it is exposed)juju status europython-status
juju debug-log
(a log aggregation)juju debug-hooks
(uses atmux
session)PostgreSQL charm cannot currently scale…
OpenStack is probably the most complicated charm set because of the multitude of options. Ubuntu JuJu MAAS (metal as a service) project was started to allow installation of JUJU on bare metal servers. Takes about 11 minutes to install OpenStack onto 9 servers.
http://tinyurl.com/juju-at-scale Testing on EC2 with 2000 nodes of Hadoop.. Took 7 or 8 hours to provision!
A couple of third party projects, https://launchpad.net/juju-jitsu/ and https://launchpad.net/charm-tools
Switching provider is dead easy…
Questions
Check out the Summit and Evernote charms to see how to pull code from source control systems.
Switching provider is dead easy… Can use current version of Ubuntu server. On the client, will work with OS10 and Debian.
Be careful with your data - the current charms might not take care of it
No automation for intelligent scale up and scale down, but juju will replace nodes which disappear.
Proposal to support verification e.g. smoke test, at some time in the future.
Diving into Flask
A.Mishkovskyi
Switched away from PHP to python with Flask. Second largest social network in Netherlands.
Considered Django, Pyramid etc.
Simple start application. Bus factor of one (everything written by one person). Loads of code behind the simple looking starting app.
End up with complex routes, loads of parameters… Flask has the ability to do
things in many different ways e.g. method specific parameters. How does this
work. Explored class based views. Can use manual dispatching. Much easier to
use a decorator… Or use class based views with method names e.g. get
…
Check out the source code in the View
class.
Routes… Rule
creates regexp. Map
holds all rules. Converters
map to python code. Rules can match URLs and subdomain. Rule
objects
are stored in a Map
in sorted order.
Modular applications - easier to develop. Pluggable. Blueprints - needed API
versioning, url_prefix
, splitting admin and API endpoints, each blueprint
has it’s own template folder. Blueprints are a simple proxy object. Great
example for writing plugins.
Wanted to use a proper ORM. There is only one - SQLAlchemy… Not obvious how
Flask-SQLAlchemy
actually works. Code just helps with debugging. Bind
is the SQLAlchemy engine or pure connection object. Bare metal, so you really
have to know what you are doing. If you specify __bind__
it will do the
proper thing… See get_bind
in Flask-SQLAlchemy
. To achieve
master-slave support, db.session.using_bind('slave')...
(custom code)!
Could use the bind for sharding etc… Using SQLAlchemy-migrate
which is
very old and not being actively maintained. Had to write a wrapper to run
migrate
. Consider switching to Alembic, which is written by Mike Bayer.
Is very mature right now.
Deferring tasks. Can now use Celery in Flask. Removes the hassle of using
amqplib/pika. Documentation is confusing and misleading. Flask-Script
is
a requirement for Flask-Celery
. Most of the commands work!! Is
over-engineered in many ways. Celery colourizes logs - they don’t like
colours! Solution - add after_setup_logger
signal and reassign our own
formatters. Also, set CELERYD_HIJACK_ROOT_LOGGER
to False
, but this
caused more problems. Solution - do not use root logger!! Two years old, but
nothing has been fixed. To monitor celery, subclass Polaroid
… Celery +
SQLAlchemy + MySQL issues - solution to drop whole connection each time the
worker starts - loses all connections (sounds like a complete nightmare)!!
Flask-Cache
, all views are non-cacheable, so not very useful. Wrote their
own. libredis
was in a pretty bad shape - have improved it.
Flask-DebugToolbar
, very good at identifying bottlenecks. Is a very good
example of blueprint based design.
Is no longer an April Fool’s joke. Still micro, but not in terms of features.
ecosystem, is not on a par with Flask in places. Interoperability is rough in places. Lacks BDFL for extensions - do not know which ones to use.
How to bootstrap a startup using Django
Gidsy is a marketplace where anyone can explore and book activities.
Why we chose Django
Big communityNetworkLanguageMany problems already solvedThe adminProven technology by similar use casesStable APIs in a well defined release processGood documentation with focus on proseHuge community of 3rd party components (2600 apps on chishop)
Haystack
Can write python, can integrate with Django. City page on the site is based on search technology. Spatial date will be very important in the future.
Customisable search abstractionIndexing, filtering, faceting, “more like this”Spatial search and sorting
TastyPie
Can easily hook into Django..
Highly customisable Web API libraryHooks for auth, throttling, caching, custom serialisation etcBackbone.js compatibleDjango has a very strong paradigm.
Celery
Async code execution, cronjobs (a few periodic tasks)Thumbnails, search index updates, caching etc.Collect stats without blockingVery easy to put on a separate server.
Memcache
Periodic cache refreshing for high traffic sitesFragment caching with dates and cache versionCache warmup during deployment.Using Celery to build data for pages. 37 signals had a great article on this a few months ago. Special field,
refresh_date
… if something was changed by the user, then all keys are invalidated. Tried redis, were not completely happy with it - have found memcache super simple.
A strong pattern. Framework based solutions from the Django community. Don’t have to think about all these things. Major usefullness of the Django ecosystem.
Workflow
Main branch is always deployableDevelopment happens in feature branchesCode reviews via pull requestsShared responsibilityStopped using gitflow - now use github workflow - very successful. Code reviews is a great way to improve the quality. Shared responsibility.
Testing
Separation of fast and slow testsFull test suite via Jenkins, soon Travis CIFast tests locally viatox
If you use the github workflow, Travis CI will test the feature branches :) Super important to make sure the product is ready for the customers.
Releasing
virtualenv
(wrapper) + piplocalshopi
for in-house software releasesdjango-setcon
for Django configuration. Class based settings files.foreman
for process management (written in Ruby).Using
foreman
, one command will set up an environment for a developer. Not using virtual machines because they take too long to set up.
Scaling Up
Initial set-up by hand.
Gets more difficult, each server downloads dependencies, external services could be down, which server is in charge?
Built their own deployment server which builds the latest release as a
tar
file.Builds are virtual envionmentsAtomic and orchestrated releasescollectstatic, migrate and other command centralised.Web interface for deploying and rolling backPushes status updates to New Relic and HipChatWill be relased as an open source package soon.
Provisioning
Follows DRYChef/Puppet/Salt (decided to use Chef)Documents infrastructure and changePlace to share and store secure dateRoles can be on one or many serversChallenge - separating deployment from the application.Dependencies come from the deployment server. On a new deploy, there is really nothing that Chef needs to do.
Use
knife
to run commands on the servers e.g.knife ssh "role:web" "sudo chef-client"
Instagram tool,
ec2-ssh
, has a simple syntax and the name never changes.
pychef
to access node data and manipulate it with python. Use it infabric
.
Operations
Log everything you could need for debuggingIf you deploy often then you need immediate feedbackUse services if you can: Mixpanel, NewRelic, Librato, Papertrail, Pagerduty
django-app-metrics
to push data to the services.
Summary
Only scale when you need to, but be preparedBe pragmaticAutomateContinuous integration and continuous deployment.Make routine tasks really easy
Questions
Django community is smaller than Ruby. Not necessarily harder to recruit.
Logging Module
Two targets for notifications: data related (errors in a file, new data, warnings) and system related (all other errors).
Requirements
Route messages to different peopleAccumulate messages relating to a single inputRemember types of messages to decide actionStore messages as actions in the database.
Development Requirements
Simple APIAvoid passing notification objects from place to placeA single process might handle many files in sequence
Logger
Root logger:
basicConfig()
Get the logger and send a message:
getLogger().error(error_message)
Or… use a named logger:
my_logger = getLogger('main.special')
Handler
does the actual output…
Might be helpful to think of it as follows:
What went wrong - MessageWhere did it go wrong - Logger nameHow important is it -Level
Who needs to know -Handler
Filters are given a copy of the log record. Message is not processed if filter fails.
Multiple handlers can be defined.
Logger hierarchy
Messages sent to X.Y.Z also go to X.Y and X (depending on filtering).getLogger('X.Y.Z').warn(text)
The logger does level cut off test and filtering before it does anything else. So, the hierarchy will not get checked if the level doesn’t match.
Filtering is possibly best done by the handler…
Lessons in Testing
5 billion page views. Use Django and Flask. Less than 20 engineers. Terrible at testing.
Lessons
No one likes writing tests. Time consuming to write. At least 50% of time writing tests.
Legacy (untested) code is expensive. Very expensive to add tests later. Add tests for regressions. Always write tests for new code. Becomes easier and easier to write them.
Slow or inaccurate - you can spend more time writing tests, or much more time running tests. So, moving towards integration tests. Interface contracts yield inaccuracy (i.e. they change).
Higher level tests are slower, but easier to write and understand i.e. unit vs integration tests.
Mocking is great, but is very fragile (they use
mock.readthedocs.org
). Very useful for testing services e.g. Twitter and internal APIs. Record live data for mocking - check out the Ruby VCR library.Limit what you test.
Assume APIs don’t change (it is mostly true).
Smoke tests… Very high level Selenium test.
Test the life-cycle of requests. Selenium kind of works… Very fragile, swapping some of the tests out to PhantomJS.
Don’t admit defeat!!
Start with a goal - write testable code - things will become much, much better.
Break up your code into functions e.g. abstract out the database calls.
Start writing tests, add helpers wherever it makes sense - tests will become much cleaner.
Create structure in the test suite. Don’t like the Django pattern (they have a much deeper folder structure).
Document best practices, make it obvious how to use your helpers. Help people write tests.
Continuously run tests, make people fix stuff immediately. Having visibility. Nobody cared about JavaScript tests until they were added to Jenkins.
Drive it into your culture. We don’t like to break production code.
Use code review Everything goes into code review… Breaking stuff into smaller chunks so development life-cycle can go faster.
Tools
If the right tools don’t exist, then build them.
Switched off the standard Django test system. They use nose for test discovery and uses standard unit test style. Can drop into
pdb
on test failure:nosetests --pdb --pdb-failure
. Check outnose-quickunit
anddjango-nose
.Record code coverage using
coverage.py
. Usecoverage run
in place ofpython
.Sentry, exception reporting, because tests are not enough! Data usually breaks code. Check out the stack trace - can often avoid having to contact the user.
CI - Jenkins. Has been mangled into something it is not at the moment. Wanted to test every commit (couldn’t do this with Jenkins). Have separated tests into chunks e.g. JS, integration, unit test.
Code review - http://phabricator.org/, (PHP), very well written. Very well integrated with GIT. Makes your commit message useful.
arc
is the command line interface - runslint
and unit tests.Gargoyle, selectively enable features in code. Silently launch features. Ease performance and for load testing. They call this dark launching - managed to test their real time module before anyone was using it. It has also failed several times without affecting anything.
Takeaway
Very hard for us to adopt it.
Test suite still has alot of holes.
Global fixture is a problem - speeded up - but hard to understand and creates problems with test isolation.
Alot of problems with people arriving from environments where they are not used to testing.
Integration testing has been awesome for us. Use Django, so database testing is pretty awesome. Unit testing can work right - if you do it from the start - works perfectly for a library. Ship fast - or a perfect product - you have to find the balance.
The process is evolving. Culture is key. Very hard to adopt. Came from PHP - had no idea what tests were at that point. Alot of people come from this hackish world. Wanted to release often, and didn’t want an expensive QA team.
Figure out the value for your own company.
Just do it?
Questions
For lint, use
pyflakes
andpep8
. Check out the modified version ofpyflakes
on his github page.git branches - follow the pattern used by the review tool. Developer works on feature branch.
rebase
so the branch is a single commit (for code review).master
has to be stable. Similar to gitflow (but not the same).Deploy - use
fabric
,rsync
etc.Making the case - Why are we deploying broken software to production? Could calculate the cost of failure and compare it to the cost of prevention. Must be doing something right, as we have no QA people - and are fairly stable.
Arbitrarily assign code reviewers. Don’t currently have a great solution. Not really sure how to guide reviewers. When to accept, and when to reject. When should we spend time doing it. Comment from audience - Review board is good, but not great.
Has personally only been successful using TDD when fixing a bug/regression. Culture for TDD is much harder to adopt. Has been some very impressive stories from companies with this culture. Wouldn’t stop a developer doing TDD as long as the code is good.
For a while, only wrote positive tests… The code reviewers and the developer will decide if the tests are worth writing. Alot of it is trial and error. Have mentors, peer review, discussions.
Test data - Django fixtures were a problem, so implemented global fixtures. Use a modified version of
django-nose
. Set-up global fixture before running tests. Loads using SQL and sets up REDIS. Rely on the test not being too absolute about what is in the fixture. Do not mock the database. Kind of iffy. Releaseddjango-mock
, but sounds like they don’t use it. All trade offs. No perfect solution.Eventual goal is for Jenkins to test each patch in isolation before merging.
eGenix PyRun
Open source project.
Motivation
Simple installation without side effects. Often difficult on linux. Disk space is cheap.
Small footprint and download.
Easy to add to installers.
Extensible - load .py, .pyc and C extension modules.
Project
Builds upon old mxCGIPython project
Support for 2.7 (not 3 yet)
Binaries available for linux, Mac OSX
Use Case
Distribution of scripts and applications. Loads almost twice as fast as regular python. Uses only a fraction of the space - 12MB file.
virtualenv
replacement. Simply copy into a folder. Forpip
etc, you will want to create a folder structure. Don’t need an activation script. Independant of python installation. Works without python installed. No symlinks to manage. Fully relocatable. Small enough to have multiple copies - 13MB withpip
andsetuptools
. 36MB is you want to compile extensions (after compile, you can remove alot of this).Testing and scripting
Application private python installations e.g. dedicated python for Trac.
Embedded devices (just a thought - not tested).
How does it work?
Based on python’s
tools/freeze
(with a couple of patches). Is used on the whole standard library.PyRun searches relative to it’s executable folder making it easily relocatable.
Uses it’s own lib/directory for extensions.
Added tricks to make it compatible with
distutils
,setuptools
andpip
. Not yet tested withbuildout
.If you find things which don’t work, then please let us know.
Missing
dbm, crypt, readline, parser, tkinter, multi-processing, test packages. These modules can be loaded as regular external
.so
modules.
What doesn’t work?
When run with
regrtest.py
- some of the test suite modules do not work. They do work when run standalone.Standard libary modules that require access to resources.
Alternatives
py2execx_Freezebbfreezeetc
Future
Better documentation.
More flexible configuration.
Windows support
setup.py
Demo
tar
# only need the bin folder
bin/pyrun
tar setuptools
../bin/pyrun ...
tar pip
cd pip
bin/pip install
Continous introspection
Cast
@erikvanzijst - author ofdogslow
andinterruptingcow
@brodie - author ofgeordi
Performance Problems
conq
, their ssh shell was importing Django and Bitbucket code, and took nearly 1.5 seconds per request. Switched to using direct SQL, and massive reductions in load. 60% load decrease on all web servers. 16 times faster to start up. To learn - this wouldn’t have been seen in the development environment, but did cause problems to all of their users.
Common Causes
Slow SQL queries (or too many).
Lock contention - between threads, database table/row locks, file locks (hg/git).
Excessive IO (disk/network)
Regular expressions - the a fast regular expression can sometimes take forever.
503 - worker pool full. Could be a denial of service attack.
500 - if request times out (Gunicorn
SIGKILL
). Process does not know that it is going to be killed.
Libraries to help
dogslow
is Django middleware which emails traceback of slow requests. Has no performance penalty.
django-geordi
, selectively profile individual requests. Add?__geordi__
to any URL to enable theVisorMiddleware
. Produces a PDF call graph showing where the process takes the time. It runs outside the worker pool as a Celery task, so shouldn’t cause load problems.
interruptingcow
- allows you to time-box chunks of python code e.g. allow the process to take up to 20 seconds - throw an exception if it takes longer. Supports nested timeouts - can be used to make parts of a request optional.
Becoming a better programmer
Shortcut
apt-get --purge remove java
Use mind-maps to help you process information. Not very useful for other people.
Very funny talk… although you probably had to be here!
BDD at BSkyB, Collaboratively coding correctly
Replace a legacy sales system. python 2.7, REST API. Cannot afford defects when selling. Cannot afford to price incorrectly.
Why we use BDD
Do it right the first timeDeliver what was required with high quality codeTesters and developers write testsEasy reuseRefactoring
Testing is everyones responsibility
History - Developers did testing - testing is role of dedicated QA team - introduction of agile - testing is everyones responsibility TDD/BDD
Other tools - Fitnesse, Selenium, nose
Agile - Sprints, planning games, retrospectives, fail fast, adapt quickly
Two weekly sprints, only two bugs and they have never been deployed.
What is BDD
FocusCollaborationSimpleFeedback cycle
Gherkin
GivenWhenThenAndFeatureBackgroundScenarioScenario Outline
Process
Story cardDefuzz (15-20 minute chat with business analyst)QA - BDD (write the test)Write code (using TDD)Review with business analystCard complete
Test
Understood by all stakeholdersSimple - plain EnglishSteps file - regular expressions,
Tools
Cucumber - RubyLettuce - almost a direct port of CucumberFreshen - used by BSkyB - usesnose
test runnerBehave - seems to be gathering momentum. Almost identical to Cucumber.All pretty much the same.
Demo
nosetests --with-freshen -v --nocapture my.feature
Do it right the first time every time.
Questions
Step re-use is done by simple collaboration - team share and help each other.
The testing team work with the business analyst to write useful tests. Need someone in the team who understands how the test will be written.
Do unit testing before writing any code. Acceptance tests are written up front in Freshen.
Work in two weekly sprints.
Can the tests serve as documentation? BSkyB have separate documentation which is written by the business analyst.
How much time does it take? We can spend so much of time writing tests. BDD tests are verbose. BDD clarifies exactly what needs to be done.
How do you know how much a feature will change? The application has a road- map and the BDD tests define the features.
zc.buildout
Problems to solve
Install and configure software in a reroducible way
python and other packages
Does not build software from source (
make
etc)Isolated from other applications on the same machine - and from other buildout environments on the same machine.
What is zc.buildout
Developed by Jim Fulton (Zope) in 2006
Demo
# download
wget bootstrap.py
# create a config file, then bootstrap the environment
python bootstrap.py -d
bin/buildout
The work is done by a recipe. They are downloaded when buildout runs. The
recipe comes as an egg e.g. zc.recipe.egg
invokes easy_install
API.
When the configuration is changed, everything previously installed by that recipe will be removed and re-installed.
Each script sets up it’s own python path, so the eggs do not have to be installed into the system python folders.
The recipe sets up the bin/buildout
script.
Versions are pinned [versions]]
, they will always be honoured. Good
practice to set allow-picked-versions = false
KGS - known good set of pinnings
e.g. Can install nginx, recipe - zc.recipe.cmmi
Python Web Applications in Multihost, Low Latency Environments
Trading systems. Using python on the web server, and JavaScript, jQuery SVG on the browser.
WSGI Frameworks works in similar ways
Create request object from an environment provided by web serverDispatch URL to an appropriate controller function which returns a responseExecute the controller functionReturned to web server.
Request Object
Environment - methodSessionetc…
Dispatch URL
Regular expressionsRewrites (mod_rewrite
)WildcardsDecorators
Controller Function
Generate the contentHandle ORM, cache, cookiesReturn responseError handling
State
Browser stateServer stateProcess stateSession state
Shared State
Get or create a session, modify, store sessionRace conditions - need to synchroniseHow to synchronise? SQL, lock file, messaging, RPC, DLM (distributed lock manager).
dlm.py
published on ActiveState.com. If app crashes, all other processes will wait. Single point of failure.Apache
mod_wsgi
can run background processes. Apache will start, end and restart the process. SeeWSGIDaemonProcess
for details.
Fun with GET or POST data
How to check equivalency? Encode multiple forms into a single query string. Not sure about this - don’t know why you would
Guidelines to writing an API with python
Reuse existing frameworks and customise them according to our customer needs. This did not work in one case, because the framework did not support an important feature. Patches to the framework were not accepted - so they tried a monkey patch (so they could keep using updates to the framework). They then put it in a middleware. So… what should the strategy have been?
How about writing an API to replace the existing framework implementation?
Steps
What is a discount calculator? Start by saying what it is not e.g. it is not reporting… This will allow us to concentrate on the API without contaminating our thought process with other concerns.
Understand the data.
Started by using a dictionary to collect the data - then converted to a class.
Lessons learned the hard way
Evolve - from
__init__
to infinity. Don’t try to do too much. Don’t try and satisfy too many people. Always try and find the most simple form of what you are trying to do. If you talk about it, and find yourself talking about multiple conditions, then simplify.Know your domain from different angles.
Document your process. Why is more important than the How.
Don’t be afraid to test your hypothesis and then throw it away.
Other Stuff
Downloads for Android gaming: http://thp.io/2012/europython/downloads.html
Keynote about https://www.torproject.org/ Tor is free software and an open network that helps you defend against a form of network surveillance that threatens personal freedom and privacy, confidential business activities and relationships, and state security known as traffic analysis.
Slides for Programming mobile apps with python
Check this out Advanced REST client Application
To Do
Check out the with
statement in connection to unit testing…
How about a python contractors cooperative? If interested, contact rob.collins@pythonpro.co.uk. http://pycontract.com
Use mind-maps to help you process information. Not very useful for other people - but will help your own brain.
Check out the Background
keyword in Freshen. What is the equivalent in
Lettuce?
Read the following:
Check out https://github.com/inglesp/prescons
Check out django-ztask
(should not use Celery apparently)
Check out django-pjax
JavaScript library written by some guys from GitHub.
Manipulates browser history.
Check out http://discorporate.us/projects/flatland/ Form validation etc…
Advanced Flask Patterns
Mysteriously applicable to other things…
Will only work with new version of Flask (released on Sunday).
Apps are entirely independent. Now have an app.app_context()
, the
current_app
will point to the current application. Similar for the request
object.
Request stack and application stack are independent:
with app.request_context() as ctx:
Because requests are expensive, you can now use the application context.
Runtime state is request bound (short lived), test bound, user controlled. If your view function returns, your context will disappear.
State bound data:
request - HTTP request and session data
app - Database connections and object caching.
Old pattern had issues:
Requires an active request for a database connection.
Always connects to the database even if it isn’t used.
Once you start using
g.
you expose an implementation detail.
New pattern seems weird and complicated. The trivial example will not work with multiple applications. Not so bad in actual use… Slides are incorrect
Teardown always happens unless a chained teardown failed. Could move
transaction commit
or abort
to teardown method.
Recommend using an extension for database handling.
Explicit response creation - using make_response
. Normally you don’t want
to do this (you can call make_response
on the object returned from
make_response
- this is useful for decorators - one person in the
room had made a custom return type which converted objects into JSON data*).
Deep copying objects in python is slow and nearly impossible! Faster to use JSON!
How to share between applications? Blueprints are similar to applications. Solution seems ugly - Armin would welcome suggestions.
Extension Primer
Are very vaguely defined
Do not use a plug-in system.
Extensions should no longer use self.app
. They should use the application
object from the context.
Making DISQUS Realtime
Back-end Django and Postgres
Real time is an entirely new architecture.
Why do real time (less than 10 seconds)?
Getting new data to the user ASAPIncreased engagementLooks awesomeWe can sell it
Old realtime used polling which used jQuery to poll memcache. Was kinda #failscale!
Real*er* Time
Tested dark on 50% of network as is still a WIPHave seen 1.5 million concurrently connected users45 thousand connections per second165 thousand messages per second.2 seconds latency end to end
How do we do it?
nodejs and mongodb (no this is a python conference)gevent, gunicorn, flask, thoonk (a queue built on redis)redis (pub-sub), nginx, haproxy
Architecture
Django - new posts onto redis queue - backend gevent server - redis pub/sub - frontend gunicorn and flask - nginx and haproxy
Backend
Listens to Thoonk queue
Cleans and formats message - this is the final format before http publish - compress data now (gzip)
Publish message to pubsub forum:id, thread:id, user:id, post:id
Average processing time is 0.2 seconds. Queue maintenance - timeout 5 seconds. Separate pub/sub and non pub/sub redis. Quarantine failing messages (what jobs to re-queue, get and cancel jobs). Transactions can be picky. Planning on using zookeeper??
gevent
is nice
gevent spawn helpers, https://gist.github.com/3053495
Start, fail, start, fail, start, fail, kill
To yeild a thread
sleep(0)
Front End
Needs to be fast
Pools redis connections
Routes messages from pubsub to http
New request - create/register a subscription with the pool - sub/queue returns a python queue based on the channel.
Listener receives the message on a pubsub channel. If that channel has a subscriber, pass it on.
Long pollingish. Long help HTTP connection. Stream JSON over this connection. Why not web-sockets - because they don’t work yet - and are not stable. They will use them, with a fall back to long polling-ish. Must be JSON - as text will get buffered. With millions of connections, had to pool redis pub/sub.
Timeouts - needless reclaiming of resources. Maximise usage of cheap things (connection count). Minimise expensive things (requests per second). Getting rid of timeouts and increasing timeouts has increased concurrency.
Testing
Darktime - Use existing network to load test (at the beginning a few user complaints - cannot hide them from the browser console).
Darkesttime - load testing a single thread. Discovered alot of flaws in the architecture.
Have knobs you can twiddle.
Stats
Measure all the things
Especially hard when numbers don’t line up.
Try to express things as +1 and -1 if you can.
Is hard in distributed things.
I used
scales
from greplin metrics for py
Lessons
Do hard work early
Defer work you might never need
End to end ACKs are good, but expensive.
Timeouts are not free.
Greenlets are effectively free
Pub/sub is effectively free.
Nginx for real time, you must have
proxy_buffering off
Questions
Something faster than
pywsgi
?FapWS
Between WSGI and Web-Sockets, you could use ZeroMQ. Don’t think it works cross language.
Flask just loads the routes.. Use Blueprints to load the same endpoints multiple times. No database access.
Do you have to have Nginx and haproxy - can we run Gunicorn straight onto the web? No - you need Nginx and haproxy.
Gargoyle also has JavaScript options to switch things on and off.
Why REDIS? Needed pub/sub and a queue. ZeroMQ also provides pub/sub, but uses a broker - so hard to measure.
nydus
- consistent hashing for Redis.
Discovering Descriptors
https://github.com/inglesp/Discovering-Descriptors
__get__
, __set__
and __delete__
makes it a descriptor.
Slots restricts the attributes that a class can have.
Properties - similar to descriptor in many ways. Good for storing a single value but representing it in different ways.
Properties work best when they know about the class
Descriptors are more general, can often apply to any class.
Use descriptors if behaviour is different for classes and instances
Properties are syntactic sugar.
Read
Data Model Reference
Descriptor HowTo Guide
Unifying types and classes in Python 2.2
Guido’s History of Python blog.
Read Code
Lots of good examples in Django such as related objects.
Hybrid attributes in SQLAlchemy
python source
Tools/demo/eiffel.py
$ grep __get__ site_packages
Play
Implement methods,
__slots__
, properties in pure python.
Cubes - lightweight OLAP
https://github.com/Stiivi/cubes
Small, lightweight framework. Is one year old - does not have permissions etc.
Aggregation browsing, slicing and dicing.
Two parts - modelling and reporting (aggregating).
Four parts: Model, Aggregation Browser, Backends, http Server
Model
Business analyst view of the data. Different to your normal transactional view of the database. Smallest part of the data is called a fact. A cube is a collection of measurable facts.
Dimensions e.g. time, type - provides context for facts - used to filter - has a hierarchy.
Label attribute describes the data. The key allows for slicing.
Cubes can be localised.
Browser
Displays data
No pre-defined ways to store the data. Denormalised or snowflake.
For the browser to work, you need the model and the data.
The cell provides the data from a filter or selection. Can be multi- dimensional. Cells have a path - which describes the meaning of the key.
Three cut types - point, set, range.
Has an implicit hierarchy e.g. months within a year.
Can create
cross_table
Slicer is an OLAP server which uses HTTP and JSON:
slicer serve slicer.ini
. Also aslicer
command line tool.
SQL Backend
Supports star or snowflake (extended star) schema. Can also browse a denormalised table.
Future
Would be nice to have some formatters for visualisation libraries.
JavaScript library (check out
cubes-js
)More backends.
Open Data
Shared repository of models
Shared repository of dimensions
Public cubes - open slicer HTTP APIs
Simple module for Django i.e. read the Django models and then use the slicer server.
Stay light. Want to keep it simple and lightweight.
Python @ Layar
or, building complex and scalable systems using python and AWS
Case Study - Mobile augmented reality. Alot of python in the back end. Mobile clients are native code.
Django back end
Comprehensive feature setBuild web pages and APIActive communityMany good extensionsCan handle high volumes (Christophe Pettus, http://thebuild.com)Handles user registration, catalogue, web hostingFiles are stored on S3 (slow), database is MySQL on Amazon RDS (not the best choice for Django, but easier on AWS as it is setup for you).
Web facing
Two Django boxes with AWS load balancerDjango instances autoscale when load goes upPopular data in memcachedScaling database - bigger machine or read replicas
Logging
SentryGroup and count on similar messagesOne Sentry install for all your services
Visual Search Engine
Image recognition - CatchoomTornado withBoost.Python
interfacing to C++ binariesSharded for scale-out, redundant for HA and read speedStorage on EBS volumes (more expensive, but much faster)
Analytics
MySQL database collects dataDjango app stores SQL queries for aggregationcron job executes queries and stores resultsMore SQL queries feed HighCharts for fancy graphicsNote: this does not scale.
Long Running Jobs - (Spencer - home grown Twisted app)
Extracting images from PDFs, analyse imagesMultiprocessing rather than multi-threadingDefault 1 instance, easily scales to 20Calling separate programs to do processing lets you use anythingOnly 1300 lines of TwistedBasically simple queuing with background tasks.
AWS
ConvenientPay for what you useBasic monitoringWeb interface and command line toolsNot the most bang for your buckAssume no guaranteesDoes not excuse you from having Ops!Databases are very expensiveBackup outside of Amazon
Tips
python has alot to offerAutomate - Fabric and ChefDeploy early, darktest, waffle, gargoyleUsedjango-ztask
not CeleryCache from the beginning. Think about every query as you write it.
Going massive with uWSGI and nginx
Identify your context - trusted or untrusted?
System resources - memory, CPU, disk space, network bandwidth
How many sysadmins do you have?
Try to never reload system services for updating config
Let users do hard work. Good docs needed.
nginx - cheap, fast HTTP, SPDY proxy
uWSGI - for app hosting and management
server {
listen 80;
servername $hostname;
location / {
include uwsgi_params;
uwsgi_pass unix:/tmp/$host.socket;
Read about vassal
(I think these might be configuration files)
# single folder
uwsgi --emporer /etc/vassals
uwsgi --emporer "/etc/vassals/*/*.ini"
[uwsgi]
customer = customer001
uid = %(customer)
gid = %(customer)
socket = /tmp/example.com.socket
wsgi-file=/var/apps/yourapp/ap...
Linux Control Groups (cgroups
)
For security to limit CPU and memory
Is getting too complicated for me now!! Check out the slides…
uWSGIfastrouter
for single Nginx and multiple uWSGI servers.
What is missing
Static files servingLong running tasks and external daemonsdatabase
ssh
givessh
to users… (please)… Why?
Working on
ssh keys for secured subscriptionOther event systems (zeromq, redis)etc, etc
Questions
Is Nginx really needed? YesCan I build the next PaaS/ISP/hosting platform? YesHA Proxy is really good at something??
In Search of Reduced Loading Times
Uses Django and PostgreSQL
Optimizing SQL
django-debug-toolbar
django-devserver
django.db.backends
loggerDatabase logging,log_min_duratiion_statement
in PostgreSQl
Less Queries
select_related
- adds aJOIN
to get the data for the related table.
prefetch_related
for many to many and reverse for foreign keys. Will write a query to get all the data in just two requests from the database.
.iterator()
Tells Django to not cache the results from the database. Prevents unnecessary caching of results.
annotate
Always use
values()
beforeannotate
, so the SQL query only does aGROUP BY
on the required columns.
Raw SQL
Don’t be afraid to use raw SQL. Two methods,
Manager.raw()
anddjango.db.connection.cursor
RawQuerySet
is like aQuerySet
, but is not aQuerySet
. The objects returned are valid models.
defer()
and only()
defer
- columns to omit from theSELECT
listonly
- columns to specify in theSELECT
list
Bulk Operations
bulk_create
(django-bulk
for older versions of Django)COPY
for PostgreSQLDon’t forget to take advantage of the native features of your database.
De-normalisation
Mostly for read-only data - and only when you see performance issues.
Meta.Options.ordering
Don’t use it!!! as it will add an
ORDER BY
statement to every query.
Caching
Don’t use the database for sessions. Use
memcached
or signed cookies.
Template Compilation
You can pre-compile templates:
django.templates.loaders.cached.Loader
.Be very careful not to use any state in your template e.g. different template output for each user.
Entity Tags/Last-Modified
Allow to use browser cache (304 status code)
Worth doing only if it is easy to calculate the entity tag. No point doing anything expensive. Will save you bandwidth. Often good for home-pages if they don’t change very often. You could store the last updated time in your cache.
Will not work with personalised pages. Could think about using JavaScript for personalised sections.
Proxy might intercept the 304 status code. The request might not even get to your server.
Optimising Algorithms
Be careful with regular expressions…
I/O
Threads for I/OAsync I/O
PJAX
django-pjax
JavaScript library written by some guys from GitHub. Manipulates browser history.
Kivy
Runs on Android, iOS, Windows, OSX and Linux
Based on OpenGL ES 2.0. API for 2D and 3D graphics. OpenGL managed by Khronos Group.
Available for 90% of Android and 84% of iOS devices. Number is always growing.
Goal
Create a frameworkHandle all devicesCode once in python - deploy anywhereBased on CythonRapid prototyping
Community
5 core developers35 contributorsOver 500 users on mailing list
Performance improving with each release.
New language, kv
for widgets (a bit like CSS)
Demo
Multi-touch does not work with QT. Can only receive one event at a time. Kivy is multi-touch by default.
Android Play store has a Kivy app showing all the widgets.
Not doing native applications - applications will look the same on all devices.
Next version
SVG graphicsSimple 3D model loaderBetter documentation
Future
Grow the communityMore widgetsUnified build/packagingFaster executionHTML5? :)
Questions
License - LGPL - Will do a blog postSize of binaries