Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Common Pitfalls with Django and South (andrewingram.net)
108 points by andrewingram on Dec 4, 2012 | hide | past | favorite | 38 comments


(Author of South here) Nice article - good set of common pitfalls, too. The multiple people one is especially tricky for people when they first encounter it, it seems.

I'm hoping I can resolve some of these issues (like the permissions thing) when we start moving parts of the code into Django proper.


Just want to thank Andrew for being a good custodian of south.


I've often gotten bitten by writing migrations in multiple apps during development and then later finding out that a new install can't be run due to circular dependencies.

Then its quite tricky to untangle them. Maybe there's a way to do a dry run full migration (fresh install) test or even a check and warning when creating a new migration. of course it usually involves multiple sibling apps that have too much knowledge of each other's FKs.

anyway, thanks for south !


Thanks for the update and for creating such an indispensable tool. Regarding your last line, is there any plans to integrate South into Django mainline code in the near future? Is there a place to read the philosophy or roadmap for this?


There is, hopefully in a couple of releases' time. There are some posts I made to the mailing list about it a while ago - I can't look them up quickly though - and a talk I did at DjangoCon this year.


Do you have a recommended workflow for multiple contributors using Git? Also do you ever intend to fix the django permissions bug that requires a syncdb --all?

Not really HN discussion points I guess but I have recently started using South on a project and the lack of clear direction here confused me. This bug has been known for quite a while but I couldn't find any best practise recommendations for either situation.


Unfortunately the permissions thing is really, really tough to fix - and not using Django permissions myself nor in any work projects, one that's rarely high on my priorities, alas.

For Git stuff, just make sure you're aware of who is writing migrations where and know what to do when you get a conflict. Making sure you have lots of smaller apps rather than one giant one really helps.


Everything stems from the fact South attempts to freeze your model definitions in time for each migration. While this works, often you need more than just the model definition (what effectively generates the DDL), but the surrounding codebase / Django magic as well (methods, model managers, etc.).

Another pain point with South is that you run your migrations after pushing a new release. That makes for hard zero-downtime updates, because your codebase and the database are out of sync for a while before you can run `django-admin.py migrate`.

After having used South, I feel the best approach still is to write your own SQL migrations by hand and run it against your database pre-deploy, never destroy data (erase/update columns/tables), and then make sure your codebase handles the new/absent columns/tables transparently. This way you can revert a broken release just pushing the previous version.


I've always had luck just running the migrations before hand -- the only real caveat there is that you need to do a two-step migration process if you're removing fields.

If all your changes are additions, and your app is managed by supervisord, it's fairly trivial to pull the code updates, run the migrations, then bounce your supervisor instance.

Like I said though, if you're deleting, the process is to stage out the deletes into a separate migration so that your codebase is using new or added/replacement columns before you remove tables or you're kind of screwed.

Also bear in mind that this knowledge is somewhat dated as all my recent Django ORM experience is based on Oracle (ugh) which I haven't gotten South to work with.


> I've always had luck just running the migrations before hand -- the only real caveat there is that you need to do a two-step migration process if you're removing fields. If all your changes are additions, and your app is managed by supervisord, it's fairly trivial to pull the code updates, run the migrations, then bounce your supervisor instance.

The problem is that some configurations need workers to restart every X requests to avoid memory leaks, and that makes the deployment process completely non-deterministic if you're pushing new code and expecting workers to still serve the old one. There are troubles with dynamic imports too, which happen on large codebases.


That's a good point that I didn't consider. I'm using Gunicorn and now that you mention it, I dunno how much is or isn't dynamic, but my process is dynamic so it takes seconds to happen and may or may not be throwing errors.

I've never noticed any errors deploying as described, but that doesn't necessarily mean anything in light of this information.


I've seen a couple of articles that have suggested both "don't use the orm" and "load fixtures using loaddata in your datamigration". These things contradict each other. loaddata uses your models, which will be potentially out of date. We have had to manually load up the fixture using json.loads, then create the model instances using orm.Foo and setattr.

Also, not a great solution, but we monkeypatched south to simply use the number in the file name, ie 0001.py. This way if two developers try to commit a migration, they cause a conflict, and we can fix it then instead of finding out later. It makes the filename less friendly, but we usually use grep/ack to find a migration anyways.


This is actually a really good point and a glaring oversight on my part, I'll have to edit that part of the post.

[Update: I've changed the last paragraph of the article to reference your comment and this solution]

I think the solution is to create a version of loaddata that knows how to work with South's frozen ORM. A quick search turned this up: http://stackoverflow.com/questions/5472925/django-loading-da...


Another approach is to use dumpscript from django-command-extensions and then search-and-replace to use orm instead of your own models.

http://code.google.com/p/django-command-extensions/wiki/Dump...


I use South on a fairly substantial django project (35 models, 85+ migrations). After around a few dozen migrations we got really frustrated with south. We still use south, but now run all migrations via db.execute(""CREATE ... """) with manual SQL rather than using the south models and the south wrappers. There were too many instances when there would be some little thing wrong, and we would get long, very hard to debug error messages. Also a number of times I would get my local install into a weird state where I could not run migrations, and I'd have to manually patch up. By doing straight SQL I always know exactly what is running, and so it is both easier to debug locally and feels safer to run against the prod data.

When we switched to db.execute, we lost the ability to do backwards migrations, but that's ok. Backwards migrations are really, really dangerous. In fact, deleting columns, ever, is really dangerous. If you are deleting data permanently it is always better to hand craft precisely what you want to do, and have everyone look it over, rather than running a backwards migration.


I've used South on at least 20 projects, some of which were very large (1yr+ of development, 100+ migrations, incl data migrations and raw SQL execution) and generally had Good Times throughout (aside from the coordination of migration on multiple developers' branches that is less of a South problem and more of a workflow one).

Caveat/qualifier: every project was on Postgres.


Backwards migrations are really only designed for quick testing in development - you really shouldn't be running them in production unless you just made a major screwup.

It's entirely possible to shoot yourself in the foot, but that's also what makes South versatile enough. Hell, there's a good reason that I recommend raw SQL in migrations as a solution sometimes - you'll still get the record-keeping and dependency stuff, so it's not like you're throwing it all away.


Here's another:

If you look at your history and think 'oh, migration 0005 hasn't been run', don't do 'migrate 0005'. That will of course take you back to 0005, deleting any modifications made after 0005.


We've just started using South on Heroku, and one thing I've been confused by is that I have to deploy the code to the live server in order to be able to run the migration commands, thus causing the site to run on the new code that depends on the post-migration database structure for a short time.

Does anyone know an easy way to run the migration on a Heroku Postgres database without pushing the code to a live server?


One way is to use symlinks to link in your live build. Then your deployment process can upload your new codebase and run the migrations before switching the symlink so that you new codebase is being served.

Where I work, we have a templated Django project that has a fabfile to do this: https://github.com/tangentlabs/tangent-django-boilerplate

If you look in the deploy function (https://github.com/tangentlabs/tangent-django-boilerplate/bl...), you can see the flow is something like:

    def deploy():
        deploy_codebase()
        ...
        migrate()
        ...
        switch_symlink()


This is a very creative solution. I like it.


Andrew Godwin wrote an interesting piece for the [Lanyrd blog](http://lanyrd.com/blog/2012/lanyrds-big-move/) where they talk a bit about putting their database in read-only mode. I know this doesn't cover your situation but it might help ease the transition of what to do while you are starting your migrations.


You should be able to get your database info by running

    heroku pg:credentials DATABASE_URL
And then you can just set it up as the database of a different install. Probably better to deploy the code before migrating though. Use maintenance mode to keep people out while you do.


As part of your Procfile you could add --migrate to the syncdb that happens.


Push the migration separately before you push the updated model definitions, when you're adding fields.


I would love to see some code examples, especially for the cases where something breaks, until you have your wits about you.


I wish this article was around year or two ago. I had an awful experience with south so much so that I switched to ruby and sequel. Sequel totally decouples your migrations from your model. It's pretty nice actually and ends up saving quite a bit of headache.


Wow. Must have been a pretty bad experience to warrant switching languages and frameworks!

Care to elaborate? I've always found South pretty straightforward.


Lot of things described here, circular references, conflicts amongst the team, trouble with simple data import. These things are a result of the coupling of models to migrations. The method of models describing and generating the migration which generates the db is troublesome.


However, in the reverse way, the migrations defining the models (as in Rails) often leads to mismatches as well. There's no easy solution to this problem, alas.

When model-based migrations work, they work really well, and help reinforce code schema/database schema being entirely in sync.


I recently started doing professional Django training. Migrations haven't come up in training sessions yet but I'm not sure what I would recommend. I feel South might be too intimidating to thiose just learning Django. Are there any alternatives?


When I first started doing Django, I found South intimidating. I think it's mostly just some vocabulary and concepts that end up feeling more complicated than they have to be (I remember the orm freezing being totally confusing, and things like my team members always 'faking' migrations baffling me).

That said, I think introduced properly/carefully South is the easiest way to do migrations by a landslide. If anything, I'd expect people to eventually grow out of it, not into it.. And not most people either. I think a number of the commenters talking about hitting bumps in the road once they got to (gasp!) dozens of models or migrations, probably bailed prematurely. Our project is 300+ models, and well over 500 migrations now and South still suits us pretty nicely.

If you'd like any help reviewing tutorial materials, I'd be happy to lend a hand. I haven't looked lately, but I'd love to see a better intro blog post for South show up. A couple years ago when I was learning I found what's out there just a little bit obtuse (not to knock Andrew's very solid tutorial at all, I just could have used something even more stripped down and careful about introducing concepts to get started).


I would recommend South.

I have found it has very nice/shallow "curve". In that if you are doing simple stuff, South is simple. Don't even have to look at generated migrations, just use management commends. As you need more complex things, you can learn more details about South.


I've heard good things about nashvegas[0]. It's a bit more manual of a tool, but maybe less surprising to new users.

[0] - http://nashvegas.readthedocs.org/en/latest/


I was thinking to write up a blog post showing how I handle migrations manually. Which has worked really well for me and is dead simple. I guess the catch is the user would have to know some SQL.


I've been using south for a few months now and I didn't even know half those pitfalls could exist. I'm Using south just for migrating simple changes to my DB and it is wonderful for that, but it seems like there is more depth to what it can do.

Great article.


I will defend ORMs until my dying day, but south is entirely new beast. Took me a few days of screwing around to be able to safely dig myself out of my own holes.


just started using python south ( and ruby taps ) ,so great tips ! tx




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: