Google App Engine: Uploading Data Tutorial

May 26, 2009

appengineIf you are my regular blog readers, you probably can safely skip this article. (Yup, you got my permission! ;) )

I’m going to talk about the gory details of bulkloading Google App Engine’s (GAE) ReferenceProperty and ListProperty, the two properties that confuse most developers due to the lack of documentation. We’ll use my app, Amanda & Jennifer’s Art Gallery, for this tutorial. Hopefully this will help you save some frustrating hours!

If you find any mistakes or a better way to implement anything in this article, please let me know! :) I’m always up for learning new techniques and improving my code.

A. DataStore Models

You might want to read Rafe Kaplan’s excellent “Modeling Entity Relationships” to understand how relationships work in GAE. By following his guideline on setting the ListProperty, you can avoid the quota problem that has bitten some developers.

OK, let’s start! My gallery app has 4 models: Artist, Artwork, Medium, and Subject. Since the Medium and Subject models are done in the same way, we’ll just use Medium for this tutorial. Note that all of the models presented here are simplified versions to allow us to better focus on the problems at hand.

# ajgallery.py

class Artist(db.Model):
  name = db.StringProperty(required=True)

class Artwork(db.Model):
 name = db.StringProperty(required=True)
 artist = db.ReferenceProperty(Artist, collection_name='artworks')
 photo = db.LinkProperty(required=True)
 thumbnail = db.StringProperty(required=True)
 media = db.ListProperty(db.Key)

class Medium(db.Model):
 name = db.StringProperty(required=True)

 @property
 def members(self):
 return Artwork.gql("WHERE media = :1", self.key())

B. Data

GAE takes comma separated CSV files. I use Google Docs’s Spreadsheet to fill in data and then export my files in CSV format. It’s pretty straightforward except for a couple of places: ReferenceProperty and ListProperty.

The following is a sample of my Spreadsheet data:

name artist photo thumbnail media
David Amanda http://…/david.jpg an img link ['t_pen','watercolor']
  1. For the artist ReferenceProperty, I have a string “Amanda“, which matches a name in the Artist Model. It will later be replaced with a Key.
  2. For the media ListPropery, I have a string list “['t_pen','watercolor']“. Note: Using single quote for each list item for later processing.

After exporting the data to the CSV file, I got:

David,Amanda,http://.../david.jpg,"an img link","['t_pen','watercolor']"

Notice that the list is turned into a string.

C. Loader Files

This is the part that gets most developers, or at least it got me. I’ll only mention my Artwork loader file artwork_loader.py here, but you should be able to figure out the rest from it.

# artwork_loader.py

from google.appengine.ext import bulkload
from google.appengine.api import datastore_types
from ajgallery import Artist, Medium

class ArtworkLoader(bulkload.Loader):
  def __init__(self):
    bulkload.Loader.__init__(self, 'Artwork',
                               [('name', str),
                                ('artist', str),
                                ('photo', datastore_types.Link),
                                ('thumbnail', str),
                                ('media', str.split)
                               ])

  def HandleEntity(self, entity):
    # update artist key
    f = Artist.all().filter("name =", entity['artist']).get()
    entity['artist'] = f.key()

    # update media key list
    temp_list = []
    temp = eval(entity['media'][0])
    for medium in temp:
      m = Medium.all().filter("name =", medium).get()
      temp_list.append(m.key())
    entity['media'] = temp_list
    return entity

if __name__ == '__main__':
    bulkload.main(ArtworkLoader())

For explanations on “(‘media’, str.split)” and “# update artist key“, please click on the corresponding links. Regarding to “# update media key list”:

  1. entity['media'] is a string list, i.e. ["['t_pen','watercolor']“] or ["['t_pen']“]. Since we are only interested in the 1st and only string element, we apply entity['media'][0].
  2. However, now we have a string, not a string list. (This is the same as what we got in our CSV file.) The easiest way to change it into a list is by doing eval().
  3. With a empty list temp_list, we can append all the matching Medium keys to it. We don’t append() it directly to entity['media'] and then pop() the first element later, because append() does not work for Key. You can work around it by doing “str(m.key())“, but then the keys turn into string and you can’t do the cool “medium.members” to get all the medium’s artworks. Andreas Blixt wrote a wrapper to solve the key validation problem; you might want to check it out.

D. Updating The Driver File

Now we need to update our ajgallery.py to take advantage of the keys we are going to produce:

# ajgallery.py

class ArtistHandler(webapp.RequestHandler):
  def get(self, target):
    :
    artist = db.GqlQuery("SELECT * FROM Artist WHERE name = :1", target).get()
    artworks = artist.artworks
    :

class MediumHandler(webapp.RequestHandler):
  def get(self, target):
    :
    medium = db.GqlQuery("SELECT * FROM Medium WHERE name = :1", target).get()
    artworks = medium.members
    :

Beautiful! We are almost there! :)

E. Uploading Data

Much thanks to Thomas Brox Røst’s wonderful “Porting legacy databases to Google App Engine” article. For some strange reason, I wasn’t able to upload my data to neither my development server or my app engine by following GAE’s “Uploading Data“. But Thomas Brox Røst’s article describes each step in details, and that’s what we’ll follow here.

We need to add new handler in the app.yml file:

# app.yaml

handlers:
- url: /load-artists
  script: artist_loader.py
  login: admin

- url: /load-media
  script: medium_loader.py
  login: admin

- url: /load-artworks
  script: artwork_loader.py
  login: admin

First we want to load our data to our development server to make sure that everything works fine. To do so, we first comment out the “login: admin” line in the app.yml file and then run the following command (Assuming the app is running at http://localhost:8080/ and bulkload_client.py is under /usr/local/bin/):

python /usr/local/bin/bulkload_client.py \
  --filename=artists.csv \
  --kind="Artist" \
  --url="http://localhost:8080/load-artists"  

python /usr/local/bin/bulkload_client.py \
  --filename=media.csv \
  --kind="Medium" \
  --url="http://localhost:8080/load-media" 

python /usr/local/bin/bulkload_client.py \
  --filename=artworks.csv \
  --kind="Artwork" \
  --url="http://localhost:8080/load-artworks"

Taking a look at our SDK console, we saw that our data are loaded properly and both Artwork.artist and Artwork.media are replaced with the proper artist and medium keys! And the app works as expected in our development server! I think that it’s time to upload everything to our app engine! :D

First, you upload your app as usual. Next, we need to get the cookie for the permission to upload data. To get our cookie, we type in the following URL in a browser:

http://your.appspot.com/load-artists

Actually any loader handler will do. GAE will redirect us to authenticate ourselves. Once we do that, a page with the proper cookie will be displayed in the browser. Cut-and-paste that cookie for the following uploading commands:

python /usr/local/bin/bulkload_client.py \
  --filename=artists.csv \
  --kind="Artist" \
  --url="http://your.appspot.com/load-artists" \
  --cookie='...'

python /usr/local/bin/bulkload_client.py \
  --filename=media.csv \
  --kind="Medium" \
  --url="http://your.appspot.com/load-artists" \
  --cookie='...'

python /usr/local/bin/bulkload_client.py \
  --filename=artworks.csv \
  --kind="Artwork" \
  --url="http://your.appspot.com/load-artists" \
  --cookie='...'

aj-gallery-05242009Congratulations! Now we have successfully uploaded our data! It’s time to try our new app! And of course it works beautifully as we expected! ;)

Entry Filed under: Geek Zone, Google App Engine. .

Leave a Comment

Required

Required, hidden

Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <pre> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Trackback this post  |  Subscribe to the comments via RSS Feed


一個人的快樂,
不是因為他擁有得多,
而是因為他計較得少。
Heaven/Nirvana is not a place but a condition.

Calendar

May 2009
S M T W T F S
« Apr   Jun »
 12
3456789
10111213141516
17181920212223
24252627282930
31  

Older Stuff :: 回憶錄

Top Posts :: 排行榜

Categories :: 分門別類

Recent Comments :: 留言

Recent Posts :: 近期目錄

Friends’ Books :: 朋友的書

Friends’ Blogs :: 部落格

Friends’ Websites :: 網站

Currently Playing :: 學習

  • python-logo
  • mysql_logo
  • Rlogo
  • hadoop-logo
  • OST_Logo
  • Currently Reading :: 書單

    Meta

    Blog Stats

    Feeds