Google App Engine: Uploading Data Tutorial
May 26, 2009
If you are my regular blog readers, you probably can safely skip this article. (Yup, you got my permission!
)
I’m going to talk about the gory details of bulkloading Google App Engine’s (GAE) ReferenceProperty and ListProperty, the two properties that confuse most developers due to the lack of documentation. We’ll use my app, Amanda & Jennifer’s Art Gallery, for this tutorial. Hopefully this will help you save some frustrating hours!
If you find any mistakes or a better way to implement anything in this article, please let me know!
I’m always up for learning new techniques and improving my code.
A. DataStore Models
You might want to read Rafe Kaplan’s excellent “Modeling Entity Relationships” to understand how relationships work in GAE. By following his guideline on setting the ListProperty, you can avoid the quota problem that has bitten some developers.
OK, let’s start! My gallery app has 4 models: Artist, Artwork, Medium, and Subject. Since the Medium and Subject models are done in the same way, we’ll just use Medium for this tutorial. Note that all of the models presented here are simplified versions to allow us to better focus on the problems at hand.
# ajgallery.py
class Artist(db.Model):
name = db.StringProperty(required=True)
class Artwork(db.Model):
name = db.StringProperty(required=True)
artist = db.ReferenceProperty(Artist, collection_name='artworks')
photo = db.LinkProperty(required=True)
thumbnail = db.StringProperty(required=True)
media = db.ListProperty(db.Key)
class Medium(db.Model):
name = db.StringProperty(required=True)
@property
def members(self):
return Artwork.gql("WHERE media = :1", self.key())
B. Data
GAE takes comma separated CSV files. I use Google Docs’s Spreadsheet to fill in data and then export my files in CSV format. It’s pretty straightforward except for a couple of places: ReferenceProperty and ListProperty.
The following is a sample of my Spreadsheet data:
| name | artist | photo | thumbnail | media |
|---|---|---|---|---|
| David | Amanda | http://…/david.jpg | an img link | ['t_pen','watercolor'] |
- For the artist ReferenceProperty, I have a string “Amanda“, which matches a name in the Artist Model. It will later be replaced with a Key.
- For the media ListPropery, I have a string list “['t_pen','watercolor']“. Note: Using single quote for each list item for later processing.
After exporting the data to the CSV file, I got:
David,Amanda,http://.../david.jpg,"an img link","['t_pen','watercolor']"
Notice that the list is turned into a string.
C. Loader Files
This is the part that gets most developers, or at least it got me. I’ll only mention my Artwork loader file artwork_loader.py here, but you should be able to figure out the rest from it.
# artwork_loader.py
from google.appengine.ext import bulkload
from google.appengine.api import datastore_types
from ajgallery import Artist, Medium
class ArtworkLoader(bulkload.Loader):
def __init__(self):
bulkload.Loader.__init__(self, 'Artwork',
[('name', str),
('artist', str),
('photo', datastore_types.Link),
('thumbnail', str),
('media', str.split)
])
def HandleEntity(self, entity):
# update artist key
f = Artist.all().filter("name =", entity['artist']).get()
entity['artist'] = f.key()
# update media key list
temp_list = []
temp = eval(entity['media'][0])
for medium in temp:
m = Medium.all().filter("name =", medium).get()
temp_list.append(m.key())
entity['media'] = temp_list
return entity
if __name__ == '__main__':
bulkload.main(ArtworkLoader())
For explanations on “(‘media’, str.split)” and “# update artist key“, please click on the corresponding links. Regarding to “# update media key list”:
- entity['media'] is a string list, i.e. ["['t_pen','watercolor']“] or ["['t_pen']“]. Since we are only interested in the 1st and only string element, we apply entity['media'][0].
- However, now we have a string, not a string list. (This is the same as what we got in our CSV file.) The easiest way to change it into a list is by doing eval().
- With a empty list temp_list, we can append all the matching Medium keys to it. We don’t append() it directly to entity['media'] and then pop() the first element later, because append() does not work for Key. You can work around it by doing “str(m.key())“, but then the keys turn into string and you can’t do the cool “medium.members” to get all the medium’s artworks. Andreas Blixt wrote a wrapper to solve the key validation problem; you might want to check it out.
D. Updating The Driver File
Now we need to update our ajgallery.py to take advantage of the keys we are going to produce:
# ajgallery.py
class ArtistHandler(webapp.RequestHandler):
def get(self, target):
:
artist = db.GqlQuery("SELECT * FROM Artist WHERE name = :1", target).get()
artworks = artist.artworks
:
class MediumHandler(webapp.RequestHandler):
def get(self, target):
:
medium = db.GqlQuery("SELECT * FROM Medium WHERE name = :1", target).get()
artworks = medium.members
:
Beautiful! We are almost there!
E. Uploading Data
Much thanks to Thomas Brox Røst’s wonderful “Porting legacy databases to Google App Engine” article. For some strange reason, I wasn’t able to upload my data to neither my development server or my app engine by following GAE’s “Uploading Data“. But Thomas Brox Røst’s article describes each step in details, and that’s what we’ll follow here.
We need to add new handler in the app.yml file:
# app.yaml handlers: - url: /load-artists script: artist_loader.py login: admin - url: /load-media script: medium_loader.py login: admin - url: /load-artworks script: artwork_loader.py login: admin
First we want to load our data to our development server to make sure that everything works fine. To do so, we first comment out the “login: admin” line in the app.yml file and then run the following command (Assuming the app is running at http://localhost:8080/ and bulkload_client.py is under /usr/local/bin/):
python /usr/local/bin/bulkload_client.py \ --filename=artists.csv \ --kind="Artist" \ --url="http://localhost:8080/load-artists" python /usr/local/bin/bulkload_client.py \ --filename=media.csv \ --kind="Medium" \ --url="http://localhost:8080/load-media" python /usr/local/bin/bulkload_client.py \ --filename=artworks.csv \ --kind="Artwork" \ --url="http://localhost:8080/load-artworks"
Taking a look at our SDK console, we saw that our data are loaded properly and both Artwork.artist and Artwork.media are replaced with the proper artist and medium keys! And the app works as expected in our development server! I think that it’s time to upload everything to our app engine!
First, you upload your app as usual. Next, we need to get the cookie for the permission to upload data. To get our cookie, we type in the following URL in a browser:
http://your.appspot.com/load-artists
Actually any loader handler will do. GAE will redirect us to authenticate ourselves. Once we do that, a page with the proper cookie will be displayed in the browser. Cut-and-paste that cookie for the following uploading commands:
python /usr/local/bin/bulkload_client.py \ --filename=artists.csv \ --kind="Artist" \ --url="http://your.appspot.com/load-artists" \ --cookie='...' python /usr/local/bin/bulkload_client.py \ --filename=media.csv \ --kind="Medium" \ --url="http://your.appspot.com/load-artists" \ --cookie='...' python /usr/local/bin/bulkload_client.py \ --filename=artworks.csv \ --kind="Artwork" \ --url="http://your.appspot.com/load-artists" \ --cookie='...'
Congratulations! Now we have successfully uploaded our data! It’s time to try our new app! And of course it works beautifully as we expected!
Entry Filed under: Geek Zone, Google App Engine. .






Trackback this post | Subscribe to the comments via RSS Feed