Wednesday, August 24, 2011

Talking about Talks

When we started planning the conference I intended to blog about EVERYTHING as a form of documentation. Whoops. No time like the present to catch up!

Without a doubt, setting up talks has been the most time consuming process in the conference planning so far. I want to shed some light on what I did this year, why it took so long and why it was so painful. This will give me some much needed catharsis and I would also be interested in hearing your feedback in the comments or elsewhere. The goal is to provide a solid, pain free template for next years organizers and for anyone else planning a conference in the future.

The Plan
The plan was simple enough: setup a Google moderator account and get feedback on what people want to see. Everyone who is planning on submitting will vet or obtain ideas there. Then, have people submit the talk to the ploneconf.org website with their bio and their talk (which will be magically normalized of course). The netsight product will take care of voting, scheduling, and display. Clean hands, everything automated, go to the pub.

The Reality
Google moderator worked "just alright". The main issue was that most people thought this was the talk submission process, and were pimping for votes SXSW style. This made it super confusing when we actually asked for talks to be submitted. Nonetheless, feedback was gathered and there were a lot of interesting ideas thrown around. 

Talk Submissions
Our "plan" did not include validating what was done last year code-wise ahead of time. We didn't realize that no permissions were set up for an open site, and everything was completely customized for Bristol. We never got logins integrated like we wanted either. Whoops. Deadline and pressure was mounting so we threw up a Google Form. Enter stage left: the vile mistress Regret.

Immediately we realized that people couldn't edit their submissions after clicking submit and started emailing us changes instead. Also, people had to triplicate their bios and many of them said things like "see other bio". This was not really fair to the submitters or us. At least the form looked nice, was embed-able, and I could add fields as people reminded me that I was missing important things like email address and first time speaker status (thanks to all those who took the time to send feedback - this is all now documented for next year). Totally not worth it though.

Deadlines
We decided up front there would be no extensions so that lit a fire under us to make sure people submitted (I personally get annoyed by our "deadline extended!" community habit). The post for talks went up and they start coming in quickly. At 3.5 days and 2 rooms, we were looking at around 65 talk slots to fill (not including related technologies track which was organized a bit differently). There was a big bump in submissions at first but then things started to trickle down to the point where we were nervous wouldn't reach our target number of talks. 

So we set up a raffle for a free conference ticket and doubled newbie chances to win. This went REALLY well, and encouraged people to submit multiple talks. Later, this gave me the opportunity to pick the strongest talks for a particular person and not have to turn down a great speaker that just picked a bad topic. This gave us plenty of great talks to chose from.

The other thing that happened is that people didn't believe the deadline was real and/or missed it. Some people submitted by email. Others by Google docs. Some talks were "implied". This process went on far too long. To give you an idea what I was dealing with, here are talks to format stats:
  • Google Forms: 72 (4 were submitted late)
  • Google Docs: 17
  • Email: 4 (2 never responded with bios)
  • IRC: 1
  • Chat: 1
If I messed up your talk, I deeply apologize but hopefully you understand why things slipped through the cracks. This goes back to the earlier point on not being able to edit talks. Lesson learned. 

Voting
Once all the talks were in we had to import them into the conference website for voting (for those keeping count, this is the 3rd e-format for talks). It took a good 2/3 of a day to set up the styles, update the CSV importer and Dexterity types, and deploy. It was easy drudgery. At some point a unicode problem was introduced we couldn't have any authors or titles with non-ascii letters. By that point I said "f*ck it", ran the import, and asked for help in manually cleaning up the bad encoding. Many thanks to bdbaddog on that one. This could all be avoided in the future.

Voting again was somewhat confusing since "we already voted". The controls aren't super intuitive but luckily they were the same as last year and didn't require authentication. All the reports were already there that I needed and people seemed to figure it out just fine. I will mention that there wasn't a single surprise result in there. Everything lined up just nearly exactly as I thought it would. It did give me numbers to work with for scheduling, which was a huge benefit (technique will be described below).

Scheduling
After all the votes had been cast, I stared at the computer for at least 2 hours thinking about where to go next. No matter what, code had to be updated (last years schedule was a hard coded html template) and it would be nice to make something reusable but I wasn't sure if it was possible in a timely manner. I looked at a couple calendar integration tools but none of them played well with the Dexterity type we were using. In the end I resorted to notecards and giant blocks of paper.

Process pseudocode (maybe someone can code this up for next year): 
  1. Transfer talk title, speaker, and vote count to postcards (format #4). If the talk was 45 minutes, leave it fill size. If it was 30 minutes, cut it in half. Talks go on 1 of 4 colored postcards: All, Beginner, Intermediate, or Advanced. This refers to level of development experience needed to see the talk. This was the easiest way for me to think of preventing talk conflicts. Note: I read what the speaker claimed their audience was but 9 times out of 10 there was a discrepancy from what was claimed and what the extended talk summary said. In those cases, I went with my gut.
  2. Sort each pile by number of votes. For each pile with too many talks, move the worst rated n talks to a separate pile. They aren't excluded, just removed from consideration until all the most popular talks are handled. Try to make all the piles an equalish size. Set aside 1/3 of the worse rated talks for multiple submitters for the same reason in a different pile. Talks that are almost exact duplicates get the same treatment. Side note: the person who submitted the most talks was Carlos de la Guradia at a whopping 6 talks!
  3. Draw a big calendar. I used some GIANT post it notes. Lay out the tracks and physically write in things that can't be changed such as keynotes, lightening talks, foundation meetings. Set up a plan ahead of time. I decided I would put "All" and "Beginner" talks in track 1, "Intermediate" and "Advanced" talks in track 2, and "Related Technology" talks in track 3. For each track I alternated between all and beginning, intermediate and advanced as much as possible. I pitted "All" talks against "Intermediate", and "Beginner" against "Advanced" as much as possible in attempt to avoid beginner/intermediate talk clashes. 
  4. Work in breaks and lunches. I went for a break or lunch about once every 2 hours.
  5. After getting all the super high rated talks mapped out and un-conflicted, pull in the reserve stacks and added them in by popularity and amount of space left in the track. Flex time talks really came in handy. The size of the cards fit perfectly for my mocked up calendar.
With this strategy, the hardest scheduling was the related technology track, since there would inevitably be clashes between the more advanced talks and those. When I had a conundrum, there was almost always a similar or duplicate talk in my backpocket that I just threw into the schedule at a different time. It took me a moment to get over the fact that there is no harm in having almost the same talk twice if they are both highly rated and well respected speakers: a conference schedule is not an ER diagram!

Last but not the very least was putting these talks in electronic form. I gave up on using the conference talks add-on as soon as I found SCHED.org. After an hour of beer and wallowing over the fact that I should have used that from the beginning, I spent the next 8 hours adding speakers, talks, keywords, etc to the new site (format #5). There is an API but all of the talks needed to be keyworded, slotted, and mapped to users so I sucked it up and just hammered it in. So much for automated.

And the 2011 Plone Conference Schedule was born.

Hindsight
No one is paying me hourly to make decisions for this conference so I have a tendency to revert to geeky habits and "make" or "wrangle" things instead of searching the web for tools and starting there. At first I was fitting conference stuff in at the end of the day and making poor choices. Now I tend to block of entire "conference days" to help reduce the decision fatigue factor. It helps a lot.

There is also a natural tendency with the conference to just get things done so people "stop bugging me about it" or because "if we don't do this soon we are screwed". I can't speak for past organizers but I have a feeling it's one of the core reasons why there is no reusable conference framework despite all the years of the conference being around. Day job money always trumps conference "profit" [<-- hilarious!]. It really takes a solid effort and motivation to do the right thing for the future. Just writing this post helps keep me focused and remembering to reuse where possible, be it our own code or someone else's.

Suggestions for Next Year
  • Get feedback on whether or not moderator actually influenced talks this year before going that route again. I have a sneaking suspicion it didn't make that much difference. Instead, use feedback from THIS years conference to purposely direct the speakers for next year. You could also encourage people to submit multiple talks from the beginning and weed out the weakest ones later.
  • Beware of the seductive siren that is Google Forms. 
  • Do the raffle for a free speaker ticket again. In fact, give away several if you can manage.
  • As a potential conference speaker, please please please use whatever form was submitted and submit your talk on time. I highly encourage next years organizers to set and enforce rules on talk submissions. "Go here and do it by X or it's out". The favors and special considerations will kill you. They killed me, and I didn't even honor all of them I think. I don't even know if I did or not!
  • Use sched.org or something like it from the get go. Plone is not good for this. SCHED prints out nicely and can be print out and cut up for manual organizing. It is mobile ready. It takes talk submissions and has a voting process. It is everything. I paid a $99 fee to get an add free version. Totally worth it. If someone steps up to sponsor the mobile version, we will be even better off.
  • There are still some things missing with SCHED. Do your research before jumping on it again. I have already started bothering them for key missing features so maybe next year they will have their shizzle together more. In your research, don't forget things like: voting, editing your own profile and talk, multiple tracks and venues, mobile support. Bonus if you find ajaxy drag and drop and auto buffers for talks.
  • As far as scheduling, the alternate talk sizes was a slight bit more overhead but in the end it actually gave me a bit of play room and the ability to fill empty slots. I will be interested to see how that goes.
  • We set up a ton of mailing lists for speakers, sponsors, etc with mailchimp. At the end of the conference, don't let 2011 organizers forget to get feedback for 2012. We tried a lot of new things like variable talks, staggered schedules, related tech - all of them need to be evaluated and reconsidered for next year.
  • If you have a bunch of people working dev, make sure someone is NOT developing and can be a manager. We know this on contracts: don't forget to apply the same philosophy here. (If I say it enough, I won't let it happen again). The perfect example is when we started "integrating" Brown Paper Tickets into the site. Put a bunch of people super geeked to give back into one room and all the sudden you've integrated something that doesn't need integrated to begin with (thanks to Jon Stahl for bringing some reason to that situation before it got out of hand). I should not have been even trying to dev anything in this scenario - role mixing is like vodka and redbull: it feels awesome at the time but the aftermath often involves vomit. 
  • While Brown Paper Tickets is local and a bit cheaper, Event Brite is integrated into just about everything already. Mailing list tools, scheduling tools, etc all support importing users directly with the API. Something to consider.
FAQ
  • Shouldn't "The State of Plone" talk be a keynote? We have a unique situation with space this year in that none of the rooms can technically hold more than 300 people (fire marshall Liz says they can). For the main keynote we rented out a space in the movie theater that is attached to SFSU. It will be awesome: you can drag your hungover butt in, eat popcorn and nachos, and get your mind melted. But it's $1000/hour. Yeah. Time to get in the movie theater business. 
  • Does that mean multiple tracks for Lightening Tracks and popular talks? No. We will be broadcasting live to overflow rooms in these cases. The crowd tends to thin out for these so hopefully we won't need them. We will use the preliminary results from scheduling on sched.org to decide which talks get overflow rooms in general. In those cases, we ask that people who are not actually paying attention use the overflow. You know who you are Mr/s. Email McYoutube.
  • My bio is weird - I didn't write one - what happened? If you didn't write your bio, there is a good chance that I did (under the influence of beer and exhaustion of course). Feel free to log in and change it. 
  • Can you fix talk X clashing with talk Y and move talk Z? Sure. If you give me the full swap formula and it passes through my "we can't do that because XYZ drama" filter at talks@ploneconf.org I'll seriously consider it. Also, please note that these will all be recorded so you can go back and watch something at any point in the future.
  • Will this schedule be changing? Likely. The general timeline is not going to change and key events will not move. Some talks may get shuffled here and there though and I fully expect some last minute additions/subtractions.
  • Wait, did you say reusing from last year? Yep. All the work we do on the site is open and started with an injection from the 2010 Plone Conference site code by Netsight. I encourage anyone who wants to change something to contribute at https://github.com/Plone-Conference-Devs. Little by little we can make a difference for the future. Hopefully in 5 years, everything will be easy peasy. Well, the tech part anyways. The dealing with people part needs its own documentation.
Many thanks to all the feedback, advice, and help along the way.  Feel free to comment below on changes, suggestions, etc. 

Monday, August 8, 2011

10 Minute Caching with Apache

The problem with caching is that it encompasses 3 layers of the stack, and requires a good solid read to understand what's really happening. You should enable browser caching on every site with more than 10 users, and proxy caching for resources pretty soon thereafter. This is my attempt to distill this down into a method that will get you "far enough".

Goal: cache everything, except that which is content in the browser and in a disk cache.
  1. Install and activate plone.app.caching (aka HTTP Caching). This is included by default after Plone 4.1. Use it. On the first tab: enable gzip and enable caching. Leave caching proxies and in memory cache at the default. On the third tab: Content files and images get moderate, content folder and content items get weak, and file and image and stable file and image get strong. You may want unstable file and image to be moderate but... I doubt it. When in doubt, use less caching. Last but not least, in the last tab, weak caching is configured to use last modified (that's it), moderate caching is default with just max age set, strong caching is same as moderate but up the expiration time to 864,000, and also check store in RAM cache. This number means those resources will be 10 days to be in the proxy/browser cache, and you can probably up that significantly as your caching chops get sharper. You may be tempted to put everything in ram cache at this point in time, but don't. That's for people who like to debug caches.
  2. Install Apache 2 and Configure Your Site. You only need to do this if this is a new site and/or it's not already configured. Many linux installs actually have this by default.  Some good ubuntu instructions are here. You probably want to enable rewrite and proxy as well, in which case you will need to a2enmod rewrite, proxy, proxy_http as well if you want to do rewriting, but this post isn't about that :)
  3. Configure Apache as a disk cache. Basic instructions are here. You will need to add one key line: "CacheIgnoreNoLastMod On". This is because the js and css resources don't have a modification time by default and we still want them cached on disk. In fact, these are the resources that we care the MOST about being cached. Make sure to set up the cache cleaning process. Check out the example site config to see how simple it can be. Note that this is just a balls to the wall quick write up. All apache is doing here is looking at your caching headers from p.a.caching and doing the same thing that the browser would. When configured properly, after the first fetch, Plone will never render any css, js, or image files again.
  4. Validate. Use LiveHeaders in Firefox or look at the network diagrams in Chrome/Safari. JS, CSS, and [non-content] images should never 304. They should download on the first pass and then always pull from cache. Pages, folders, or any other content should always 200. If you see any 404s in the meanwhile, stop immediately and fix them. Clear the cache. Validate that all new page requests come from the cache.
FAQ
  1. How do I know its hitting the cache? I'm so glad I pretended like you asked. There is no built in way to do this but with some clever jiggering it's easy. From the command line, tail the Z2.log(s) for each running zope instance. (Pst: you can tail multiple logs at once by just listing them with a space in between). Then, open up the browser. Load a page. Notice that everything is pulled the first time. Clear the cache. Load the page again. Note that all the resources have 200 OK status (meaning they were actually retrieved from the servers) but there is no entry in your Z2.log. It's a festivus miracle! This means that they were served from apache.
  2. What about HAProxy? That's nice, and I totally recommend it. But not required. 
  3. Should I use an apache in buildout? No. Unless you want to hate yourself or you are an expert sysadmin [that is trying to get fired]. Tutorials don't care about buildout and you want to have access to tutorials.
  4. Shouldn't I be using varnish too? Perhaps. I personally have never liked it, and there is no one config out of the box that makes me happy. All of my sites are closed, pretty dynamic and include 3rd party integration so caching individual pages never suits me. Remember, this is a 10 minute way to get basic results. Varnish is really overkill for a lot of sites and will just give you more headache than its worth. 
  5. Purging? That's for another tutorial. 
  6. What exactly is cached here and where? This does NOT cache and views or pages. That's the point. It's simple enough for debugging and small setups while still redirecting "crap traffic" to another server. It caches just enough and no more. I'm pretty sure there is a your mom joke in here somewhere...
  7. Isn't apache for old people? Beat it punk.
  8. Did you test these instructions? No, they were written in anger and rush. Comment if I messed up and I'll help you then update this. If there is significant interest I can do a more detailed post. This assumes a minimum sysadmin knowledge.
  9. What if I want to force fresh css/js? Easy. Just reinstall your product OR turn development mode on then off for js and/or css. This causes a new file name to be generated and the caching cycle to start over again. For static images, you may be in a harder situation. I have just relegated to create a new name for the images each time. I find that these rarely change since these images are almost always going to be from design. If you can wait the 10 days (or whatever time you set for strong caching) you don't need to do anything. \o/