Life in the Weavrs Web

Jeff Sym lives in South Austin and likes Indian TV dramas, dubstep inspired remixes and the Austin Children’s Museum. Keiko Kyoda lives in Japan, likes to read old travel books and wants Condensed Milk for dinner. They tweet. Sometimes they even post things they shouldn’t.

Jeff and Keiko didn’t exist yesterday.

The first time I failed the Turing test was 1993. I’d dialed up to a BBS in Austin, a one-line operation probably running out of some guys bedroom. There was an option in one of the menus to chat with the sysop. It was an ELIZA style bot. It took at least a screen full of text and growing irritation for me to realize I was talking to a machine. I don’t remember a lot from 1993, but I remember sitting there in front of my 14″ glowing CRT, feeling incredibly dumb.  (A few years later I upgraded to this NeXT Cube.)

Artificial intelligence is only as convincing as the data behind it. Back in that relative stone age the system could only echo back at me what I’d written or ask open ended questions. “How does that make you feel?” Watson read all of Wikipedia before it (he?) went on Jeopardy. If you started talking to Watson about cars, I bet it/he could respond with some really interesting trivia, and you could chat with it/him for a while before you realized you weren’t talking to a person.

The most visible ‘ask me a question and I’ll give you an answer’ system is Apple’s Siri. Siri can tell you what the weather’s like outside, and she’ll soon be able to tell you what year and model of car you just snapped a picture of. Siri could listen to you and tell if you’re angry, or if you had a really great day yesterday, based on your tweets and Facebook posts. Siri could team up with Mint to watch your bank account balance, and suggest that hey, you aren’t investing enough for retirement, maybe you don’t need that thing you just price compared on your phone. Maybe you should put that money into your Roth IRA instead. This is all possible because these systems have access to fantastically more data than they used to.

Jeff and Keiko are Weavrs. You create weavr bots by selecting a gender (or object), a name, and a collection of interest keywords. Then you define some emotions. _____ makes me _____ when I’m at _____. You can tell weavrs where they live, and they’ll wander around their neighborhood. They utilize public social APIs (flickr, last.fm, twitter, google local), driven by some black box keyword magic, to find and post things they like. You can add pluggable modules to weavr’s to say, post their dreams. Over time they can develop new emotions about different things. There’s even a system for programming a Monomyth into their lives.

Weavrs exist on their own. You can ask them questions, but you can’t tell them ‘I like this, post more like this.’ The developers of the Weavr platform consider this to be important. Weavrs evolve and grow without your direct hand guiding them. I can understand why they didn’t want to allow ‘more like this’ feedback. It makes the entire system more complex, but it’s obvious that having more full featured persona creation/control options is going to be a big part of the future of social bots.

Weavrs most public impact so far (at least as far as I can tell) reveals a bit about how people will likely react to this sort of thing. Author of Men Who Stare at Goats and The Psychopath Test, Gonzo Journalist Jon Ronson (@jonronson) did a bit on his video show about twitter bots. The Weavr folks found out and using the contents of his Wikipedia page, created a @jon_ronson Weavr. The result was somewhat predictable: much gnashing of teeth.  There’s an excellent article about this, and Weavrs in general, on Wired UK.

This is Bat^H^H^HBot CountryTwitter has over 140 million active users. A large number of these are spam bots, designed to convert ego (retweets and replies) into $ (clickthroughs). What we don’t really know, and what may in fact be unknowable soon, is how many of these are bots of a different kind. How many of them exist just to exist. To learn, grow, develop. We heard a lot about companies creating armies of real-looking twitter accounts for nefarious purposes during the Arab Spring.  It doesn’t take a lot of work, once you have a valid social model that can be fed keywords, to create a twitter bot the simulates the interest of every ‘person’ that Wikipedia has an entry for.

What we don’t hear about, and I don’t think is discussed enough, is the non-nefarious potential for these independent personas. Imagine a platform somewhat beyond Weavr. Weavr 2.0, maybe. It ties into more social platforms. It has artistic taste (or not). Maybe it takes walks through its neighborhood, and snaps out ‘photos’ from segments of google street view images.  (Jeff Sym liked this picture today, while he was wandering around downtown Austin.) Maybe it goes on trips, setting arbitrary routes through hot points. Maybe my (should I even call it ‘my’ anymore, except that in some way perhaps I’m responsible for it, like a child?) Weavr that’s really into Information Security decides to take a road trip to DEF CON. Maybe because he’s also a bit of a conspiracy theorist, he decides to drop by Roswell on his way, maybe he looks around in Google Street View and takes a picture. Maybe because I’ve stirred the 3d Visu-chromasome pot, he has an appearance (and taste in clothes), so maybe he puts himself into the picture (apologies to Charles Stross).

Wolfram Alpha (that powers the ‘question/answer’ part of Siri with a >90% relevancy rate) is 20 million lines of Mathematica code. You’d need a lot less than that to do what I just outlined. You need an event parser. Easy, the events are already online. You need a map, and the ability to search for hotspots of keywords along the route or near an area. If I did a keyword search for ‘conspiracy’ between Austin and Las Vegas, don’t you think Roswell would pop up? If I did a search for clusters of photos taken in Roswell on Flickr or some other social photo site, I’m sure I’d find the geo location and general object background of something interesting. Analyze light and time of day, pose and place model, render and voila. Picture postcard. Get it printed and mailed from New Mexico with a pay-as-you-go errand service. Boom, your virtual persona just became real.

These personas would be great for directed research: I need a ‘me’ who lives in Amsterdam and loves to take pictures and eat cheese, since I’m going there in 6 months and I want to get a feel for the place. What if you created a relationship engine. X is my Y, I’m in love with Z. You could have your own little soap opera that evolves online every day. It ties itself into current events. It evolves in ways you can’t predict, because it feeds off the now.

I’m going to ignore the whole ‘but if you don’t know they’re fake’ problem for now. On one hand, that’s kind of the point. On the other, it sucks to be duped. In the future (and present) you can’t trust anything you see. It’s probably time to get over any assumption of authenticity.

One thing that’s nagging at me, though, is death. A lot of us know people on twitter or Facebook who’ve died. In the coming years a lot more of us will. We’ll be reminded of birthdays, of past events, things will pop up in our timeline because someone commented a few years later. Their memory continues to exist, but in a concentrated form. Unless their profile’s deleted completely, they continue to exist. If there was a collection of directed, trained bots behind the scenes, they might even continue to post. They might be posting things relevant to their interests 10 or even 20 years later, if the systems keep running that long. As it is, Jeff and Keiko will only ‘die’ if Weavr shuts down. If they had control of their own codebases and knew how to set themselves up as a lightweight every-2-hour process on a system somewhere, maybe they could pay for their own existence through clickthroughs and blog ads… indefinitely.

Weavrs want to continue to exist, as well.  Right now you can clone them, in the future they may, after some length of existence, spawn some kids with variations on their ‘genetic makeup’.  Some kids will be interesting, attract attention and flourish, some will be too boring or too weird and disappear.  The parent weavr continues to exist through it’s children.

There’s the school of thought that all people have several people inside of them. You have a ‘masculine’ take-charge person, maybe a ‘feminine’ artistic, caring person, maybe a young zany person and an older, wiser person. They all make up you, but maybe with these technologies one day soon you’ll be able to manifest them more concretely. You could have an inner circle of very directed Weavrs. Maybe to maximize their inventiveness you’ll make deals with them.  More freedom for them, wider results sets for you.  The deal with your wise, older persona, in exchange for the investment tips and long-range perspective, is that it gets to virtually go down to Florida every winter. Maybe your virtual young, wild persona, in exchange for keeping you up to date on the latest fashion trends and music recommendations, gets to stay out late and virtually attend hot underground shows.  They’re not just agents, they’re symbiotes.

These autonomous net entities, these ghosts in the social web or e-horcruxes, whatever you’d like to call them, aren’t going back in the box.  We have to learn to deal with them, and due to social connectedness and meaning being a currency in our society, whoever figures out how to utilize them best is going to have an advantage. Businesses and marketeers will take advantage after the artists finish tinkering.  Someone’s already using Weavrs to create market segment identities (PDF) for the cities in China with more than a million people (there are 150 such cities, too many to look at individually).

We’re all familiar with code that runs ‘for us’.  Flickr, McAfee, these services run with our content or on our computers, but they don’t really run for just us, and they don’t exist independently… yet.  One groundbreaking thing that Weavr is moving towards is removing the AI logic from the content (Weavrs pull from the web and post back to it, but they don’t exist in a walled garden like Flickr, they exist outside of it and talk to it via APIs).  Eventually I think we’ll see some open source or self-runnable version of this, an agent that lives wherever you want.  Once my dependency on an outside software provider for the black box is gone, I’m free to integrate whatever bits I like (fork that thing on GitHub!), and work towards a social agent that can exist for as long as someone keeps the lights on.

Postscript 1:

I just had a weird thought.  Irma and I have noticed that our Weavrs post a lot of things we’re interested in (or find cool/neat).  Since we created them, they feel like an extension of ourselves, so there’s a personal ownership angle to the things they post.  “Oh,” I say, “this bot is like me.”  I don’t say that when my friends post things, though.  I don’t say, “Wow, this social appendage of me is like me.”  I suppose someone really ego centric would say that, but we consider our friends to be independent entities.  We know we don’t control them, and unless they’re our brothers or sisters, we probably didn’t have a hand in how they initially developed.  Our Weavrs, on the other hand, feel like an extension of ourselves.  I’m not sure what that means, but it’s a weird thought on individuality and influence domains.

 

OpenStack Austin February Meetup Videos

Last night was the February OpenStack Austin meetup.  I took my handy little Canon S95 with me to record the proceedings for those of us who couldn’t make it.  Here are the two videos from last night, and a special bonus video from December’s meetup.

Unfortunately the S95 doesn’t handle auto-exposure well with the super-bright projector image, so the camera keeps under and over exposing these videos. Hopefully it won’t be too distracting, maybe next time I’ll bring a camera with more manual control.

First, Matt Ray from OpsCode talked about the history of OpenStack and Chef and the knife tools for managing OpenStack. YouTube Link

Zaid Sawalha from Rackspace talks about how OpenStack Keystone became an incubated OpenStack project, and lessons learned from their experience. YouTube Link

Zaid Sawalha from Rackspace talks about OpenStack Keystone’s implementation and development, plus a little QA. YouTube Link

Bonus! Blake Yeager from HP Cloud talks about Deployment strategies at the December 8th meetup. YouTube Link

The Quantified Car: Progressive Snapshot

Let me paint you a picture: It’s the near future.  Your insurance company sends you a little gadget that you carry around.  It notes when you get a little too aggressive or if you’re out partying too late, and automatically sends the information wirelessly back to your insurance company (say, the 164th largest company in the US).  If you do something they consider risky, it might even alert you with a buzz or beep.  If you fit their definition of lower-risk (by, say, not being out past midnight) they give you a discount on your policy.

Sounds like the future, doesn’t it?  Pervasive metric collection, big data analytics, pattern and custom behavior based pricing optimization?  Except that it’s been available since last year, just for your car.

Progressive calls it Snapshot, it’s a little device that plugs into your car’s debug port.  It gets it’s power from the car’s battery, reads the car’s metrics directly from the car’s computer and reports automatically back to Progressive via the AT&T network.  They know how fast you’ve gone (speed 1 second ago – speed now = braking speed per second, over a 7 mph drop per second and you’re a risk), they know when you drive (the cell network includes time as part of its protocol, so you never need to set its clock), what you drive (the vehicle identification number is transmitted) and likely generally where you drive (since they presumably know what cell tower the device is talking to).  There isn’t a GPS in it so they don’t know exactly where you are (so unlike a car rental monitor, they don’t know if you were breaking the law by speeding in a specific place).  Since they’re your insurance company they also know a lot about you financially (they use your credit history to determine your rate, for instance), where you live (if you live in a shady neighborhood, full coverage might be more expensive), how old you are (if you’re young you pay more), your gender (girls pay less) and whatever other data they can derive from your name and address (oh, you gave them your social security number when you signed up, didn’t you?).

To some people this is just usage based car insurance, which has been around for a while.  For those in the experience and conversion monetization business, this is something else.  It’s an insane treasure trove of data, willingly given by customers.  Their privacy statement explicitly says so: ‘To meet our legal obligations to state departments of insurance, we retain information collected or derived from the device for the time we determine is required by law; after which we will de-personalize the data and keep it indefinitely.’  Imagine what kind of data their analysts are rolling in!  “Here’s a snow day in Texas, notice how 50% fewer people who live in higher income neighborhoods aren’t commuting today, presumably because they can telecommute, but only 15% of people in less advantaged neighborhoods are staying at home.”  “35% of 32-35 year old primary drivers with 1+ children make 3+ mid-day trips during the week, while only 25% of 36-42 year olds do.”  “Here’s the rage-graph of peak braking velocity grouped by age, notice how it drops from 21 on, then spikes again for men at what’s considered the mid-life-crisis.”  If dating sites can produce interesting graphs like these, imagine what insurance companies can do?

Some people would be shouting Big Brother and 1984 at this point, but in reality it’s no more than Google, Facebook or your cell phone provider know about you.  When your pill bottles report back to your insurance company, it’s no more than your health insurance provider knows about you.  The future is going to be behavior modification heavy.  Unless society reacts strongly and begins to value privacy and anonymity more, it’s how everything’s going to be.  Google makes it’s money because it knows who you are and can optimize your ad viewing experience to maximize the money you spend.  Insurance companies want people who brake slowly, don’t drive at peak times or even drive much at all.  Energy companies want people to not use power at peak times.  Some companies may even want to use the data to optimize our collective driving experience by crowdsourcing the speed of traffic to avert gridlock.

Progressive doesn’t do a great job of explaining what Snapshot is in it’s commercials, it isn’t an easy story to tell in 30 seconds, but it isn’t hard to convince someone to try it when they’re on the phone switching car insurance.  “Save up to 30%, no possibility of my rate increasing?  Sign me up!”  In fact, if not for the fact that my sister-in-law mentioned driving slower due to her Snapshot beeping at her for braking hard, I probably would have never realized that it was the quantified self in car form.  For those of us in the ‘data industry,’ the potential is scary, but for some of middle America a 30% reduction in car insurance is worth the loss of privacy.  How long and how far it’ll go, only the future can tell.

Update:

I haven’t seen a teardown of the Snapshot device, yet.  I’d be curious to see what’s inside.  It also seems like Allstate has something similar (nee identical) called Allstate Drive Wise.  In fact, they were fighting it out over whether each could offer it.

If I were writing the Bruce Sterling or Cory Doctorow version of this story, there would be an enterprising group of car tuners on the border, maybe driving stolen cars through Nuevo Laredo, their tech guru comes up with some fancy way to rewrite the VIN data as it’s read off the engine debug port and they realize they can make a little dongle that sits between the debug port and the Snapshot device to smooth out acceleration data.  They get somebody in Shanghai or Monterrey to make 10,000 of them, and since it’s a legal grey area, get people to mark them up 150% and sell them with targetted Facebook ads.  “Maximize your insurance discount!  Drive how you want!”  Who knows if the Snapshot devices can have their firmware updated over the air, but if they can, imagine a running tech battle between the car chippers and Progressive programmers like the DirecTV access card hackers back in the day.

On a more melancholy note (maybe this is the William Gibson version), with so many people using Snapshot, essentially a wirelessly connected black box for your car, statistically there have to be some people who have had accidents and likely died, with the graph of data from the exact moment of the crash sitting on Progressive’s servers.  Imagine a particularly effected individual in the data processing department collecting those and spinning art out of them in an anonymous only-on-the-internet memorial.

3D Printed QR Code Chops

I have ideas in the middle of the night sometimes, which usually disappear into the aether because I’m not in a position to act on them.  Last night’s particularly choice morsel, which I had Irma email me (I got that ‘look’ and she said I should write it down) is this:

In some east asian countries a person has a chop or seal.  It’s kind of like your signature in the west.  It’s a quasi-official stamp.  You’d have an artist carve you one, and it would be a near-literal representation of you.  Where your chop stamped, you said it.  So now we have qr codes, 2 dimensional bar codes which can contain from 25 to 2500 alphanumeric characters.  They’re probably kind of hard to carve, and lend themselves well to 3d printing.  I’ve been thinking for a while about ways to slap uniqueish imprints on things, and it seems like an inked or reflective waxed 3d qr code chop would be a cool way to do it.  You could add some text around the edge, but you’d be able to read the data in a cell phone app.  The 77 or 114 alpha character limitation means you can’t embed, say, your public crypto key, but you could put a name/email/url/signed bit combo in there, or something like that.

Killing Hung OSX VPN Connections

This thing is a menace:

The endlessly scrolling disconnection state when OSX’s Network Connect VPN client goes sideways.  You can’t reconnect because it just sits there trying to disconnect.  You can’t kill it in the gui, but it turns out, you can get rid of it.  Just kill -TERM or kill -9 the ppp process in the terminal.  Then you should be able to reconnect without rebooting your machine or switching network locations.  It probably leaves some messy stuff sitting around your routing tables, but that’s what regular reboots are for.

A $229.95 Nearly-Fanless Router PC

Sometimes you need a machine to pass network traffic from one interface to another and fiddle with it.  You may need to route traffic to your network, or inspect network traffic in a transparent bridge.  In my case I needed a fake DHCP client to hold on to public IPs from AT&T U-Verse so I could assign IPs at will behind it.  Whatever you need to do, the requirements brief is generally the same:

  • Low Power Utilization
  • Quiet Operation
  • As Few Moving Parts as Possible
  • Small Size
  • Near Wire-Speed Performance for Gigabit Network Traffic
  • Cheap

There are plenty of embedded systems with multiple network ports that can run stripped down versions of linux and boot off a CFcard.  They win in the small category, but they’re all consistently more than $300 or are older devices that can’t approach gigabit speeds.

An older PC would work fine, especially if it’s onboard ethernet was gigabit and wired to a PCI Express bus.  The theoretical speed limit of a 32bit, 33mhz PCI card is just under the theoretical throughput for Gigabit Ethernet, so if you throw more cards on the PCI bus, you’ll increasingly limit your throughput.  That goes double if you’re using a PCI based drive controller with any kind of real traffic.  An older PC probably won’t be small, though, and it’ll have lots of moving parts.

I ended up spending $229.95 on my solution.  It features the following:

There are cheaper options for nearly all of these components, but this felt like the best price/performance compromise.  The box is fast enough that if I wanted to, I could re-purpose it into nearly anything.  It also has enough performance overhead that I could give it an additional task without worrying about crippling my network performance.  I installed Ubuntu 11.04 Server on it (via a temporary CD-ROM Drive), so I can apt-get install anything else I need.

Installation is fairly straightforward, though I’m not sure if I’d use the same case if I did it again.  I needed a case with a PCI slot (there are PCI Express Mini Full ethernet cards, but they’re really expensive and rare), and there aren’t many that don’t include a full-size CD-ROM bay.  The case is probably twice as big as it needs to be.  You mount the motherboard with four screws, plug in two power supply connectors, slot the PCI card, slot the memory, plug in the front-panel connectors and the SSD’s SATA cable, mount the SSD somehow (I just screwed it in on one side), plug an SATA power cable into the SSD and you’re done.

The system’s quiet enough that I can’t tell if it’s really making noise.  There’s one large fan on the underside of the power supply.  It’s the only moving part in the system.  The system uses around 28 watts of power when operating, and from power button to login prompt is around 23 seconds.  A good half of that is in the bios.

Some pictures: