SMS Signature Overview
Maybe you’ve seen them before. Every single text message you get from one of your friends ends with “<3 Stacee” or “Hugs&Kisses”. It’s called a text messages signature. They are just like email signatures; users setup their phones to automatically insert a message at the end of every text message. This can be your name, a short phrase, or anything else you like, and are very common on newer phone models.
For example, I could setup my phone so that every message I send ends with “-=Ben=-” so people know it’s from me: “I’ll meet you at the coffee shop at 3pm -=Ben=-” If you’ve seen these before, you know exactly what I’m talking about. If you haven’t you probably think it’s crazy that someone would fill up part of their precious 160 characters with a signature!
Over the past year, we’ve processed millions of text messages. We’ve seen all sorts of different SMS signatures and unexpected user behavior and we want to share some of our data. Specifically, we wanted to understand how prevalent are SMS signatures and what can do to improve our text message processing algorithms.
First we chose a single keyword that we knew was actively promoted. We then extracted 6,000 unique text messages start started with keyword in the past year. Next we removed all duplicate messages so we only had a single message for each phone. Then we removed multiple different messages for each giving phone; it turns out that people change their signatures more than you think (some change them weekly)! Finally, we manually went through the list and discounted any extraneous messages that were obviously manually typed and clearly not a signature. Ambiguous messages were also removed from the list. This left 5,242 unique text messages to analyze.
(To protect the privacy of our customers and users, we are unable to publish the list of signatures, but we are able to share aggregate results with you.)
- 647 users out of 5,242 had SMS signatures, or 12.34% of users
- The average signature length is 11 characters
- The longest signature was a whopping 91 characters!
- 513 signatures, or 79% of them, had punctuation or other non-alphanumeric characters in them
- 13.8% of all signatures contain a heart <3 and 7.7% contain a smile =)
We were also curious if SMS signatures varied by demographic. We repeated our analysis for a known audience that was significantly older than the average. We do not know (nor could we share) the exact ages of users, so let’s just define the term ‘older’ to mean exclusively adults, as opposed to teenagers or college students. In this case, only 18 out of 414, or 4.3% of adults use SMS signatures, compared with 12.3% above.
By now you have probably guessed why this is such a significant topic for Mobile Commons. SMS signatures might be fun when texting your friends, but they post a significant technological challenge for companies like ours. Our products are built around recognizing keywords for opting in and search terms for querying data while on the go. How can we reliably parse keywords and queries when messages have seemingly random text added to them?!
This is best illustrated with an example: Consider FishPhone, an SMS service that lets you text FISH and the name of a fish to 30644 and get back the health and environmental impact of that species. For example, “fish halibut” tells you the environmental impact of halibut. But what happens if the text message reads “fish halibut peace,<3,and :)”? Or “fish atlantic salmon xoxobaby girlxoxo”? How can we tell what you’re searching for? It gets even harder when you allow multi-word keywords (e.g. You can text WHERE IS and a bottle water brand to 69866 to find out how far away your water’s source is).
This is a great example of a real world problem. You can write the best algorithms in the world for parsing text messages, but until you start processing live user data, you have no idea what you’re likely to encounter or how wrong your assumptions were. Unfortunately I can’t reveal our algorithms (that’s part of our secret sauce), but I can say that they have evolved tremendously over the past couple years.
In message processing systems that only rely on a single keyword, SMS signatures aren’t really a problem; your system can just grab the first word. When you start allowing multi-word keywords (e.g. text “do something” to 30644), keywords with search terms, or data collection (text your email address and zip code), signatures become quite difficult to deal with. We’re quite proud of the natural language processing algorithms we’ve written over the past two years to elegantly handle these sorts of real-world challenges. We constantly tweak our algorithms and run regression tests to ensure we have the most sophisticated and accurate text messaging processing product available!