I was traveling in China recently and noticed something interesting on the subway. Older people using their phones almost always hold the screen and talk into it. Younger people just type. At first I thought the older folks couldn’t type well. Turns out that’s not it. A lot of them just prefer talking. A Chinese friend told me WeChat blew up early on partly because of its walkie-talkie style voice messages. It got me thinking. Why do people seem to love voice so much once they try it? Then it hit me. Humans have been speaking for 100,000 years. Writing is maybe 5,000 years old. Mass literacy is a couple hundred. Typing is the historical exception. Talking is the default. This is already happening for human to human communication. Tools like Wispr Flow have a lot of heavy users now. You say something, it becomes text, you send it. The end product is still text, but the input side is voice. What I’m more curious about is the next step. Voice for talking to machines. For the last 100 years we’ve talked to computers with numbers, text, code. Siri-era voice could only trigger preset commands. LLMs change that. You can say something vague and an agent can break it down and act on it. Products like Owlfy are doing this for desktops. Rabbit pitched the same idea years ago with their “Large Action Model.” They didn’t pull it off, but the direction made sense. If this actually works out, it’s the third big shift in how people use computers. Command line, then GUI, then just talking. Each shift made computers usable for way more people. Of course I could be totally wrong. Voice has real downsides. It’s hard to skim, slower than reading, awkward in public. Picture an office where everyone is talking to their screen. Kind of weird. So I’m curious. When you’re interacting with a computer or a system, do you reach for voice or keyboard and mouse first? What’s the difference for you? submitted by /u/TheseSir8010
Originally posted by u/TheseSir8010 on r/ArtificialInteligence
