Hundreds of people gathered for the first lecture at what had become the world’s most important conference on artificial intelligence — row after row of faces. Some were East Asian, a few were Indian, and a few were women. But the vast majority were white men. More than 5,500 people attended the meeting, five years ago in Barcelona, Spain.Timnit Gebru, then a graduate student at Stanford University, remembers counting only six Black people other than herself, all of whom she knew, all of whom were men.The homogeneous crowd crystallized for her a glaring issue. The big thinkers of tech …
When tech companies created the facial recognition systems that are rapidly remaking government surveillance and chipping away at personal privacy, they may have received help from an unexpected source: your face.Companies, universities and government labs have used millions of images collected from a hodgepodge of online sources to develop the technology. Now, researchers have built an online tool, Exposing.AI, that lets people search many of these image collections for their old photos.The tool, which matches images from the Flickr online photo-sharing service, offers a window onto the vast amounts of data needed to build a wide variety …
In recent years we’ve seen a whole bunch of visual/style fashion-focused search engines cropping up, tailored to helping people find the perfect threads to buy online by applying computer vision and other AI technologies to perform smarter-than-keywords visual search which can easily match and surface specific shapes and styles. Startups like Donde Search, Glisten and Stye.ai to name a few.
Early-stage London-based Cadeera, which is in the midst of raising a seed round, wants to apply a similar AI visual search approach, but for interior décor. All through the pandemic it’s been working on a prototype …
URUMQI, China — At the end of a desolate road rimmed by prisons, deep within a complex bristling with cameras, American technology is powering one of the most invasive parts of China’s surveillance state.
The computers inside the complex, known as the Urumqi Cloud Computing Center, are among the world’s most powerful. They can watch more surveillance footage in a day than one person could in a year. They look for faces and patterns of human behavior. They track cars. They monitor phones.
The Chinese government uses these computers to watch untold numbers of people in Xinjiang, a western region of China where Beijing has unleashed a campaign of surveillance and suppression in the name of combating terrorism.
Chips made by Intel and Nvidia, the American semiconductor companies, have powered the complex since it opened in 2016. By 2019, at a time when reports said that Beijing was using advanced technology to imprison and track Xinjiang’s mostly Muslim minorities, new U.S.-made chips helped the complex join the list of the world’s fastest supercomputers. Both Intel and Nvidia say they were unaware of what they called misuse of their technology.
Powerful American technology and its potential misuse cut to the heart of the decisions the Biden administration must face as it tackles the country’s increasingly bitter relationship with China. The Trump administration last year banned the sale of advanced semiconductors and other technology to Chinese companies implicated in national security or humans rights issues. A crucial early question for Mr. Biden will be whether to firm up, loosen or rethink those restrictions.
Some figures in the technology industry argue that the ban went too far, cutting off valuable sales of American product with plenty of harmless uses and spurring China to create its own advanced semiconductors. Indeed, China is spending billions of dollars to develop high-end chips.
By contrast, critics of the use of American technology in repressive systems say that buyers exploit workarounds and that the industry and officials should track sales and usage more closely.
Companies often point out that they have little say over where their products end up. The chips in the Urumqi complex, for example, were sold by Intel and Nvidia to Sugon, the Chinese company backing the center. Sugon is an important supplier to Chinese military and security forces, but it also makes computers for ordinary companies.
That argument is not good enough anymore, said Jason Matheny, the founding director of Georgetown University’s Center for Security and Emerging Technology and a former U.S. intelligence official.
“Government and industry need to be more thoughtful now that technologies are advancing to a point where you could be doing real-time surveillance using a single supercomputer on millions of people potentially,” he said.
There is no evidence the sale of Nvidia or Intel chip, which predate the Trump order, broke any laws. Intel said it no longer sells semiconductors for supercomputers to Sugon. Still, both continue to sell chips to the Chinese firm.
The Urumqi complex’s existence and use of U.S. chips are no secret, and there was no shortage of clues that Beijing was using it for surveillance in Xinjiang. Since 2015, when the complex began development, state media and Sugon had boasted of its ties to the police.
In five-year old marketing materials distributed in China, Nvidia promoted the Urumqi complex’s capabilities and boasted that the “high capacity video surveillance application” there had won customer satisfaction.
Nvidia said that the materials referred to older versions of its products and that video surveillance then was a normal part of the discussion around “smart cities,” an effort in China to use technology to solve urban issues like pollution, traffic and crime. A spokesman for Nvidia said the company had no reason to believe its products would be used “for any improper purpose.”
The spokesman added that Sugon “hasn’t been a significant Nvidia customer” since last year’s ban. He also said that Nvidia had not provided technical assistance for Sugon since then.
A spokesman for Intel, which still sells Sugon lower-end chips, said it would restrict or stop business with any customer that it found had used its products to violate human rights.
Publicity over Intel’s China business appears to have had an impact within the company. One business unit last year drafted ethics guidelines for its technology’s A.I. applications, according to three people familiar with the matter who asked not to be named because Intel had not made the guidelines public.
Sugon said in a statement that the complex was originally aimed at tracking license plates and managing other smart city tasks, but its systems proved ineffective and were switched to other uses. But as recently as September, official Chinese government media described the complex as a center for processing video and images for managing cities.
Advances in technology have given the authorities around the world substantial power to watch and sort people. In China, leaders have pushed technology to an even greater extreme. Artificial intelligence and genetic testing are used to screen people to see whether they are Uighurs, one of Xinjiang’s minority groups. Chinese companies and the authorities claim their systems can detect religious extremism or opposition to the Communist Party.
The Urumqi Cloud Computing Center — also sometimes called the Xinjiang Supercomputing Center — broke onto the list of the world’s fastest computers in 2018, ranking No. 221. In November 2019, new chips helped push its computer to No. 135.
Two data centers run by Chinese security forces sit next door, a way to potentially cut down on lag time, according to experts. Also nearby are six prisons and re-education centers.
When a New York Times reporter tried to visit the center in 2019, he was followed by plainclothes police officers. A guard turned him away.
The official Chinese media and Sugon’s previous statements depict the complex as a surveillance center, among other uses. In August 2017, local officials said that the center would support a Chinese police surveillance project called Sharp Eyes and that it could search 100 million photos in a second. By 2018, according to company disclosures, its computers could connect to 10,000 video feeds and analyze 1,000 simultaneously, using artificial intelligence.
“With the help of cloud computing, big data, deep learning and other technologies, the intelligent video analysis engine can integrate police data and applications from video footage, Wi-Fi hot spots, checkpoint information, and facial recognition analysis to support the operations of different departments” within the Chinese police, Sugon said in a 2018 article posted to an official social media account.
On the occasion of a visit by local Communist Party leaders to the complex that year, it wrote on its website that the computers had “upgraded the thinking from after-the-fact tracking to before-the-fact predictive policing.”
In Xinjiang, predictive policing often serves as shorthand for pre-emptive arrests aimed at behavior deemed disloyal or threatening to the party. That could include a show of Muslim piety, links to family living overseas or owning two phones or not owning a phone, according to Uighur testimony and official Chinese policy documents.
Technology helps sort vast amounts of data that humans cannot process, said Jack Poulson, a former Google engineer and founder of the advocacy group Tech Inquiry.
“When you have something approaching a surveillance state, your primary limitation is on your ability to identify events of interest within your feeds,” he said. “The way you scale up your surveillance is through machine learning and large scale A.I.”
The Urumqi complex went into development before reports of abuses in Xinjiang were widespread. By 2019, governments around the world were protesting China’s conduct in Xinjiang. That year, the Sugon computer appeared on the international supercomputing rankings, using Intel Xeon Gold 5118 processors and Nvidia Tesla V100 advanced artificial intelligence chips.
It is not clear how or whether Sugon will obtain chips powerful enough keep the Urumqi complex on that list. But lesser technology typically used to run harmless tasks can also be used for surveillance and suppression. Customers can also use resellers in other countries or chips made by American companies overseas.
Last year, the police in two Xinjiang counties, Yanqi and Qitai, purchased surveillance systems that ran on lower-level Intel chips, according to government procurement documents. The Kizilsu Kyrgyz Autonomous Prefecture public security bureau in April purchased a computing platform that used servers running less-powerful Intel chips, according to the documents, though the agency had been placed on a Trump administration blacklist last year for its involvement in surveillance.
China’s dependence on American chips has, for now, helped the world push back, said Maya Wang, a China researcher with Human Rights Watch.
“I’m afraid in a few years time, Chinese companies and government will find their own way to develop chips and these capabilities,” Ms. Wang said. “Then there will be no way to get a handle on trying to stop these abuses.”
Paul Mozur reported from Urumqi, China, and Don Clark from San Francisco.
You wait ages for foot scanning startups to help with the tricky fit issue that troubles online shoe shopping and then two come along at once: Launching today in time for Black Friday sprees is Xesto — which like Neatsy, which we wrote about earlier today, also makes use of the iPhone’s TrueDepth camera to generate individual 3D foot models for shoe size recommendations.
The Canadian startup hasn’t always been focused on feet. It has a long-standing research collaboration with the University of Toronto, alma mater of its CEO and co-founder Sophie Howe (its other co-founder and chief scientist, Afiny Akdemir, is also pursuing a Math PhD there) — and was actually founded back in 2015 to explore business ideas in human computer interaction.
But Howe tells us it moved into mobile sizing shortly after the 2017 launch of the iPhone X — which added a 3D depth camera to Apple’s smartphone. Since then Apple has added the sensor to additional iPhone models, pushing it within reach of a larger swathe of iOS users. So you can see why startups are spying a virtual fit opportunity here.
“This summer I had an aha! moment when my boyfriend saw a pair of fancy shoes on a deep discount online and thought they would be a great gift. He couldn’t remember my foot length at the time, and knew I didn’t own that brand so he couldn’t have gone through my closet to find my size,” says Howe. “I realized in that moment shoes as gifts are uncommon because they’re so hard to get correct because of size, and no one likes returning and exchanging gifts. When I’ve bought shoes for him in the past, I’ve had to ruin the surprise by calling him – and I’m not the only one. I realized in talking with friends this was a feature they all wanted without even knowing it… Shoes have such a cult status in wardrobes and it is time to unlock their gifting potential!”
Howe slid into this TechCrunch writer’s DMs with the eye-catching claim that Xesto’s foot-scanning technology is more accurate than Neatsy’s — sending a Xesto scan of her foot compared to Neatsy’s measure of it to back up the boast. (Aka: “We are under 1.5 mm accuracy. We compared against Neatsy right now and they are about 1.5 cm off of the true size of the app,” as she put it.)
Another big difference is Xesto isn’t selling any shoes itself. Nor is it interested in just sneakers; its shoe-type agnostic. If you can put it on your feet it wants to help you find the right fit, is the idea.
Right now the app is focused on the foot scanning process and the resulting 3D foot models — showing shoppers their feet in a 3D point cloud view, another photorealistic view as well as providing granular foot measurements.
There’s also a neat feature that lets you share your foot scans so, for example, a person who doesn’t have their own depth sensing iPhone could ask to borrow a friend’s to capture and takeaway scans of their own feet.
Helping people who want to be bought (correctly fitting) shoes as gifts is the main reason they’ve added foot scan sharing, per Howe — who notes shoppers can create and store multiple foot profiles on an account “for ease of group shopping”.
“Xesto is solving two problems: Buying shoes [online] for yourself, and buying shoes for someone else,” she tells TechCrunch. “Problem 1: When you buy shoes online, you might be unfamiliar with your size in the brand or model. If you’ve never bought from a brand before, it is very risky to make a purchase because there is very limited context in selecting your size. With many brands you translate your size yourself.
“Problem 2: People don’t only buy shoes for themselves. We enable gift and family purchasing (within a household or remote!) by sharing profiles.”
Xesto is doing its size predictions based on comparing a user’s (<1.5mm accurate) foot measurements to brands’ official sizing guidelines — with more than 150 shoe brands currently supported.
Howe says it plans to incorporate customer feedback into these predictions — including by analyzing online reviews where people tend to specify if a particular shoe sizes larger or smaller than expected. So it’s hoping to be able to keep honing the model’s accuracy.
“What we do is remove the uncertainty of finding your size by taking your 3D foot dimensions and correlate that to the brands sizes (or shoe model, if we have them),” she says. “We use the brands size guides and customer feedback to make the size recommendations. We have over 150 brands currently supported and are continuously adding more brands and models. We also recommend if you have extra wide feet you read reviews to see if you need to size up (until we have all that data robustly gathered).”
Asked about the competitive landscape, given all this foot scanning action, Howe admits there’s a number of approaches trying to help with virtual shoe fit — such as comparative brand sizing recommendations or even foot scanning with pieces of paper. But she argues Xesto has an edge because of the high level of detail of its 3D scans — and on account of its social sharing feature. Aka this is an app to make foot scans you can send your bestie for shopping keepsies.
“What we do that is unique is only use 3D depth data and computer vision to create a 3D scan of the foot with under 1.5mm accuracy (unmatched as far as we’ve seen) in only a few minutes,” she argues. “We don’t ask you any information about your feet, or to use a reference object. We make size recommendations based on your feet alone, then let you share them seamlessly with loved ones. Size sharing is a unique feature we haven’t seen in the sizing space that we’re incredibly excited about (not only because we will get more shoes as gifts :D).”
Xesto’s iOS app is free for shoppers to download. It’s also entirely free to create and share your foot scan in glorious 3D point cloud — and will remain so according to Howe. The team’s monetization plan is focused on building out partnerships with retailers, which is on the slate for 2021.
“Right now we’re not taking any revenue but next year we will be announcing partnerships where we work directly within brands ecosystems,” she says, adding: “[We wanted to offer] the app to customers in time for Black Friday and the holiday shopping season. In 2021, we are launching some exciting initiatives in partnership with brands. But the app will always be free for shoppers!”
Since being founded around five years ago, Howe says Xesto has raised a pre-seed round from angel investors and secured national advanced research grants, as well as taking in some revenue over its lifetime. The team has one patent granted and one pending for their technologies, she adds.
Research papers come out far too rapidly for anyone to read them all, especially in the field of machine learning, which now affects (and produces papers in) practically every industry and company. This column aims to collect the most relevant recent discoveries and papers — particularly in but not limited to artificial intelligence — and explain why they matter.
This week, a startup that’s using UAV drones for mapping forests, a look at how machine learning can map social media networks and predict Alzheimer’s, improving computer vision for space-based sensors and other news regarding recent technological advances.
Predicting Alzheimer’s through speech patterns
Machine learning tools are being used to aid diagnosis in many ways, since they’re sensitive to patterns that humans find difficult to detect. IBM researchers have potentially found such patterns in speech that are predictive of the speaker developing Alzheimer’s disease.
The system only needs a couple minutes of ordinary speech in a clinical setting. The team used a large set of data (the Framingham Heart Study) going back to 1948, allowing patterns of speech to be identified in people who would later develop Alzheimer’s. The accuracy rate is about 71% or 0.74 area under the curve for those of you more statistically informed. That’s far from a sure thing, but current basic tests are barely better than a coin flip in predicting the disease this far ahead of time.
This is very important because the earlier Alzheimer’s can be detected, the better it can be managed. There’s no cure, but there are promising treatments and practices that can delay or mitigate the worst symptoms. A non-invasive, quick test of well people like this one could be a powerful new screening tool and is also, of course, an excellent demonstration of the usefulness of this field of tech.
(Don’t read the paper expecting to find exact symptoms or anything like that — the array of speech features aren’t really the kind of thing you can look out for in everyday life.)
Making sure your deep learning network generalizes to data outside its training environment is a key part of any serious ML research. But few attempt to set a model loose on data that’s completely foreign to it. Perhaps they should!
Researchers from Uppsala University in Sweden took a model used to identify groups and connections in social media, and applied it (not unmodified, of course) to tissue scans. The tissue had been treated so that the resultant images produced thousands of tiny dots representing mRNA.
Normally the different groups of cells, representing types and areas of tissue, would need to be manually identified and labeled. But the graph neural network, created to identify social groups based on similarities like common interests in a virtual space, proved it could perform a similar task on cells. (See the image at top.)
“We’re using the latest AI methods — specifically, graph neural networks, developed to analyze social networks — and adapting them to understand biological patterns and successive variation in tissue samples. The cells are comparable to social groupings that can be defined according to the activities they share in their social networks,” said Uppsala’s Carolina Wählby.
It’s an interesting illustration not just of the flexibility of neural networks, but of how structures and architectures repeat at all scales and in all contexts. As without, so within, if you will.
Drones in nature
The vast forests of our national parks and timber farms have countless trees, but you can’t put “countless” on the paperwork. Someone has to make an actual estimate of how well various regions are growing, the density and types of trees, the range of disease or wildfire, and so on. This process is only partly automated, as aerial photography and scans only reveal so much, while on-the-ground observation is detailed but extremely slow and limited.
Treeswift aims to take a middle path by equipping drones with the sensors they need to both navigate and accurately measure the forest. By flying through much faster than a walking person, they can count trees, watch for problems and generally collect a ton of useful data. The company is still very early-stage, having spun out of the University of Pennsylvania and acquired an SBIR grant from the NSF.
“Companies are looking more and more to forest resources to combat climate change but you don’t have a supply of people who are growing to meet that need,” Steven Chen, co-founder and CEO of Treeswift and a doctoral student in Computer and Information Science (CIS) at Penn Engineering said in a Penn news story. “I want to help make each forester do what they do with greater efficiency. These robots will not replace human jobs. Instead, they’re providing new tools to the people who have the insight and the passion to manage our forests.”
Another area where drones are making lots of interesting moves is underwater. Oceangoing autonomous submersibles are helping map the sea floor, track ice shelves and follow whales. But they all have a bit of an Achilles’ heel in that they need to periodically be picked up, charged and their data retrieved.
Purdue engineering professor Nina Mahmoudian has created a docking system by which submersibles can easily and automatically connect for power and data exchange.
The craft needs a special nosecone, which can find and plug into a station that establishes a safe connection. The station can be an autonomous watercraft itself, or a permanent feature somewhere — what matters is that the smaller craft can make a pit stop to recharge and debrief before moving on. If it’s lost (a real danger at sea), its data won’t be lost with it.
You can see the setup in action below:
Sound in theory
Drones may soon become fixtures of city life as well, though we’re probably some ways from the automated private helicopters some seem to think are just around the corner. But living under a drone highway means constant noise — so people are always looking for ways to reduce turbulence and resultant sound from wings and propellers.
Researchers at the King Abdullah University of Science and Technology found a new, more efficient way to simulate the airflow in these situations; fluid dynamics is essentially as complex as you make it, so the trick is to apply your computing power to the right parts of the problem. They were able to render only flow near the surface of the theoretical aircraft in high resolution, finding past a certain distance there was little point knowing exactly what was happening. Improvements to models of reality don’t always need to be better in every way — after all, the results are what matter.
Machine learning in space
Computer vision algorithms have come a long way, and as their efficiency improves they are beginning to be deployed at the edge rather than at data centers. In fact it’s become fairly common for camera-bearing objects like phones and IoT devices to do some local ML work on the image. But in space it’s another story.
Performing ML work in space was until fairly recently simply too expensive power-wise to even consider. That’s power that could be used to capture another image, transmit the data to the surface, etc. HyperScout 2 is exploring the possibility of ML work in space, and its satellite has begun applying computer vision techniques immediately to the images it collects before sending them down. (“Here’s a cloud — here’s Portugal — here’s a volcano…”)
For now there’s little practical benefit, but object detection can be combined with other functions easily to create new use cases, from saving power when no objects of interest are present, to passing metadata to other tools that may work better if informed.
In with the old, out with the new
Machine learning models are great at making educated guesses, and in disciplines where there’s a large backlog of unsorted or poorly documented data, it can be very useful to let an AI make a first pass so that graduate students can use their time more productively. The Library of Congress is doing it with old newspapers, and now Carnegie Mellon University’s libraries are getting into the spirit.
CMU’s million-item photo archive is in the process of being digitized, but to make it useful to historians and curious browsers it needs to be organized and tagged — so computer vision algorithms are being put to work grouping similar images, identifying objects and locations, and doing other valuable basic cataloguing tasks.
“Even a partly successful project would greatly improve the collection metadata, and could provide a possible solution for metadata generation if the archives were ever funded to digitize the entire collection,” said CMU’s Matt Lincoln.
A very different project, yet one that seems somehow connected, is this work by a student at the Escola Politécnica da Universidade de Pernambuco in Brazil, who had the bright idea to try sprucing up some old maps with machine learning.
The tool they used takes old line-drawing maps and attempts to create a sort of satellite image based on them using a Generative Adversarial Network; GANs essentially attempt to trick themselves into creating content they can’t tell apart from the real thing.
Well, the results aren’t what you might call completely convincing, but it’s still promising. Such maps are rarely accurate but that doesn’t mean they’re completely abstract — recreating them in the context of modern mapping techniques is a fun idea that might help these locations seem less distant.
Virtual meetings are a fundamental part of how we interact with each other these days, but even when (if!?) we find better ways to mitigate the effects of COVID-19, many think that they will be here to stay. That means there is an opportunity out there to improve how they work — because let’s face it, Zoom Fatigue is real and I for one am not super excited anymore to be a part of your Team.
Mmhmm, the video presentation startup from former Evernote CEO Phil Libin with ambitions to change the conversation (literally and figuratively) about what we can do with the medium — its first efforts have included things like the ability to manipulate presentation material around your video in real time to mimic newscasts — is today announcing an acquisition as it continues to home in on a wider launch of its product, currently in a closed beta.
It has acquired Memix, an outfit out of San Francisco that has built a series of filters you can apply to videos — either pre-recorded or streaming — to change the lighting, details in the background, or across the whole of the screen, and an app that works across various video platforms to apply those filters.
Like mmhmm, Memix is today focused on building tools that you use on existing video platforms — not building a video player itself. Memix today comes in the form of a virtual camera, accessible via Windows apps for Zoom, WebEx and Microsoft Teams; or web apps like Facebook Messenger, Houseparty and others that run on Chrome, Edge and Firefox.
Libin said in an interview that the plan will be to keep that virtual camera operating as is while it works on integrating the filters and Memix’s technology into mmhmm, while also laying the groundwork for building more on top of the platform.
Libin’s view is that while there are already a lot of video products and users in the market today, we are just at the start of it all, with technology and our expectations changing rapidly. We are shifting, he said, from wanting to reproduce existing experiences (like meetings) to creating completely new ones that might actually be better.
“There is a profound change in the world that we are just at the beginning of,” he said in an interview. “The main thing is that everything is hybrid. If you imagine all the experiences we can have, from in-person to online, or recorded to live, up to now almost everything in life fit neatly into one of those quadrants. The boundaries were fixed. Now all these boundaries have melted away we can rebuild every experience to be natively hybrid. This is a monumental change.”
That is a concept that the Memix founders have not just been thinking about, but also building the software to make it a reality.
“There is a lot to do,” said Pol Jeremias-Vila, one of the co-founders. “One of our ideas was to try to provide people who do streaming professionally an alternative to the really complicated set-ups you currently use,” which can involve expensive cameras, lights, microphones, stands and more. “Can we bring that to a user just with a couple of clicks? What can be done to put the same kind of tech you get with all that hardware into the hands of a massive audience?”
Memix’s team of two — co-founders Inigo Quilez and Pol Jeremias-Vila, Spaniards who met not in Spain but the Bay Area — are not coming on board full-time, but they will be helping with the transition and integration of the tech.
Libin said that he first became aware of Quilez from a YouTube video he’d posted on “The principles of painting with maths”, but that doesn’t give a lot away about the two co-founders. They are in reality graphic engineering whizzes, with Jeremias-Vila currently the lead graphics software engineer at Pixar, and Quilez until last year a product manager and lead engineer at Facebook, where he created, among other things, the Quill VR animation and production tool for Oculus.
Because working the kind of hours that people put in at tech companies wasn’t quite enough time to work on graphics applications, the pair started another effort called Beauty Pi (not to be confused with Beauty Pie), which has become a home for various collaborations between the two that had nothing to do with their day jobs. Memix had been bootstrapped by the pair as a project built out of that. Other efforts have included Shadertoy, a community and platform for creating Shaders (a computer program created to shade in 3D scenes).
That background of Memix points to an interesting opportunity in the world of video right now. In part because of all the focus (sorry not sorry!) on video right now as a medium because of our current pandemic circumstances, but also because of the advances in broadband, devices, apps and video technology, we’re seeing a huge proliferation of startups building interesting variations and improvements on the basic concept of video streaming.
Just in the area of videoconferencing alone, some of the hopefuls have included Headroom, which launched the other week with a really interesting AI-based approach to helping its users get more meaningful notes from meetings, and using computer vision to help presenters “read the room” better by detecting if people are getting bored, annoyed and more.
Vowel is also bringing a new set of tools not just to annotate meetings and their corresponding transcriptions in a better way, but to then be able to search across all your sessions to follow up items and dig into what people said over multiple events.
And Descript, which originally built a tool to edit audio tracks, earlier this week launched a video component, letting users edit visuals and what you say in those moving pictures, by cutting, pasting and rewriting a word-based document transcribing the sound from that video. All of these have obvious B2B angles, like mmhmm, and they are just the tip of the iceberg.
Indeed, the huge amount of IP out there is interesting in itself. Yet the jury is still out on where all of it would best live and thrive as the space continues to evolve, with more defined business models (and leading companies) only now emerging.
That presents an interesting opportunity not just for the biggies like Zoom, Google and Microsoft, but also players who are building entirely new platforms from the ground up.
Mmhmm is a notable company in that context. Not only does it have the reputation and inspiration of Libin behind it — a force powerful enough that even his foray into the ill-fated world of chatbots got headlines — but it’s also backed by the likes of Sequoia, which led a $21 million round earlier this month.
Libin said he doesn’t like to think of his startup as a consolidator, or the industry in a consolidation play, as that implies a degree of maturity in an area that he still feels is just getting started.
“We’re looking at this not so much as consolidation, which to me means market share,” he said. “Our main criteria is that we wanted to work with teams that we are in love with.”