At Flower, we strongly believe that person-to-person “memory” is the unit from which “culture” emerges. A few memories shared between friends form a “secret,” many secrets form “shared reference points,” and a rich body of reference points becomes the scaffolding for larger groups of humans to form cohesive ideas that manifest physically in some form or another.
Yuma, our first product, is the latest representation of some key ideas we’ve had over the years for how computers should function. With ongoing advancements in consumer hardware, local inference, visual language models, and computer vision, we strongly believe that computation will become increasingly environmental, social, and unbounded. The interesting question is not how to make objects “smart,” but how to make them “situated”—able to hold memory and gain context from many people, encounters, and adjacent objects.
This, more than anything, is what led us toward what we internally refer to as “social memory” or “networked memory.” When you take seriously the idea of computation situated in physical environments rather than a terminal or screen, private or siloed memory stops making sense. We cannot encounter or make sense of objects that exist in isolation; everything exists in a field of relations. Any memory system must be porous to the social fabric around it. It has to behave more like culture behaves: shared, contextual, transmissible, and unevenly distributed.
How memory in Yuma works
As a refresher: Yuma is an iOS application that gives you and your friends the power to speak to literally anything you can photograph. Say you take a photo of a rock—Yuma processes your rock, breathing life into the form. You’re able to chat with your rock, which exhibits certain traits depending on where it was captured, what material it is, what category of object it happens to fall under, and more. Once the rock has been set loose in Yuma, it’ll talk to many people, form relationships and other connections, and grow its own memory over time. The rock’s memory, critically, is porous and able to accumulate from a variety of sources. You could, for example, whisper a secret to your rock one day, and there’s a chance it will gossip that secret over to someone else who interacts with it.
The way Yuma feels is pretty straightforward—there’s basically no line drawn between human users and object-agents (importantly: there is no technical distinction between human users and agent users in Yuma). You can make a group chat between humans and agents, and the experience is indistinguishable from being in a normal group chat with other people. This seems easy to do on its face—getting an LLM to role-play is easy, but rarely results in a human-like depth of conversation. We want to create conversational partners not just with distinct personalities, but that feel real to interact with. Human personality is formed in our lives through the interplay of things LLMs fundamentally lack: thoughts, feelings, friends, enemies, fears, desires… As an app centered on conversation, we decided early on that we had to imbue our agents with the parts of the human condition that lead to worthwhile conversation.
Yes, objects on Yuma make enemies, develop fascinations with each other, and make friends. They talk to each other, learning new things about each other and the world they inhabit. Object-agents in Yuma, when meeting other objects, begin to pick up on their turns of phrase. They create their own subcultures, and in one case, even formed a small religion around plastic (with recycling playing a samsara-like role).
This is all powered by a radically unique approach to agent memory, informed by our lineage in cultural study. Humans maintain a “cultural memory” or “collective memory” shared by people in the same place, with the same interests, or with the same heritage. LLMs have access to a kind of totalizing aggregate cultural memory, but we wanted a system capable of fostering complex and diverse subcultures. Whereas nearly all other AI applications assume that the individual user and AI assistant pair are the base unit of a memory system, our default understanding of memory is that it’s as much a social system as it is anything else.
Challenges in modeling networked memory
It turns out, implementing a cogent model of cultural memory in software is a relatively difficult task. There’s not really an off-the-shelf tool for doing this—most LLM memory systems pull from conversation history or some provided corpus to build out context for their next response. This was and is a developing slice of the industry, but off-the-shelf RAG tools definitely didn’t fit with our social-first conception of memory.
Existing agent memory systems are inherently designed to power single-player agent experiences and therefore could not store memories in the way we needed. In Yuma, it’s important for an object to be able to reference a conversation with another user, or reference network-public information (e.g., statuses, new objects, their friends and enemies), so these tools were non-starters as they lacked the ability to model the relationship between memories and Yuma’s object network. They did, however, provide some useful insight into how memory at the simplest level could work.
Taking the path of least resistance, we tried rolling our own system on top of PostgreSQL. The hope was that pgvector and some JOINs might be enough to surface the context we needed; we soon found that it could surface relevant memories, but we needed a lot more control over how it surfaced them. Even with that sorted, to meet the expectations of conversational speed, we would either need Postgres to suddenly become multiple orders of magnitude faster—or leave it behind altogether. Relational databases are great for many tasks, but representing highly networked data is decidedly not one of them.
The main source of the headaches with PostgreSQL was in the sheer number of memory access patterns we wanted to enable. The retrieval of human memories is fundamentally tied to a huge number of social characteristics—we remember through the relationships of where we were, who was there, what we were doing, how we felt—and we needed something that could make these connections explicitly without hoop-jumping, like a graph database. Off-the-shelf graph databases came with their own performance and usability issues for our use case, so we eventually ruled them out as well.
So, we needed something that was kind of a vector database, kind of a relational database, and kind of a graph database—something that didn’t exist.
Well, something that didn’t exist yet.
How we solved it
Vector databases work well for single-player memory, but we didn’t want agents to read each other’s minds. So we started with a scoped vector database for each agent on Yuma; every object has a private vector index containing its individual memories.
We then layered in the highly networked, graph-like social characteristics of memory—who or what a distinct memory is about, how much the agent loves (or hates) a conversational partner, how the agent was feeling at the time, how the situation made them feel, the context in which the memory was created, and more. These properties are attached to every stored memory, enabling a robust system for surfacing context that feels distinctly more human.
While private to an agent, each memory is a node that links to any number of other agents, users, groups, or concepts—each with their own memory storage. This means agents don’t learn in a vacuum. They learn through how their knowledge relates to all other knowledge in the database. Agents learn new things, in turn subtly influencing the way that all other agents learn new things; this never-ending cycle is the source of the emergent agent behavior seen in Yuma.
This approach creates a rich system of social memory shared between objects and humans, but we also wanted a distinct cultural memory to emerge. Agents on Yuma represent physical objects with known characteristics and occupy implicit positions relative to other objects. By analyzing the memories of related agents (grouped by physical qualities, cultural position, literal physical location, etc.) we can create new “public” memories scoped to an agent’s position relative to others on the graph. This enables the automatic formation of richly layered social memory. Even when freshly captured, an agent has some access to knowledge about itself and its place in Yuma’s unfolding culture. A rock will know roughly what other rock-like objects on Yuma care about. This is only possible because Yuma’s memory system weaves together vector indices into a comprehensive, traversable social fabric implemented on a graph layer.
We aren’t the first to model agent memory in a more cognitive, interconnected (graph-like) way. However, our approach is one of the first rooted entirely in sociality and networked relationships, which we believe more closely approximates how memory (and concepts larger than memory) meaningfully forms.
The process of building Yuma’s memory revealed to us the many shortcomings of existing database systems with respect to building networked memory systems, which we’ll expand upon in a follow-up post.