OpenCUI


Open Frontend Framework for Chatbot Development
Schema Grounded Chatbots for Any Services
overview
What is OpenCUI ?
ROLE
Product Manager and Product Designer
DURATION
Oct. 2017 - Present
TEAM
Engineer Team (4 Full-time, 2 Part-time)
1 Product Intern
WEBSITE
opencui.io
OpenCUI is an open frontend framework for building schema grounded conversational interface, with state-of-the-art natural language understanding, taking a dynamic statecharts approach.

In this schema ground approach, the goal is to build conversational interface for data types required by API schema. Natural language text and voice are converted into schemas first, which represents what service users want and how they want it. Structured data returned from business logic is then rendered into natural text in the given language and style. These allow businesses to declare business logic more clearly and explicitly and control over a conversation.

To make building conversational apps as cost-effective as building graphical user interface (GUI) apps, we provide a reusable (meaning composable and reconfigurable) module or component, which can save both time and cost. And it follows the same principles such as separation of concerns, model-view-controller and the same workflow such as version control using git for collaboration.
BACKGROUND
Why ?


Why most businesses can afford to build a web app, but not the conversational one with the same functionalities? This simple question motivated us to take on the challenge of democratizing conversational user experiences.

Before OpenCUI, we've been working in the field of conversational user experience for almost seven years, from voice assistants to chatbot development platforms. We have gained valuable insights into conversational interface development:
1. Voice-powered virtual assistants are very limited in helping people
In 2017, we built an intelligent skill search engine based voice assistant, which has complete experience of understanding and execution. And launched it on LeTV phones, Lenovo phones, Gome phones as their native system voice app. After launching, we got robust performance, with over 0.5 million daily queries and 2.5 million weekly active users.

We also provided a most efficient skill extension model, in which users/developers can teach the skill by demonstrating it to the assistant and publish this skill on our skill platform, so that all users can use this published skill and further check the usefulness of the skill.

In the beginning, we thought it was a light model with easy operation, because:
1. Apps can connect to the assistant automatically so that developers don’t need to know APIs of every APP.
2. Developers don’t need to spend lots of time and energy to negotiate integration with third party APPs.
3. Efficient and various ways to generate skills to help product skills grow exponentially.

But it backfired. There is a limit to what we can do and how much we can help. Apps always change too much and too fast in China, and different users often use different versions of the same app, that allows the same skill to be derived into many different versions unconsciously. So it is impossible to cover everything from the interface level. But from the API level, app developers will not expose that too much.

Since user interface is largely business logic dependent, business logic can and will vary from business to business, instead of building entire conversational apps for businesses, we aim to provide conversational interface building tools that empower business developers to build conversational experience themselves. Therefore, we decided to open up our skill platform that we used internally to extend mobile app skills and allow developers to use it to provide more possibilities.
2. Conversation driven approaches platform can not work well with business
As the skill platform was used to extend app skills in the past, its design was initially based on conversation. So we quickly found out that business logic is typically described as processes, full control of each step a user takes is possible in graphical user interfaces but not in conversational interaction.

User's expression is random and diverse, and they will not follow the established route.

For conversational user interface, users can express anything anytime. This is advantageous for users, since now there is no learning curve for them to acquire services, as they can dictate what service to get directly. What is more, during conversations, a user might switch topics without providing all the information needed. Chatbot should use conversational history to automatically complete these user utterances, and and figure out what they want. Without some sort of factorization, the possible conversational paths needed by flow based modeling approach grows exponentially with complexity. Thus flow based approach to define conversational interaction becomes prohibitively costly.
Need expensive expertise in machine learning and natural language understanding

Understanding human language is hard as the different texts can mean the same and also the meaning of the same can change by contexts. This is one of the key complications of building conversational user interface. The popular approach relies on standard ML/NLU tasks like text classification and named entity recognition. While these standard tasks are well studied, applying them to new business use cases requires serious customization, which typically call for expensive expertise in ML/NLU. Therefore, it is not easy for regular dev team to customize for any use cases. In fact, accuracy is not the most important metric when it comes to dialog understanding. To deploy a chatbot into production, everything needs to be hot fixable by the operation team.

In conclusion, if a solution takes too long, costs too much, then it is not a commercially viable solution.
3. Conversational interface development should start from service
Services are the reason that businesses build chatbot. Without first fully deciding on what services are provided, it is impossible to design this user interface part as there is no focus. Instead of focusing on the conversations, conversational interaction development should be driven by service, where actual business logic is implemented, usually accessible via API calls, and provided by backend team.

API Schema provides natural boundary for both design and implementation. Given the set of APIs, it should be immediately clear whether given conversation is relevant or not.
API schema typically is a result of careful deliberation between the product owner and software architect, so it is usually normalized to be minimal and orthogonal. This means the similar functionalities are generally serviced by the same APIs, so there is no need to create the equivalency between user intention at language level, as all we have to do it mapping language toward the APIs.

Schema concisely describes services. With schema decided, teams with different responsibilities design and build upon that independently, without worrying about how other parts are handled.
SOLUTION
How ?


Goal

1. Schema Driven Development
Users interact with a business because it can do something better or cheaper, so there is no need to respond intelligently to all possible user utterances. For any given business, it is enough to focus only on the conversations related to the service that business provides, which is defined by its API schema.
2. Separate Different Concerns
Different people can work on the different aspects: the actual service can be implemented by the backend team, conversational user interface builder can take care of interaction logic, and CUI designer can provide the script for better user experience.
3. No need to worry about NLU
Understanding errors should be able to hotfix easily, without experts involvement. And there should be stable interface between language understanding and the rest of chatbot, so that we can migrate to new technology easily. Today's bert will be tommorrow's naive bayes.
4. Dynamic Statecharts for Arbitrary CUI
Instead of a single state machine, CUI needs to keep track of multiple statecharts as it is typically for humans to switch in and out topics so that the problems can be solved more effectively.  
5. Reusable Component
If we have to build everything from scratch every time, we are not as effective as we could be. So conversational behavior should be reused in one shot, and the only change required is what needs to be customized.

Design

1. 4 Layers of Chatbot
To Achieve Goal #1, 2, 3
Separation of concerns is essential in increasing productivity and reducing the cost of building things. We decompose chatbot into 4 layers: schema, interaction, language and channel. So different aspects can be handled by different people.

Schema Layer. Defines interface to backend service, including data structure needed to invoke the service as both input and output.

Interaction Layer. Defines interaction logic, or how input parameter needed by service should be collected from user via conversational interactions, in a language independent fashion. The decision in this level include whether to prompt user for given slot, whether to give recommendation when prompt, and what to do if input validation is failed, for example.

Language Layer. Used for converting back and forth between natural language and types (structured semantics). You can use expression exemplars for dialog understanding and templates for text generation to control its behavior.

Channel Layer. Different channels have different ways to encode the conversation relevant information, implementation for channels are defined in this layer to make sure the same bot can be accessed from different channels.

Based on clear separation between interaction and language encouraged by schema grounded CUI, OpenCUI uses a set of production friendly ML/NLU models that are very easy for regular dev team to customize for any use cases with just utterance exemplars and template.

When building a multilingual chatbot, builder should declare business logic at schema layer first, design the conversational interaction logic at interaction layer, and supply language-specific data in each language layer. It's easy to provide a consistent experience in multilingual chatbot, with the same structured data propagating to every language side by committing. No need duplicate the bot, no need repeat the process for as many languages as needed, just filling in the blanks with different languages.
In channel layer, we offer a way for builders to offer a richer in-conversation experience than standard text messages by universal message. Message is a structure encoding of how information should be rendered to users on the channel. On OpenCUI, regardless which channel the message is defined for, it is just a templated string that encodes some JSON object.

Therefore, builders only need to define it once in the universal message, and OpenCUI will automatically generate the styles corresponding to all channels, instead of defining them every time for every channel.
2. CUI component
To Achieve Goal #5
CUI component is a language independent abstraction of task oriented, cooperative turn taking conversations between chatbot and user. We assume that chatbot provides some services that users might want. Every CUI component helps reach the overall goal of the conversation: figuring out what users want and how they want it, and of course delivering service. In another word, each CUI component carries a clear goal, toward building a common understanding of or delivering what users want.

The desired dynamic behavior of CUI component is declaratively defined in form of annotations. These are designed as "control" so builders control the desired behavior of their chatbots. Annotations can be defined on the frame level which defines behavior of the entire frame, or slot level which defines the slot specific behavior:

1. Initialization: try to fill the slots based on business logic first.
2. Prompt: allow you to provide the template for SlotRequest dialog act, needed to request user preference for the given slot.
3. Value Recommendation: provide a list of filling candidates per business data and logic for user to choose from. This can avoid wasting user effort for filling slots with an unservable value.
4. Value Check: examine proposed value is servable based on business rules.
5. Confirmation: give user a second chance to verify the proposed value.

By simply making decisions on whether to enable and how to configure these five components, OpenCUI can guide builders to come up with a reasonable CUI interaction logic systematically. This way, builders can focus on unique and cost-effective services that bring actual value to users and make their life better.

It should be clear that message bot send to users in a single turn can contain messages generated from more than one of these stages. The example shown below.
3. Beyond Slot Filling
To Achieve Goal #3, 4
The 5 stage slot filling conversational interaction logic defined by chatbot builder in intentionally modeled under dynamic state chart, or dynamic composite state machines. The deterministic nature warranted by this conceptual model makes it easy for business to control interaction logic for their business needs, both in building and debugging phase.

In particular, each entity slot filling is essentially a deterministic state machine, that deterministically moves from the start state to the end state, that goes through the 5 stages of slot filling based the transition table defined by corresponding CUI components. The frame slot are filled by composite state machine that also by frame level CUI component as well as the components defined on each one of its slots.

Transition is a low level annotation that give builder the ability to control the state machine directly. It is an optional frame level annotation which lets you define transitions between slots hosted directly and indirectly by hosting frame. Transition can be configured in two parts: triggering and update actions, where triggering defines under what condition the corresponding actions sequence is executed.
4. Reusability
To Achieve Goal #5
Reusability is the one of the key design goal for OpenCUI to help business to reduce the cost of building personalized services. In this way, builders can use existing components instead of starting from scratch, and always create once, reuse often. There are four different mechanisms available.

Import. Instead of build functionality from scratch, on OpenCUI, the first choice of acquired functionality is importing the relevant components. Where there are right components, builders only need to provide business dependent data to service their users.

Clone. Clone is another way of reuse. Instead of build chatbot from empty slate, one start from exist chatbot by clone it.

Inherit. We support inherit/implement on frames and intents, so that we can reuse behavior by adding to existing frame instead of building frame from scratch.

Compose. Builders can use frame as slot of larger frame, to get bigger and bigger behavior.
next steps
What's Next ?

Community

Connect with conversational user interface builders, developers and designers who are contributing to OpenCUI and building the future of CUI. Here they can:

1. Build conversational user interface for business.
2. Share best practices of component.
3. Get Extension applications.
4. Get announcements first.
5. Expand CUI knowledge.
6. Build meaningful relationships.
STATEMENT
Statement
I am very lucky to be involved in product design and development from zero to one. Above is the high level core design concept of OpenCUI, which is the best practice we have abstracted from the experience of two products and actual business practice in the past six years. For more information, you can visit our website: opencui.io.

I'm also one of the core contributors to our website, responsible for website style and document writing. So you might see some of the same content in my portfolio and our website.

If you are interested in OpenCUI, or you have any questions, please feel free to contact me! I will get right back to you.

Email: lu.zeng@opencui.io