Taxi: The Semantic Revolution in API Contracts

30/10/2024

★★★★★Rating: 4.99 (13753 votes)

In the increasingly interconnected world of modern software development, effectively managing and understanding the vast oceans of data and the intricate web of API contracts has become a formidable challenge. Traditional approaches often fall short, focusing on isolated systems rather than the holistic ecosystem. This is precisely where Taxi steps in, offering a revolutionary approach to documenting data and API contracts, fundamentally shifting our perspective from mere structure to profound semantic meaning. It's not about how data is physically arranged, but what it truly represents and how it relates across your entire organisation.

What's so great about taxi? — Taxi's documentation goals are broader than just HTTP APIs, it includes Message queues, Serverless functions, Databases, etc. It's the defacto standard - pretty much everyone understands what OpenAPI is. The tooling ecosystem is awesome. It's more mature, having had the benefit of thousands of developers collaborating for years.

Table

What is Taxi?
Taxi vs. Traditional Schema Languages
Structural vs. Semantic Contracts: A Deeper Dive
Taxi and OpenAPI Interoperability
Taxi and Other Protocols (Protobuf, Avro)
TaxiQL and GraphQL Comparison
Core Principles and Design Philosophy of Taxi
Taxi's Relationship with Orbital and Adoption
Frequently Asked Questions (FAQs)

What is Taxi?

At its heart, Taxi is a dedicated language designed to describe how data and APIs within an expansive ecosystem relate to one another. Its primary ambition is to empower consumers to seamlessly compose services together, liberating them from the rigid constraints of being tightly coupled to underlying API implementations. Unlike conventional schema languages that merely define the shape of data, Taxi delves deeper, describing data semantically. This means it employs rich, meaningful types – types that convey the actual meaning of the data rather than just its arbitrary field name. This profound shift allows for the development of incredibly powerful tooling, capable of discovering and mapping data based on its inherent meaning, leading to unprecedented levels of automation and understanding.

Consider a simple scenario. In many systems, you might find a customerId field within an Invoice object and an id field within a Customer object. Structurally, these might both be String or Int. However, with Taxi, you can imbue these with semantic richness:

model Customer { id: CustomerId inherits String firstName: FirstName inherits String lastName: LastName inherits LastName } model Invoice { invoiceId: InvoiceId inherits Int customerId: CustomerId }

Notice how String and Int now carry a richer context. More importantly, we immediately discern that Invoice.customerId and Customer.id refer to the exact same piece of information – a CustomerId. This seemingly small detail is incredibly powerful. By building out this comprehensive set of semantically rich terms, we begin to forge a Taxonomy of truly interchangeable types. This taxonomy serves as an invaluable resource, allowing us to meticulously document how data is intended to be used across the organisation. Advanced tooling, such as Orbital, can then leverage this semantic understanding to automatically link and discover data, paving the way for significantly automated API integration processes.

Taxi vs. Traditional Schema Languages

The landscape of schema languages today is vast, with many excellent tools performing specific tasks. However, a common characteristic among them is their introspective focus. They excel at describing what a single API does, detailing its inputs, outputs, and internal structure. Where they often falter, and where Taxi shines, is in articulating how that particular API relates to other APIs within the broader organisational ecosystem. Questions like: 'How does the data returned from this API relate to other data within our organisation?' or 'How do the inputs required by this API connect to other available data sources?' often go unanswered, leading to fragmented understanding and significant manual effort in integration. Taxi directly addresses this critical gap. By employing semantics, it meticulously models how data interrelates between disparate systems, providing a unified, coherent view of your data landscape. Crucially, Taxi doesn't demand a complete overhaul of your existing infrastructure. It generally works harmoniously with your current schema languages, effectively 'polyfilling' them with essential semantic metadata, enriching your existing definitions without requiring wholesale replacement.

Structural vs. Semantic Contracts: A Deeper Dive

The fundamental distinction between traditional schema approaches and Taxi's philosophy boils down to the difference between structural and semantic contracts. Understanding this distinction is key to appreciating Taxi's unique value proposition.

Aspect	Structural Contract Focus	Semantic Contract Focus (Taxi)
Accessibility	Where is this API accessible? (e.g., port, transport mechanism, path, HTTP verbs).	How do the inputs and outputs from this system relate to other systems?
Encoding	How are requests/responses encoded? (e.g., JSON, Protobuf).	What does the data mean? (e.g., `CustomerId` is a concept, not just a string).
Data Shape	What keys are present in maps & arrays returned? What inputs are expected?	What are the side effects of calling this method? (Beyond just data returned).

While structural contracts are undeniably important for the mechanics of API interaction, they tell us little about the purpose or interchangeability of the data. Semantic contracts, on the other hand, provide this crucial layer of meaning, enabling intelligent tooling and a deeper understanding across the entire data estate.

Taxi and OpenAPI Interoperability

Taxi has been designed with strong interoperability in mind, particularly with widely adopted standards like OpenAPI. While it is entirely possible to use Taxi to comprehensively describe REST-ish APIs, much in the same vein as OpenAPI, the most common and often most effective approach is to combine the two. It's generally more practical and powerful to embed Taxi metadata directly within your existing OpenAPI specifications. This 'best of both worlds' strategy allows you to leverage OpenAPI's robust capabilities for describing the functional aspects of an API – its endpoints, parameters, and responses – while simultaneously using Taxi's metadata to describe how those inputs and the data returned relate semantically to other systems within your sprawling ecosystem.

In a direct comparison, Taxi offers several distinct advantages over OpenAPI:

Feature	Taxi's Strength	OpenAPI's Strength
Readability & Writability	Easily sketch specs by hand due to its dedicated DSL (Domain Specific Language).	YAML/JSON based, widely understood syntax.
Type System	Richer, more expressive, full semantic type system with a compiler for validation.	Robust structural type definition.
Inter-API Relationships	Excels at describing how data relates between multiple APIs.	Primarily focused on describing a single API.
Documentation Scope	Broader goals, including Message Queues, Serverless Functions, Databases, etc.	Primarily focused on HTTP APIs.

Despite Taxi's strengths, it’s important to acknowledge where OpenAPI holds sway. It remains the de facto standard in API documentation; virtually everyone in the industry understands what OpenAPI is. Furthermore, its tooling ecosystem is incredibly mature and diverse, benefitting from the collaborative efforts of thousands of developers over many years. This maturity provides a robust foundation for many existing projects. Taxi is not intended to replace OpenAPI entirely, but rather to augment it, providing a crucial layer of semantic understanding that OpenAPI alone cannot offer.

Taxi and Other Protocols (Protobuf, Avro)

Beyond schema languages like OpenAPI or RAML, there are also encoding specifications such as Protobuf and Avro. These tools are fundamentally designed to document how payloads are serialised into bytes – essentially, the wire format for data transmission. This is a crucial distinction: Taxi does not, and is not intended to, be a serialization protocol. Instead, it operates at a higher level of abstraction, focusing on the meaning of the data, regardless of its underlying serialization format. It complements these encoding specifications by providing the semantic context for the data they transmit.

TaxiQL and GraphQL Comparison

Taxi, and particularly its query language, TaxiQL, share some overarching goals with GraphQL: both aim to provide a unified entry point for composing data from multiple APIs. However, despite these shared objectives, there are subtle yet profoundly key differences in their underlying approaches and philosophies.

Aspect	Taxi / TaxiQL Approach	GraphQL Approach
Integration Automation	Automates integration between systems using semantic types as links, reducing manual glue code.	Relies on resolvers to manually stitch together APIs, requiring maintenance.
Ecosystem Integration	Works with existing API specs (OpenAPI, RAML, JsonSchema, Protobuf, Databases) without replacement.	GraphQL federation often requires "GraphQL everywhere" for seamless integration.
Schema Evolution	Consumers define data contracts; query engine satisfies, decoupling consumers from publisher changes.	Single schema; refactoring can impact all consumers, increasing cost of change.

One of Taxi's core aims is to enable software to automate the integration between disparate systems, often without the need for engineers to manually write 'glue code'. This is achieved by leveraging its powerful semantic types, which serve as the intelligent links between various data sources. In stark contrast, GraphQL's approach heavily relies on the implementation of 'resolvers' – pieces of code that explicitly define how to retrieve and stitch together data from underlying APIs. While powerful, these resolvers introduce a maintenance burden; any breaking changes in upstream systems necessitate corresponding updates within the resolver code.

Furthermore, GraphQL federation, while a robust solution, often implies a widespread adoption of GraphQL across an organisation. Taxi, on the other hand, is designed to be more accommodating to existing architectures. It aims to work seamlessly with the API specifications you already have in place, serving as a bridge between diverse formats like OpenAPI, RAML, JsonSchema, and Protobuf, without demanding that your existing schemas be replaced. While Taxi can also bridge between databases, similar to GraphQL, this does require a specific Taxi schema in addition to your existing DDL (Data Definition Language). The challenge of neatly embedding Taxi metadata directly into DDL scripts is an ongoing area of exploration, and the community is always keen to hear innovative ideas on this front.

Finally, the approach to schema evolution differs significantly. GraphQL typically operates with a single, monolithic schema from which consumers can cherry-pick desired fields. While convenient, this unified schema can become challenging to refactor or evolve, as changes might necessitate corresponding modifications across all its consumers. TaxiQL, conversely, is engineered to allow consumers to precisely define the contract of the data they require. It then becomes the responsibility of the query engine to satisfy this contract. This consumer-driven contract definition means that even as publisher contracts undergo changes, consumers largely remain decoupled, thereby keeping the cost of change remarkably low.

Core Principles and Design Philosophy of Taxi

As a language, Taxi is meticulously crafted with several core principles underpinning its design, each contributing to its unique effectiveness in the realm of data documentation and API contracts:

Readability: Taxi boasts a familiar and intuitive syntax, making it remarkably easy for developers to write and, perhaps more importantly, to understand. This focus on clarity ensures that the documentation itself is accessible and not another hurdle to overcome.
Expressiveness: Taxi is built to be profoundly expressive. It empowers users to describe not only the structural elements but also the rich semantic meaning of their data, along with the often quirky and specific contracts of their APIs. This deep level of detail facilitates a comprehensive understanding of data's purpose.
Typesafe: At its core, Taxi is a strongly typed, expressive language. It is purpose-built specifically for describing API operations and types, complete with a robust compiler that rigorously validates API contracts, ensuring correctness and consistency across your ecosystem.
Tooling: A primary driver behind Taxi's design is its ability to enable next-generation tooling integration. The richness of its syntax allows for a far more detailed expression of what services can do, moving beyond merely documenting where to find them. This foundational capability is what allows for automation in areas like data discovery and API integration.
Extensibility: Taxi is inherently designed for extensibility. It provides mechanisms for users to refine and compose existing API schemas, allowing for the addition of crucial context, annotations, and the improvement of type signatures over time. This adaptability ensures that Taxi can evolve alongside your organisation's data needs.

Taxi's Relationship with Orbital and Adoption

It's worth noting that Taxi is heavily utilised to power Orbital, a project with which it has a symbiotic relationship, influencing and evolving alongside each other. The remarkable expressiveness of Taxi is precisely what enables Orbital to automate complex integration tasks between various services. However, it is crucial to understand that Taxi is explicitly designed to be a standalone tool. It is not coupled to Orbital, meaning there are a plethora of amazing applications and integrations you can achieve with Taxi entirely on its own, independent of the Orbital platform.

One of Taxi's most appealing characteristics is its ease of incremental adoption. This means you don't need to undertake a massive, disruptive migration. You can seamlessly set it up alongside your existing documentation solutions, such as Swagger or legacy XML schemas, and gradually migrate or augment functionality at your convenience. In fact, rather than being a replacement, Taxi is specifically designed to complement these existing tools. You can happily use your current Swagger definitions or XSDs as your foundational schema, and then intelligently overlay semantic data using Taxi, enriching your documentation without discarding your prior investments.

Frequently Asked Questions (FAQs)

To further clarify the power and utility of Taxi, here are some frequently asked questions:

What problem does Taxi primarily solve?
Taxi solves the problem of understanding how data and APIs relate across an entire ecosystem. Traditional schema languages often describe individual APIs in isolation. Taxi bridges this gap by adding semantic meaning, allowing tooling to discover, map, and automate integrations based on what data means, not just its structure.

Is Taxi a replacement for OpenAPI or GraphQL?
No, not entirely. Taxi is designed to complement tools like OpenAPI, enriching them with semantic metadata. While it can describe APIs, its main strength is in describing relationships between APIs. For GraphQL, while both aim for API composition, Taxi offers automated integration via semantic types, whereas GraphQL relies on manual resolvers. Taxi is generally more adaptable to existing diverse API landscapes without requiring a full ecosystem change.

Can I use Taxi with my existing APIs and schemas?
Absolutely. Taxi is built for incremental adoption and interoperability. You can embed Taxi metadata into your existing OpenAPI, RAML, JsonSchema, or even Protobuf specifications. It works by 'polyfilling' your current schemas with semantic information, enhancing them without requiring a complete overhaul.

Is Taxi an open-source project?
Yes, Taxi is an open-source project released under the Apache 2 license. Its source code is freely available and hosted on GitHub, encouraging community contributions and transparency.

What exactly is 'semantic data' in the context of Taxi?
'Semantic data' refers to data that is described by its meaning or purpose, rather than just its format or name. For example, instead of just knowing a field is a 'String', Taxi allows you to define it as a CustomerId, FirstName, or InvoiceId. This semantic type conveys the intent and context of the data, making it inherently more understandable and allowing systems to reason about its interchangeability and relationships across different parts of an organisation.

What is the Taxi Playground?
The Taxi Playground is an online environment designed for quickly writing Taxi specifications and visualising them through generated diagrams. It's an excellent tool for experimenting with the language and understanding its capabilities without local setup.

How does Taxi help with API integration?
By providing a rich semantic understanding of data, Taxi enables powerful tooling (like Orbital) to automatically discover and map data between different systems. This drastically reduces the need for manual 'glue code' and complex, bespoke integration logic, accelerating development and reducing errors.

Does Taxi handle data serialization?
No, Taxi does not concern itself with how data is serialised into bytes (e.g., JSON, Protobuf, Avro). It operates at a higher level, focusing on the meaning and relationships of data, complementing existing serialization protocols rather than replacing them.

If you want to read more articles similar to Taxi: The Semantic Revolution in API Contracts, you can visit the Taxis category.