← Back to Home

Open Source Licenses & Acknowledgments

IDPD, operated by Metayage Private Limited, incorporates third-party open-source machine learning models and libraries to power prompt enrichment in its AI-assisted invention disclosure and patent analysis features. We gratefully acknowledge the authors and maintainers of these components.

These components are used unmodified via Python package management. Attribution is provided below as a matter of good practice and enterprise transparency.

Summary

# Component Author / Publisher License Commercial Use
1 BAAI/bge-base-en-v1.5 Beijing Academy of Artificial Intelligence MIT Permitted
2 KeyBERT Maarten Grootendorst MIT Permitted
3 google/bert-for-patents Google LLC Apache 2.0 Permitted
4 vectara/hallucination_evaluation_model (HHEM-2.1) Vectara Inc. Apache 2.0 Permitted
5 cross-encoder/nli-deberta-v3-base UKP Lab & Microsoft MIT Permitted
6 Qwen2.5 (qwen2.5:1.5b-instruct) Alibaba Cloud — Tongyi Qianwen Qwen License (commercial use permitted) Permitted
7 YAKE (Yet Another Keyword Extractor) LIAAD – INESC TEC LGPL-3.0 Permitted (unmodified library)

MIT Licensed Components

The following components are licensed under the MIT License, which permits unrestricted use, modification, and distribution including commercial use with no required notice obligations. They are acknowledged here as a matter of good practice.

MIT

1. BAAI/bge-base-en-v1.5

Author: Beijing Academy of Artificial Intelligence (BAAI)

Source: huggingface.co/BAAI/bge-base-en-v1.5

Use in IDPD: Semantic text embeddings used for invention similarity search (FAISS vector store) and as the backbone model for KeyBERT keyphrase extraction.

Changes: Used as-is via the sentence-transformers library. No modifications to model weights or architecture.

MIT

2. KeyBERT

Author: Maarten Grootendorst

Source: github.com/MaartenGr/KeyBERT

Use in IDPD: Semantic keyphrase extraction from invention disclosures using shared BGE embedding weights. Extracted keyphrases are used to guide and focus AI patent drafting prompts.

Changes: Used as-is via pip. No modifications to source code.

Apache 2.0 Licensed Components

The following components are licensed under the Apache License, Version 2.0, which permits use, modification, and distribution in commercial products. Attribution is provided below.

Apache 2.0

3. google/bert-for-patents

Author: Google LLC

Source: huggingface.co/google/bert-for-patents

Use in IDPD: Patent-domain language model (BERT-Large, 340M parameters, pre-trained on 100M+ patent documents from Google Patents Public Data). Used for patent-specific keyphrase extraction to construct accurate EPO prior art search queries. The model understands patent claim syntax (comprising, wherein, said) and IPC code vocabulary, producing higher-quality technical keyphrases than general-purpose models.

Changes: Used as-is via the HuggingFace transformers library with mean-pooling for text embedding. No modifications to model weights or architecture.

License: apache.org/licenses/LICENSE-2.0

Apache 2.0

4. vectara/hallucination_evaluation_model (HHEM-2.1-Open)

Author: Vectara Inc.

Source: huggingface.co/vectara/hallucination_evaluation_model

Use in IDPD: Factual consistency scoring (hallucination detection) for the patent process chatbot. After the AI assistant generates a response, HHEM-2.1 scores the response against the retrieved knowledge context (0–1 scale, where 1.0 = fully grounded). This score is displayed to the user as a trust indicator badge on each chatbot answer. Benchmark: 96.4% AUC on hallucination detection tasks.

Changes: Used as-is. The DeBERTa tokenizer (cross-encoder/nli-deberta-v3-base) is used due to a custom configuration class in HHEM-2.1 that requires explicit tokenizer selection.

License: apache.org/licenses/LICENSE-2.0

Apache 2.0

6. Qwen2.5 (qwen2.5:1.5b-instruct) — via Ollama

Author: Alibaba Cloud — Tongyi Qianwen Team

Source: huggingface.co/Qwen/Qwen2.5-1.5B-Instruct

Use in IDPD: On-premises large language model powering the patent process chatbot. All inference runs locally on the IDPD server; no user queries leave the server infrastructure. The model answers patent process questions using retrieved knowledge (RAG) and is limited to 200 output tokens per response.

Changes: Served unmodified via Ollama (MIT-licensed inference runtime). No fine-tuning applied.

License: Qwen License Agreement — permits commercial use. Full terms: Qwen License

LGPL-3.0 Licensed Components

LGPL-3.0

3. YAKE — Yet Another Keyword Extractor

Author: Ricardo Campos et al., LIAAD – INESC TEC, University of Beira Interior

Source: github.com/LIAAD/yake

Use in IDPD: Statistical (non-neural) keyword extraction from invention descriptions during AI prompt pre-processing. Complements KeyBERT with fast, domain-agnostic keyword identification (<5ms per call, no model weights required).

LGPL-3.0 compliance note: YAKE is used as an unmodified third-party library, linked dynamically via Python's pip package manager. No modifications have been made to YAKE's source code. Under LGPL-3.0, this means no obligation to open-source IDPD's own code. The full LGPL-3.0 license is available at gnu.org/licenses/lgpl-3.0.html.

Third-Party Service Providers

In addition to the open-source components above, IDPD uses the following external service provider to operate the platform. This provider acts as a data processor and is disclosed here for transparency.

MIT

5. cross-encoder/nli-deberta-v3-base

Author: UKP Lab (Technische Universität Darmstadt) & Microsoft Corporation

Source: huggingface.co/cross-encoder/nli-deberta-v3-base

Use in IDPD: Tokenizer for the HHEM-2.1 hallucination evaluation model. HHEM-2.1 shares the DeBERTa-v3-base architecture; this tokenizer is used in place of HHEM's own configuration class to resolve a HuggingFace AutoTokenizer compatibility issue.

Changes: Tokenizer used as-is. Model weights are not used (only tokenizer).

Third-Party API Services

In addition to on-premises ML components, IDPD integrates the following external API services to provide patent data features. These services process queries on their respective infrastructure.

EPO Open Patent Services (OPS) — Patent Prior Art Data

Operator: European Patent Office (EPO), EPO Headquarters, 80298 Munich, Germany

Website: epo.org — Open Patent Services

Developer portal: developers.epo.org

Use in IDPD: When a user requests a prior art search for an invention, IDPD submits a structured keyword query to the EPO OPS v3.2 REST API. The API returns bibliographic data (title, abstract, applicant, publication date, patent family) from EPO's corpus of 90+ million patent documents across ~90 national patent offices. Prior art results are displayed informationally; they do not constitute legal advice or a freedom-to-operate opinion.

Data transmitted: Search queries derived from the invention's title and technical keyphrases only. No inventor names, personal data, or confidential invention details are transmitted to EPO.

EPO OPS Terms of Use: EPO OPS Terms of Use

EPO Privacy Notice: EPO Data Protection Notice

Brevo — Transactional Email Delivery

Operator: Sendinblue SAS, 55 rue d'Amsterdam, 75008 Paris, France

Website: brevo.com

Use in IDPD: All transactional emails are delivered via Brevo's SMTP relay. Brevo processes recipient email addresses and email content solely to deliver messages on behalf of Metayage Private Limited.

Privacy policy: brevo.com/legal/privacypolicy

For questions about this page or our use of open-source software, contact ip@myipstrategy.com.

Last Updated: June 02, 2026