IDPD, operated by Metayage Private Limited, incorporates third-party open-source machine learning models and libraries to power prompt enrichment in its AI-assisted invention disclosure and patent analysis features. We gratefully acknowledge the authors and maintainers of these components.
These components are used unmodified via Python package management. Attribution is provided below as a matter of good practice and enterprise transparency.
| # | Component | Author / Publisher | License | Commercial Use |
|---|---|---|---|---|
| 1 | BAAI/bge-base-en-v1.5 | Beijing Academy of Artificial Intelligence | MIT | Permitted |
| 2 | KeyBERT | Maarten Grootendorst | MIT | Permitted |
| 3 | google/bert-for-patents | Google LLC | Apache 2.0 | Permitted |
| 4 | vectara/hallucination_evaluation_model (HHEM-2.1) | Vectara Inc. | Apache 2.0 | Permitted |
| 5 | cross-encoder/nli-deberta-v3-base | UKP Lab & Microsoft | MIT | Permitted |
| 6 | Qwen2.5 (qwen2.5:1.5b-instruct) | Alibaba Cloud — Tongyi Qianwen | Qwen License (commercial use permitted) | Permitted |
| 7 | YAKE (Yet Another Keyword Extractor) | LIAAD – INESC TEC | LGPL-3.0 | Permitted (unmodified library) |
The following components are licensed under the MIT License, which permits unrestricted use, modification, and distribution including commercial use with no required notice obligations. They are acknowledged here as a matter of good practice.
Author: Beijing Academy of Artificial Intelligence (BAAI)
Source: huggingface.co/BAAI/bge-base-en-v1.5
Use in IDPD: Semantic text embeddings used for invention similarity search (FAISS vector store) and as the backbone model for KeyBERT keyphrase extraction.
Changes: Used as-is via the sentence-transformers library. No modifications to model weights or architecture.
Author: Maarten Grootendorst
Source: github.com/MaartenGr/KeyBERT
Use in IDPD: Semantic keyphrase extraction from invention disclosures using shared BGE embedding weights. Extracted keyphrases are used to guide and focus AI patent drafting prompts.
Changes: Used as-is via pip. No modifications to source code.
The following components are licensed under the Apache License, Version 2.0, which permits use, modification, and distribution in commercial products. Attribution is provided below.
Author: Google LLC
Source: huggingface.co/google/bert-for-patents
Use in IDPD: Patent-domain language model (BERT-Large, 340M parameters, pre-trained on 100M+ patent documents from Google Patents Public Data). Used for patent-specific keyphrase extraction to construct accurate EPO prior art search queries. The model understands patent claim syntax (comprising, wherein, said) and IPC code vocabulary, producing higher-quality technical keyphrases than general-purpose models.
Changes: Used as-is via the HuggingFace transformers library with mean-pooling for text embedding. No modifications to model weights or architecture.
License: apache.org/licenses/LICENSE-2.0
Author: Vectara Inc.
Source: huggingface.co/vectara/hallucination_evaluation_model
Use in IDPD: Factual consistency scoring (hallucination detection) for the patent process chatbot. After the AI assistant generates a response, HHEM-2.1 scores the response against the retrieved knowledge context (0–1 scale, where 1.0 = fully grounded). This score is displayed to the user as a trust indicator badge on each chatbot answer. Benchmark: 96.4% AUC on hallucination detection tasks.
Changes: Used as-is. The DeBERTa tokenizer (cross-encoder/nli-deberta-v3-base) is used due to a custom configuration class in HHEM-2.1 that requires explicit tokenizer selection.
License: apache.org/licenses/LICENSE-2.0
Author: Alibaba Cloud — Tongyi Qianwen Team
Source: huggingface.co/Qwen/Qwen2.5-1.5B-Instruct
Use in IDPD: On-premises large language model powering the patent process chatbot. All inference runs locally on the IDPD server; no user queries leave the server infrastructure. The model answers patent process questions using retrieved knowledge (RAG) and is limited to 200 output tokens per response.
Changes: Served unmodified via Ollama (MIT-licensed inference runtime). No fine-tuning applied.
License: Qwen License Agreement — permits commercial use. Full terms: Qwen License
Author: Ricardo Campos et al., LIAAD – INESC TEC, University of Beira Interior
Source: github.com/LIAAD/yake
Use in IDPD: Statistical (non-neural) keyword extraction from invention descriptions during AI prompt pre-processing. Complements KeyBERT with fast, domain-agnostic keyword identification (<5ms per call, no model weights required).
LGPL-3.0 compliance note: YAKE is used as an unmodified third-party library, linked dynamically via Python's pip package manager. No modifications have been made to YAKE's source code. Under LGPL-3.0, this means no obligation to open-source IDPD's own code. The full LGPL-3.0 license is available at gnu.org/licenses/lgpl-3.0.html.
In addition to the open-source components above, IDPD uses the following external service provider to operate the platform. This provider acts as a data processor and is disclosed here for transparency.
Author: UKP Lab (Technische Universität Darmstadt) & Microsoft Corporation
Source: huggingface.co/cross-encoder/nli-deberta-v3-base
Use in IDPD: Tokenizer for the HHEM-2.1 hallucination evaluation model. HHEM-2.1 shares the DeBERTa-v3-base architecture; this tokenizer is used in place of HHEM's own configuration class to resolve a HuggingFace AutoTokenizer compatibility issue.
Changes: Tokenizer used as-is. Model weights are not used (only tokenizer).
In addition to on-premises ML components, IDPD integrates the following external API services to provide patent data features. These services process queries on their respective infrastructure.
Operator: European Patent Office (EPO), EPO Headquarters, 80298 Munich, Germany
Website: epo.org — Open Patent Services
Developer portal: developers.epo.org
Use in IDPD: When a user requests a prior art search for an invention, IDPD submits a structured keyword query to the EPO OPS v3.2 REST API. The API returns bibliographic data (title, abstract, applicant, publication date, patent family) from EPO's corpus of 90+ million patent documents across ~90 national patent offices. Prior art results are displayed informationally; they do not constitute legal advice or a freedom-to-operate opinion.
Data transmitted: Search queries derived from the invention's title and technical keyphrases only. No inventor names, personal data, or confidential invention details are transmitted to EPO.
EPO OPS Terms of Use: EPO OPS Terms of Use
EPO Privacy Notice: EPO Data Protection Notice
Operator: Sendinblue SAS, 55 rue d'Amsterdam, 75008 Paris, France
Website: brevo.com
Use in IDPD: All transactional emails are delivered via Brevo's SMTP relay. Brevo processes recipient email addresses and email content solely to deliver messages on behalf of Metayage Private Limited.
Privacy policy: brevo.com/legal/privacypolicy
For questions about this page or our use of open-source software, contact ip@myipstrategy.com.
Last Updated: June 02, 2026