Skip to content

Commit 3ae0655

Browse files
authored
Merge pull request #25 from linkml/faq
Added a FAQ and additional tests for ML and LLM inference
2 parents a064a5c + 03bff0c commit 3ae0655

19 files changed

+986
-76
lines changed

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ common query, index, and storage operations.
77

88
For full documentation, see [https://linkml.io/linkml-store/](https://linkml.io/linkml-store/)
99

10+
See [these slides](https://docs.google.com/presentation/d/e/2PACX-1vSgtWUNUW0qNO_ZhMAGQ6fYhlXZJjBNMYT0OiZz8DDx8oj7iG9KofRs6SeaMXBBOICGknoyMG2zaHnm/embed?start=false&loop=false&delayms=3000) for a high level overview.
11+
1012
__Warning__ LinkML-Store is still undergoing changes and refactoring,
1113
APIs and command line options are subject to change!
1214

@@ -132,3 +134,4 @@ make app
132134
## Background
133135

134136
See [these slides](https://docs.google.com/presentation/d/e/2PACX-1vSgtWUNUW0qNO_ZhMAGQ6fYhlXZJjBNMYT0OiZz8DDx8oj7iG9KofRs6SeaMXBBOICGknoyMG2zaHnm/embed?start=false&loop=false&delayms=3000) for more details
137+

docs/about.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
.. _about
1+
.. _about:
22

33
About
44
=========================================================
@@ -74,4 +74,4 @@ This frameworks also allows *composable indexes*. Currently two indexers are sup
7474
Metadata and Configuration
7575
--------------------------
7676

77-
- :py:mod:`ClientConfig<linkml_store.api.config.ClientConfig>` provides a structure for configuring the client
77+
- :py:mod:`ClientConfig<linkml_store.api.config.ClientConfig>` provides a structure for configuring the client

docs/faq.rst

Lines changed: 302 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,302 @@
1+
.. _faq:
2+
3+
Frequently Asked Questions
4+
==========================
5+
6+
General
7+
-------
8+
9+
What is this project?
10+
~~~~~~~~~~~~~~~~~~~~~~
11+
12+
linkml-store is a data management solution that provides a common interface to multiple backends,
13+
including DuckDB, MongoDB, Neo4J, and Solr.
14+
It is designed to make it easier to work with data in different forms (tabular, JSON, columnar, RDF),
15+
provide expressive validation at scale, and enable the ability to mix and match different backends.
16+
17+
For a high-level overview, see `These slides <https://docs.google.com/presentation/d/e/2PACX-1vSgtWUNUW0qNO_ZhMAGQ6fYhlXZJjBNMYT0OiZz8DDx8oj7iG9KofRs6SeaMXBBOICGknoyMG2zaHnm/embed?start=false&loop=false&delayms=3000>`_.
18+
19+
20+
Is this a database engine?
21+
~~~~~~~~~~~~~~~~~~~~~~~~~~
22+
23+
No, linkml-store is not a database engine in itself. It is designed to be used *in combination*
24+
with your favorite database engines.
25+
26+
Do I need to know LinkML to use this?
27+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
28+
29+
No, you do not need to know LinkML to use linkml-store. In fact you can use linkml-store in
30+
"YOLO mode" where you don't even specify a schema (a schema will be induced as far as possible).
31+
32+
However, for serious applications we recommend you always provide a LinkML schema for your
33+
different datasets.
34+
35+
For more information on LinkML, see the `LinkML documentation <https://linkml.io/linkml/>`_.
36+
37+
Can I use the command line?
38+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
39+
40+
Yes, linkml-store provides a command line interface.
41+
42+
See the `Command Line Tutorial <https://linkml.io/linkml-store/tutorials/Command-Line-Tutorial.html>`_ for examples.
43+
44+
All commands can be used via the base ``linkml-store`` command:
45+
46+
.. code-block:: bash
47+
48+
linkml-store --help
49+
50+
Note some command line options may change in future until this package is 1.0.0
51+
52+
Can I use the Python API?
53+
~~~~~~~~~~~~~~~~~~~~~~~~~
54+
55+
Yes, linkml-store provides a Python API.
56+
57+
See the `Python Tutorial <https://linkml.io/linkml-store/tutorials/Python-Tutorial.html>`_ for examples.
58+
59+
Example:
60+
61+
.. code-block:: python
62+
63+
from linkml_store import Client
64+
65+
client = Client()
66+
db = client.attach_database("duckdb")
67+
collection = db.attach_collection("my_collection")
68+
collection.insert({"name": "Alice", "age": 42})
69+
result = collection.find({"name": "Alice"})
70+
71+
72+
Can I use a web API?
73+
~~~~~~~~~~~~~~~~~~~~
74+
75+
Yes, you can stand up a web API.
76+
77+
To start you should first create a config file, e.g. ``db/conf.yaml``:
78+
79+
Then run:
80+
81+
.. code-block:: bash
82+
83+
export LINKML_STORE_CONFIG=./db/conf.yaml
84+
make api
85+
86+
Can I use a web UI?
87+
~~~~~~~~~~~~~~~~~~~~
88+
89+
We provide a *very rudimentary* web UI. To start you should first create a config file, e.g. ``db/conf.yaml``:
90+
91+
Then run:
92+
93+
.. code-block:: bash
94+
95+
export LINKML_STORE_CONFIG=./db/conf.yaml
96+
make app
97+
98+
99+
What is CRUDSI?
100+
~~~~~~~~~~~~~~~
101+
102+
CRUDSI is our not particularly serious name for the design pattern that linkml-store follows.
103+
104+
Many database engines and database solutions implement a CRUD layer:
105+
106+
* Create
107+
* Read
108+
* Update
109+
* Delete
110+
111+
linkml-store adds two more operations:
112+
113+
* Search
114+
* Inference
115+
116+
117+
Is this an AI/Machine Learning/LLM/Vector database platform?
118+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
119+
120+
linkml-store is first and foremost a *data management* platform. However,
121+
we do provide optional integrations to AI and ML tooling. In particular, you can plug and
122+
play different solutions for implementing search indexes, including LLM textual embeddings.
123+
124+
Additionally, we believe that robust data management using rich and expressive semantic
125+
schemas (in combination with the database engine of your choice) is the key to
126+
making data **AI-ready**.
127+
128+
Is linkml-store production ready?
129+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
130+
131+
linkml-store is currently not as mature as the core LinkML products. Be warned that
132+
the API and command line options may change. However, things may be moving fast,
133+
and you are invited to check back in here later!
134+
135+
Are there tutorials?
136+
~~~~~~~~~~~~~~~~~~~~
137+
138+
See :ref:`tutorials`
139+
140+
Installation
141+
-------
142+
143+
How do I install linkml-store?
144+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
145+
146+
.. code-block:: bash
147+
148+
pip install "linkml-store[all]"
149+
150+
This installs both necessary and optional dependencies. We recommend this for now.
151+
152+
As a developer, how do I install linkml-store?
153+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
154+
155+
Check out the repo, and like all linkml projects, use Poetry:
156+
157+
.. code-block:: bash
158+
159+
git clone <URL>
160+
cd linkml-store
161+
make install
162+
163+
Backend Integrations
164+
------------
165+
166+
What is a database integration?
167+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
168+
169+
This framework provides different integrations (aka adapters or implementations) that can hook into
170+
your favorite backend database (if your database engine is not supported, please be patient - or
171+
consider contributing one as a PR!)
172+
173+
Does linkml-store support DuckDB?
174+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
175+
176+
Yes, linkml-store supports DuckDB as a backend. DuckDB is a modern columnar in-memory database
177+
178+
See the :ref:`tutorial <tutorials>` for examples.
179+
180+
Note that currently for DuckDB we bypass the `standard linkml to SQL to relational mapping <https://linkml.io/linkml/generators/sqltable.html>`_ step,
181+
and instead use DuckDB more like a data frame store. Nested objects and lists are stored directly
182+
(using DuckDB's json integrations behind the scenes), rather than fully normalized.
183+
184+
Does linkml-store support MongoDB?
185+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
186+
187+
Yes, linkml-store supports MongoDB as a backend. MongoDB is a popular NoSQL database.
188+
189+
See the `MongoDB how-to guide <https://linkml.io/linkml-store/how-to/Use-MongoDB.html>`_ for examples.
190+
191+
Does linkml-store support Neo4J?
192+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
193+
194+
Yes, linkml-store supports Neo4J as a backend. Neo4J is a popular graph database.
195+
196+
See the `Neo4J how-to guide <https://linkml.io/linkml-store/how-to/Use-Neo4J.html>`_ for examples.
197+
198+
Does linkml-store support Solr?
199+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
200+
201+
Currently we provide only read support for Solr. We are working on write support.
202+
203+
See the `Solr how-to guide <https://linkml.io/linkml-store/how-to/Query-Solr-using-CLI.html>`_ for examples.
204+
205+
Can I use linkml-store with my favorite triplestore?
206+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
207+
208+
Not yet! This is a surprising omission given LinkML's roots in the semantic web community. However,
209+
this is planned soon, so check back later.
210+
211+
Data model
212+
----------
213+
214+
What is the data model in linkml-store?
215+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
216+
217+
linkml-store has a simple data model:
218+
219+
* A :class:`.Client` provides a top-level interface over one or more databases.
220+
* A :class:`.Database` consists of one or more possibly heterogeneous collections.
221+
* A :class:`.Collection` is a queryable set of objects of a similar type.
222+
223+
Search
224+
------
225+
226+
Can I use LLM vector embeddings for search?
227+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
228+
229+
Yes, you can use LLM vector embeddings for search. This is an optional feature.
230+
231+
See `How to use semantic search <https://linkml.io/linkml-store/how-to/Use-Semantic-Search.html>`_ for examples.
232+
233+
Do I need to use an LLM for search
234+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
235+
236+
No, but currently other options are limited. You can use a naive tripartite index, or if your backend
237+
supports search out the box (e.g. Solr) then linkml-store should directly wire into this.
238+
239+
Validation
240+
----------
241+
242+
Does linkml-store provide validation?
243+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
244+
245+
Yes, linkml-store provides expressive validation using the LinkML framework.
246+
247+
Note that currently validation primarily leverages json-schema integrations, but the intent is to
248+
provide validation integrations directly with underlying backend stores.
249+
250+
Does linkml-store provide referential integrity validation?
251+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
252+
253+
See `Check Referential Integrity <https://linkml.io/linkml-store/how-to/Check-Referential-Integrity.html>`_ for examples.
254+
255+
Inference
256+
---------
257+
258+
What is inference in linkml-store?
259+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
260+
261+
We have a very flexible notion of inference. It can encompass:
262+
263+
* Statistical or Machine Learning (ML) inference, e.g. via supervised learning
264+
* Ontological inference, e.g. via reasoning over an ontology
265+
* Rule-based or procedural inference
266+
* LLM-based inference
267+
268+
How do I do standard ML inference?
269+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
270+
271+
Currently we provide integrations to scikit-learn, but only expose DecisionTree classifiers for now.
272+
Remember, linkml-store is not a full fledged ML platform; you should use packages like XGBoost, PyTorch,
273+
or scikit-learn directly for more complex ML tasks.
274+
275+
See `Predict Missing Data <https://linkml.io/linkml-store/how-to/Predict-Missing-Data.html>`_ for examples.
276+
277+
See also the `Command Line Tutorial <https://linkml.io/linkml-store/tutorials/Command-Line-Tutorial.html>`_ for
278+
a simple example.
279+
280+
How do I do LLM inference?
281+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
282+
283+
See the `Command Line Tutorial <https://linkml.io/linkml-store/tutorials/Command-Line-Tutorial.html>`_ (see
284+
the final section) for an example.
285+
286+
How do I do rule-based inference?
287+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
288+
289+
Check back later for tutorials. For now, you can read about:
290+
291+
- the `LinkML expression language <https://linkml.io/linkml/schemas/expression-language.html>`_
292+
- `Rules in LinkML <https://linkml.io/linkml/schemas/advanced.html#rules>`_
293+
294+
In future we will provide bindings for rule engines, datalog engines, and OWL reasoners.
295+
296+
Contributing
297+
------------
298+
299+
How do I contribute to linkml-store?
300+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
301+
302+
We welcome contributions! Please see the `LinkML contributing guide <https://linkml.io/linkml/contributing.html>`_.

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ common query, index, and storage operations.
1616
tutorials/index
1717
how-to/index
1818
reference/index
19+
faq
1920

2021
Indices and tables
2122
==================

src/linkml_store/api/stores/duckdb/duckdb_collection.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,9 @@ def insert(self, objs: Union[OBJECT, List[OBJECT]], **kwargs):
3636
logger.info(f"Inserting into: {self.alias} // T={table.name}")
3737
engine = self.parent.engine
3838
col_names = [c.name for c in table.columns]
39+
bad_objs = [obj for obj in objs if not isinstance(obj, dict)]
40+
if bad_objs:
41+
logger.error(f"Bad objects: {bad_objs}")
3942
objs = [{k: obj.get(k, None) for k in col_names} for obj in objs]
4043
with engine.connect() as conn:
4144
with conn.begin():

0 commit comments

Comments
 (0)