Skip to content

Commit 950716f

Browse files
committed
Fix typo
1 parent b93d339 commit 950716f

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

_posts/2024-08-28-what-is-vector.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,11 @@ wechat: false
2727

2828
## Introduction
2929

30-
Recently, I heard about this technology several times in different occasions. It makes me wonder what vector is and how it is useful in artificial intelligence (AI). This is a sturdy note for helping myself to better understand this technology, so the content may not be accurate. But I hope that it can help you to understand this technology as well. In this article, we are going to explore the definition of a vector, the motivation of using vector in different databases, and the use cases of vectors in different industries. Now, let's get started!
30+
Recently, I heard about this technology several times on different occasions. It makes me wonder what a vector is and how it is useful in artificial intelligence (AI). This is a study note that will help me to understand this technology better, so the content may not be accurate. But I hope that it can help you to understand this technology as well. In this article, we are going to explore the definition of a vector, the motivation for using vectors in different databases, and the use cases of vectors in different industries. Now, let's get started!
3131

3232
## Definition
3333

34-
According to Wikipedia, vectors are mathematical representation of data in a high-dimensional space. In this space, each dimension corresponds to a feature of the data, with the number of dimensions ranging from a few hundreds to tens of thousands, depending on the complexity of the data being represented. A vector's position in the space represents its characteristics. Words. phrases, or entire documents, as well as images, audio, and other types of data can all be vectorized.
34+
According to Wikipedia, vectors are mathematical representations of data in a high-dimensional space. In this space, each dimension corresponds to a feature of the data, with the number of dimensions ranging from a few hundred to tens of thousands, depending on the complexity of the data being represented. A vector's position in the space represents its characteristics. Words. phrases, or entire documents, as well as images, audio, and other types of data can all be vectorized.
3535

3636
```mermaid
3737
mindmap
@@ -55,16 +55,16 @@ mindmap
5555
Vectors belong to a larger category of _tensors_. In machine learning (ML), "tensor" is used as a generic term for an array of numbers—or an array of arrays of numbers—in n-dimensional space, functioning like a mathematical bookkeeping device for data.
5656

5757
- A scalar is a zero-dimensional tensor containing a single number.
58-
- A vector is a one dimensional tensor containing multiple scars in the same type of data.
58+
- A vector is a one-dimensional tensor containing multiple scars in the same type of data.
5959
- A tube is a first-order tensor containing scalars of more than one type of data, such as a mix of strings and numbers
6060
- A matrix is a two-dimensional tensor containing multiple vectors of the same type of data.
61-
- Tensors with three or more dimensions, like a 3-dimensional tensors used to represent color images in computer vision algorithms, are referred to as multidimensional arrays or N-dimensional tensors.
61+
- Tensors with three or more dimensions, like 3-dimensional tensors used to represent color images in computer vision algorithms, are referred to as multidimensional arrays or N-dimensional tensors.
6262

6363
## Vectorization
6464

6565
If you want to convert text into vectors, you would typically interact with the LLM at a specific stage in the following process.
6666

67-
* **Tokenization:** The tax is first tokenized, which means breaking down into text, into smaller units, token usually words or sub words. This is the first step, but it's not yet the factorization process.
67+
* **Tokenization:** The tax is first tokenized, which means breaking down into text, into smaller units. Tokens are usually words or sub-words. This is the first step, but it's not yet the factorization process.
6868
* **Embedding (Vectorization):** After tokenization, the test is passed through an **embedding layer**. This is where the interaction with the LLM happens. The LLM takes the tokens and converts them into dense numerical representations—**vectors**. These vectors are high dimensional (e.g. 768 dimensions in the case of BERT or GBT-3's default embeddings), and contain semantic information about the text.
6969

7070
Here is an example from Anshu's article [Understanding the Fundamental Limitations of Vector-Based Retrieval for Building LLM-powered Chatbot](https://medium.com/thirdai-blog/understanding-the-fundamental-limitations-of-vector-based-retrieval-for-building-llm-powered-48bb7b5a57b3), where a corpus of text documents being broken down into smaller blocks of text (chunk). Each trunk is then fed to a trained language model like BERT or GPT to generate vector representation, also known as embedding. The embedding is then stored into the vector database.

0 commit comments

Comments
 (0)