a vector database implementation based on 《从零构建向量数据库》
支持FaissIndex,HnswlibIndex; 支持标量向量混合查询; 支持数据持久化存储; 使用http请求对数据库发起访问,插入或查询vector
Ubuntu:
sudo apt-get install cmake openssl libssl-dev libz-dev libcpprest-dev gfortran echo "export VECTORDB_CODE_BASE=_______" >> ~/.bashrc #下载后的代码根路径 例如/home/zhouzj/project/vectorDB
source ~/.bashrcSwitch to the project directory:
cd third_party
bash build.shyou can use
bash build.sh --help to see more detailSwitch to the project directory:
$ mkdir build
$ cd build
$ cmake ..
$ makeIf you want to compile the system in debug mode, pass in the following flag to cmake: Debug mode:
$ cmake -DCMAKE_BUILD_TYPE=Debug ..
$ make -j`nproc`You can find vectordb_config under project directory; You can modify the content of each item according to your preferences. See the example and explaination in common/vector_cfg.cpp
According to your vectordb_config,you should make sure these path exist; like this:
mkdir ~/vectordb1/
cd ~/vectordb1
mkdir snap
mkdir storage
mkdir ~/test_vectordb/
cd ~/test_vectordb
mkdir snap
mkdir storageWhen you use vdb_server you will use ~/vectordb1,when vdb_server is restarted, the contents will be retained If you want to reset, you can remove these and create again like this:
cd ~/vectordb1
rm -rf wal snap* storage*
mkdir snap
mkdir storageAfter change the third_part/proto, you should run
cd third_party
bash build.sh --package protobufto rebuild .pb.h and .pb.cc
Switch to the project directory:
cd build
./bin/vdb_serverYou can open another terminal,and input commands, following the example commands in test/test.h.
Remember to modify the port to keep it consistent with the one in vectordb_config
You can build and use different google test like this
Switch to the project directory:
$ mkdir build
$ cd build
$ cmake ..
$ make faiss_index_test
$ cd test
$ .faiss_index_testSwitch to the project directory:
cd build
./bin/vdb_server 1在另一终端中
cd build
./bin/vdb_server 21,2为nodeID
只要在vectordb_config中配置好node信息,即可在终端中执行./vdb_server $nodeid(不输入nodeid则默认为1)
选取其中一个作为主节点,将其他节点作为从节点加入集群:
curl -X POST -H "Content-Type: application/json" -d '{"nodeId": 2, "endpoint": "127.0.0.1:8082"}' http://localhost:7781/AdminService/AddFollower想让哪个节点作为主节点,就向该节点port发出请求,-d后为待加入的从节点信息,一次加入一个,加入完毕后通过List来查看是否已加入集群:
curl -X GET http://localhost:7781/AdminService/ListNode此外可以通过GetNode来查看当前节点状态
curl -X GET http://localhost:7781/AdminService/GetNode向主节点中发起upsert请求,从节点中进行search,也可查到数据;
curl -X POST -H "Content-Type: application/json" -d '{"vectors": [0.999], "id":6, "int_field":47,"indexType": "FLAT"}' http://localhost:7781/UserService/upsert
curl -X POST -H "Content-Type: application/json" -d '{"vectors": [0.999], "k": 5 , "indexType": "FLAT","filter":{"fieldName":"int_field","value":47,"op":"="}}' http://localhost:7782/UserService/search目前支持流量转发、故障切换、集群分片
建立vdb_server集群后,利用vdb_server_master管理集群元数据信息,需要手动提前部署好etcd
cd build
./bin/vdb_server_master可通过如下接口管理vdb_server集群信息(role 1为从节点,0为主节点)
#查看node信息
curl -X POST -H "Content-Type: application/json" -d '{"instanceId" : 1,"nodeId": 1}' http://localhost:6060/MasterService/GetNodeInfo
#查看instance下的所有node信息
curl -X POST -H "Content-Type: application/json" -d '{"instanceId" : 1}' http://localhost:6060/MasterService/GetInstance
#增加node2信息
curl -X POST -H "Content-Type: application/json" -d '{"instanceId": 1, "nodeId": 2, "url": "http://127.0.0.1:7782", "role": 1, "status": 0}' http://localhost:6060/MasterService/AddNode
#删除node2信息
curl -X DELETE -H "Content-Type: application/json" -d '{"instanceId" : 1,"nodeId": 2}' http://localhost:6060/MasterService/RemoveNode
#更新分区信息:
curl -X POST http://localhost:6060/MasterService/UpdatePartitionConfig -H "Content-Type: application/json" -d '{
"instanceId": 1,
"partitionKey": "id",
"numberOfPartitions": 2,
"partitions": [
{"partitionId": 0, "nodeId": 1},
{"partitionId": 0, "nodeId": 2},
{"partitionId": 0, "nodeId": 3},
{"partitionId": 1, "nodeId": 4}
]
}'
#获取分区信息:
curl -X POST -H "Content-Type: application/json" -d '{"instanceId" : 1}' http://localhost:6060/MasterService/GetPartitionConfig利用vdb_server_proxy提供统一的流量入口(读写分离,写必然在主节点)
cd build
./bin/vdb_server_master如果master中设置了partition,则请求也会根据分片进行转发(一个分片需要有一个主节点)
#读请求
curl -X POST -H "Content-Type: application/json" -d '{"vectors": [0.9], "k": 5, "indexType": "FLAT", "filter":{"fieldName":"int_field","value":49, "op":"="}}' http://localhost:6061/ProxyService/search
#写请求
curl -X POST -H "Content-Type: application/json" -d '{"id": 6, "vectors": [0.9], "int_field": 49, "indexType": "FLAT"}' http://localhost:6061/ProxyService/upsert
#强制读主
curl -X POST -H "Content-Type: application/json" -d '{"vectors": [0.89], "k": 5, "indexType": "FLAT", "filter":{"fieldName":"int_field","value":49 ,"op":"="},"forceMaster" : true}' http://localhost:6061/ProxyService/search- 《从零构建向量数据库》
- faiss
- hnswlib
- openblas
- brpc
- rapidjson
- httplib
- spdlog
- gflags
- protobuf
- glog
- crypto
- leveldb
- ssl
- z
- rocksdb
- snappy
- lz4
- bz2
- roaring
- gtest
- backward-cpp
- nuraft
- curl
- etcdclient
非常感谢 Xiaoccer , third_party下的patches,build.sh,CMakeLists.txt基于 TineVecDB 中third_party下的内容进行修改
vectorDB is licensed under the MIT License. For more details, please refer to the LICENSE file.
目前已完成《从零构建向量数据库》前五章所有内容,并在此之上做如下优化:
将代码结构化,添加CMakeList一键编译; 添加三方库一键编译;
将涉及的路径、端口等用config文件进行配置 读取config文件进行配置解析,采用Cfg单例来读取解析后的配置内容
添加google test并加入cmake,支持编译运行单元测试 目前添加了部分单元测试
添加backward-cpp 只需在CMakeList中添加需要backward的target即可 例如add_backward(vdb_server)
添加 clang-tidy,clang-format
添加单例模板类,将IndexFactory改造为真正的单例模式
采用snappy压缩算法,对WAL进行压缩及解压缩,减少WAL文件所占空间
利用protobuf + brpc替代httplib