Skip to content

Commit c355157

Browse files
prakriti-solankeykaustubh-darekarkartikpersistentdependabot[bot]praveshkumar1988
authored
Staging to Main branch (#1492)
* different url web page having same title issue fixed (#1110) * Text file encoding issue (#1126) * Resolved UnicodeDecodeError for some txt files * changes done to process utf-8 file with unstructured only * Resolved UnicodeDecodeError issue for files having other than utf-8 encoding (#1128) * Resolved UnicodeDecodeError for some txt files * changes done to process utf-8 file with unstructured only * ensure file path is correct and as per intended format to avoid security risks * Sanitizing additional instruction (#1130) * resolved UnboundLocalError: local variable 'graphDb_data_Access' referenced before assignment (#1129) * connection not there message for data resources (#1131) * dockerfile updates and utils functions change * fix: readonly issue fix * Resolved uploaded file extraction failing on deployed version (#1136) * UI fixes v0.7.2 (#1138) * UI fixes * dropdown option fix * Message Overlay and button size fixes * select components size change from large to small post package update (#1135) * connection status check * connection check update * fix * added dropdown for graph visulization * fixes: added explore icon fixed responsivness issue * document_status api fix --------- Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com> * Update BreakDownPopOver.tsx * chunk_count_val * type error * spell fixes and protected route fixes * top entities not found - bug resolved (#1150) * top entities not fpund bug resolved * refined code * limiting content fetching to current wikipedia page (#1151) * added the link for login redirectig * removed loading statw * added the padding and changed the message * Bump re-resizable from 6.9.17 to 6.11.2 in /frontend (#1149) Bumps [re-resizable](https://github.com/bokuweb/react-resizable-box) from 6.9.17 to 6.11.2. - [Release notes](https://github.com/bokuweb/react-resizable-box/releases) - [Changelog](https://github.com/bokuweb/re-resizable/blob/master/CHANGELOG.md) - [Commits](https://github.com/bokuweb/react-resizable-box/commits) --- updated-dependencies: - dependency-name: re-resizable dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump eslint-plugin-react from 7.35.0 to 7.37.4 in /frontend (#1148) Bumps [eslint-plugin-react](https://github.com/jsx-eslint/eslint-plugin-react) from 7.35.0 to 7.37.4. - [Release notes](https://github.com/jsx-eslint/eslint-plugin-react/releases) - [Changelog](https://github.com/jsx-eslint/eslint-plugin-react/blob/master/CHANGELOG.md) - [Commits](https://github.com/jsx-eslint/eslint-plugin-react/compare/v7.35.0...v7.37.4) --- updated-dependencies: - dependency-name: eslint-plugin-react dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump @types/node from 20.14.14 to 22.13.9 in /frontend (#1152) Bumps [@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node) from 20.14.14 to 22.13.9. - [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases) - [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node) --- updated-dependencies: - dependency-name: "@types/node" dependency-type: direct:development update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump eslint-config-prettier from 8.10.0 to 10.0.2 in /frontend (#1146) Bumps [eslint-config-prettier](https://github.com/prettier/eslint-config-prettier) from 8.10.0 to 10.0.2. - [Release notes](https://github.com/prettier/eslint-config-prettier/releases) - [Changelog](https://github.com/prettier/eslint-config-prettier/blob/main/CHANGELOG.md) - [Commits](https://github.com/prettier/eslint-config-prettier/compare/v8.10.0...v10.0.2) --- updated-dependencies: - dependency-name: eslint-config-prettier dependency-type: direct:development update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump react-dropzone from 14.3.5 to 14.3.8 in /frontend (#1145) Bumps [react-dropzone](https://github.com/react-dropzone/react-dropzone) from 14.3.5 to 14.3.8. - [Release notes](https://github.com/react-dropzone/react-dropzone/releases) - [Commits](https://github.com/react-dropzone/react-dropzone/compare/v14.3.5...v14.3.8) --- updated-dependencies: - dependency-name: react-dropzone dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update dependabot.yml * Update the query to check DB is gds version (#1153) * Entity details shown for entity mode (#1154) * Dev (#1132) * Read only mode for unauthenticated users (#1046) * llm name changes * build fix * default mode fix * ragas model names update * lint fixes * Chunk Entities API condition * added the tooltip for unsupported lllms for ragas metric loading * removed unused imports * multimode fix when we get error response * mode changes for score display * fix: Fixed the details state handling between multiple chats feature: Added the warning banner If selected llm model is not supported for raga's evaluation * Fix: Entity Mode Width Fix * diffbot fix for async (#797) * Minor changes (#798) * added congig variable for default diffbot chat model * fulltext index creation is skipped when the labels are empty * entity vector change * added optinal to communities for entity mode * updated the entity query --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * New: Added the supported llm models for ragas evaluation * Fix: Communitites Tab is displayed based communitites length * added the conversation download button (#800) * model name correction * chatmode switch mode fix * Add API payload GCP logging (#805) * Adding Links to get neighboring nodes (#796) * addition of link * added neighbours query * implemented with driver * updated the query * communitiesInfo name change * communities.tsx removed * api integration * modified response * entities change * chunk and communities * chunk space removal * added element id to chunks * loading on click * format changes * added file name for Dcoumrnt node * chat token cut off model name update * icon change * duplicate sources removal * Entity change --------- Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com> * added error message for doc retriver (#807) * copy row (#803) * copy row * column for copy * column copy * Raga's Evaluation For Multi Modes (#806) * Updatedmodels for ragas eval * context utilization metrics removed * updated supported llms for ragas * removed context utilization * Implemented Parallel API * multi api calls error resolved * MultiMode Metrics * Fix: Metric Evalution For Single Mode * multi modes ragas evaluation * api payload changes * metric api output format changed * multi mode ragas changes * removed pre process dataset * api response changes * Multimode metrics api integration * nan error for no answer resolved * QA integration changes --------- Co-authored-by: kaustubh-darekar <kaustubh_darekar@persistent.com> * lint fixes * fix: multimode metrics state handling fix: lint fixes * fix: Multimode metrics mode change state issue fix: chunk list style issue * fix: list style fix * Correct TYPO mistake * added new env for ragas embedding model * Props name changes (#811) * Props name changes * removed the accesstoken from row on copy action * props changes for dropzone component * graph view changes --------- Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com> * test * view graph * nodes count and relationshipcount updation fix * sourceUrl Fix * empty string "" fix to keep the default values we should keep the value blank instead "" * prop changes * props changes * retry condition update for failed files (#820) * Chat modes name changes (#815) * Props name changes * removed the accesstoken from row on copy action * updated chat mode names * Chat Modes Name Changes * lint fixes * using readble format In UI * removal of size to avoid console warning * key add --------- Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com> Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com> * Youtube transcript fix with proxy (#822) * update script for async func * ragas changes for graph retrieval mode. context added in api output (#825) * Remove extract latency from logging and add LIMIT in duplicate nodes * Document updates (#828) * document updated with ragas evaluation information * formatting changes * chatbot api documentation updated * api details added in document * function name changed for drop create vector index api * Update README.md * updated api structire in docs (#827) * Update backend_docs.adoc * 821 llm model listing (#823) * added logic for document filters * LLM models * message change * link added * removed the text --------- Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com> * Exclude session lable node from duplicate nodes list * Added the tooltip for disabled llm option (#835) * node size changes * mode removal of rows check * formatting * Exclude __Entity__ node label from duplicate node list * Update README.md * Update README.md * Update README.md * Update README.md * fixed the youtube link * Security header and GZIPMiddleware (#847) * Added security header all API * Add GZipMiddleware * Chunk Text Details (#850) * Community title added * Added api for fetching chunk text details * output format changed for chunk text * integrated the service layer for chunkdata * added the chunks * formatting output of llm call for title generation * formatting llm output for title generation * added flex row * Changes related to pagination of fetch chunk api * Integrated the pagination * page changes error resolved for fetch chunk api * for get neighbours api , community title added in properties * moving community title related changes to separate branch * Removed Query module from fastapi import statement * icon changes --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Communities Id to Title (#851) * Staging to main (#735) * Dev (#537) * format fixes and graph schema indication fix * Update README.md * added chat modes variable in env updated the readme * spell fix * added the chat mode in env table * added the logos * fixed the overflow issues * removed the extra fix * Fixed specific scenario "when the text from schema closes it should reopen the previous modal" * readme changes * removed dev console logs * added new retrieval query (#533) * format fixes and tab rendering fix * fixed the setting modal reopen issue --------- Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com> Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com> * disabled the sumbit buttom on loading * Deduplication tab (#566) * de-duplication API * Update De-Duplicate query * created the Deduplication tab * added the API service * added the removeable tags for similar nodes in deduplication tab * Integrate Tag * added GraphLabel * added loader state * added the merge service * integrated the merge API * Merge Query issue fixed * Auto refresh the duplicate nodes after merging operation * added the description for de duplication * reset on merging --------- Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com> * Update frontend_docs.adoc (#538) * Update frontend_docs.adoc * doc update * Images * Images folder change * Images folder change * test image * Update frontend_docs.adoc * image change * Update frontend_docs.adoc * Update frontend_docs.adoc * added the Graph Mode SS * added the Query SS * Update frontend_docs.adoc * conflics fix * conflict fix * Update frontend_docs.adoc --------- Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com> Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * updated langchain versions (#565) * Update the De-Duplication query * Node relationship id type none issue (#547) * de-duplication API * Update De-Duplicate query * Issue fixed Nodes,Relationship Id and Type None or Blank * added the tooltips * type fix * Unneccory import * added score threshold and added some error handling (#571) * Update requirements.txt * Tooltip and other UI fixes (#572) * Staging To Main (#495) * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com> * recent merges * pdf deletion due to out of diskspace * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * Convert is_cancelled value from string to bool * added the default page size * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * offset in chunks (#389) * page number in gcs loader (#393) * added youtube timestamps (#392) * chat pop up button (#387) * expand * minimize-icon * css changes * chat history * chatbot wider Side Nav * expand icon * chatbot UI * Delete * merge fixes * code suggestions --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * chunks create before extraction using is_pre_process variable (#383) * chunks create before extraction using is_pre_process variable * Return total pages for Model * update requirement.txt * total pages on uplaod API * added the Confirmation Dialog * added the selected files into the confirmation modal * format and lint fixes * added the stop watch image * fileselection on alert dialog * Add timeout in docker for gunicorn workers * Add cancel icon to info popup (#384) * Info Modal Changes * css changes * recent merges * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com> * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * added the default page size * Convert is_cancelled value from string to bool * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Save Total Pages in DB * Added total Pages * file selection when we didn't select anything from Main table * added the danger icon only for large files * added the overflow for more files and file selection for all new files * moved the interface to types * added the icon accoroding to the source * set total page for wiki and youtube * h3 heading * merge * updated the alert on basis if total pages * deleted chunks * polling based on total pages * isNan check * large file based on file size for s3 and gcs * file source in server side event * time calculation based on chunks for gcs and s3 --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com> Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com> Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com> * fixed the layout issue * Populate graph schema (#399) * crreate new endpoint populate_graph_schema and update the query for getting lables from DB * Added main.py changes * conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396) * added the condtion * removed llms * Fixed issue : Remove extra unused param * get emb only if used (#278) * Chatbot chunks (#402) * Added file name to the content sent to LLM * added chunk text in the response * increased the docs parts sent to llm * Modified graph query * mardown rendering * youtube starttime * icons * offset changes * removed the files due to codespace space issue --------- Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com> Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Settings modal to support generating the labels from the llm by using text given by user (#405) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * fixed css issue * fixed status blank issue * Modified response when no docs is retrived (#413) * Fixed env/docker-compose for local deployments + README doc (#410) * Fixed env/docker-compose for local deployments + README doc * wrong place for ENV in README * by default, removed langsmith + fixed knn score string to float * by default, removed langsmith + fixed knn score string to float * Fixed strings in docker-compose env * Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop) * Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that * Support for all unstructured files (#401) * all unstructured files * responsiveness * added file type * added the extensions * spell mistake * ppt file changes --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * Extract schema using direct ChatOpenAI API and Chain * integrated the checkbox for schema to text dialog * Update SettingModal.tsx --------- Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com> * gcs file content read via storage client (#417) * gcs file content read via storage client * added the access token the file state --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * pypdf2 to read files from gcs (#420) * 407 remove driver from frontend (#416) * removed driver * removed API * connecting to database on page refresh --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Css handling of info modal and Tooltips (#418) * css change * toolTips * Sidebar Tooltips * copy to clip * css change * added image types * added gcs * type fix * docker changes * speech * added the toolip for dropzone sources --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Fixed retrival bugs (#421) * yarn format fixes * changed the delete message * added the cancel button * changed the message on tooltip * added space * UI fixes * tooltip for setting * updated req * wikipedia URL input (#424) * accept only wikipedia links * added wikipedia link * added wikilink regex * wikipedia single url only * changed the alert message * wording change * pushed validation state persist error --------- Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com> * speech and copy (#422) * speech and copy * startTime * added chunk properties * tooltips --------- Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com> Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Fixed issue for out of range in KNN API * solved conflicts * conflict solved * Remove logging info from update KNN API * tooltip changes * format and lint fixes * responsiveness changes * Fixed issue for total pages GCS, S3 * UI polishing (#428) * button and tooltip changes * checking validation on change * settings module populate fix * format fixes * opening the modal after auth success * removed the limit * added the scrobar for dropdowns * speech state (#426) * speech state * Button Details changes * delete wording change * Total pages in buckets (#431) * page number NA for buckets * added N/A for gcs and s3 pages * total pages for gcs * remove unwanted logger --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * removed the max width * Update FileTable.tsx * Update the docker file * Modified prompt (#438) * Update Dockerfile * Update Dockerfile * Update Dockerfile * rendering Fix * Local file upload gcs (#442) * Uplaod file to GCS * GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled * Add life cycle rule on uploaded bucket * pdf upload local and gcs bucket check * delete files when processed and extract changes --------- Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com> * Modified chat length and entities used (#443) * metadata for unstructured files (#446) * Unstructured file metadata (#447) * metadata for unstructured files * sleep in gcs upload * updated * icons added to chunks (#435) * icons added to chunks * info modal icons * Dev (#433) * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com> * recent merges * pdf deletion due to out of diskspace * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * Convert is_cancelled value from string to bool * added the default page size * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * offset in chunks (#389) * page number in gcs loader (#393) * added youtube timestamps (#392) * chat pop up button (#387) * expand * minimize-icon * css changes * chat history * chatbot wider Side Nav * expand icon * chatbot UI * Delete * merge fixes * code suggestions --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * chunks create before extraction using is_pre_process variable (#383) * chunks create before extraction using is_pre_process variable * Return total pages for Model * update requirement.txt * total pages on uplaod API * added the Confirmation Dialog * added the selected files into the confirmation modal * format and lint fixes * added the stop watch image * fileselection on alert dialog * Add timeout in docker for gunicorn workers * Add cancel icon to info popup (#384) * Info Modal Changes * css changes * recent merges * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com> * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * added the default page size * Convert is_cancelled value from string to bool * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Save Total Pages in DB * Added total Pages * file selection when we didn't select anything from Main table * added the danger icon only for large files * added the overflow for more files and file selection for all new files * moved the interface to types * added the icon accoroding to the source * set total page for wiki and youtube * h3 heading * merge * updated the alert on basis if total pages * deleted chunks * polling based on total pages * isNan check * large file based on file size for s3 and gcs * file source in server side event * time calculation based on chunks for gcs and s3 --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com> Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com> Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com> * fixed the layout issue * Populate graph schema (#399) * crreate new endpoint populate_graph_schema and update the query for getting lables from DB * Added main.py changes * conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396) * added the condtion * removed llms * Fixed issue : Remove extra unused param * get emb only if used (#278) * Chatbot chunks (#402) * Added file name to the content sent to LLM * added chunk text in the response * increased the docs parts sent to llm * Modified graph query * mardown rendering * youtube starttime * icons * offset changes * removed the files due to codespace space issue --------- Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com> Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Settings modal to support generating the labels from the llm by using text given by user (#405) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * fixed css issue * fixed status blank issue * Modified response when no docs is retrived (#413) * Fixed env/docker-compose for local deployments + README doc (#410) * Fixed env/docker-compose for local deployments + README doc * wrong place for ENV in README * by default, removed langsmith + fixed knn score string to float * by default, removed langsmith + fixed knn score string to float * Fixed strings in docker-compose env * Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop) * Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that * Support for all unstructured files (#401) * all unstructured files * responsiveness * added file type * added the extensions * spell mistake * ppt file changes --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * Extract schema using direct ChatOpenAI API and Chain * integrated the checkbox for schema to text dialog * Update SettingModal.tsx --------- Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com> * gcs file content read via storage client (#417) * gcs file content read via storage client * added the access token the file state --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * pypdf2 to read files from gcs (#420) * 407 remove driver from frontend (#416) * removed driver * removed API * connecting to database on page refresh --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Css handling of info modal and Tooltips (#418) * css change * toolTips * Sidebar Tooltips * copy to clip * css change * added image types * added gcs * type fix * docker changes * speech * added the toolip for dropzone sources --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Fixed retrival bugs (#421) * yarn format fixes * changed the delete message * added the cancel button * changed the message on tooltip * added space * UI fixes * tooltip for setting * updated req * wikipedia URL input (#424) * accept only wikipedia links * added wikipedia link * added wikilink regex * wikipedia single url only * changed the alert message * wording change * pushed validation state persist error --------- Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com> * speech and copy (#422) * speech and copy * startTime * added chunk properties * tooltips --------- Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com> Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Fixed issue for out of range in KNN API * solved conflicts * conflict solved * Remove logging info from update KNN API * tooltip changes * format and lint fixes * responsiveness changes * Fixed issue for total pages GCS, S3 * UI polishing (#428) * button and tooltip changes * checking validation on change * settings module populate fix * format fixes * opening the modal after auth success * removed the limit * added the scrobar for dropdowns * speech state (#426) * speech state * Button Details changes * delete wording change * Total pages in buckets (#431) * page number NA for buckets * added N/A for gcs and s3 pages * total pages for gcs * remove unwanted logger --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * removed the max width * Update FileTable.tsx * Update the docker file * Modified prompt (#438) * Update Dockerfile * Update Dockerfile * Update Dockerfile * rendering Fix * Local file upload gcs (#442) * Uplaod file to GCS * GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled * Add life cycle rule on uploaded bucket * pdf upload local and gcs bucket check * delete files when processed and extract changes --------- Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com> * Modified chat length and entities used (#443) * metadata for unstructured files (#446) * Unstructured file metadata (#447) * metadata for unstructured files * sleep in gcs upload * updated * icons added to chunks (#435) * icons added to chunks * info modal icons --------- Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com> Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com> Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com> Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com> Co-authored-by: Ajay Meena <meenajy1996@gmail.com> Co-authored-by: Morgan Senechal <morgan@neo4j.com> Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com> * fixed gcs status message issue * added if check for failed count * Null issue Fixed from backend for upload API and graph_document when model name mismatch * added word break issue * Added neo4j-rust-ext * processing time estimation based on bytes * File extension upper case fixed, File delete from GCS or local based on env variable. * timer per byte * Update Dockerfile * Adding sort rows on the table (#451) * Gcs upload folder hashed (#453) * implement foldername hashed in GCS bucket uplaod * Raise exception if invalid model selected * folder name for gcs upload --------- Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com> * upload all unstructuredfiles to gcs (#455) * Mofified chunk query (#454) * Added libre office for fixing error -- soffice command was not found. Please install libreoffice on your system and try again. - Install instructions: https://www.libreoffice.org/get-help/install-howto/ - Mac: https://formulae.brew.sh/cask/libreoffice - Debian: https://wiki.debian.org/LibreOffice" * Fix the PARTIAL CONTENT issue * File-table no data found (#456) * 'file-table'' * review comment * Llm format change (#459) * changed the llm models format to lowercase * added the error message * llm model changes * format fixes * removed unused import * added the capitalize method * delete files from merged_file_path only if source is local file --------- Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com> * commented total page code (#460) * format fixes * removed the disabled check on dropdown * Large file env * DEV to STAGING (#461) * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com> * recent merges * pdf deletion due to out of diskspace * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * Convert is_cancelled value from string to bool * added the default page size * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * offset in chunks (#389) * page number in gcs loader (#393) * added youtube timestamps (#392) * chat pop up button (#387) * expand * minimize-icon * css changes * chat history * chatbot wider Side Nav * expand icon * chatbot UI * Delete * merge fixes * code suggestions --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * chunks create before extraction using is_pre_process variable (#383) * chunks create before extraction using is_pre_process variable * Return total pages for Model * update requirement.txt * total pages on uplaod API * added the Confirmation Dialog * added the selected files into the confirmation modal * format and lint fixes * added the stop watch image * fileselection on alert dialog * Add timeout in docker for gunicorn workers * Add cancel icon to info popup (#384) * Info Modal Changes * css changes * recent merges * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com> * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * added the default page size * Convert is_cancelled value from string to bool * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Save Total Pages in DB * Added total Pages * file selection when we didn't select anything from Main table * added the danger icon only for large files * added the overflow for more files and file selection for all new files * moved the interface to types * added the icon accoroding to the source * set total page for wiki and youtube * h3 heading * merge * updated the alert on basis if total pages * deleted chunks * polling based on total pages * isNan check * large file based on file size for s3 and gcs * file source in server side event * time calculation based on chunks for gcs and s3 --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com> Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com> Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com> * fixed the layout issue * Populate graph schema (#399) * crreate new endpoint populate_graph_schema and update the query for getting lables from DB * Added main.py changes * conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396) * added the condtion * removed llms * Fixed issue : Remove extra unused param * get emb only if used (#278) * Chatbot chunks (#402) * Added file name to the content sent to LLM * added chunk text in the response * increased the docs parts sent to llm * Modified graph query * mardown rendering * youtube starttime * icons * offset changes * removed the files due to codespace space issue --------- Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com> Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Settings modal to support generating the labels from the llm by using text given by user (#405) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * fixed css issue * fixed status blank issue * Modified response when no docs is retrived (#413) * Fixed env/docker-compose for local deployments + README doc (#410) * Fixed env/docker-compose for local deployments + README doc * wrong place for ENV in README * by default, removed langsmith + fixed knn score string to float * by default, removed langsmith + fixed knn score string to float * Fixed strings in docker-compose env * Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop) * Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that * Support for all unstructured files (#401) * all unstructured files * responsiveness * added file type * added the extensions * spell mistake * ppt file changes --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * Extract schema using direct ChatOpenAI API and Chain * integrated the checkbox for schema to text dialog * Update SettingModal.tsx --------- Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com> * gcs file content read via storage client (#417) * gcs file content read via storage client * added the access token the file state --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * pypdf2 to read files from gcs (#420) * 407 remove driver from frontend (#416) * removed driver * removed API * connecting to database on page refresh --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Css handling of info modal and Tooltips (#418) * css change * toolTips * Sidebar Tooltips * copy to clip * css change * added image types * added gcs * type fix * docker changes * speech * added the toolip for dropzone sources --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Fixed retrival bugs (#421) * yarn format fixes * changed the delete message * added the cancel button * changed the message on tooltip * added space * UI fixes * tooltip for setting * updated req * wikipedia URL input (#424) * accept only wikipedia links * added wikipedia link * added wikilink regex * wikipedia single url only * changed the alert message * wording change * pushed validation state persist error --------- Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com> * speech and copy (#422) * speech and copy * startTime * added chunk properties * tooltips --------- Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com> Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Fixed issue for out of range in KNN API * solved conflicts * conflict solved * Remove logging info from update KNN API * tooltip changes * format and lint fixes * responsiveness changes * Fixed issue for total pages GCS, S3 * UI polishing (#428) * button and tooltip changes * checking validation on change * settings module populate fix * format fixes * opening the modal after auth success * removed the limit * added the scrobar for dropdowns * speech state (#426) * speech state * Button Details changes * delete wording change * Total pages in buckets (#431) * page number NA for buckets * added N/A for gcs and s3 pages * total pages for gcs * remove unwanted logger --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * removed the max width * Update FileTable.tsx * Update the docker file * Modified prompt (#438) * Update Dockerfile * Update Dockerfile * Update Dockerfile * rendering Fix * Local file upload gcs (#442) * Uplaod file to GCS * GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled * Add life cycle rule on uploaded bucket * pdf upload local and gcs bucket check * delete files when processed and extract changes --------- Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com> * Modified chat length and entities used (#443) * metadata for unstructured files (#446) * Unstructured file metadata (#447) * metadata for unstructured files * sleep in gcs upload * updated * icons added to chunks (#435) * icons added to chunks * info modal icons * fixed gcs status message issue * added if check for failed count * Null issue Fixed from backend for upload API and graph_document when model name mismatch * added word break issue * Added neo4j-rust-ext * processing time estimation based on bytes * File extension upper case fixed, File delete from GCS or local based on env variable. * timer per byte * Update Dockerfile * Adding sort rows on the table (#451) * Gcs upload folder hashed (#453) * implement foldername hashed in GCS bucket uplaod * Raise exception if invalid model selected * folder name for gcs upload --------- Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com> * upload all unstructuredfiles to gcs (#455) * Mofified chunk query (#454) * Added libre office for fixing error -- soffice command was not found. Please install libreoffice on your system and try again. - Install instructions: https://www.libreoffice.org/get-help/install-howto/ - Mac: https://formulae.brew.sh/cask/libreoffice - Debian: https://wiki.debian.org/LibreOffice" * Fix the PARTIAL CONTENT issue * File-table no data found (#456) * 'file-table'' * review comment * Llm format change (#459) * changed the llm models format to lowercase * added the error message * llm model changes * format fixes * removed unused import * added the capitalize method * delete files from merged_file_path only if source is local file --------- Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com> * commented total page code (#460) * format fixes * removed the disabled check on dropdown * Large file env --------- Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com> Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com> Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com> Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com> Co-authored-by: Ajay Meena <meenajy1996@gmail.com> Co-authored-by: Morgan Senechal <morgan@neo4j.com> Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com> * DEV to STAGING (#462) * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com> * recent merges * pdf deletion due to out of diskspace * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * Convert is_cancelled value from string to bool * added the default page size * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * offset in chunks (#389) * page number in gcs loader (#393) * added youtube timestamps (#392) * chat pop up button (#387) * expand * minimize-icon * css changes * chat history * chatbot wider Side Nav * expand icon * chatbot UI * Delete * merge fixes * code suggestions --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * chunks create before extraction using is_pre_process variable (#383) * chunks create before extraction using is_pre_process variable * Return total pages for Model * update requirement.txt * total pages on uplaod API * added the Confirmation Dialog * added the selected files into the confirmation modal * format and lint fixes * added the stop watch image * fileselection on alert dialog * Add timeout in docker for gunicorn workers * Add cancel icon to info popup (#384) * Info Modal Changes * css changes * recent merges * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com> * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * added the default page size * Convert is_cancelled value from string to bool * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Save Total Pages in DB * Added total Pages * file selection when we didn't select anything from Main table * added the danger icon only for large files * added the overflow for more files and file selection for all new files * moved the interface to types * added the icon accoroding to the source * set total page for wiki and youtube * h3 heading * merge * updated the alert on basis if total pages * deleted chunks * polling based on total pages * isNan check * large file based on file size for s3 and gcs * file source in server side event * time calculation based on chunks for gcs and s3 --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com> Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com> Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com> * fixed the layout issue * Populate graph schema (#399) * crreate new endpoint populate_graph_schema and update the query for getting lables from DB * Added main.py changes * conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396) * added the condtion * removed llms * Fixed issue : Remove extra unused param * get emb only if used (#278) * Chatbot chunks (#402) * Added file name to the content sent to LLM * added chunk text in the response * increased the docs parts sent to llm * Modified graph query * mardown rendering * youtube starttime * icons * offset changes * removed the files due to codespace space issue --------- Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com> Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Settings modal to support generating the labels from the llm by using text given by user (#405) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * fixed css issue * fixed status blank issue * Modified response when no docs is retrived (#413) * Fixed env/docker-compose for local deployments + README doc (#410) * Fixed env/docker-compose for local deployments + README doc * wrong place for ENV in README * by default, removed langsmith + fixed knn score string to float * by default, removed langsmith + fixed knn score string to float * Fixed strings in docker-compose env * Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop) * Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that * Support for all unstructured files (#401) * all unstructured files * responsiveness * added file type * added the extensions * spell mistake * ppt file changes --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * Extract schema using direct ChatOpenAI API and Chain * integrated the checkbox for schema to text dialog * Update SettingModal.tsx --------- Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com> * gcs file content read via storage client (#417) * gcs file content read via storage client * added the access token the file state --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * pypdf2 to read files from gcs (#420) * 407 remove driver from frontend (#416) * removed driver * removed API * connecting to database on page refresh --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Css handling of info modal and Tooltips (#418) * css change * toolTips * Sidebar Tooltips * copy to clip * css change * added image types * added gcs * type fix * docker changes * speech * added the toolip for dropzone sources --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Fixed retrival bugs (#421) * yarn format fixes * changed the delete message * added the cancel button * changed the message on tooltip * added space * UI fixes * tooltip for setting * updated req * wikipedia URL input (#424) * accept only wikipedia links * added wikipedia link * added wikilink regex * wikipedia single url only * changed the alert message * wording change * pushed validation state persist error --------- Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com> * speech and copy (#422) * speech and copy * startTime * added chunk properties * tooltips --------- Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com> Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * Fixed issue for out of range in KNN API * solved conflicts * conflict solved * Remove logging info from update KNN API * tooltip changes * format and lint fixes * responsiveness changes * Fixed issue for total pages GCS, S3 * UI polishing (#428) * button and tooltip changes * checking validation on change * settings module populate fix * format fixes * opening the modal after auth success * removed the limit * added the scrobar for dropdowns * speech state (#426) * speech state * Button Details changes * delete wording change * Total pages in buckets (#431) * page number NA for buckets * added N/A for gcs and s3 pages * total pages for gcs * remove unwanted logger --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * removed the max width * Update FileTable.tsx * Update the docker file * Modified prompt (#438) * Update Dockerfile * Update Dockerfile * Update Dockerfile * rendering Fix * Local file upload gcs (#442) * Uplaod file to GCS * GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled * Add life cycle rule on uploaded bucket * pdf upload local and gcs bucket check * delete files when processed and extract changes --------- Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com> * Modified chat length and entities used (#443) * metadata for unstructured files (#446) * Unstructured file metadata (#447) * metadata for unstructured files * sleep in gcs upload * updated * icons added to chunks (#435) * icons added to chunks * info modal icons * fixed gcs status message issue * added if check for failed count * Null issue Fixed from backend for upload API and graph_document when model name mismatch * added word break issue * Added neo4j-rust-ext * processing time estimation based on bytes * File extension upper case fixed, File delete from GCS or local based on env variable. * timer per byte * Update Dockerfile * Adding sort rows on the table (#451) * Gcs upload folder hashed (#453) * implement foldername hashed in GCS bucket uplaod * Raise exception if invalid model selected * folder name for gcs upload --------- Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com> * upload all unstructuredfiles to gcs (#455) * Mofified chunk query (#454) * Added libre office for fixing error -- soffice command was not found. Please install libreoffice on your system and try again. - Install instructions: https://www.libreoffice.org/get-help/install-howto/ - Mac: https://formulae.brew.sh/cask/libreoffice - Debian: https://wiki.debian.org/LibreOffice" * Fix the PARTIAL CONTENT issue * File-table no data found (#456) * 'file-table'' * review comment * Llm format change (#459) * changed the llm models format to lowercase * added the error message * llm model changes * format fixes * removed unused import * added the capitalize method * delete files from merged_file_path only if source is local file --------- Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com> * commented total page code (#460) * format fixes * removed the disabled check on dropdown * Large file env --------- Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com> Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com> Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com> Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com> Co-authored-by: Ajay Meena <meenajy1996@gmail.com> Co-authored-by: Morgan Senechal <morgan@neo4j.com> Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com> * added upload api * changed the dropzone error message * Dev to staging (#466) * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- Co-authored-by: Pravesh Kumar <121786590+praveshkumar1988@users.noreply.github.com> * recent merges * pdf deletion due to out of diskspace * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * Convert is_cancelled value from string to bool * added the default page size * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> * offset in chunks (#389) * page number in gcs loader (#393) * added youtube timestamps (#392) * chat pop up button (#387) * expand * minimize-icon * css changes * chat history * chatbot wider Side Nav * expand icon * chatbot UI …
1 parent 2c0421d commit c355157

56 files changed

Lines changed: 2259 additions & 725 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 64 additions & 53 deletions
Large diffs are not rendered by default.

backend/README.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,8 +64,6 @@ Update the environment variable in `.env` file. Refer example.env in backend fol
6464

6565
`OPENAI_API_KEY`: Open AI key to use incase of openai embeddings
6666

67-
`EMBEDDING_MODEL` : "all-MiniLM-L6-v2" or "openai" or "vertexai"
68-
6967
`NEO4J_URI` : Neo4j URL
7068

7169
`NEO4J_USERNAME` : Neo4J database username

backend/example.env

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
OPENAI_API_KEY = "" #Mandatory- Default_Value = "openai_api_key" #This is required if you are using openai embedding model
2-
EMBEDDING_MODEL = "" #OPTIONAL- Default_Value ="" #this can be openai or vertexai or by default all-MiniLM-L6-v2
3-
RAGAS_EMBEDDING_MODEL = "" #OPTIONAL- Default_Value ="openai" #Keep blank if you want to use all-MiniLM-L6-v2 for ragas embeddings
2+
EMBEDDING_MODEL = "" #OPTIONAL- Default_Value ="all-MiniLM-L6-v2" -- Embedding model to be used for text embedding
3+
EMBEDDING_PROVIDER = "" #OPTIONAL- Default_Value ="sentence-transformer" -- Embedding provider to be used for text embedding
44
IS_EMBEDDING = "" #OPTIONAL- Default_Value ="True" --Flag to enable text embedding
55
BUCKET_UPLOAD_FILE = "" #OPTIONAL- Default_Value ="gcs bucket name" -- use the gcs bucket to upload local file to gcs cloud
66
BUCKET_FAILED_FILE = "" #OPTIONAL- Default_Value ="gcs bucket name" -- use the gcs bucket for failed file while extraction
@@ -20,7 +20,7 @@ DUPLICATE_SCORE_VALUE = "" #OPTIONAL- Default_Value = "0.97" -- Node score value
2020
DUPLICATE_TEXT_DISTANCE = "" #OPTIONAL- Default_Value = "3" --This value used to find distance for all node pairs in the graph and calculated based on node properties
2121
DEFAULT_DIFFBOT_CHAT_MODEL="" #OPTIONAL- Default_Value = "openai_gpt_5_mini" #whichever model specified here , need to add config for that model in below format)
2222
GRAPH_CLEANUP_MODEL="" #OPTIONAL- Default_Value = "openai_gpt_5_mini" -- Model name to clean-up graph in post processing
23-
BEDROCK_EMBEDDING_MODEL="" #OPTIONAL - Default_Value = "model_name,aws_access_key,aws_secret_key,region_name" -- If want to use bedrock embedding #model_name="amazon.titan-embed-text-v1"
23+
BEDROCK_EMBEDDING_MODEL_KEY="" #OPTIONAL - Default_Value = "aws_access_key,aws_secret_key,region_name" -- If want to use bedrock embedding
2424
YOUTUBE_TRANSCRIPT_PROXY="" #Mandatory --Proxy key required to process youtube video for getting transcript --Sample Value ="https://user:pass@domain:port"
2525
EFFECTIVE_SEARCH_RATIO="" #OPTIONAL- Default_Value = "5"
2626
MAX_TOKEN_CHUNK_SIZE="" #OPTIONAL- Default_Value = "10000" #Max token used to process/extract the file content.
@@ -44,10 +44,11 @@ LLM_MODEL_CONFIG_bedrock_nova_micro_v1="amazon.nova-micro-v1:0,aws_access_key,aw
4444
LLM_MODEL_CONFIG_bedrock_nova_lite_v1="amazon.nova-lite-v1:0,aws_access_key,aws_secret_key,region_name"
4545
LLM_MODEL_CONFIG_bedrock_nova_pro_v1="amazon.nova-pro-v1:0,aws_access_key,aws_secret_key,region_name"
4646
LLM_MODEL_CONFIG_ollama_llama3="llama3_model_name,model_local_url"
47-
TRACK_TOKEN_USAGE="true" #Add this if you want to track token usage
48-
DAILY_TOKENS_LIMIT="250000" #Mandatory if TRACK_TOKEN_USAGE is true
49-
MONTHLY_TOKENS_LIMIT="1000000" #Mandatory if TRACK_TOKEN_USAGE is true
50-
TOKEN_TRACKER_DB_URI="" #Mandatory if TRACK_TOKEN_USAGE is true
51-
TOKEN_TRACKER_DB_USERNAME="" #Mandatory if TRACK_TOKEN_USAGE is true
52-
TOKEN_TRACKER_DB_PASSWORD="" #Mandatory if TRACK_TOKEN_USAGE is true
53-
TOKEN_TRACKER_DB_DATABASE="" #Mandatory if TRACK_TOKEN_USAGE is true
47+
TRACK_USER_USAGE="true" #Add this if you want to track token usage
48+
DAILY_TOKENS_LIMIT="250000" #Mandatory if TRACK_USER_USAGE is true
49+
MONTHLY_TOKENS_LIMIT="1000000" #Mandatory if TRACK_USER_USAGE is true
50+
TOKEN_TRACKER_DB_URI="" #Mandatory if TRACK_USER_USAGE is true
51+
TOKEN_TRACKER_DB_USERNAME="" #Mandatory if TRACK_USER_USAGE is true
52+
TOKEN_TRACKER_DB_PASSWORD="" #Mandatory if TRACK_USER_USAGE is true
53+
TOKEN_TRACKER_DB_DATABASE="" #Mandatory if TRACK_TOKEN_USAGE is true
54+
AUTHENTICATION_REQUIRED="" #OPTIONAL- Default_Value = "False" -- Enable authentication for API access

backend/score.py

Lines changed: 128 additions & 23 deletions
Large diffs are not rendered by default.

backend/src/QA_integration.py

Lines changed: 22 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,6 @@
3939
)
4040
load_dotenv()
4141

42-
EMBEDDING_MODEL = get_value_from_env("EMBEDDING_MODEL", "sentence_transformer")
43-
4442
class SessionChatHistory:
4543
history_dict = {}
4644

@@ -291,7 +289,7 @@ def retrieve_documents(doc_retriever, messages):
291289

292290
return docs,transformed_question
293291

294-
def create_document_retriever_chain(llm, retriever):
292+
def create_document_retriever_chain(llm, retriever, embedding_provider, embedding_model):
295293
try:
296294
logging.info("Starting to create document retriever chain")
297295

@@ -305,7 +303,7 @@ def create_document_retriever_chain(llm, retriever):
305303
output_parser = StrOutputParser()
306304

307305
splitter = TokenTextSplitter(chunk_size=CHAT_DOC_SPLIT_SIZE, chunk_overlap=0)
308-
EMBEDDING_FUNCTION , _ = load_embedding_model(EMBEDDING_MODEL)
306+
EMBEDDING_FUNCTION , _ = load_embedding_model(embedding_provider, embedding_model)
309307
embeddings_filter = EmbeddingsFilter(
310308
embeddings=EMBEDDING_FUNCTION,
311309
similarity_threshold=CHAT_EMBEDDING_FILTER_SCORE_THRESHOLD
@@ -334,7 +332,7 @@ def create_document_retriever_chain(llm, retriever):
334332
logging.error(f"Error creating document retriever chain: {e}", exc_info=True)
335333
raise
336334

337-
def initialize_neo4j_vector(graph, chat_mode_settings):
335+
def initialize_neo4j_vector(graph, chat_mode_settings, embedding_provider, embedding_model):
338336
try:
339337
retrieval_query = chat_mode_settings.get("retrieval_query")
340338
index_name = chat_mode_settings.get("index_name")
@@ -346,7 +344,7 @@ def initialize_neo4j_vector(graph, chat_mode_settings):
346344

347345
if not retrieval_query or not index_name:
348346
raise ValueError("Required settings 'retrieval_query' or 'index_name' are missing.")
349-
EMBEDDING_FUNCTION , _ = load_embedding_model(EMBEDDING_MODEL)
347+
EMBEDDING_FUNCTION , _ = load_embedding_model(embedding_provider, embedding_model)
350348
if keyword_index:
351349
neo_db = Neo4jVector.from_existing_graph(
352350
embedding=EMBEDDING_FUNCTION,
@@ -397,10 +395,10 @@ def create_retriever(neo_db, document_names, chat_mode_settings,search_k, score_
397395
logging.info(f"Successfully created retriever with search_k={search_k}, score_threshold={score_threshold}")
398396
return retriever
399397

400-
def get_neo4j_retriever(graph, document_names,chat_mode_settings, score_threshold=CHAT_SEARCH_KWARG_SCORE_THRESHOLD):
398+
def get_neo4j_retriever(graph, document_names,chat_mode_settings, score_threshold=CHAT_SEARCH_KWARG_SCORE_THRESHOLD, embedding_provider=None, embedding_model=None):
401399
try:
402400

403-
neo_db = initialize_neo4j_vector(graph, chat_mode_settings)
401+
neo_db = initialize_neo4j_vector(graph, chat_mode_settings, embedding_provider, embedding_model)
404402
# document_names= list(map(str.strip, json.loads(document_names)))
405403
search_k = chat_mode_settings["top_k"]
406404
ef_ratio = get_value_from_env("EFFECTIVE_SEARCH_RATIO", 5, "int")
@@ -412,7 +410,7 @@ def get_neo4j_retriever(graph, document_names,chat_mode_settings, score_threshol
412410
raise Exception(f"An error occurred while retrieving the Neo4jVector index or creating the retriever. Please drop and create a new vector index '{index_name}': {e}") from e
413411

414412

415-
def setup_chat(model, graph, document_names, chat_mode_settings):
413+
def setup_chat(model, graph, document_names, chat_mode_settings, embedding_provider, embedding_model):
416414
start_time = time.time()
417415
try:
418416
if model == "diffbot":
@@ -421,8 +419,8 @@ def setup_chat(model, graph, document_names, chat_mode_settings):
421419
llm, model_name, _ = get_llm(model=model)
422420
logging.info(f"Model called in chat: {model} (version: {model_name})")
423421

424-
retriever = get_neo4j_retriever(graph=graph, chat_mode_settings=chat_mode_settings, document_names=document_names)
425-
doc_retriever = create_document_retriever_chain(llm, retriever)
422+
retriever = get_neo4j_retriever(graph=graph, chat_mode_settings=chat_mode_settings, document_names=document_names, embedding_provider=embedding_provider, embedding_model=embedding_model)
423+
doc_retriever = create_document_retriever_chain(llm, retriever, embedding_provider, embedding_model)
426424

427425
chat_setup_time = time.time() - start_time
428426
logging.info(f"Chat setup completed in {chat_setup_time:.2f} seconds")
@@ -433,23 +431,23 @@ def setup_chat(model, graph, document_names, chat_mode_settings):
433431

434432
return llm, doc_retriever, model_name
435433

436-
def process_chat_response(messages, history, question, model, graph, document_names, chat_mode_settings, email=None, uri=None):
434+
def process_chat_response(messages, history, question, model, graph, document_names, chat_mode_settings, email=None, uri=None, embedding_provider=None, embedding_model=None):
437435
try:
438-
if get_value_from_env("TRACK_TOKEN_USAGE", "false", "bool"):
439-
try:
440-
track_token_usage(email, uri, 0, model)
441-
except LLMGraphBuilderException as e:
442-
logging.error(str(e))
443-
raise RuntimeError(str(e))
444-
llm, doc_retriever, model_version = setup_chat(model, graph, document_names, chat_mode_settings)
436+
# if get_value_from_env("TRACK_USER_USAGE", "false", "bool"):
437+
# try:
438+
# track_token_usage(email, uri, 0, model, operation_type="chat")
439+
# except LLMGraphBuilderException as e:
440+
# logging.error(str(e))
441+
# raise RuntimeError(str(e))
442+
llm, doc_retriever, model_version = setup_chat(model, graph, document_names, chat_mode_settings,embedding_provider, embedding_model)
445443

446444
docs,transformed_question = retrieve_documents(doc_retriever, messages)
447445

448446
if docs:
449447
content, result, total_tokens,formatted_docs = process_documents(docs, question, messages, llm, model, chat_mode_settings)
450-
if get_value_from_env("TRACK_TOKEN_USAGE", "false", "bool"):
451-
latest_token = track_token_usage(email=email, uri=uri, usage=total_tokens, last_used_model=model)
452-
logging.info(f"Total token usage {latest_token} for user {email} ")
448+
# if get_value_from_env("TRACK_USER_USAGE", "false", "bool"):
449+
# latest_token = track_token_usage(email=email, uri=uri, usage=total_tokens, last_used_model=model, operation_type="chat")
450+
# logging.info(f"Total token usage {latest_token} for user {email} ")
453451
else:
454452
content = "I couldn't find any relevant documents to answer your question."
455453
result = {"sources": list(), "nodedetails": list(), "entities": list()}
@@ -664,7 +662,7 @@ def get_chat_mode_settings(mode,settings_map=CHAT_MODE_CONFIG_MAP):
664662

665663
return chat_mode_settings
666664

667-
def QA_RAG(graph, model, question, document_names, session_id, mode, write_access=True, email=None, uri=None):
665+
def QA_RAG(graph, model, question, document_names, session_id, mode, write_access=True, email=None, uri=None, embedding_provider=None, embedding_model=None):
668666
logging.info(f"Chat Mode: {mode}")
669667

670668
history = create_neo4j_chat_message_history(graph, session_id, write_access)
@@ -695,7 +693,7 @@ def QA_RAG(graph, model, question, document_names, session_id, mode, write_acces
695693
"user": "chatbot"
696694
}
697695
else:
698-
result = process_chat_response(messages,history, question, model, graph, document_names,chat_mode_settings, email, uri)
696+
result = process_chat_response(messages,history, question, model, graph, document_names,chat_mode_settings, email, uri, embedding_provider, embedding_model)
699697

700698
result["session_id"] = session_id
701699

backend/src/communities.py

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -316,9 +316,9 @@ def create_community_summaries(gds, model, email, uri):
316316
token_usage = 0
317317
try:
318318
#pre check if user allowed to create community summaries
319-
if get_value_from_env("TRACK_TOKEN_USAGE", "false", "bool"):
319+
if get_value_from_env("TRACK_USER_USAGE", "false", "bool"):
320320
try:
321-
track_token_usage(email, uri, 0, model)
321+
track_token_usage(email, uri, 0, model,operation_type="precheck")
322322
except LLMGraphBuilderException as e:
323323
logging.error(str(e))
324324
raise RuntimeError(str(e))
@@ -361,21 +361,20 @@ def create_community_summaries(gds, model, email, uri):
361361

362362
finally:
363363
try:
364-
if get_value_from_env("TRACK_TOKEN_USAGE", "false", "bool"):
364+
if get_value_from_env("TRACK_USER_USAGE", "false", "bool"):
365365
if callback_handler:
366366
usage = callback_handler.report()
367367
token_usage = usage.get("total_tokens", 0)
368368
if email and token_usage > 0:
369369
email = email.strip().replace('"', '')
370-
latest_token = track_token_usage(email, uri, token_usage, model)
370+
latest_token = track_token_usage(email, uri, token_usage, model,operation_type="community_summary")
371371
logging.info(f"In community : Total token usage {latest_token} for user {email} ")
372372
except Exception as err:
373373
logging.warning(f"Failed to track token usage: {err}")
374374

375-
def create_community_embeddings(gds):
375+
def create_community_embeddings(gds, embedding_provider, embedding_model):
376376
try:
377-
embedding_model = get_value_from_env("EMBEDDING_MODEL","sentence_transformer")
378-
embeddings, dimension = load_embedding_model(embedding_model)
377+
embeddings, dimension = load_embedding_model(embedding_provider, embedding_model)
379378
logging.info(f"Embedding model '{embedding_model}' loaded successfully.")
380379

381380
logging.info("Fetching community details.")
@@ -467,7 +466,7 @@ def create_fulltext_index(gds, index_type):
467466
logging.error("An error occurred while creating the full-text index.", exc_info=True)
468467
logging.error(f"Error details: {str(e)}")
469468

470-
def create_community_properties(gds, model, email, uri):
469+
def create_community_properties(gds, model, email, uri, embedding_provider, embedding_model):
471470
commands = [
472471
(CREATE_COMMUNITY_CONSTRAINT, "created community constraint to the graph."),
473472
(CREATE_COMMUNITY_LEVELS, "Successfully created community levels."),
@@ -484,7 +483,7 @@ def create_community_properties(gds, model, email, uri):
484483
create_community_summaries(gds, model, email, uri)
485484
logging.info("Successfully created community summaries.")
486485

487-
embedding_dimension = create_community_embeddings(gds)
486+
embedding_dimension = create_community_embeddings(gds, embedding_provider, embedding_model)
488487
logging.info("Successfully created community embeddings.")
489488

490489
create_vector_index(gds=gds,index_type=ENTITY_VECTOR_INDEX_NAME,embedding_dimension=embedding_dimension)
@@ -518,7 +517,7 @@ def clear_communities(gds):
518517
raise
519518

520519

521-
def create_communities(uri, username, password, database,email=None,model=COMMUNITY_CREATION_DEFAULT_MODEL):
520+
def create_communities(uri, username, password, database,email=None,model=COMMUNITY_CREATION_DEFAULT_MODEL, embedding_provider=None, embedding_model=None):
522521
try:
523522
gds = get_gds_driver(uri, username, password, database)
524523
clear_communities(gds)
@@ -527,7 +526,7 @@ def create_communities(uri, username, password, database,email=None,model=COMMUN
527526
write_communities_success = write_communities(gds, graph_project)
528527
if write_communities_success:
529528
logging.info("Starting Community properties creation process.")
530-
create_community_properties(gds,model,email,uri)
529+
create_community_properties(gds,model,email,uri, embedding_provider, embedding_model)
531530
logging.info("Communities creation process completed successfully.")
532531
else:
533532
logging.warning("Failed to write communities. Constraint was not applied.")

backend/src/entities/source_extract_params.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,8 @@ class SourceScanExtractParams(BaseModel):
2323
language: Optional[str] = Field(None, description="Language")
2424
retry_condition: Optional[str] = Field(None, description="Retry condition")
2525
additional_instructions: Optional[str] = Field(None, description="Additional instructions")
26-
26+
embedding_provider: Optional[str] = Field(None, description="Embedding provider")
27+
embedding_model: Optional[str] = Field(None, description="Embedding model")
2728

2829
def get_source_scan_extract_params(
2930
source_url: Optional[str] = Form(None),
@@ -46,6 +47,8 @@ def get_source_scan_extract_params(
4647
language: Optional[str] = Form(None),
4748
retry_condition: Optional[str] = Form(None),
4849
additional_instructions: Optional[str] = Form(None),
50+
embedding_provider: Optional[str] = Form(None),
51+
embedding_model: Optional[str] = Form(None),
4952
) -> SourceScanExtractParams:
5053
return SourceScanExtractParams(
5154
source_url=source_url,
@@ -68,4 +71,6 @@ def get_source_scan_extract_params(
6871
language=language,
6972
retry_condition=retry_condition,
7073
additional_instructions=additional_instructions,
74+
embedding_provider=embedding_provider,
75+
embedding_model=embedding_model
7176
)

backend/src/entities/source_node.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,3 +31,4 @@ class sourceNode:
3131
access_token:str=None
3232
retry_condition:str=None
3333
token_usage:int=None
34+
embedding_model:str=None

0 commit comments

Comments
 (0)