-
Notifications
You must be signed in to change notification settings - Fork 3.5k
feat(extend): add Extend AI document processing integration #3869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
5583f94
feat(extend): add Extend AI document processing integration
waleedlatif1 2634fdb
fix(extend): cast json response to fix type error
waleedlatif1 073315b
fix(extend): correct API request body structure per Extend docs
waleedlatif1 4830a4c
fix(extend): address PR review comments
waleedlatif1 3b0d080
fix(extend): sync integrations.json bgColor to #000000
waleedlatif1 eb90d96
lint
waleedlatif1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| --- | ||
| title: Extend | ||
| description: Parse and extract content from documents | ||
| --- | ||
|
|
||
| import { BlockInfoCard } from "@/components/ui/block-info-card" | ||
|
|
||
| <BlockInfoCard | ||
| type="extend_v2" | ||
| color="#000000" | ||
| /> | ||
|
|
||
| ## Usage Instructions | ||
|
|
||
| Integrate Extend AI into the workflow. Parse and extract structured content from documents or file references. | ||
|
|
||
|
|
||
|
|
||
| ## Tools | ||
|
|
||
| ### `extend_parser` | ||
|
|
||
| #### Input | ||
|
|
||
| | Parameter | Type | Required | Description | | ||
| | --------- | ---- | -------- | ----------- | | ||
| | `filePath` | string | No | URL to a document to be processed | | ||
| | `file` | file | No | Document file to be processed | | ||
| | `fileUpload` | object | No | File upload data from file-upload component | | ||
| | `outputFormat` | string | No | Target output format \(markdown or spatial\). Defaults to markdown. | | ||
| | `chunking` | string | No | Chunking strategy \(page, document, or section\). Defaults to page. | | ||
| | `engine` | string | No | Parsing engine \(parse_performance or parse_light\). Defaults to parse_performance. | | ||
| | `apiKey` | string | Yes | Extend API key | | ||
|
|
||
| #### Output | ||
|
|
||
| This tool does not produce any outputs. | ||
|
|
||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -39,6 +39,7 @@ | |
| "enrich", | ||
| "evernote", | ||
| "exa", | ||
| "extend", | ||
| "fathom", | ||
| "file", | ||
| "firecrawl", | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,188 @@ | ||
| import { createLogger } from '@sim/logger' | ||
| import { type NextRequest, NextResponse } from 'next/server' | ||
| import { z } from 'zod' | ||
| import { checkInternalAuth } from '@/lib/auth/hybrid' | ||
| import { | ||
| secureFetchWithPinnedIP, | ||
| validateUrlWithDNS, | ||
| } from '@/lib/core/security/input-validation.server' | ||
| import { generateRequestId } from '@/lib/core/utils/request' | ||
| import { RawFileInputSchema } from '@/lib/uploads/utils/file-schemas' | ||
| import { isInternalFileUrl } from '@/lib/uploads/utils/file-utils' | ||
| import { resolveFileInputToUrl } from '@/lib/uploads/utils/file-utils.server' | ||
|
|
||
| export const dynamic = 'force-dynamic' | ||
|
|
||
| const logger = createLogger('ExtendParseAPI') | ||
|
|
||
| const ExtendParseSchema = z.object({ | ||
| apiKey: z.string().min(1, 'API key is required'), | ||
| filePath: z.string().optional(), | ||
| file: RawFileInputSchema.optional(), | ||
| outputFormat: z.enum(['markdown', 'spatial']).optional(), | ||
| chunking: z.enum(['page', 'document', 'section']).optional(), | ||
| engine: z.enum(['parse_performance', 'parse_light']).optional(), | ||
| }) | ||
|
|
||
| export async function POST(request: NextRequest) { | ||
| const requestId = generateRequestId() | ||
|
|
||
| try { | ||
| const authResult = await checkInternalAuth(request, { requireWorkflowId: false }) | ||
|
|
||
| if (!authResult.success || !authResult.userId) { | ||
| logger.warn(`[${requestId}] Unauthorized Extend parse attempt`, { | ||
| error: authResult.error || 'Missing userId', | ||
| }) | ||
| return NextResponse.json( | ||
| { | ||
| success: false, | ||
| error: authResult.error || 'Unauthorized', | ||
| }, | ||
| { status: 401 } | ||
| ) | ||
| } | ||
|
|
||
| const userId = authResult.userId | ||
| const body = await request.json() | ||
| const validatedData = ExtendParseSchema.parse(body) | ||
|
|
||
| logger.info(`[${requestId}] Extend parse request`, { | ||
| fileName: validatedData.file?.name, | ||
| filePath: validatedData.filePath, | ||
| isWorkspaceFile: validatedData.filePath ? isInternalFileUrl(validatedData.filePath) : false, | ||
| userId, | ||
| }) | ||
|
|
||
| const resolution = await resolveFileInputToUrl({ | ||
| file: validatedData.file, | ||
| filePath: validatedData.filePath, | ||
| userId, | ||
| requestId, | ||
| logger, | ||
| }) | ||
|
|
||
| if (resolution.error) { | ||
| return NextResponse.json( | ||
| { success: false, error: resolution.error.message }, | ||
| { status: resolution.error.status } | ||
| ) | ||
| } | ||
|
|
||
| const fileUrl = resolution.fileUrl | ||
| if (!fileUrl) { | ||
| return NextResponse.json({ success: false, error: 'File input is required' }, { status: 400 }) | ||
| } | ||
|
|
||
| const extendBody: Record<string, unknown> = { | ||
| file: { fileUrl }, | ||
| } | ||
waleedlatif1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| const config: Record<string, unknown> = {} | ||
|
|
||
| if (validatedData.outputFormat) { | ||
| config.target = validatedData.outputFormat | ||
| } | ||
|
|
||
| if (validatedData.chunking) { | ||
| config.chunkingStrategy = { type: validatedData.chunking } | ||
| } | ||
|
|
||
| if (validatedData.engine) { | ||
| config.engine = validatedData.engine | ||
| } | ||
|
|
||
| if (Object.keys(config).length > 0) { | ||
| extendBody.config = config | ||
| } | ||
|
|
||
| const extendEndpoint = 'https://api.extend.ai/parse' | ||
| const extendValidation = await validateUrlWithDNS(extendEndpoint, 'Extend API URL') | ||
| if (!extendValidation.isValid) { | ||
| logger.error(`[${requestId}] Extend API URL validation failed`, { | ||
| error: extendValidation.error, | ||
| }) | ||
| return NextResponse.json( | ||
| { | ||
| success: false, | ||
| error: 'Failed to reach Extend API', | ||
| }, | ||
| { status: 502 } | ||
| ) | ||
| } | ||
|
|
||
| const extendResponse = await secureFetchWithPinnedIP( | ||
| extendEndpoint, | ||
| extendValidation.resolvedIP!, | ||
| { | ||
| method: 'POST', | ||
| headers: { | ||
| 'Content-Type': 'application/json', | ||
| Accept: 'application/json', | ||
| Authorization: `Bearer ${validatedData.apiKey}`, | ||
| 'x-extend-api-version': '2025-04-21', | ||
| }, | ||
| body: JSON.stringify(extendBody), | ||
| } | ||
| ) | ||
|
|
||
| if (!extendResponse.ok) { | ||
| const errorText = await extendResponse.text() | ||
| logger.error(`[${requestId}] Extend API error:`, errorText) | ||
| let clientError = `Extend API error: ${extendResponse.statusText || extendResponse.status}` | ||
| try { | ||
| const parsedError = JSON.parse(errorText) | ||
| if (parsedError?.message || parsedError?.error) { | ||
| clientError = (parsedError.message ?? parsedError.error) as string | ||
| } | ||
| } catch { | ||
| // errorText is not JSON; keep generic message | ||
| } | ||
| return NextResponse.json( | ||
| { | ||
| success: false, | ||
| error: clientError, | ||
| }, | ||
| { status: extendResponse.status } | ||
| ) | ||
waleedlatif1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| } | ||
|
|
||
| const extendData = (await extendResponse.json()) as Record<string, unknown> | ||
|
|
||
| logger.info(`[${requestId}] Extend parse successful`) | ||
|
|
||
| return NextResponse.json({ | ||
| success: true, | ||
| output: { | ||
| id: extendData.id ?? null, | ||
| status: extendData.status ?? 'PROCESSED', | ||
| chunks: extendData.chunks ?? [], | ||
| blocks: extendData.blocks ?? [], | ||
| pageCount: extendData.pageCount ?? extendData.page_count ?? null, | ||
| creditsUsed: extendData.creditsUsed ?? extendData.credits_used ?? null, | ||
| }, | ||
| }) | ||
| } catch (error) { | ||
| if (error instanceof z.ZodError) { | ||
| logger.warn(`[${requestId}] Invalid request data`, { errors: error.errors }) | ||
| return NextResponse.json( | ||
| { | ||
| success: false, | ||
| error: 'Invalid request data', | ||
| details: error.errors, | ||
| }, | ||
| { status: 400 } | ||
| ) | ||
| } | ||
|
|
||
| logger.error(`[${requestId}] Error in Extend parse:`, error) | ||
|
|
||
| return NextResponse.json( | ||
| { | ||
| success: false, | ||
| error: error instanceof Error ? error.message : 'Internal server error', | ||
| }, | ||
| { status: 500 } | ||
| ) | ||
| } | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.