Elevate your data handling capabilities with Extracta.ai's Document Data Extraction API. Our cutting-edge solution empowers your systems to automatically extract structured data from a myriad of documents - whether they are scanned images, PDFs, emails, invoices, contracts, or any digital file format you can think of. Tailored to meet the needs of various industries, our API facilitates the seamless automation of workflows, significantly reducing manual efforts and enhancing overall efficiency.
Features:
Whether you're a software developer, a business analyst, or a data scientist, our Document Data Extraction API is designed to streamline your data processing tasks, allowing you to focus on what truly matters - driving your business forward. Start with Extracta.ai today and transform the way you handle documents forever.
Structure your request with mandatory parameters: 'name', 'language', 'fields' and 'file'. Each field requires a 'key', with 'description' and 'example' being optional. The document must be provided as either 'base64String' or a 'fileUrl'.
## API Documentation
This section provides guidelines for structuring your Document Parsing API requests to Extracta.ai. Ensure to follow the format below for successful data extraction:
## Request Format
```
{
"extractionDetails": {
"name": "Extraction Name", // required - Name your extraction process
"language": "Supported Language", // required - Choose from the supported languages
"fields": [
{
"key": "Field Key", // required - Define the key for data extraction
"description": "Field Description", // optional - Describe the field
"example": "Field Example" // optional - Provide an example value
},
...
]
},
"file": "base64String or file URL" // required - Provide the document in base64String format or as a URL
}
```
## Advanced Format
In addition to the basic format outlined in the previous sections, Extracta.ai also supports more complex data structures for specialized extraction needs. This advanced format allows the definition of **nested objects and arrays**, catering to a broader range of data representation.
### Type `object`
The **object** type represents a structured object with multiple **properties**. Each property is defined as an object within an array, and can include its own **key**, **description**, **type**, and **example**.
```
{
"key": "personal_info",
"description": "Personal information of the person", // optional
"type": "object",
"properties": [
{
"key": "name",
"description": "Name of the person", // optional
"example": "Alex Smith", // optional
"type": "string" // optional
},
{
"key": "email",
"description": "Email of the person",
"example": "[email protected]",
"type": "string"
},
.....
]
}
```
### Type `array`
The **array** type is used for lists of **items**, such as a collection of work experiences. The items key contains an object defining the structure of each item in the array.
```
{
"key": "work_experience",
"description": "Work experience of the person", // optional
"type": "array",
"items": {
"type": "object",
"properties": [
{
"key": "title",
"description": "Title of the job", // optional
"example": "Software Engineer", // optional
"type": "string" // optional
},
{
"key": "start_date",
"description": "Start date of the job",
"example": "2022",
"type": "string"
},
...
]
}
}
```
### Notes on Usage | Document Parsing API
- For both `object` and `array` types, the `example` parameter is applicable only for their inner properties/items.
- When defining fields, if no `type` is specified, it defaults to `string`.
- For `object` and `array` types, the inner fields can only be of type `string`. This means that each property within an object or each item within an array should be a string type, ensuring consistency and simplicity in data representation.
- These advanced field types enable more detailed and structured data representation, enhancing the capabilities of Extracta.ai's data extraction process.
## Supported File Types
Extracta.ai is capable of processing documents in **image (JPG, PNG), PDF, and DOCX formats**. This enhancement allows for a wider range of document types to be submitted for extraction.
## Supported Languages
Extracta.ai currently supports document extraction in the following languages: **Romanian, English, French, Spanish, Arabic, Portuguese, German, Italian**. Additional support for 20 more languages is planned.
**Note**: If an unsupported language is specified, the API will return an error message indicating an invalid language choice. Keep updated with our API documentation for new language additions.
Process Document - Endpoint Features
| Object | Description |
|---|---|
Request Body |
[Required] Json |
{
"name": "Darren Charles",
"email": "[email protected]",
"phone": "+1-709-680-9033",
"address": "9 Corpus Christi, Texas",
"soft_skills": "highly motivated, ability to translate business strategies, learn new things",
"hard_skills": "Matlab, MeVisLab, Keras, CUDA, Git, DataStage, MQTT",
"last_job": "Trainee With English Communications",
"years_of_experience": "Ongoing"
}
curl --location --request POST 'https://zylalabs.com/api/3606/document+data+extraction+api/4000/process+document' --header 'Authorization: Bearer YOUR_API_KEY'
--data-raw '{
"extractionDetails": {
"name": "CV - Extraction",
"language": "English",
"fields": [
{
"key": "name",
"description": "the name of the person in the CV",
"example": "Johan Smith"
},
{
"key": "email",
"description": "the email of the person in the CV",
"example": "[email protected]"
},
{
"key": "phone",
"description": "the phone number of the person",
"example": "123 333 4445"
},
{
"key": "address",
"description": "the compelte address of the person",
"example": "1234 Main St, New York, NY 10001"
},
{
"key": "soft_skills",
"description": "the soft skills of the person",
"example": ""
},
{
"key": "hard_skills",
"description": "the hard skills of the person",
"example": ""
},
{
"key": "last_job",
"description": "the last job of the person",
"example": "Software Engineer"
},
{
"key": "years_of_experience",
"description": "the years of experience of last job",
"example": "5"
}
]
},
"file": "https://deveatery.com/extracta/cv.png"
}'
| Header | Description |
|---|---|
Authorization
|
[Required] Should be Bearer access_key. See "Your API Access Key" above when you are subscribed. |
No long-term commitment. Upgrade, downgrade, or cancel anytime. Free Trial includes up to 50 requests.
We are capable of handling a broad spectrum of documents, encompassing both structured and unstructured formats, such as PDFs, Word documents, text files, and scanned images (in PNG, JPG formats), employing OCR technology as required.
The API returns structured data extracted from documents, including key fields such as name, email, phone, address, and skills. This data is organized in a JSON format, making it easy to integrate into applications.
Users can customize requests by defining specific extraction criteria in the 'fields' parameter. Each field can include a 'key', 'description', and 'example', allowing tailored data extraction based on unique business needs.
The response data typically includes fields like "name", "email", "phone", "address", "soft_skills", "hard_skills", "last_job", and "years_of_experience". These fields provide comprehensive insights into the extracted document's content.
The response data is structured in a JSON format, where each key corresponds to a specific piece of extracted information. This organization allows for straightforward parsing and integration into various applications.
The API can extract a wide range of information, including personal details, contact information, skills, work experience, and educational background from various document types, such as resumes and invoices.
Data accuracy is maintained through advanced extraction algorithms that leverage state-of-the-art technology. The API requires no pre-training, ensuring rapid and precise extraction from diverse document formats.
The endpoint accepts parameters such as 'name' (extraction name), 'language' (supported languages), and 'fields' (specific data keys). Users must provide a valid document in either 'base64String' or 'fileUrl' format.
Typical use cases include automating data entry from resumes, invoices, contracts, and other documents, streamlining workflows in HR, finance, and legal sectors, and enhancing data processing efficiency across various industries.
Zyla API Hub is like a big store for APIs, where you can find thousands of them all in one place. We also offer dedicated support and real-time monitoring of all APIs. Once you sign up, you can pick and choose which APIs you want to use. Just remember, each API needs its own subscription. But if you subscribe to multiple ones, you'll use the same key for all of them, making things easier for you.
Service Level:
100%
Response Time:
1,344ms
Service Level:
100%
Response Time:
1,812ms
Service Level:
100%
Response Time:
1,586ms
Service Level:
100%
Response Time:
1,945ms
Service Level:
100%
Response Time:
10,154ms
Service Level:
100%
Response Time:
0ms
Service Level:
100%
Response Time:
1,134ms
Service Level:
100%
Response Time:
1,429ms
Service Level:
100%
Response Time:
1,296ms
Service Level:
97%
Response Time:
942ms
Service Level:
100%
Response Time:
5,255ms
Service Level:
100%
Response Time:
2,317ms
Service Level:
100%
Response Time:
5,526ms
Service Level:
100%
Response Time:
1,259ms
Service Level:
100%
Response Time:
4,163ms
Service Level:
100%
Response Time:
353ms
Service Level:
100%
Response Time:
4,391ms
Service Level:
100%
Response Time:
64ms
Service Level:
100%
Response Time:
6,256ms
Service Level:
100%
Response Time:
2,281ms