Data acquisition is the foundational step in any Enterprise Perception System (EPS), involving the gathering of raw data from various sources—both machine-driven and human-driven. This data forms the foundation for perceptual analysis, decision-making, and the generation of insights. Historically, organizations primarily relied on human-generated data for decision-making, but with the advent of advanced sensors and automation technologies, machine-generated data has increasingly become the norm. The diversity of data sources, from automated sensors to human inputs, makes EPS flexible and adaptable in different contexts. This section will cover the types of data acquisition, the evolving role of both machines and humans as sensors, and the technologies that support this process.
In an Enterprise Perception System, data comes in a variety of formats. These can be broadly categorized into structured, semi-structured, and unstructured data. Understanding the data formats collected by an EPS helps to design systems that effectively integrate, process, and analyze this information as each format requires the use of different technologies.
Structured data refers to data that is highly organized and easily searchable in databases or tables. It typically conforms to a specific format and schema, making it easier to store, retrieve, and analyze.
EXAMPLES:
• Customer Records: Detailed information about customers, such as names, addresses, contact details, and purchase history, stored in relational databases like CRM systems.
• Bank Transactions: Transactional data such as deposit and withdrawal amounts, timestamps, and account numbers, stored in highly structured formats in financial databases.
• GPS coordinates: Location data collected from devices such as smartphones or vehicles.
ADVANTAGES:
• Easy to store and process in relational databases or structured query systems.
• Ideal for automated data processing and integration with other systems.
Semi-structured data does not follow a rigid structure but still contains organizational elements like tags or markers that make it easier to process compared to unstructured data.
EXAMPLES:
• Crawling websites: Data collected from web scraping or crawling often comes in semi-structured formats like HTML, where some sections (e.g., tags, metadata) are structured, but the content itself can vary widely.
• JSON or XML data: Common in APIs, where responses from sensors or software systems might contain information in structured fields but with variable formatting.
• Time-series data: Sensor readings over time, which often include a mix of structured (e.g., timestamps) and semi-structured components (e.g., varying attributes collected at different times).
• System event logs: Often semi-structured, containing identifiable sections but varying in content (e.g., different event types may log different data fields).
ADVANTAGES:
• Offers flexibility in data storage while retaining enough structure to allow for meaningful analysis.
• Useful for aggregating data from a variety of sources with different formats.
Unstructured data consists of information that does not follow any specific format or structure, making it more difficult to store and analyze. However, it often provides rich, context-heavy insights.
EXAMPLES:
• Images and video feeds: Captured by security cameras or other monitoring devices, used in computer vision applications.
• Text data: Transcripts of voice descriptions, user feedback, emails, or social media posts. Often analyzed with Natural Language Processing (NLP) techniques to extract insights.
• Audio data: Voice recordings from customer support calls, human descriptions of events, or even sensor-generated sound data.
ADVANTAGES:
• Rich in context and can provide deeper insights that are not possible through structured data alone.
• Essential for complex, perceptual tasks such as facial recognition, sentiment analysis, or audio pattern recognition.
Storing and Indexing Data for retrieval
Data Quality and Accuracy
Data acquisition in an EPS can be broadly categorized based on the nature of the data and the type of “sensor” collecting it. Both machines and humans contribute valuable data, though their methods of collection and the nature of their inputs differ. The balance between human-generated and machine-generated data in organizations has shifted dramatically in recent decades.